Container Service for Kubernetes (ACK) provides the topology-aware CPU scheduling feature based on the new Kubernetes scheduling framework. This feature can improve the performance of CPU-sensitive workloads. This topic describes how to enable topology-aware CPU scheduling.
How it works
Multiple pods can run on a node in a Kubernetes cluster, and some of the pods may belong to CPU-intensive workloads. In this case, pods compete for CPU resources. When the competition becomes intense, the CPU cores that are allocated to each pod may frequently change. This situation intensifies when Non-Uniform Memory Access (NUMA) nodes are used. These changes degrade the performance of the workloads. The Kubernetes CPU manager provides a CPU scheduling solution to fix this issue within a node. However, the Kubernetes CPU manager cannot find the optimal way to allocate CPU cores within a cluster. The Kubernetes CPU manager works only on guaranteed pods and does not apply to other types of pods. In a guaranteed pod, each container is configured with a CPU request and a CPU limit, and the request and limit are set to the same value.
Topology-aware CPU scheduling applies to the following scenarios:
The workload is compute-intensive.
The application is CPU-sensitive.
The workload runs on multi-core Elastic Compute Service (ECS) bare metal instances with Intel CPUs or AMD CPUs.
To test topology-aware CPU scheduling, stress tests are performed on two NGINX applications, each of which requests 4 CPU cores and 8 GB of memory. One application is deployed on an ECS Bare Metal instance with 104 Intel CPU cores and the other application is deployed on an ECS Bare Metal instance with 256 AMD CPU cores. The results show that the application performance is improved by 22% to 43% when topology-aware CPU scheduling is enabled. The following figures show the details.
Performance metrics
Intel
AMD
QPS
Improved by 22.9%
Improved by 43.6%
AVG RT
Reduced by 26.3%
Reduced by 42.5%
ImportantDifferent applications have varying sensitivities to the CPU core binding policy. The data from the preceding experiments is for reference only. We recommended that you deploy your application in a stable environment and adjust the experimental stress level based on the device type and other environmental factors to ensure that the application can run as normal. Then, evaluate whether to enable the topology-aware CPU scheduling feature after comparing the performance statistics.
When you enable topology-aware CPU scheduling, you can set cpu-policy
to static-burst
in the template.metadata.annotations
section of the Deployment object or in the metadata.annotations
section of the Pod object to adjust the automatic CPU core binding policy. The policy is suitable for compute-intensive workloads and can efficiently reduce CPU core contention among processes and memory access between NUMA nodes. This maximizes the utilization of fragmented CPU resources and optimizes resource allocation for compute-intensive workloads without the need to modify the hardware and VM resources. This further improves CPU usage.
For more information about how topology-aware CPU scheduling is implemented, see Practice of Fine-grained Cgroups Resources Scheduling in Kubernetes.
Prerequisites
An ACK Pro cluster is created. For more information, see Create an ACK Pro cluster.
ack-koordinator (FKA ack-slo-manager) is installed. For more information, see ack-koordinator (FKA ack-slo-manager).
Noteack-koordinator is upgraded and optimized based on resource-controller. You must uninstall resource-controller after you install ack-koordinator. For more information about how to uninstall resource-controller, resource-controller.
Limits
The following table describes the versions that are required for the system components.
Component | Version |
Kubernetes | ≥ 1.18 |
ack-koordinator | ≥ 0.2.0 |
Billing
No fee is charged when you install and use the ack-koordinator component. However, fees may be charged in the following scenarios:
ack-koordinator is an non-managed component that occupies worker node resources after it is installed. You can specify the amount of resources requested by each module when you install the component.
By default, ack-koordinator exposes the monitoring metrics of features such as resource profiling and fine-grained scheduling as Prometheus metrics. If you enable Prometheus metrics for ack-koordinator and use Managed Service for Prometheus, these metrics are considered as custom metrics and fees are charged for these metrics. The fee depends on factors such as the size of your cluster and the number of applications. Before you enable Prometheus metrics, we recommend that you read the Billing topic of Managed Service for Prometheus to learn the free quota and billing rules of custom metrics. For more information about how to monitor and manage resource usage see Query the amount of observable data and bills.
Usage notes
Before you enable topology-aware CPU scheduling, make sure that ack-koordinator is deployed.
When you enable topology-aware CPU scheduling, make sure that
cpu-policy=none
is configured for the nodes.To limit pod scheduling, add the
nodeSelector
parameter.ImportantDo not add the
nodeName
parameter, which cannot be parsed by the pod scheduler when topology-aware CPU scheduling is enabled.
Enable topology-aware CPU scheduling
Before you enable topology-aware CPU scheduling, you need to configure the annotations
and Containers
parameters when you configure pods. Perform the following steps to enable topology-aware CPU scheduling.
Set
cpuset-scheduler
totrue
in thetemplate.metadata.annotations
section of the Deployment object or in themetadata.annotations
section of the Pod object to enable topology-aware CPU scheduling.Set the
resources.limit.cpu
parameter in thecontainers
section to an integer.
Create a file named go-demo.yaml based on the following content and configure the Deployment to use topology-aware CPU scheduling.
ImportantYou need to configure pod
annotations
in thetemplate.metadata
section of the Deployment.When you configure topology-aware CPU scheduling, you can set
cpu-policy
tostatic-burst
in theannotations
section to adjust the automatic CPU core binding policy. To use the setting, delete the number sign (#
) beforecpu-policy
.
Run the following command to create a Deployment:
kubectl create -f go-demo.yaml
Verify topology-aware CPU scheduling
In this example, the following conditions apply:
The Kubernetes version of the ACK Pro cluster is 1.20.
Two cluster nodes are used in the test. One is used as the load generator. The other runs the workloads and serves as the tested machine.
The following deployment and stress testing commands are for reference only. You can adjust the resource specifications and request stress according to your experimental environment to ensure that the application is in a normal state before proceeding with the experiment.
Run the following command to add a label to the tested machine:
kubectl label node 192.168.XX.XX policy=intel/amd
Deploy the NGINX service on the tested machine.
Use the following YAML templates to create resources for the NGINX service:
Run the following command to create the resources that are provisioned for the NGINX service:
kubectl create -f service.yaml
kubectl create -f configmap.yaml
kubectl create -f nginx.yaml
Log on to the load generator, download the wrk2 open source stress test tool, and decompress the package. For more information, see wrk2 official site.
NoteFor more information about how to log on to a node, see Connect to an instance by using VNC or Connect to a Windows instance by using a username and password.
Run the following command to perform stress tests and record the test data:
wrk --timeout 2s -t 20 -c 100 -d 60s --latency http://<IP address of the tested machine>:32257
Expected output:
20 threads and 100 connections Thread Stats Avg Stdev Max +/- Stdev Latency 600.58us 3.07ms 117.51ms 99.74% Req/Sec 10.67k 2.38k 22.33k 67.79% Latency Distribution 50% 462.00us 75% 680.00us 90% 738.00us 99% 0.90ms 12762127 requests in 1.00m, 10.10GB read Requests/sec: 212350.15Transfer/sec: 172.13MB
Run the following command to delete the NGINX Deployment:
kubectl delete deployment nginx
Expected output:
deployment "nginx" deleted
Use the following YAML template to deploy an NGINX Deployment with topology-aware CPU scheduling enabled:
Run the following command to perform stress tests and record the test data for comparison:
wrk --timeout 2s -t 20 -c 100 -d 60s --latency http://<IP address of the tested machine>:32257
Expected output:
20 threads and 100 connections ls Thread Stats Avg Stdev Max +/- Stdev Latency 345.79us 1.02ms 82.21ms 99.93% Req/Sec 15.33k 2.53k 25.84k 71.53% Latency Distribution 50% 327.00us 75% 444.00us 90% 479.00us 99% 571.00us 18337573 requests in 1.00m, 14.52GB read Requests/sec: 305119.06Transfer/sec: 247.34MB
Compare the data of the preceding tests. This comparison indicates that the performance of the NGINX service is improved by 43% after topology-aware CPU scheduling is enabled.
Verify that the automatic CPU core binding policy improves performance
In this example, a CPU policy is configured for a workload that runs on a node with 64 CPU cores. After you configure the automatic CPU core binding policy of an application with topology-aware CPU scheduling enabled, the CPU usage can be further improved by 7% to 8%.
The following deployment and stress testing commands are for reference only. You can adjust the resource specifications according to your own experimental environment to ensure that the application is in a normal state before proceeding with the experiment.
Run the following command to query the pods:
kubectl get pods | grep cal-pi
Expected output:
NAME READY STATUS RESTARTS AGE cal-pi-d**** 1/1 Running 1 9h
Run the following command to query the log of the
cal-pi-d****
application:kubectl logs cal-pi-d****
Expected output:
computing Pi with 3000 Threads...computed the first 20000 digets of pi in 620892 ms! the first digets are: 3.14159264 writing to pi.txt... finished!
Use topology-aware CPU scheduling.
Configure the Deployment to use topology-aware CPU scheduling and configure the automatic CPU core binding policy. For more information, see Enable topology-aware CPU scheduling.
Create a file named go-demo.yaml based on the following content and configure the Deployment to use topology-aware CPU scheduling.
ImportantYou need to configure pod
annotations
in thetemplate.metadata
section of the Deployment.Run the following command to create a Deployment:
kubectl create -f go-demo.yaml
Run the following command to query the pods:
kubectl get pods | grep go-demo
Expected output:
NAME READY STATUS RESTARTS AGE go-demo-e**** 1/1 Running 1 9h
Run the following command to query the log of the
go-demo-e****
application:kubectl logs go-demo-e****
Expected output:
computing Pi with 3000 Threads...computed the first 20000 digets of pi in 571221 ms! the first digets are: 3.14159264 writing to pi.txt... finished!
Compare the log data with the log data in Step 2. You can find that the performance of the pod configured with a CPU policy is improved by 8%.
References
Kubernetes is unaware of the topology of GPU resources on nodes. Therefore, Kubernetes schedules GPU resources in a random manner. As a result, the GPU acceleration for training jobs considerably varies based on the scheduling results of GPU resources. To avoid this situation, ACK supports topology-aware GPU scheduling based on the scheduling framework of Kubernetes. You can use this feature to select multiple GPUs from GPU-accelerated nodes to achieve optimal GPU acceleration for training jobs. For more information, see Topology-aware GPU scheduling.