Container Service for Kubernetes (ACK) provides service level objective (SLO)-aware workload scheduling based on the ack-koordinator component. You can use SLO-aware workload scheduling to colocate online service applications and offline computing applications. This topic describes how to use ack-koordinator to colocate an online service and a video transcoding application.
Use scenarios
You can colocate an online NGINX service and a video transcoding application that uses FFmpeg. This allows you to utilize the resources that are allocated to pods but are not in use on the node. You can enable the resource isolation feature of ack-koordinator to isolate the resources allocated to the online NGINX service from the resources allocated to the video transcoding application that uses FFmpeg. This way, the performance of the online NGINX service can be guaranteed.
Colocation deployment architecture
In this example, an online NGINX service and a video transcoding application that uses FFmpeg are colocated on a node. The Quality of Service (QoS) class of the pod created for the NGINX service is set to latency-sensitive (LS). The QoS class of the pod created for the video transcoding application is set to best-effort (BE). In this topic, the NGINX service is deployed in different colocation modes to test the performance of the NGINX service.
The following features are used in colocation:
Resource reuse: This feature allows BE workloads to overcommit the resources that are allocated to LS workloads but are not in use. This improves the resource utilization of the cluster. For more information, see Dynamic resource overcommitment.
Resource isolation: This feature uses various methods to limit the resources used by BE workloads and prioritize the resource demand of LS workloads. For more information, see CPU QoS, CPU Suppress, and Resource isolation based on the L3 cache and MBA.
Preparations
Set up the environment
Add two nodes to an ACK Pro cluster. One of the nodes runs an NGINX web service and a video transcoding application that uses FFmpeg, and serves as the tested machine. The other node has the wrk tool installed and is used to perform stress tests by sending requests to the NGINX service. For more information, see Create an ACK Pro cluster.
To maximize the benefits provided by the colocation capability of ack-koordinator, we recommend that you deploy the tested machine on an Elastic Compute Service (ECS) Bare Metal instance and use Alibaba Cloud Linux as the operating system of the node.
Install ack-koordinator (FKA ack-slo-manager) and enable the colocation policies. For more information, see Getting started. In this example, ack-koordinator 0.8.0 is used.
Deploy an online NGINX service
Deploy an online NGINX service and wrk.
Create a YAML file named ls-nginx.yaml and copy the following content to the file:
Run the following command to deploy the NGINX service:
kubectl apply -f ls-nginx.yaml
Run the following command to query the name of the pod that runs the NGINX service:
kubectl get pod -l app=nginx -o wide
Expected output:
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES nginx 1/1 Running 0 43s 11.162.XXX.XXX cn-beijing.192.168.2.93 <none> <none>
The output shows that the pod is in the
Running
state. This indicates that the NGINX service runs as expected on the tested machine.Run the following command on the other node to deploy wrk:
wget -O wrk-4.2.0.tar.gz https://github.com/wg/wrk/archive/refs/tags/4.2.0.tar.gz && tar -xvf wrk-4.2.0.tar.gz cd wrk-4.2.0 && make && chmod +x ./wrk
Deploy an offline video transcoding application
Deploy an offline video transcoding application that uses FFmpeg.
Create a YAML file named be-ffmpeg.yaml and copy the following content to the file:
Run the following command to deploy the video transcoding application that uses FFmpeg:
kubectl apply -f be-ffmpeg.yaml
Run the following command to query the status of the pod in which the video transcoding application that uses FFmpeg runs:
kubectl get pod -l app=ffmpeg -o wide
Expected output:
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES be-ffmpeg 1/1 Running 0 15s 11.162.XXX.XXX cn-beijing.192.168.2.93 <none> <none>
The output shows that the pod is in the
Running
state. This indicates that the video transcoding application that uses FFmpeg runs as expected on the tested machine.
Procedure
In this topic, the online service and offline application are deployed in three colocation modes to test the application performance and the resource utilization of the node in these modes. The following table compares these modes in terms of application performance and the resource utilization of the node.
Deployment mode | Description |
Exclusive deployment mode of the online service (baseline group) | In this mode, only the online NGINX service is deployed on the tested machine. The video transcoding application that uses FFmpeg is not deployed on the tested machine. Use wrk to send requests to the NGINX service to test the performance and the resource utilization of the node in this mode. |
Default colocation mode of Kubernetes (control group) | Deploy the online NGINX service and the offline video transcoding application that uses FFmpeg on the tested machine. Then, use wrk to send requests to the NGINX service. In this example, the QoS class of the video transcoding application that uses FFmpeg is set to BE, and resource requests and limits are not specified in the pod configuration of the video transcoding application. You can change the number of processes to control the CPU utilization of the node. In this example, the CPU utilization of the node is 65%. Test the performance of the NGINX service and the resource utilization of the node in this mode. |
SLO-aware colocation mode of ACK (experimental group) | In this mode, the online NGINX service and the offline video transcoding application that uses FFmpeg are deployed on the tested machine. Use wrk to send requests to the NGINX service to test the performance and the resource utilization of the node in this mode. In this example, the QoS class of the video transcoding application that uses FFmpeg is set to BE, and extended resources are requested by the pod of the video transcoding application. For more information, see Dynamic resource overcommitment. The CPU Suppress, CPU QoS, and resource isolation based on the L3 cache and MBA features are enabled for the tested machine. Test the performance of the NGINX service and the resource utilization of the node in this mode. |
Test results
The following metrics are used to evaluate the performance of the NGINX service and the resource utilization of the node in different colocation modes:
Response time (RT)-percentile: RT is commonly used to evaluate the performance of an online application. A lower RT value indicates higher performance. You can obtain the RT value in the output of wrk. The RT value indicates the amount of time that the NGINX service requires to process the request from wrk. For example, RT-P50 indicates the maximum amount of time that the NGINX service requires to process 50% of the requests from wrk. RT-P90 indicates the maximum amount of time that the NGINX service requires to process 90% of the requests from wrk.
Average CPU utilization of the node: This metric indicates the CPU utilization of applications on the node within a time period. You can run the
kubectl top node
command to query the average CPU utilization of the node in different colocation modes.
The following table describes the metrics in different modes.
Metric | Baseline group (Exclusive deployment mode) | Control group (Default colocation mode) | Experimental group (SLO-aware colocation mode) |
NGINX RT-P90 (ms) | 0.533 | 0.574 (+7.7%) | 0.548 (2.8%) |
NGINX RT-P99 (ms) | 0.93 | 1.07 (+16%) | 0.96 (+3.2%) |
Average CPU utilization of the node | 29.6% | 65.1% | 64.8% |
Compare the metrics of the control group with the metrics of the baseline group: The average CPU utilization of the node increases from 29.6% to 65.1%. The values of NGINX RT-P90 and NGINX RT-P99 greatly increase. The RT curve has a long tail.
Compare the metrics of the experimental group with the metrics of the baseline group: The average CPU utilization of the node increases from 28.5% to 64.8%. The values of NGINX RT-P90 and NGINX RT-P99 slightly increase.
Compare the metrics of the experimental group with the metrics of the control group: The average CPU utilization of the node is similar. The values of NGINX RT-P90 and NGINX RT-P99 greatly decrease and are close to the values of NGINX RT-P90 and NGINX RT-P99 of the baseline group.
The preceding results show that the SLO-aware colocation mode of ACK can effectively increase the CPU utilization of the node and mitigate performance interference when an online NGINX service and an offline video transcoding application are deployed on the same node.
Deploy applications
Exclusive deployment mode of the online service
Deploy only the online service on the tested machine.
Refer to the Deploy an online NGINX service section of this topic to deploy the online NGINX service on the tested machine.
Use wrk to send requests to the NGINX service.
# Replace node_ip with the IP address of the tested machine. Port 8000 of the NGINX service is exposed to the tested machine. ./wrk -t6 -c54 -d60s --latency http://${node_ip}:8000/
Run the following command to query the average CPU utilization of the tested machine:
kubectl top node
Expected output:
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY% cn-beijing.192.168.2.93 29593m 29% xxxx xxxx cn-beijing.192.168.2.94 6874m 7% xxxx xxxx
The output shows that the average CPU utilization of the tested machine is about
29%
.After the stress test is complete, run the following command to query the test results from wrk:
To obtain precise test results, we recommend that you perform multiple stress tests.
Running 1m test @ http://192.168.2.94:8000/
Expected output:
6 threads and 54 connections Thread Stats Avg Stdev Max +/- Stdev Latency 402.18us 1.07ms 59.56ms 99.83% Req/Sec 24.22k 1.12k 30.58k 74.15% Latency Distribution 50% 343.00us 75% 402.00us 90% 523.00us 99% 786.00us 8686569 requests in 1.00m, 6.88GB read Requests/sec: 144537.08 Transfer/sec: 117.16MB
RT is a key metric for evaluating the performance of the online NGINX service in different scenarios. In the output, the
Latency Distribution
section shows the distribution of RT percentile values. For example,90% 523.00us
indicates that the RT for processing 90% of the requests is 523.00 microseconds. When only the online NGINX service is deployed, the RT-P50 is 343 microseconds, the RT-P90 is 523 microseconds, and the RT-P99 is 786 microseconds.
Default colocation mode of Kubernetes
Colocate the online NGINX service and the offline video transcoding application that uses FFmpeg on the tested machine. Then, use wrk to send requests to the NGINX service. Refer to the SLO-aware colocation mode of ACK section of this topic to colocate the applications. You must configure parameters and annotations based on the requirements in the YAML template.
SLO-aware colocation mode of ACK
Set the QoS class of the video transcoding application that uses FFmpeg to BE.
Refer to the Getting started topic to enable SLO-aware colocation.
The following list describes the features that are related to SLO-aware colocation:
Dynamic resource overcommitment: After you enable this feature, use the default configurations. This feature allows the system to overcommit resources that are allocated to pods but are not in use on a node, and then schedule the overcommitted resources to BE pods.
CPU Suppress: After you enable this feature, set the
cpuSuppressThresholdPercent
parameter to65
and use the default settings for the remaining configurations. This feature can limit the CPU usage of BE pods when the CPU utilization of the node exceeds 65%. This ensures the performance of LS pods.CPU QoS: After you enable this feature, use the default configurations. This feature allows you to enable the CPU Identity capability of Alibaba Cloud Linux. This way, LS pods are prioritized over BE pods during CPU scheduling. This prevents a BE pod from affecting an LS pod when simultaneous multithreading (SMT) is used to run the threads of both pods at the same time.
Resource isolation based on the L3 cache and MBA: After you enable this feature, use the default configurations. This feature allows you to isolate L3 cache (last level cache) and memory bandwidth on ECS Bare Metal instances. This way, LS pods are prioritized to use L3 cache and memory bandwidth.
ImportantThe CPU QoS feature can be enabled only when Alibaba Cloud Linux is used as the node OS.
L3 cache and memory bandwidth isolation can be enabled only when the node is deployed on an ECS Bare Metal instance.
Refer to the Deploy an online NGINX service section of this topic to deploy the online NGINX application on the tested machine.
Create a YAML file named be-ffmpeg.yaml and copy the following content to the file:
Run the following command to deploy the video transcoding application that uses FFmpeg:
kubectl apply -f besteffort-ffmpeg.yaml
Run the following command to query the status of the pods that are created for cert-manager:
kubectl get pod -l app=ffmpeg -o wide
Expected output:
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES besteffort-ffmpeg 1/1 Running 0 15s 11.162.XXX.XXX cn-beijing.192.168.2.93 <none> <none>
Run the following command to send requests to the NGINX service by using wrk:
# Replace node_ip with the IP address of the tested machine. Port 8000 of the NGINX service is exposed to the tested machine. ./wrk -t6 -c54 -d60s --latency http://${node_ip}:8000/
Run the following command to query the CPU utilization of the tested machine:
kubectl top node
Expected output:
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY% cn-beijing.192.168.2.93 65424m 63% xxxx xxxx cn-beijing.192.168.2.94 7040m 7% xxxx xxxx
The output shows that the CPU utilization of the tested machine is about
63%
.After the stress test is complete, run the following command to print the test results from wrk:
For more information about the test result, see the Test results section of this topic.
For more information about the colocation of different types of workloads, see the following topics:
FAQ
What do I do if the stress test result returned by wrk displays "Socket errors: connect 54,"?
Problem description
The stress test result returned by wrk displays Socket errors: connect 54,
. The connection between the wrk client and NGINX server fails to be created. As a result, the stress test fails.
Cause
This error occurs when the number of client connections exceeds the upper limit. As a result, the client fails to create a connection to the NGINX server.
Solution
To prevent this error, check the TCP connection settings on the stress test machine and enable the TCP connection reuse feature.
Log on to the stress test machine and run the following command to check whether TCP connection reuse is enabled:
sudo sysctl -n net.ipv4.tcp_tw_reuse
If
0
or2
is returned, the TCP connection reuse feature is disabled.Run the following command to enable the TCP connection reuse feature:
sudo sysctl -w net.ipv4.tcp_tw_reuse=1
Use wrk to initiate another stress test.
If the stress test result does not display
Socket errors: connect 54, ...
, the stress test is successful.
Run the preceding commands only on the stress test machine. You do not need to configure the tested machine. After the test is completed, run the sysctl -w net.ipv4.tcp_tw_reuse
command to disable the TCP connection reuse feature in case your service is affected by the feature.