Colocate online services and video transcoding applications - Container Service for Kubernetes

Container Service for Kubernetes (ACK) provides service level objective (SLO)-aware workload scheduling based on the ack-koordinator component. You can use SLO-aware workload scheduling to colocate online service applications and offline computing applications. This topic describes how to use ack-koordinator to colocate an online service and a video transcoding application.

Use scenarios

You can colocate an online NGINX service and a video transcoding application that uses FFmpeg. This allows you to utilize the resources that are allocated to pods but are not in use on the node. You can enable the resource isolation feature of ack-koordinator to isolate the resources allocated to the online NGINX service from the resources allocated to the video transcoding application that uses FFmpeg. This way, the performance of the online NGINX service can be guaranteed.

Colocation deployment architecture

In this example, an online NGINX service and a video transcoding application that uses FFmpeg are colocated on a node. The Quality of Service (QoS) class of the pod created for the NGINX service is set to latency-sensitive (LS). The QoS class of the pod created for the video transcoding application is set to best-effort (BE). In this topic, the NGINX service is deployed in different colocation modes to test the performance of the NGINX service.

The following features are used in colocation:

Resource reuse: This feature allows BE workloads to overcommit the resources that are allocated to LS workloads but are not in use. This improves the resource utilization of the cluster. For more information, see Dynamic resource overcommitment.
Resource isolation: This feature uses various methods to limit the resources used by BE workloads and prioritize the resource demand of LS workloads. For more information, see CPU QoS, CPU Suppress, and Resource isolation based on the L3 cache and MBA.

Preparations

Set up the environment

Add two nodes to an ACK Pro cluster. One of the nodes runs an NGINX web service and a video transcoding application that uses FFmpeg, and serves as the tested machine. The other node has the wrk tool installed and is used to perform stress tests by sending requests to the NGINX service. For more information, see Create an ACK Pro cluster.
To maximize the benefits provided by the colocation capability of ack-koordinator, we recommend that you deploy the tested machine on an Elastic Compute Service (ECS) Bare Metal instance and use Alibaba Cloud Linux as the operating system of the node.
Install ack-koordinator (FKA ack-slo-manager) and enable the colocation policies. For more information, see Getting started. In this example, ack-koordinator 0.8.0 is used.

Deploy an online NGINX service

Deploy an online NGINX service and wrk.

Create a YAML file named ls-nginx.yaml and copy the following content to the file:

Show YAML file content

---
# Example of the NGINX configuration.
apiVersion: v1
data:
  config: |-
    user  nginx;
    worker_processes  80; # The number of worker processes that run in the NGINX service. This parameter specifies the number of requests that the NGINX server can concurrently process. 

    events {
        worker_connections  1024;  # The number of worker connections. Default value: 1024. 
    }

    http {
        server {
            listen  8000;

            gzip off;
            gzip_min_length 32;
            gzip_http_version 1.0;
            gzip_comp_level 3;
            gzip_types *;
        }
    }

    #daemon off;
kind: ConfigMap
metadata:
  name: nginx-conf

---
# The pod that runs the online NGINX service. 
apiVersion: v1
kind: Pod
metadata:
  labels:
    koordinator.sh/qosClass: LS
    app: nginx
  name: nginx
spec:
  containers:
    - image: 'koordinatorsh/nginx:v1.18-koord-exmaple'
      imagePullPolicy: IfNotPresent
      name: nginx
      ports:
        - containerPort: 8000
          hostPort: 8000 # The port that is used to perform stress tests. 
          protocol: TCP
      resources:
        limits:
          cpu: '80'
          memory: 10Gi
        requests:
          cpu: '80'
          memory: 10Gi
      volumeMounts:
        - mountPath: /apps/nginx/conf
          name: config
  hostNetwork: true
  restartPolicy: Never
  volumes:
    - configMap:
        items:
          - key: config
            path: nginx.conf
        name: nginx-conf
      name: config
  nodeName: cn-beijing.192.168.2.93  # Replace the value with the node name of the tested machine.

Run the following command to deploy the NGINX service:
```
kubectl apply -f ls-nginx.yaml
```

Run the following command to query the name of the pod that runs the NGINX service:

kubectl get pod -l app=nginx -o wide

Expected output:

NAME    READY   STATUS    RESTARTS   AGE    IP               NODE                      NOMINATED NODE   READINESS GATES
nginx   1/1     Running   0          43s    11.162.XXX.XXX   cn-beijing.192.168.2.93   <none>           <none>

The output shows that the pod is in the Running state. This indicates that the NGINX service runs as expected on the tested machine.

Run the following command on the other node to deploy wrk:

wget -O wrk-4.2.0.tar.gz https://github.com/wg/wrk/archive/refs/tags/4.2.0.tar.gz && tar -xvf wrk-4.2.0.tar.gz
cd wrk-4.2.0 && make && chmod +x ./wrk

Deploy an offline video transcoding application

Deploy an offline video transcoding application that uses FFmpeg.

Create a YAML file named be-ffmpeg.yaml and copy the following content to the file:

Show YAML file content

# The pod that runs the offline video transcoding application that uses FFmpeg. 
apiVersion: v1
kind: Pod
metadata:
  name: be-ffmpeg
  labels:
    app: ffmpeg
  # 1. Use the default colocation mode of Kubernetes to deploy the application: Delete the koordinator.sh/qosClass=BE label. 
  # 2. Use the SLO-aware colocation mode of ACK to deploy the application: Retain the koordinator.sh/qosClass=BE label. 
    koordinator.sh/qosClass: BE
spec:
  containers:
    # You can increase the number of processes in the pod to improve the resource utilization of the offline video transcoding application. The default number is 25. Each process contains two threads that run in parallel. 
    - command:
        - start-ffmpeg.sh
        - '25'
        - '2'
        - /apps/ffmpeg/input/HD2-h264.ts
        - /apps/ffmpeg/
      image: 'registry.cn-zhangjiakou.aliyuncs.com/acs/ffmpeg-4-4-1-for-slo-test:v0.1'
      imagePullPolicy: Always
      name: ffmpeg
      resources:
      # 1. Use the default colocation mode of Kubernetes to deploy the application: Delete the following extended resources: kubernetes.io/batch-cpu and kubernetes.io/batch-memory. 
      # 2. Use the SLO-aware colocation mode of ACK to deploy the application: Retain the following extended resources: kubernetes.io/batch-cpu and kubernetes.io/batch-memory. 
      # Specify resource requests based on the resource specifications of the node. 
        limits:
          kubernetes.io/batch-cpu: 70k
          kubernetes.io/batch-memory: 22Gi
        requests:
          kubernetes.io/batch-cpu: 70k
          kubernetes.io/batch-memory: 22Gi
  hostNetwork: true
  restartPolicy: Never
  nodeName: cn-beijing.192.168.2.93  # Replace the value with the node name of the tested machine.

Run the following command to deploy the video transcoding application that uses FFmpeg:
```
kubectl apply -f be-ffmpeg.yaml
```
Run the following command to query the status of the pod in which the video transcoding application that uses FFmpeg runs:
```
kubectl get pod -l app=ffmpeg -o wide
```
Expected output:
```
NAME        READY   STATUS    RESTARTS   AGE    IP               NODE                      NOMINATED NODE   READINESS GATES
be-ffmpeg   1/1     Running   0          15s    11.162.XXX.XXX   cn-beijing.192.168.2.93   <none>           <none>
```
The output shows that the pod is in the Running state. This indicates that the video transcoding application that uses FFmpeg runs as expected on the tested machine.

Procedure

In this topic, the online service and offline application are deployed in three colocation modes to test the application performance and the resource utilization of the node in these modes. The following table compares these modes in terms of application performance and the resource utilization of the node.

Deployment mode	Description
Exclusive deployment mode of the online service (baseline group)	In this mode, only the online NGINX service is deployed on the tested machine. The video transcoding application that uses FFmpeg is not deployed on the tested machine. Use wrk to send requests to the NGINX service to test the performance and the resource utilization of the node in this mode.
Default colocation mode of Kubernetes (control group)	Deploy the online NGINX service and the offline video transcoding application that uses FFmpeg on the tested machine. Then, use wrk to send requests to the NGINX service. In this example, the QoS class of the video transcoding application that uses FFmpeg is set to BE, and resource requests and limits are not specified in the pod configuration of the video transcoding application. You can change the number of processes to control the CPU utilization of the node. In this example, the CPU utilization of the node is 65%. Test the performance of the NGINX service and the resource utilization of the node in this mode.
SLO-aware colocation mode of ACK (experimental group)	In this mode, the online NGINX service and the offline video transcoding application that uses FFmpeg are deployed on the tested machine. Use wrk to send requests to the NGINX service to test the performance and the resource utilization of the node in this mode. In this example, the QoS class of the video transcoding application that uses FFmpeg is set to BE, and extended resources are requested by the pod of the video transcoding application. For more information, see Dynamic resource overcommitment. The CPU Suppress, CPU QoS, and resource isolation based on the L3 cache and MBA features are enabled for the tested machine. Test the performance of the NGINX service and the resource utilization of the node in this mode.

Test results

The following metrics are used to evaluate the performance of the NGINX service and the resource utilization of the node in different colocation modes:

Response time (RT)-percentile: RT is commonly used to evaluate the performance of an online application. A lower RT value indicates higher performance. You can obtain the RT value in the output of wrk. The RT value indicates the amount of time that the NGINX service requires to process the request from wrk. For example, RT-P50 indicates the maximum amount of time that the NGINX service requires to process 50% of the requests from wrk. RT-P90 indicates the maximum amount of time that the NGINX service requires to process 90% of the requests from wrk.
Average CPU utilization of the node: This metric indicates the CPU utilization of applications on the node within a time period. You can run the kubectl top node command to query the average CPU utilization of the node in different colocation modes.

The following table describes the metrics in different modes.

Metric	Baseline group (Exclusive deployment mode)	Control group (Default colocation mode)	Experimental group (SLO-aware colocation mode)
NGINX RT-P90 (ms)	0.533	0.574 (+7.7%)	0.548 (2.8%)
NGINX RT-P99 (ms)	0.93	1.07 (+16%)	0.96 (+3.2%)
Average CPU utilization of the node	29.6%	65.1%	64.8%

Compare the metrics of the control group with the metrics of the baseline group: The average CPU utilization of the node increases from 29.6% to 65.1%. The values of NGINX RT-P90 and NGINX RT-P99 greatly increase. The RT curve has a long tail.
Compare the metrics of the experimental group with the metrics of the baseline group: The average CPU utilization of the node increases from 28.5% to 64.8%. The values of NGINX RT-P90 and NGINX RT-P99 slightly increase.
Compare the metrics of the experimental group with the metrics of the control group: The average CPU utilization of the node is similar. The values of NGINX RT-P90 and NGINX RT-P99 greatly decrease and are close to the values of NGINX RT-P90 and NGINX RT-P99 of the baseline group.

The preceding results show that the SLO-aware colocation mode of ACK can effectively increase the CPU utilization of the node and mitigate performance interference when an online NGINX service and an offline video transcoding application are deployed on the same node.

Deploy applications

Exclusive deployment mode of the online service

Deploy only the online service on the tested machine.

Refer to the Deploy an online NGINX service section of this topic to deploy the online NGINX service on the tested machine.

Use wrk to send requests to the NGINX service.

# Replace node_ip with the IP address of the tested machine. Port 8000 of the NGINX service is exposed to the tested machine. 
./wrk -t6 -c54 -d60s --latency http://${node_ip}:8000/

Run the following command to query the average CPU utilization of the tested machine:

kubectl top node

Expected output:

NAME                      CPU(cores)   CPU%   MEMORY(bytes)   MEMORY%
cn-beijing.192.168.2.93   29593m       29%    xxxx            xxxx
cn-beijing.192.168.2.94   6874m        7%     xxxx            xxxx

The output shows that the average CPU utilization of the tested machine is about 29%.

After the stress test is complete, run the following command to query the test results from wrk:
To obtain precise test results, we recommend that you perform multiple stress tests.
```
Running 1m test @ http://192.168.2.94:8000/
```
Expected output:
```
  6 threads and 54 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   402.18us    1.07ms  59.56ms   99.83%
    Req/Sec    24.22k     1.12k   30.58k    74.15%
  Latency Distribution
     50%  343.00us
     75%  402.00us
     90%  523.00us
     99%  786.00us
  8686569 requests in 1.00m, 6.88GB read
Requests/sec: 144537.08
Transfer/sec:    117.16MB
```
RT is a key metric for evaluating the performance of the online NGINX service in different scenarios. In the output, the Latency Distribution section shows the distribution of RT percentile values. For example, 90% 523.00us indicates that the RT for processing 90% of the requests is 523.00 microseconds. When only the online NGINX service is deployed, the RT-P50 is 343 microseconds, the RT-P90 is 523 microseconds, and the RT-P99 is 786 microseconds.

Default colocation mode of Kubernetes

Colocate the online NGINX service and the offline video transcoding application that uses FFmpeg on the tested machine. Then, use wrk to send requests to the NGINX service. Refer to the SLO-aware colocation mode of ACK section of this topic to colocate the applications. You must configure parameters and annotations based on the requirements in the YAML template.

SLO-aware colocation mode of ACK

Set the QoS class of the video transcoding application that uses FFmpeg to BE.

Refer to the Getting started topic to enable SLO-aware colocation.
The following list describes the features that are related to SLO-aware colocation:
- Dynamic resource overcommitment: After you enable this feature, use the default configurations. This feature allows the system to overcommit resources that are allocated to pods but are not in use on a node, and then schedule the overcommitted resources to BE pods.
- CPU Suppress: After you enable this feature, set the cpuSuppressThresholdPercent parameter to 65 and use the default settings for the remaining configurations. This feature can limit the CPU usage of BE pods when the CPU utilization of the node exceeds 65%. This ensures the performance of LS pods.
- CPU QoS: After you enable this feature, use the default configurations. This feature allows you to enable the CPU Identity capability of Alibaba Cloud Linux. This way, LS pods are prioritized over BE pods during CPU scheduling. This prevents a BE pod from affecting an LS pod when simultaneous multithreading (SMT) is used to run the threads of both pods at the same time.
- Resource isolation based on the L3 cache and MBA: After you enable this feature, use the default configurations. This feature allows you to isolate L3 cache (last level cache) and memory bandwidth on ECS Bare Metal instances. This way, LS pods are prioritized to use L3 cache and memory bandwidth.
Important
- The CPU QoS feature can be enabled only when Alibaba Cloud Linux is used as the node OS.
- L3 cache and memory bandwidth isolation can be enabled only when the node is deployed on an ECS Bare Metal instance.
Refer to the Deploy an online NGINX service section of this topic to deploy the online NGINX application on the tested machine.

Create a YAML file named be-ffmpeg.yaml and copy the following content to the file:

Show YAML file content

# The pod that runs the offline video transcoding application that uses FFmpeg. 
apiVersion: v1
kind: Pod
metadata:
  name: be-ffmpeg
  labels:
    app: ffmpeg
  labels:
    # Set the QoS class of the video transcoding application that uses FFmpeg to BE. 
    koordinator.sh/qosClass: BE
spec:
  containers:
    - command:
        - start-ffmpeg.sh
        - '30'
        - '2'
        - /apps/ffmpeg/input/HD2-h264.ts
        - /apps/ffmpeg/
      image: 'registry.cn-zhangjiakou.aliyuncs.com/acs/ffmpeg-4-4-1-for-slo-test:v0.1'
      imagePullPolicy: Always
      name: ffmpeg
      resources:
      # Apply for the following dynamically overcommitted resources: kubernetes.io/batch-cpu and kubernetes.io/batch-memory. 
        limits:
          kubernetes.io/batch-cpu: 70k
          kubernetes.io/batch-memory: 22Gi
        requests:
          kubernetes.io/batch-cpu: 70k
          kubernetes.io/batch-memory: 22Gi
  hostNetwork: true
  restartPolicy: Never
  nodeName: cn-beijing.192.168.2.93  # Replace the value with the node name of the tested machine.

Run the following command to deploy the video transcoding application that uses FFmpeg:
```
kubectl apply -f besteffort-ffmpeg.yaml
```

Run the following command to query the status of the pods that are created for cert-manager:

kubectl get pod -l app=ffmpeg -o wide

Expected output:

NAME                READY   STATUS    RESTARTS   AGE    IP               NODE                      NOMINATED NODE   READINESS GATES
besteffort-ffmpeg   1/1     Running   0          15s    11.162.XXX.XXX   cn-beijing.192.168.2.93   <none>           <none>

Run the following command to send requests to the NGINX service by using wrk:

# Replace node_ip with the IP address of the tested machine. Port 8000 of the NGINX service is exposed to the tested machine. 
./wrk -t6 -c54 -d60s --latency http://${node_ip}:8000/

Run the following command to query the CPU utilization of the tested machine:

kubectl top node

Expected output:

NAME                      CPU(cores)   CPU%   MEMORY(bytes)   MEMORY%
cn-beijing.192.168.2.93   65424m       63%    xxxx            xxxx
cn-beijing.192.168.2.94   7040m        7%     xxxx            xxxx

The output shows that the CPU utilization of the tested machine is about 63%.

After the stress test is complete, run the following command to print the test results from wrk:
For more information about the test result, see the Test results section of this topic.

For more information about the colocation of different types of workloads, see the following topics:

FAQ

What do I do if the stress test result returned by wrk displays "Socket errors: connect 54,"?

Problem description

The stress test result returned by wrk displays Socket errors: connect 54,. The connection between the wrk client and NGINX server fails to be created. As a result, the stress test fails.

Cause

This error occurs when the number of client connections exceeds the upper limit. As a result, the client fails to create a connection to the NGINX server.

Solution

To prevent this error, check the TCP connection settings on the stress test machine and enable the TCP connection reuse feature.

Log on to the stress test machine and run the following command to check whether TCP connection reuse is enabled:
```
sudo sysctl -n net.ipv4.tcp_tw_reuse
```
If 0 or 2 is returned, the TCP connection reuse feature is disabled.
Run the following command to enable the TCP connection reuse feature:
```
sudo sysctl -w net.ipv4.tcp_tw_reuse=1
```
Use wrk to initiate another stress test.
If the stress test result does not display Socket errors: connect 54, ..., the stress test is successful.

Note

Run the preceding commands only on the stress test machine. You do not need to configure the tested machine. After the test is completed, run the sysctl -w net.ipv4.tcp_tw_reuse command to disable the TCP connection reuse feature in case your service is affected by the feature.