Use cgroup configuration files to temporarily modify the resource parameters of pods, such as CPU parameters, memory parameters, and disk IOPS limits - Container Service for Kubernetes

If you want to temporarily modify container parameters for a running pod in a cluster that runs Kubernetes 1.27 or earlier, you must modify the PodSpec parameter and submit the change. Then, the pod is deleted and recreated. The feature provided by Container Service for Kubernetes (ACK) allows you to use cgroup configuration files to modify the resource parameters of pods. This allows you to temporarily adjust the CPU parameters, memory parameters, and disk IOPS limits without the need to restart the pod.

This feature is suitable for scenarios in which temporary adjustment is required. For example, the memory usage of a pod increases. In this case, you need to increase the memory limit of the pod to avoid triggering the out of memory (OOM) killer. To perform common O&M operations, we recommend that you use the relevant features provided by ACK. For more information, see Enable CPU Burst, Enable topology-aware CPU scheduling, and Resource profiling.

Prerequisites

A kubectl client is connected to the ACK cluster. For more information, see Obtain the kubeconfig file of a cluster and use kubectl to connect to the cluster.
The ack-koordinator component whose version is 0.5.0 or later is installed. For more information, see ack-koordinator (FKA ack-slo-manager).

Billing

No fee is charged when you install or use the ack-koordinator component. However, fees may be charged in the following scenarios:

ack-koordinator is a non-managed component that occupies worker node resources after it is installed. You can specify the amount of resources requested by each module when you install the component.
By default, ack-koordinator exposes the monitoring metrics of features such as resource profiling and fine-grained scheduling as Prometheus metrics. If you enable Prometheus metrics for ack-koordinator and use Managed Service for Prometheus, these metrics are considered custom metrics and fees are charged for these metrics. The fee depends on factors such as the size of your cluster and the number of applications. Before you enable Prometheus metrics, we recommend that you read the Billing topic of Managed Service for Prometheus to learn about the free quota and billing rules of custom metrics. For more information about how to monitor and manage resource usage, see Query the amount of observable data and bills.

Modify the memory limit

When the memory usage of a pod increases, you can dynamically modify the memory limit of the pod by using cgroup configuration files to avoid triggering the OOM killer. In this example, a container with an original memory limit of 1 GB is created to verify that the memory limit of the container can be modified by using cgroup configuration files without the need to restart the pod.

If you use this feature in clusters that run Kubernetes 1.22 or later, make sure that the version of the ack-koordinator component is v1.5.0-ack1.14 or later. For other component versions, only clusters that run Kubernetes 1.22 or earlier are supported.

Note

If you want to modify the CPU limit to meet common requirements, we recommend that you use the CPU Burst feature. This feature can automatically modify the CPU limit of the pod. For more information, see Enable CPU Burst. If you want to temporarily modify the CPU limit of the pod, perform the steps in Upgrade from resource-controller to ack-koordinator.

Create a pod-demo.yaml file with the following YAML template:

apiVersion: v1
kind: Pod
metadata:
  name: pod-demo
spec:
  containers:
  - name: pod-demo
    image: registry-cn-beijing.ack.aliyuncs.com/acs/stress:v1.0.4
    resources:
      requests:
        cpu: 1
        memory: "50Mi"
      limits:
        cpu: 1 
        memory: "1Gi" # Set the container memory limit to 1 GB. 
    command: ["stress"]
    args: ["--vm", "1", "--vm-bytes", "256M", "-c", "2", "--vm-hang", "1"]

Run the following command to deploy the pod-demo application in the cluster:
```
kubectl apply -f pod-demo.yaml
```

Run the following command to query the original memory limit of the container:

# The actual path consists of the UID of the pod and the ID of the container. 
cat /sys/fs/cgroup/memory/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-podaf44b779_41d8_43d5_a0d8_8a7a0b17****.slice/memory.limit_in_bytes

Expected output:

# In this example, 1073741824 is returned, which is the result of 1 × 1024 × 1024 × 1024. This indicates that the original memory limit of the container is 1 GB. 
1073741824

The output shows that the original memory limit of the container is 1 GB, which is the same as the value of the spec.containers.resources.limits.memory parameter in the YAML file that you created in Step 1.

Use the following YAML template to specify the memory limit of a container and create a cgroups-sample.yaml file:

apiVersion: resources.alibabacloud.com/v1alpha1
kind: Cgroups
metadata:
  name: cgroups-sample
spec:
  pod:
    name: pod-demo
    namespace: default
    containers:
    - name: pod-demo
      memory: 5Gi  # Change the memory limit of the pod to 5 GB.

Run the following command to deploy the cgroups-sample.yaml file in the cluster:
```
kubectl apply -f cgroups-sample.yaml
```

Run the following command to query the new memory limit of the container after you submit the change:

# The specific path can be obtained based on the UID of the pod. 
cat /sys/fs/cgroup/memory/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-podaf44b779_41d8_43d5_a0d8_8a7a0b17****.slice/memory.limit_in_bytes

Expected output:

# In this example, 5368709120 is returned, which is the result of 5 × 1024 × 1024 × 1024. This indicates that the new memory limit of the container is 5 GB.
5368709120

The output shows that the memory limit of the container is 5 GB, which is the same as the value of the spec.pod.containers.memory parameter in the YAML file that you created in Step 4. The modification is successful.

Run the following command to query the status of the pod:

kubectl describe pod pod-demo

Expected output:

Events:
  Type    Reason          Age   From               Message
  ----    ------          ----  ----               -------
  Normal  Scheduled       36m   default-scheduler  Successfully assigned default/pod-demo to cn-hangzhou.192.168.0.50
  Normal  AllocIPSucceed  36m   terway-daemon      Alloc IP 192.XX.XX.51/24 took 4.490542543s
  Normal  Pulling         36m   kubelet            Pulling image "registry-cn-beijing.ack.aliyuncs.com/acs/stress:v1.0.4"
  Normal  Pulled          36m   kubelet            Successfully pulled image "registry-cn-beijing.ack.aliyuncs.com/acs/stress:v1.0.4" in 2.204s (2.204s including waiting). Image size: 7755078 bytes.
  Normal  Created         36m   kubelet            Created container pod-demo
  Normal  Started         36m   kubelet            Started container pod-demo

The output shows that the pod runs as normal and no restart events are generated.

Change the vCores that are bound to a pod

If your application requires high CPU performance and you want to achieve better resource isolation, you can modify the vCores that are bound to the pod and specify the serial numbers of vCores that can be used by the pod.

In this example, a pod without bound vCores is created to verify that the bound vCores of the pod can be modified by using cgroup configuration files without the need to restart the pod.

Note

In common cases, we recommend that you use topology-aware CPU scheduling to manage CPU resources for CPU-sensitive workloads. For more information, see Enable topology-aware CPU scheduling.

Create a pod-cpuset-demo.yaml file with the following YAML template:

apiVersion: v1
kind: Pod
metadata:
  name: pod-cpuset-demo
spec:
  containers:
  - name: pod-cpuset-demo
    image: registry-cn-beijing.ack.aliyuncs.com/acs/stress:v1.0.4
    resources:
      requests:
        memory: "50Mi"
      limits:
        memory: "1000Mi"
        cpu: 0.5
    command: ["stress"]
    args: ["--vm", "1", "--vm-bytes", "556M", "-c", "2", "--vm-hang", "1"]

Run the following command to deploy the pod-cpuset-demo.yaml file in the cluster:
```
kubectl apply -f pod-cpuset-demo.yaml
```

Run the following command to query the vCores that are bound to the container:

# The actual path consists of the UID of the pod and the ID of the container. 
cat /sys/fs/cgroup/cpuset/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-podf9b79bee_eb2a_4b67_befe_51c270f8****.slice/cri-containerd-aba883f8b3ae696e99c3a920a578e3649fa957c51522f3fb00ca943dc2c7****.scope/cpuset.cpus

Expected output:

0-31

The output shows that the serial numbers of the vCores that can be used by the container range from 0 to 31 before you bind vCores to the container.

Use the following YAML template to specify the vCores and create a cgroups-sample-cpusetpod.yaml file:

apiVersion: resources.alibabacloud.com/v1alpha1
kind: Cgroups
metadata:
  name: cgroups-sample-cpusetpod
spec:
  pod:
    name: pod-cpuset-demo
    namespace: default
    containers:
    - name: pod-cpuset-demo
      cpuset-cpus: 2-3  # Bind vCore 2 and vCore 3 to the pod.

Run the following command to deploy the cgroups-sample-cpusetpod.yaml file in the cluster:
```
kubectl apply -f cgroups-sample-cpusetpod.yaml
```
Run the following command to query the vCores that are bound to the container after you submit the change:
```
# The actual path consists of the UID of the pod and the ID of the container. 
cat /sys/fs/cgroup/cpuset/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-podf9b79bee_eb2a_4b67_befe_51c270f8****.slice/cri-containerd-aba883f8b3ae696e99c3a920a578e3649fa957c51522f3fb00ca943dc2c7****.scope/cpuset.cpus
```
Expected output:
```
2-3
```
The output shows that vCore 2 and vCore 3 are bound to the containers. The vCores that are bound to the containers are the same as the vCores that are specified in the spec.pod.containers.cpuset-cpus parameter in the YAML file that you created in Step 4. The modification is successful.

Run the following command to query the status of the pod:

kubectl describe pod pod-cpuset-demo

Expected output:

Events:
  Type    Reason          Age   From               Message
  ----    ------          ----  ----               -------
  Normal  Scheduled       7m7s  default-scheduler  Successfully assigned default/pod-cpuset-demo to cn-hangzhou.192.XX.XX.50
  Normal  AllocIPSucceed  7m5s  terway-daemon      Alloc IP 192.XX.XX.56/24 took 2.060752512s
  Normal  Pulled          7m5s  kubelet            Container image "registry-cn-beijing.ack.aliyuncs.com/acs/stress:v1.0.4" already present on machine
  Normal  Created         7m5s  kubelet            Created container pod-cpuset-demo
  Normal  Started         7m5s  kubelet            Started container pod-cpuset-demo
  Normal  CPUSetBind      84s   koordlet           set cpuset 2-3 to container pod-cpuset-demo success

The output shows that the pod runs as normal and no restart events are generated.

Modify the disk IOPS

If you want to modify the disk IOPS for a pod, you must use Alibaba Cloud Linux as the operating system of the worker node that you want to manage.

In this example, an I/O-intensive application is created to verify that the disk IOPS of a pod can be modified by using cgroup configuration files without the need to restart the pod.

Note

If you specify a blkio limit in cgroup v1, the OS kernel limits only the direct I/O of a pod. The OS kernel does not limit the buffered I/O of a pod. To limit the buffered I/O of a pod, use cgroup v2 or enable the cgroup writeback feature for Alibaba Cloud Linux. For more information, see Enable the cgroup writeback feature.

Create an I/O-intensive application with the following YAML template:

Mount the host directory /mnt to the pod. The device name of the corresponding disk is /dev/vda1.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: fio-demo
  labels:
    app: fio-demo
spec:
  selector:
    matchLabels:
      app: fio-demo
  template:
    metadata:
      labels:
        app: fio-demo
    spec:
      containers:
      - name: fio-demo
        image: registry.cn-zhangjiakou.aliyuncs.com/acs/fio-for-slo-test:v0.1
        command: ["sh", "-c"]
        # Use Fio to perform write stress tests on the disk. 
        args: ["fio -filename=/data/test -direct=1 -iodepth 1 -thread -rw=write -ioengine=psync -bs=16k -size=2G -numjobs=10 -runtime=12000 -group_reporting -name=mytest"]
        volumeMounts:
          - name: pvc
            mountPath: /data    # The disk volume is mounted to the path /data. 
      volumes:
        - name: pvc
          hostPath:
            path: /mnt

Run the following command to deploy the fio-demo Deployment in the cluster:
```
kubectl apply -f fio-demo.yaml
```

Create a cgroup configuration file that is used to control the disk IOPS to limit the throughput of the pod.

Use the following YAML file to specify the bytes per second (BPS) limit of the /dev/vda1 device and create a file named cgroups-sample-fio.yaml:

apiVersion: resources.alibabacloud.com/v1alpha1
kind: Cgroups
metadata:
  name: cgroups-sample-fio
spec:
  deployment:
    name: fio-demo
    namespace: default
    containers:
    - name: fio-demo
      blkio:
        # The I/O limit in bit/s. Example: 1048576, 2097152, or 3145728. 
        device_write_bps: [{device: "/dev/vda1", value: "1048576"}]

Run the following command to query the disk IOPS limit after you submit the change:

# The actual path consists of the UID of the pod and the ID of the container. 
cat /sys/fs/cgroup/blkio/kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-pod0840adda_bc26_4870_adba_f193cd00****.slice/cri-containerd-9ea6cc97a6de902d941199db2fcda872ddd543485f5f987498e40cd706dc****.scope/blkio.throttle.write_bps_device

Expected output:

253:0 1048576

The output shows that the IOPS limit of the disk is 1048576 bit/s.

View the monitoring data of the node.
The figures show that the BPS of the container is the same as the value that is specified in the device_write_bps parameter of the YAML file that you created in Step 3. The pod is not restarted after you submit the change.
Note
For more information about how to enable Managed Service for Prometheus, see Managed Service for Prometheus. In the left-side navigation pane, choose Operations > Prometheus Monitoring. Click the Application Monitoring tab to view the disk data.

Dynamically modify Deployment-level resource parameters of a pod

The dynamic modification of pod-level resource parameters described in the preceding section also takes effect in the Deployment-level parameters. Pod-level resource parameters are modified by using the spec.pod field in the cgroups configuration files. Deployment-level resource parameters are modified by using the spec.deployment field. The following example describes how to modify the vCores in a Deployment. The operations in other scenarios are similar.

Create a go-demo.yaml file with the following YAML template:

The Deployment creates two pods that run a stress testing program. Each pod requests 0.5 vCores.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: go-demo
  labels:
    app: go-demo
spec:
  replicas: 2
  selector:
    matchLabels:
      app: go-demo
  template:
    metadata:
      labels:
        app: go-demo
    spec:
      containers:
      - name: go-demo
        image: polinux/stress
        command: ["stress"]
        args: ["--vm", "1", "--vm-bytes", "556M", "-c", "1", "--vm-hang", "1"]
        imagePullPolicy: Always
        resources:
          requests:
            cpu: 0.5
          limits:
            cpu: 0.5

Run the following command to deploy the go-demo Deployment in the cluster:
```
kubectl apply -f go-demo.yaml
```

Use the following YAML template to specify the CPU binding information and create a file named cgroups-cpuset-sample.yaml:

apiVersion: resources.alibabacloud.com/v1alpha1
kind: Cgroups
metadata:
  name: cgroups-cpuset-sample
spec:
  deployment: # This is a Deployment. 
    name: go-demo
    namespace: default
    containers:
    - name: go-demo
      cpuset-cpus: 2,3 # Bind vCore 2 and vCore 3 to the pod.

Run the following command to deploy the cgroups-cpuset-sample Deployment in the cluster:
```
kubectl apply -f cgroups-cpuset-sample.yaml
```
Run the following command to query the vCores that are bound to the container after you submit the change:
```
# The actual path consists of the UID of the pod and the ID of the container. 
cat /sys/fs/cgroup/cpuset/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod06de7408_346a_4d00_ba25_02833b6c****.slice/cri-containerd-733a0dc93480eb47ac6c5abfade5c22ed41639958e3d304ca1f85959edc3****.scope/cpuset.cpus
```
Expected output:
```
2-3
```
The output shows that vCore 2 and vCore 3 are bound to the containers. The vCores that are bound to the containers are the same as the vCores that are specified in the spec.deployment.containers.cpuset-cpus parameter in the cgroup configuration file.