Kubernetes uses cgroups to isolate container resources. cgroup v2 provides a unified API design to access resources and supports resource monitoring and request redirects based on pods. cgroup v2 also provides enhanced resource isolation across multiple resources. This topic describes how to use ack-image-builder to build a cgroup v2 custom image, and then create cgroup v2 nodes from the custom image in the Container Service for Kubernetes (ACK) console.
Background information
CGroup
Kubernetes uses cgroups to isolate container resources. cgroups have two versions: v1 and v2. cgroup v2 provides an enhanced user experience and the following features:
A unified API design to access resources.
New features such as Pressure Stall Information (PSI).
Extended Berkeley Packet Filter (eBPF) programs can be attached to cgroups. Resource monitoring and request redirects based on pods are also supported.
Enhanced resource isolation across multiple resources.
Unified accounting for different types of memory allocations, such as network memory and kernel memory.
Accounting for non-immediate resource changes. For example, statistics about page cache write backs can be collected to limit non-immediate input and output.
Kubernetes 1.18 brings cgroup v2 to Alpha, Kubernetes 1.22 brings cgroup v2 to Beta, and Kubernetes 1.25 brings cgroup v2 to general availability (GA). For more information, see About cgroup v2.
ack-image-builder
ack-image-builder is an image build tool provided by Alibaba Cloud. ack-image-builder can help automate image building. You can use ack-image-builder to build an OS image of cgroup v2 and then create cgroup v2 nodes in your ACK cluster from the image. For more information, see ack-image-builder.
ack-image-builder is developed based on open source HashiCorp Packer. HashiCorp Packer provides default configuration templates and verification scripts. For more information, see HashiCorp Packer.
Limits
Limit | Description |
Operating system | Only Alibaba Cloud Linux 3 supports cgroup v2. You can use only Alibaba Cloud Linux 3 as the base image to build a custom image that supports cgroup v2. |
Runtimes | Only containerd is supported. |
Kubernetes | Versions 1.28 and later are supported. |
Applications or components | If the applications or components in your cluster rely on cgroups, make sure that they are compatible with cgroup v2.
|
Usage notes
If you use Java applications, we recommend that you use JDK 11.0.16 or a version later than JDK 15 to ensure compatibility with cgroup v2. For more information, see JDK-8230305.
Prerequisites
You have submitted an application in the Quota Center console to use the custom image feature.
Procedure
To build a custom image by using ack-image-builder, perform the following steps.
Install Packer.
Go to the Install Packer page, select a software version based on your operating system, and install Packer. For more information about how to install Packer, see Install Packer.
Run the following command to view the version of Packer:
packer version
Expected output:
Packer v1.*.*
The preceding output indicates that Packer is installed.
Run the following command to download the cgroup v2 configuration template:
git clone https://github.com/AliyunContainerService/ack-image-builder.git cd ack-image-builder
Build a custom image that is used to create cgroup v2 nodes.
Run the following command to import the AccessKey pair. The AccessKey pair is used to create temporary resources when you build the custom image.
export ALICLOUD_ACCESS_KEY=XXXXXX export ALICLOUD_SECRET_KEY=XXXXXX
Run the following command to build a custom image:
packer build -var cgroup_mode=CGROUP_MODE_V2 examples/ack-aliyunlinux3.json
The
cgroup_mode
parameter specifies the cgroup mode of the image. The default value isCGROUP_MODE_V1
. Set the parameter toCGROUP_MODE_V2
to build a custom image that uses the cgroup v2 mode.scripts/set-cgroupv2.sh
indicates the cgroup version andm-xxxxxxxxxxxxxxxxx
indicates the ID of the custom image.
Create an ACK cluster from the custom image.
The following example shows how to create an ACK Pro cluster.
Log on to the ACK console. In the left-side navigation pane, click Clusters.
In the upper-right corner of the Clusters page, click Create Kubernetes Cluster.
On the Managed Kubernetes tab, configure the cluster based on the custom image that you created, and complete the cluster creation as prompted.
Configure the following key parameters as required. For more information about the parameters, see Create an ACK managed cluster.
After you configure the cluster parameters, click Next:Node Pool Configurations.
On the Node Pool Configurations wizard page, click Show Advanced Options. Click Select a custom image next to Custom Image.
In the Choose Custom Image dialog box, select the custom image and click Use.
Complete other configurations for the cluster.
After the cluster is created from the custom image, the nodes in the cluster use the custom image. Any subsequent nodes added to the node pool during scaling also use this image..
Log on to a cluster node and run the following command to query the cgroup type. Make sure that the node uses cgroup v2.
df -T /sys/fs/cgroup
Expected output:
Filesystem Type 1K-blocks Used Available Use% Mounted on cgroup2 cgroup2 0 0 0 - /sys/fs/cgroup
The output indicates that the node uses cgroup v2.
Use scenarios of cgroup v2
Limit container IOPS with cgroup v2
The asynchronous block IOPS collected by cgroup v1 is inaccurate, and the values and limits of the container IOPS are usually much smaller than the actual values.
cgroup v2 can accurately collect the asynchronous IOPS of a container. Therefore, you can use cgroup v2 to limit container IOPS.
How to limit container IOPS:
Write the disk IOPS limit to the io.max interface file in cgroup v2 to limit the maximum IOPS that a cgroup can consume in a container.
For more information, see Control Group v2.
Example:
Use cgroup v2 to limit the container IOPS when the container starts, and run the dd command to verify that the IOPS is lower than the specified bandwidth:
apiVersion: v1
kind: Pod
metadata:
name: write-file-pod
spec:
restartPolicy: Never
containers:
- name: dd-container
image: alibaba-cloud-linux-3-registry.cn-hangzhou.cr.aliyuncs.com/alinux3/alinux3
command:
- bash
- -c
- "yum install -y sysstat; \
echo '253:0 wbps=10485760' > /sys/fs/cgroup$(cat /proc/1/cgroup | awk -F ':' '{print $3}')/io.max; \
dd if=/dev/zero of=/writefile bs=100M count=10 & iostat -dh 1 30;"
securityContext:
privileged: true
# "echo '253:0 wbps=10485760' > /sys/fs/cgroup/.../io.max" specifies the IOPS limit.
# "253:0" specifies the device ID of the disk. Replace it with the device ID of the disk that you write.
# "wbps" specifies the input upper limit of a disk. 10485760 is used in this example, which equals to 10MB/s.
After you deploy the preceding pod to the cluster, you can print the pod log to verify that the disk IOPS is limited to 10 MB/s.
# kubectl logs write-file-pod -f
....
tps kB_read/s kB_wrtn/s kB_read kB_wrtn Device
91.00 0.0k 10.8M 0.0k 10.8M vda
tps kB_read/s kB_wrtn/s kB_read kB_wrtn Device
88.00 0.0k 9.6M 0.0k 9.6M vda