ContainerOS is an operating system that Alibaba Cloud provides for containerized development. ContainerOS is fully compatible with Kubernetes. ContainerOS accelerates operating system startups and image pulling to improve the efficiency of node scale-outs for Container Service for Kubernetes (ACK). This topic describes how to use ContainerOS to quickly scale out nodes.
Table of contents
Prerequisites
ContainerOS is specified as the operating system of your managed node pool. For more information, see Use ContainerOS.
If you create a managed node pool that runs ContainerOS for the first time, make sure that the following components in your ACK cluster are updated to the latest version so that cluster nodes can be quickly scaled out:
The network component of the cluster: Terway or Flannel
The default volume component of the cluster: csi-plugin
To check whether the components are updated to the latest version, go to the cluster details page of your cluster and choose
in the left-side navigation pane. If Upgrade is displayed in the lower-right part of the card of the component, click Upgrade to update the component.
Usage notes
To accelerate the startup of the operating system, ContainerOS pre-installs container images to reduce the amount of time required for pulling the images. When you use ContainerOS, do not manually update the ACK components, including csi-plugin and Terway or Flannel. Otherwise, the pre-installed image version may differ from the application version and the startup of the operating system may be slowed down.
Container images are layered. Due to this characteristic, it requires less time and is more flexible for ContainerOS to update images than pulling images. We recommend that you update the corresponding components in advance to improve your experience when you scale out nodes.
Benefits of ContainerOS
Optimization | Description |
Operating system startup speed | ContainerOS simplifies the operating system startup procedure to accelerate operating system startups. ContainerOS is an operating system for VMs in the cloud. ContainerOS relies on only a small number of hardware drivers. You can change the required kernel driver module to the built-in mode. In addition, ContainerOS deprecates the initial RAM file system (initramfs) and simplifies udev rules to greatly speed up operating system startups. For example, Alibaba Cloud Linux 3 requires more than one minute for the initial boot on an Elastic Compute Service (ECS) instance of the ecs.g7.large type, but ContainerOS requires only two seconds. |
Image pulling speed | After the ECS nodes start, ACK needs to pull the container images of specific components to complete basic tasks. By pre-installing the container images of the components that are required for cluster management, ContainerOS can reduce the amount of time required for image pulling during node startups. For example, if your cluster uses Terway, the state of a node can change to Ready only after the pod of Terway is ready. The high network latency can severely increase the amount of time required for pulling images. To prevent this issue, ContainerOS pre-installs the container image of Terway in the operating system, which allows ACK to directly obtain the image from the local directory. This saves the time from pulling the image over the Internet. |
Node elasticity | ContainerOS is integrated with the management capabilities of ACK to improve the elasticity of nodes. |
The following figure shows the 90% node startup duration of expanding an empty ACK node pool. The duration starts when the scale-out request is submitted and ends when 90% of the nodes are ready. Compared with the CentOS and Alibaba Cloud Linux 2 custom image solutions, ContainerOS has competitive advantages in performance. The following figure shows the statistics.
The statistics in this example are theoretical values. The actual values may vary based on the optimization of the service and environment.
Procedure
If you want to start a large number of nodes, you can manually configure the Kubernetes controller manager, Kubernetes scheduler, and API server to accelerate node scale-outs. For example, if you want to scale out more than 100 ECS nodes at a time, you can use this method.
Some APIs support up to 100 connections by default. In this scenario, no additional configuration is required when you start less than 100 ECS nodes.
Configure traffic throttling for the Kubernetes controller manager
Log on to the ACK console. In the left-side navigation pane, click Cluster.
On the Clusters page, find the cluster that you want to manage and click its name. In the left-side pane, choose .
On the Core Components tab of the Add-ons page, find Kube Controller Manager and click Configuration in the lower-right part of the card.
In the Kube Controller Manager Parameters dialog box, set the kubeAPIQPS parameter to 800 and the kubeAPIBurst parameter to 1000, configure other parameters based on your business requirement, and then click OK.
NoteBased on the test results, we recommend that you use the preceding settings. If you have other requirements, you can modify the settings accordingly.
Configure traffic throttling for the Kubernetes scheduler
Log on to the ACK console. In the left-side navigation pane, click Cluster.
On the Clusters page, find the cluster that you want to manage and click its name. In the left-side pane, choose .
On the Core Components tab of the Add-ons page, find Kube Scheduler and click Configuration in the lower-right part of the card.
In the Kube Scheduler Parameters dialog box, set the connectionQPS parameter to 800 and the connectionBurst parameter to 1000, configure other parameters based on your business requirement, and then click OK.
NoteBased on the test results, we recommend that you use the preceding settings. If you have other requirements, you can modify the settings accordingly.
Modify the number of replicas for the API server
The number of replicas for the API server is dynamically adjusted based on loads. If a large number of nodes are scaled out, the replicas are increased. More time is required for the nodes to become ready. You can submit a ticket to modify the number of replicas for the API server to accelerate node scale-outs.