Auto scaling overview - Container Service for Kubernetes

If the resource requests of your business are unpredictable or periodically change, we recommend that you enable auto scaling for your business. For example, you can enable auto scaling for web applications, gaming services, or online education applications. Workload scaling can automatically adjust the number of pod replicas or the amount of resources allocated to workloads to meet the requirements of workloads. Workload scaling helps you handle traffic spikes and save resource costs.

Usage notes

This topic introduces workload scaling and node scaling to O&M engineers and developers. We recommend that you first familiarize yourself with the scaling solutions provided by the Kubernetes community, such as Horizontal Pod Autoscaler (HPA) and Vertical Pod Autoscaler (VPA), and node scaling solutions, such as Cluster Autoscaling.
If your cluster has more than 500 nodes or 10,000 pods, refer to the usage notes described in Plan the cluster resource scaling frequency to ensure the stability of your cluster and the control plane of the cluster.

Workload scaling and compute resource scaling

The auto scaling feature of Container Service for Kubernetes (ACK) provides elasticity from the following aspects:

Workload scaling (scheduling layer elasticity): This scaling solution adjusts the number of pods or the amount of resources allocated to pods based on workload changes. For example, HPA can automatically adjust the number of application pods based on traffic changes to further adjust the amount of resources occupied by the current workload.
Compute resource scaling (resource layer elasticity): This scaling solution consists of node scaling and virtual node scaling. You can use this solution to increase or decrease the amount of resources allocated to your applications based on pod scheduling results and resource usage.

We recommend that you use the preceding solutions in combination. This allows you to scale pod replicas to improve resource utilization and scale compute resources in the cluster to meet the resource requirements of pods.

Workload scaling solutions

You can run the kubectl scale command to manually adjust the number of pods. This method is suitable for temporary scaling requirements. The following table describes how to select among the workload scaling solutions provided by ACK based on your business scenarios. You can use these solutions to meet requirements such as cost control, stability improvement, and flexible resource management.

Solution	Description	Scaling metric	Scenario	References

Solution	Description	Scaling metric	Scenario	References
HPA	HPA scales out pods during peak hours to handle traffic spikes and scales in pods during off-peak hours to reduce resource costs. HPA is suitable for most scenarios.	Resource metrics such as CPU and memory utilization Custom metrics	HPA is ideal for online services that include a large number of pods and require frequent scaling to handle traffic fluctuations, such as e-commerce services, online education, and financial services.	Implement Horizontal Pod Autoscaler (HPA)
CronHPA	CronHPA uses a Crontab-like strategy to scale pods based on a predefined schedule. You can specify the time zone and date on which scaling is performed in the schedule. You can also exclude dates, such as holidays, from the schedule. CronHPA can be used together with HPA.	Scheduled scaling	CronHPA is ideal for applications that have predictable traffic patterns and scenarios where you need to run tasks at a scheduled time.	Use CronHPA Make CronHPA compatible with HPA
VPA	VPA monitors the resource consumption mode of pods and provides recommendations on CPU and memory allocation. VPA adjusts resource allocation but does not change the number of pod replicas.	VPA provides recommendations on the CPU request, CPU limit, memory request, and memory limit for pods. In addition, VPA can automatically adjust the preceding resource requests and limits.	VPA is ideal for scenarios where stable resource allocation is required, such as scale-out of stateful applications and deployment of large monolithic applications. In most cases, VPA takes effect when pods are recovered from anomalies.	Vertical Pod Autoscaler (VPA)
KEDA	KEDA supports a rich variety of event sources and enables event-driven auto scaling for workloads.	Number of events, such as the queue length.	KEDA is ideal for scenarios where instant scaling is required, especially event-based offline jobs. For example, you can enable KEDA for offline video and audio transcoding jobs, event-driven jobs, and stream processing jobs.	ACK KEDA
Advanced Horizontal Pod Autoscaler (AHPA)	AHPA can automatically learn the pattern of workload fluctuations and predict resource demand based on historical metric data to help you implement predictive scaling.	Resource metrics such as CPU, memory, and GPU utilization Traffic metrics such as queries per second (QPS) and response time (RT) Other custom metrics	AHPA is ideal for scenarios where traffic periodically fluctuates, such as live streaming, online education, and gaming services.	Predictive scaling based on AHPA

In addition to the preceding solutions, you can use the UnitedDeployment controller to define workloads. You can use the UnitedDeployment controller to manage multiple workloads of the same type on multiple subsets in a flexible and convenient manner. This allows you to dynamically adjust the number of pod replicas on each subset. You can use the UnitedDeployment controller together with the preceding solutions to enable flexible workload scaling and scheduling for scenarios where multiple types of compute resources are used. For more information, see Implement workload scaling based on the UnitedDeployment controller.

Compute resource scaling

In scenarios where instant scaling is required to handle traffic fluctuations, you need to enable the cluster to automatically adjust compute resources based on workload changes. This improves the elasticity of your business and reduces your O&M work. The components for compute resource scaling listen for pending pods to decide whether new ECS nodes or elastic container instances are required for scheduling pods.

For more information about node scaling, see Node scaling.

Important

The resource delivery statistics provided in the following table are only theoretical values. The actual values may vary based on your environment.

Solution	Description	Scenario	Resource delivery efficiency	References

Solution	Description	Scenario	Resource delivery efficiency	References
Node auto scaling	You can use the node auto scaling feature to enable ACK to automatically scale nodes when resources in your cluster cannot fulfill pod scheduling.	The node auto scaling feature is suitable for all scenarios and is especially ideal for online services, deep learning tasks, small-scale scaling activities, and workloads that require only one scaling activity each time. For example, you can enable node auto scaling for a cluster that contains less than 20 node pools with auto scaling enabled or node pools that have auto scaling enabled, each containing less than 100 nodes.	The time required to add 100 nodes to a cluster: Standard mode: 120 seconds. Swift mode: 60 seconds. Standard mode with images that support quick boot (Qboot): 90 seconds. Standard mode with images that support quick boot (Qboot): 45 seconds.	Enable node auto scaling
Node instant scaling	Compared with node auto scaling, node instant scaling provides higher scaling speeds, improved scaling efficiency, and a higher success rate of resource delivery. In addition, you can view the health status of node instant scaling based on the inventory of ECS instances. For more information about the comparison between node autoscaling and node instant scaling, see Solution comparison.	The node instant scaling feature is suitable for all scenarios and is especially ideal for large-scale clusters or clusters that require faster resource scaling, auto scaling across multiple instance types and zones, and advanced scheduling strategies such as topology spread constraints. A cluster is considered large if a node pool that has auto scaling enabled in the cluster contains more than 100 nodes or the cluster has more than 20 node pools that have auto scaling enabled.	The time required to add 100 nodes to a cluster: ContainerOS in swift mode: 45 seconds. Standard mode: 103 seconds. Swift mode: N/A.	Enable node instant scaling View the health status of node instant scaling
Virtual nodes	Virtual nodes eliminate the need for node management or capacity planning. With virtual nodes, you can deploy up to 50,000 pods in a cluster. You can use virtual nodes to scale out application pods to handle traffic spikes. When you scale out applications, up to 10,000 pods can be created within 1 minute.	Virtual nodes are suitable for all scenarios and are especially ideal for tasks, scheduled tasks, data computing jobs, AI applications, and scenarios where workload spikes exist.	The time required to create 1,000 pods in a cluster: When image caching is disabled: 30 seconds. When image caching is enabled: 15 seconds.	Schedule pods to elastic container instances through virtual nodes

Billing

The auto scaling feature is free of charge. The auto scaling component is deployed in pods. Therefore, you must deploy at least one node in your cluster. You are charged for the nodes that are added by using the auto scaling feature. For more information, see Billing overview.

FAQ

For more information about the answers to some frequently asked questions about the auto scaling feature, see Auto scaling FAQs.

Click to view the FAQ index of node auto scaling

Category	Subcategory	Issue
Scaling behavior of node auto scaling	Limits
	Scale-out behavior	What scheduling policies does cluster-autoscaler use to determine whether unschedulable pods can be scheduled to a node pool for which node auto scaling is enabled? What resources can cluster-autoscaler simulate? Why does cluster-autoscaler fail to add nodes after a scale-out activity is triggered? How does cluster-autoscaler evaluate the resource capacity of a scaling group that uses multiple types of instances? How do I choose between multiple node pools for which auto scaling is enabled when I perform a scaling activity? How do I add custom resources to node pools for which auto-scaling is enabled?
	Scale-in behavior	Why does cluster-autoscaler fail to remove nodes after a scale-in activity is triggered? How do I enable or disable pod eviction for a DaemonSet pod? What types of pods can prevent cluster-autoscaler from removing nodes?
	Extended support	Does cluster-autoscaler support CRD?
Custom scaling behavior	Use pods to manage scaling	How do I set a scale-out delay in cluster-autoscaler for unschedulable pods?
Custom scaling behavior	Use nodes to manage scaling	How do I prevent cluster-autoscaler from removing nodes? How do I use pod annotations to allow cluster-autoscaler to remove the node that hosts the pod or prevent cluster-autoscaler from removing the node that hosts the pod?
Questions related to cluster-autoscaler		How do I update cluster-autoscaler to the latest version? What operations can trigger the system to automatically update cluster-autoscaler? Why does node scaling still fail after I complete role authorization in the ACK managed cluster?

Click to view the FAQ index of node instant scaling

Category	Subcategory	Issue
Scaling behavior of node instant scaling	Scale-out behavior	What resources can node instant scaling simulate? Can node instant scaling adjust to the appropriate instance type in the node pool based on requests received by pods? How does node instant scaling choose by default when multiple instance types are configured in the node pool? How does node instant scaling detect changes in instance type inventory in the node pool? How can I optimize node pool configuration to avoid scale-out failures due to insufficient inventory? Why does node instant scaling fail to add nodes after a scale-out activity is triggered? How do I add custom resources to node pools that have node instant scaling enabled?
Scaling behavior of node instant scaling	Scale-in behavior	Why does node instant scaling fail to remove nodes after a scale-in is triggered? What types of pods can prevent node instant scaling from removing nodes?
Custom scaling behavior	Use pods to control scaling	How does node instant scaling use pods to control node scale-in activities?
Custom scaling behavior	Use nodes to control scaling	How do I specify the nodes that I want to delete during the scale-in activities of node instant scaling? How do I prevent node instant scaling from removing nodes? Can node instant scaling only scale in empty nodes?
Node instant scaling component		Are there any operations that trigger the automatic update of the node instant scaling component? Why does node scaling still fail after I complete role authorization in the ACK managed cluster?

Click to view the FAQ index of workload scaling (including HPA and CronHPA)

References

In scenarios that require preinstallation or high performance, you can use custom OS images to facilitate auto scaling. For more information, see Create custom images.
For more information about how to collect auto scaling logs, see Collect log files of system components.
When you configure your workloads, we recommend that you follow the suggestions provided by Recommended workload configurations.
In scenarios where serverless containers are used, you can configure Knative to trigger scaling activities based on the number of requests and the number of requests that are concurrently processed. When no request is received, Knative automatically scales the number of pods to zero. For more information, see Knative and Enable auto scaling to withstand traffic fluctuations.