Auto scaling overview

Updated at: 2025-01-22 08:03

If the resource requests of your business are unpredictable or periodically change, we recommend that you enable auto scaling for your business. For example, you can enable auto scaling for web applications, gaming services, or online education applications. Workload scaling can automatically adjust the number of pod replicas or the amount of resources allocated to workloads to meet the requirements of workloads. Workload scaling helps you handle traffic spikes and save resource costs.

Usage notes

Workload scaling and compute resource scaling

The auto scaling feature of Container Service for Kubernetes (ACK) provides elasticity from the following aspects:

  • Workload scaling (scheduling layer elasticity): This scaling solution adjusts the number of pods or the amount of resources allocated to pods based on workload changes. For example, HPA can automatically adjust the number of application pods based on traffic changes to further adjust the amount of resources occupied by the current workload.

  • Compute resource scaling (resource layer elasticity): This scaling solution consists of node scaling and virtual node scaling. You can use this solution to increase or decrease the amount of resources allocated to your applications based on pod scheduling results and resource usage.

We recommend that you use the preceding solutions in combination. This allows you to scale pod replicas to improve resource utilization and scale compute resources in the cluster to meet the resource requirements of pods.

Workload scaling solutions

image

You can run the kubectl scale command to manually adjust the number of pods. This method is suitable for temporary scaling requirements. The following table describes how to select among the workload scaling solutions provided by ACK based on your business scenarios. You can use these solutions to meet requirements such as cost control, stability improvement, and flexible resource management.

Solution

Description

Scaling metric

Scenario

References

Solution

Description

Scaling metric

Scenario

References

HPA

HPA scales out pods during peak hours to handle traffic spikes and scales in pods during off-peak hours to reduce resource costs. HPA is suitable for most scenarios.

HPA is ideal for online services that include a large number of pods and require frequent scaling to handle traffic fluctuations, such as e-commerce services, online education, and financial services.

Implement horizontal pod autoscaling

CronHPA

CronHPA uses a Crontab-like strategy to scale pods based on a predefined schedule. You can specify the time zone and date on which scaling is performed in the schedule. You can also exclude dates, such as holidays, from the schedule. CronHPA can be used together with HPA.

Scheduled scaling

CronHPA is ideal for applications that have predictable traffic patterns and scenarios where you need to run tasks at a scheduled time.

VPA

VPA monitors the resource consumption mode of pods and provides recommendations on CPU and memory allocation. VPA adjusts resource allocation but does not change the number of pod replicas.

VPA provides recommendations on the CPU request, CPU limit, memory request, and memory limit for pods. In addition, VPA can automatically adjust the preceding resource requests and limits.

VPA is ideal for scenarios where stable resource allocation is required, such as scale-out of stateful applications and deployment of large monolithic applications. In most cases, VPA takes effect when pods are recovered from anomalies.

Vertical Pod Autoscaler (VPA)

KEDA

KEDA supports a rich variety of event sources and enables event-driven auto scaling for workloads.

Number of events, such as the queue length.

KEDA is ideal for scenarios where instant scaling is required, especially event-based offline jobs. For example, you can enable KEDA for offline video and audio transcoding jobs, event-driven jobs, and stream processing jobs.

ACK KEDA

Advanced Horizontal Pod Autoscaler (AHPA)

AHPA can automatically learn the pattern of workload fluctuations and predict resource demand based on historical metric data to help you implement predictive scaling.

  • Resource metrics such as CPU, memory, and GPU utilization

  • Traffic metrics such as queries per second (QPS) and response time (RT)

  • Other custom metrics

AHPA is ideal for scenarios where traffic periodically fluctuates, such as live streaming, online education, and gaming services.

Predictive scaling based on AHPA

In addition to the preceding solutions, you can use the UnitedDeployment controller to define workloads. You can use the UnitedDeployment controller to manage multiple workloads of the same type on multiple subsets in a flexible and convenient manner. This allows you to dynamically adjust the number of pod replicas on each subset. You can use the UnitedDeployment controller together with the preceding solutions to enable flexible workload scaling and scheduling for scenarios where multiple types of compute resources are used. For more information, see Implement workload scaling based on the UnitedDeployment controller.

Compute resource scaling

image

In scenarios where instant scaling is required to handle traffic fluctuations, you need to enable the cluster to automatically adjust compute resources based on workload changes. This improves the elasticity of your business and reduces your O&M work. The components for compute resource scaling listen for pending pods to decide whether new ECS nodes or elastic container instances are required for scheduling pods.

For more information about node scaling, see Node scaling.

Important

The resource delivery statistics provided in the following table are only theoretical values. The actual values may vary based on your environment.

Solution

Description

Scenario

Resource delivery efficiency

References

Solution

Description

Scenario

Resource delivery efficiency

References

Node auto scaling

You can use the node auto scaling feature to enable ACK to automatically scale nodes when resources in your cluster cannot fulfill pod scheduling.

The node auto scaling feature is suitable for all scenarios and is especially ideal for online services, deep learning tasks, small-scale scaling activities, and workloads that require only one scaling activity each time. For example, you can enable node auto scaling for a cluster that contains less than 20 node pools with auto scaling enabled or node pools that have auto scaling enabled, each containing less than 100 nodes.

The time required to add 100 nodes to a cluster:

Enable node auto scaling

Node instant scaling

Compared with node auto scaling, node instant scaling provides higher scaling speeds, improved scaling efficiency, and a higher success rate of resource delivery. In addition, you can view the health status of node instant scaling based on the inventory of ECS instances. For more information about the comparison between node autoscaling and node instant scaling, see Solution comparison.

The node instant scaling feature is suitable for all scenarios and is especially ideal for large-scale clusters or clusters that require faster resource scaling, auto scaling across multiple instance types and zones, and advanced scheduling strategies such as topology spread constraints. A cluster is considered large if a node pool that has auto scaling enabled in the cluster contains more than 100 nodes or the cluster has more than 20 node pools that have auto scaling enabled.

The time required to add 100 nodes to a cluster:

Virtual nodes

Virtual nodes eliminate the need for node management or capacity planning. With virtual nodes, you can deploy up to 50,000 pods in a cluster. You can use virtual nodes to scale out application pods to handle traffic spikes. When you scale out applications, up to 10,000 pods can be created within 1 minute.

Virtual nodes are suitable for all scenarios and are especially ideal for tasks, scheduled tasks, data computing jobs, AI applications, and scenarios where workload spikes exist.

The time required to create 1,000 pods in a cluster:

  • When image caching is disabled: 30 seconds.

  • When image caching is enabled: 15 seconds.

Schedule pods to elastic container instances that are deployed as virtual nodes

Billing

The auto scaling feature is free of charge. The auto scaling component is deployed in pods. Therefore, you must deploy at least one node in your cluster. You are charged for the nodes that are added by using the auto scaling feature. For more information, see Billing overview.

FAQ

For more information about the answers to some frequently asked questions about the auto scaling feature, see Auto scaling FAQs.

Click to view the FAQ index of node auto scaling

Category

Subcategory

Issue

Scaling behavior of node auto scaling

Scale-out behavior

Scale-in behavior

Extended support

Does cluster-autoscaler support CRD?

Custom scaling behavior

Use pods to control scaling

Use nodes to control scaling

cluster-autoscaler related

Click to view the FAQ index of node instant scaling

Click to view the FAQ index of workload scaling (including HPA and CronHPA)

References

  • In scenarios that require preinstallation or high performance, you can use custom OS images to facilitate auto scaling. For more information, see Create custom images.

  • For more information about how to collect auto scaling logs, see Collect log files of system components.

  • When you configure your workloads, we recommend that you follow the suggestions provided by Recommended workload configurations.

  • In scenarios where serverless containers are used, you can configure Knative to trigger scaling activities based on the number of requests and the number of requests that are concurrently processed. When no request is received, Knative automatically scales the number of pods to zero. For more information, see Knative and Enable auto scaling to withstand traffic fluctuations.

  • On this page (1, T)
  • Usage notes
  • Workload scaling and compute resource scaling
  • Workload scaling solutions
  • Compute resource scaling
  • Billing
  • FAQ
  • References
Feedback
phone Contact Us

Chat now with Alibaba Cloud Customer Service to assist you in finding the right products and services to meet your needs.

alicare alicarealicarealicare