To automatically scale the number of Pods based on CPU utilization, memory usage, or other custom metrics, use Horizontal Pod Autoscaler (HPA) for your application's Pods. HPA automatically scales out by adding Pod replicas to handle sudden workload spikes and scales in by removing replicas when the workload decreases, which saves resources. HPA is ideal for workloads with fluctuating traffic and many services that require frequent scaling, such as e-commerce, online education, and financial services.
Before you begin
To use the HPA feature effectively, review the official Kubernetes documentation on Horizontal Pod Autoscaling to learn about its basic principles, algorithms, and configurable scaling behaviors.
In addition, Container Service for Kubernetes (ACK) provides various solutions for Workload Scaling (application-layer elasticity) and Node Scaling (resource-layer elasticity). Before proceeding, read Auto scaling overview for the use cases and limitations of different solutions.
Prerequisites
An ACK managed cluster or an ACK dedicated cluster is created. For more information, see Create a cluster.
To use
kubectlcommands for HPA, you must connect to your Kubernetes cluster withkubectl. For more information, see Connect to an ACK cluster using kubectl.
Create an HPA-enabled application in the console
ACK integrates the HPA feature, allowing you to create HPA-enabled applications in the ACK console. You can create an HPA when creating a new application, or enable it for an existing one. Create only one HPA per Workload to avoid conflicting scaling policies and ensure predictable behavior.
Create an HPA when creating an application
The following steps use a stateless Deployment as an example to show how to enable HPA for an application. The procedure is similar for other types of Workloads.
Log on to the ACK console. In the left navigation pane, click Clusters.
On the Clusters page, find the cluster you want to manage and click its name. In the left navigation pane, choose .
On the Deployments page, click Create from Image.
On the Create page, configure the basic information, container settings, service settings, and scaling settings as prompted to create an HPA-enabled Deployment.
For detailed steps and configuration descriptions, see Create a stateless workload (Deployment). The following section describes only the key configurations.
Basic Information: Configure the application's name, number of replicas, and other settings.
Container: Configure the image and specify the required CPU and memory resources.
Use the resource profiling feature to analyze historical resource usage data and get recommendations for configuring container Requests and Limits. For more information, see Resource profiling.
ImportantYou must set resource Requests for your application; otherwise, HPA will not function.
Advanced:
In the Scaling section, select the Enable checkbox for HPA and configure the scaling conditions and parameters.
Metric: Supports CPU Usage and Memory Usage. The metric type must match the resource type for which you have set a Request. If you specify both CPU and Memory, scaling triggers as soon as either metric reaches its threshold.
Condition: The target resource utilization percentage. When the usage exceeds this value, the Pods scale out. For details about the Horizontal Pod Autoscaling algorithm, see Algorithm details.
Max. Replicas: The maximum number of replicas for the Deployment. This value must be greater than the minimum number of replicas.
Min. Replicas: The minimum number of replicas for the Deployment. This value must be an integer greater than or equal to 1.
After the Deployment is created, you can view it on the Deployments page. Click the Deployment name, then click the Pods tab on the details page. Here, you can view HPA-related metrics, such as CPU or memory utilization and the maximum and minimum number of replicas. You can also manage the HPA, including updating its configuration or disabling it.
Create an HPA for an existing application
The following steps use a stateless Deployment as an example to show how to enable HPA for an existing application. The procedure is similar for other types of Workloads.
Workload page
Log on to the ACK console. In the left navigation pane, click Clusters.
On the Clusters page, find the cluster you want to manage and click its name. In the left navigation pane, choose .
On the Deployments page, click the target application. On the Pod Scaling tab, click Create in the HPA area.
In the Create dialog box, set the scaling configurations as prompted.
Name: The name of the HPA policy.
Metric: Click Add.
Metric Name: Supports CPU Usage and Memory Usage. The metric type must match the resource type for which you have set a Request. If you specify both CPU and Memory, scaling triggers as soon as either metric reaches its threshold.
Threshold: The target resource utilization percentage. When the usage exceeds this value, the Pods scale out. For details about the Horizontal Pod Autoscaling algorithm, see Algorithm details.
Max. Containers: The maximum number of replicas for the Deployment. This value must be greater than the minimum number of replicas.
Min. Containers: The minimum number of replicas for the Deployment. This value must be an integer greater than or equal to 1.
After the configuration is complete, go to the Deployments page, click the Deployment name, then click the Pod Scaling tab. Here, you can view HPA-related metrics, such as CPU or memory utilization and the maximum or minimum number of replicas. You can also manage the HPA, including updating its configuration or disabling it.
Workload scaling page
This page is available only to allowlisted users. To request access, submit a ticket.
Log on to the ACK console. In the left navigation pane, click Clusters.
On the Clusters page, find the cluster you want and click its name. In the left navigation pane, click Workload Scaling.
In the upper-right corner of the page, click Create Auto Scaling, and then click the HPA and CronHPA tab. Select the target workload, select the Metric-based Auto Scaling (HPA) checkbox under Configure Scaling Policy, and configure the HPA policy as prompted.
Policy Name: The name of the HPA policy.
Min. Replicas: The minimum number of replicas for the Deployment. This value must be an integer greater than or equal to 1.
Max. Replicas: The maximum number of replicas for the Deployment. This value must be greater than the minimum number of replicas.
Metric Name: Supports CPU, GPU, Memory, Nginx Ingress requests, and custom metrics. The metric type must match the resource type for which you have set a Request. If you specify multiple metric types, scaling triggers as soon as any one of the metrics reaches its threshold.
Threshold: The target resource utilization percentage. When the usage exceeds this value, the Pods scale out. For details about the Horizontal Pod Autoscaling algorithm, see Algorithm details.
After the HPA is created, you can view the list of HPAs on the Workload Scaling page. In the Actions column, you can view HPA-related metrics, such as resource utilization and the maximum or minimum number of replicas. You can also manage the HPA, including updating its configuration or disabling it.
Verify the results
On the Clusters page, find the cluster you want and click its name. In the left navigation pane, click Workload Scaling.
Click the Horizontal Scaling tab, and then select HPA to view the scaling status and task list.
In a production environment, the application scales based on workload. To verify this behavior in a test environment, perform stress testing on the Pods.
Create an HPA-enabled application by using kubectl
You can also manually create an HPA from a manifest and bind it to the Deployment object you want to scale. This lets you configure automatic Pod scaling by using kubectl commands. Create only one HPA per Workload to avoid conflicting scaling policies and ensure predictable behavior. The following steps use an HPA-enabled Nginx application as an example.
Create a file named
nginx.ymland copy the following content into it.ImportantTo implement HPA, you must set the resource
requestsfor your Pods. Otherwise, HPA will not function. Use the Resource Profile feature to analyze historical resource usage data and get recommendations for configuring container Requests and Limits. For more information, see Resource profile.Run the following command to create the Nginx application.
kubectl apply -f nginx.ymlCreate a file named
hpa.ymland copy the following content into it to create the HPA.The
scaleTargetReffield specifies the object to which this HPA is bound. In this example, it is bound to the Deployment namednginx. The HPA will trigger scaling when the average CPU utilization across all Pods reaches 50%.Kubernetes 1.24 and later
apiVersion: autoscaling/v2 kind: HorizontalPodAutoscaler metadata: name: nginx-hpa spec: scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: nginx minReplicas: 1 # The minimum number of replicas to which the deployment can be scaled in. This must be an integer greater than or equal to 1. maxReplicas: 10 # The maximum number of replicas to which the deployment can be scaled out. This must be greater than minReplicas. metrics: - type: Resource resource: name: cpu target: type: Utilization averageUtilization: 50 # The target average utilization of the resource. This is the ratio of the average resource usage to the requested resource amount.Kubernetes earlier than 1.24
apiVersion: autoscaling/v2beta2 kind: HorizontalPodAutoscaler metadata: name: nginx-hpa spec: scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: nginx minReplicas: 1 # Must be an integer greater than or equal to 1. maxReplicas: 10 # Must be greater than the minimum number of replicas. metrics: - type: Resource resource: name: cpu target: type: Utilization averageUtilization: 50To scale on both CPU and memory, add both resources under the
metricsfield instead of creating two separate HPAs. Scaling triggers as soon as any of the specified metrics reaches its threshold.metrics: - type: Resource resource: name: cpu target: type: Utilization averageUtilization: 50 - type: Resource resource: name: memory target: type: Utilization averageUtilization: 50Run the following command to create the HPA.
kubectl apply -f hpa.ymlWhile the HPA is deploying, running
kubectl describe hpa <HPA_NAME>(in this example,nginx-hpa) may show warning messages. Run thekubectl get hpacommand to check the status of the HPA.Warning FailedGetResourceMetric 2m (x6 over 4m) horizontal-pod-autoscaler missing request for cpu on container nginx in pod default/nginx-deployment-basic-75675f5897-mqzs7 Warning FailedComputeMetricsReplicas 2m (x6 over 4m) horizontal-pod-autoscaler failed to get cpu utilization: missing request for cpu on container nginx in pod default/nginx-deployment-basic-75675f5Wait for the HPA to be created successfully and for the Pods to meet the scaling condition (in this example, when the CPU utilization of the Nginx Pods exceeds 50%). Then, run the
kubectl describe hpa <HPA_NAME>command again to check the horizontal scaling status.The expected output below indicates that the HPA is running correctly.
Type Reason Age From Message ---- ------ ---- ---- ------- Normal SuccessfulRescale 5m6s horizontal-pod-autoscaler New size: 1; reason: All metrics below target
Related operations
If the default scaling behavior does not meet your business needs, use the behavior field to configure more granular scaling-down (scaleDown) and scaling-up (scaleUp) behaviors. For more information, see Configurable scaling behavior.
Use cases for the behavior field include:
Scaling out rapidly during sudden traffic surges.
Scaling out quickly and scaling in slowly in environments with frequent workload fluctuations.
Disabling scaling in for state-sensitive applications.
In resource-constrained or cost-sensitive scenarios, using a
stabilizationWindowSecondsto limit the rate of scaling out and reduce frequent adjustments caused by brief fluctuations.
For configuration details and examples of the behavior field, see Adjust the scaling sensitivity of HPA.
FAQ
What do I do if unknown is displayed in the current field in the HPA metrics?
What do I do if HPA cannot collect metrics and fails to perform scaling?
What do I do if excess pods are added by HPA during a rolling update?
What do I do if HPA does not scale pods when the scaling threshold is reached?
Can CronHPA and HPA interact without conflicts? How do I enable CronHPA to interact with HPA?
How do I fix the issue that excess pods are added by HPA when CPU or memory usage rapidly increases?
What does the unit of the utilization metric collected by HPA mean?
What do I do if unknown is displayed in the TARGETS column after I run the kubectl get hpa command?
How do I configure horizontal autoscaling after I customize the format of NGINX Ingress logs?
How do I query the sls_ingress_qps metric from the command line?
How to use the console to manage the VPA installed using kubectl?
Related documents
Related tasks
To learn how to implement HPA based on metrics from Alibaba Cloud components when your Kubernetes cluster supports External Metrics, see Horizontally scale pods with Alibaba Cloud metrics.
To learn how to convert Alibaba Cloud Prometheus metrics into HPA-compatible metrics to implement HPA, see Horizontal pod autoscaling based on Prometheus metrics.
For issues encountered while using HPA, you can first refer to the Node autoscaling FAQ for troubleshooting.
If you need to coordinate CronHPA with HPA, see Coordinate CronHPA and HPA.
Other Workload Scaling solutions
If your application has predictable, periodic resource usage and you need to scale Pods on a schedule, see Use CronHPA for scheduled horizontal scaling.
If your application's resource usage is cyclical but difficult to define with rules, you can use the Advanced Horizontal Pod Autoscaler (AHPA). It automatically identifies business cycles from historical metrics to scale Pods. For more information, see Predictive scaling (AHPA).
To automatically set resource requests and limits for your Pods based on their resource usage, ensuring they receive sufficient computing resources, see Use Vertical Pod Autoscaling (VPA).
To flexibly customize scaling policies and scale Pods based on Kubernetes events such as message queues, scheduled tasks, or custom metrics, see Event-driven autoscaling.
Combined solutions
You can combine HPA with node auto-scaling, which allows the cluster to automatically add nodes when its resources are insufficient. For more information, see Enable node autoscaling.