If you want to automatically scale pods based on CPU usage, memory usage, or other custom metrics, you can enable the Horizontal Pod Autoscaler (HPA) feature for your application containers. HPA can quickly scale out pod replicas to handle sudden workload spikes and scale in pods to save resources when the workload decreases. This process is automated and requires no manual intervention. HPA is suitable for scenarios with significant service fluctuations and frequent scaling needs, such as E-commerce, online education, and finance.
Before you begin
To help you better use the HPA feature, read the official Kubernetes document Horizontal Pod Autoscaling to understand its basic principles, algorithm details, and configurable scaling behaviors.
In addition, ACK clusters provide various workload scaling (scheduling layer elasticity) and node scaling (resource layer elasticity) solutions. Before you proceed, read Auto Scaling to understand the use cases and limitations of each solution.
Prerequisites
You have created an ACK managed cluster or an ACK dedicated cluster. For more information, see Create a cluster.
If you plan to use kubectl commands to implement HPA, connect to your Kubernetes cluster using kubectl. For more information, see Connect to an ACK cluster using kubectl.
Create an HPA application in the console
Container Service for Kubernetes (ACK) is integrated with HPA, which lets you create HPA policies in the ACK console. You can create an HPA policy when you create an application or enable HPA for an existing application. We recommend that you create only one HPA policy for each workload.
Create an HPA when you create an application
The following example shows how to enable Horizontal Pod Autoscaler (HPA) for a stateless Deployment. The steps are similar for other types of workloads.
Log on to the ACK console. In the left navigation pane, click Clusters.
On the Clusters page, click the name of the cluster that you want to manage. In the navigation pane on the left, choose .
On the Stateless page, click Create From Image.
On the Create page, configure the basic information, container, Service, and scaling settings for the application to create a Deployment that supports HPA.
For more information about the steps and configuration items, see Create a stateless workload (Deployment). The following list describes only the key configuration items.
Basic Information: Configure the application name, number of replicas, and other information.
Container Configuration: Configure the image and the required CPU and memory resources.
You can use the resource profile feature to analyze historical resource usage data and obtain recommendations for configuring container requests and limits. For more information, see Resource profile.
ImportantYou must set resource requests for the application. Otherwise, you cannot enable HPA.
Advanced Configuration:
In the Scaling Configuration section, select Enable for Metric-based Scaling and configure the scaling conditions and parameters.
Metric: CPU and memory are supported. The metric type must match the resource type for which you set a request. If you specify both CPU and memory, HPA triggers a scaling operation when it detects that either metric reaches its threshold.
Trigger Condition: The percentage of resource usage. When the specified usage is exceeded, the application starts to scale out. For more information about the horizontal pod autoscaling algorithm, see Algorithm details.
Maximum Replicas: The maximum number of replicas to which the Deployment can be scaled out. This value must be greater than the minimum number of replicas.
Minimum Replicas: The minimum number of replicas to which the Deployment can be scaled in. This value must be an integer greater than or equal to 1.
After the Deployment is created, you can view it on the Deployments page. Click the Deployment name, and then click the Pod Scaling tab. On this tab, you can view HPA-related metrics, such as CPU or memory usage and the maximum or minimum number of replicas. You can also manage the HPA policy, such as by updating its configuration or disabling it.
Create an HPA for an existing application
The following steps describe how to enable HPA for an existing stateless application (Deployment). The steps for other workload types are similar.
Workload page
Log on to the ACK console. In the left navigation pane, click Clusters.
On the Clusters page, click the name of the cluster that you want to manage. In the navigation pane on the left, choose .
On the Stateless page, click the target application. On the Container Scaling tab, click Create in the HPA area.
In the Create dialog box, set the scaling configuration.
Name: The name of the HPA policy.
Metric: Click Add.
Metric: CPU and memory are supported. The metric type must match the resource type for which you set a request. If you specify both CPU and memory, HPA triggers a scaling operation when it detects that either metric reaches its threshold.
Threshold: The percentage of resource usage. When the specified usage is exceeded, the application starts to scale out. For more information about the horizontal pod autoscaling algorithm, see Algorithm details.
Maximum Containers: The maximum number of replicas to which the Deployment can be scaled out. This value must be greater than the minimum number of replicas.
Minimum Containers: The minimum number of replicas to which the Deployment can be scaled in. This value must be an integer greater than or equal to 1.
After the configuration is complete, you can click the Deployment name on the Deployments page, and then click the Pod Scaling tab. On this tab, you can view HPA-related metrics, such as CPU or memory usage and the maximum or minimum number of replicas. You can also manage the HPA policy, such as by updating its configuration or disabling it.
Workload scaling page
This page is available only to users in the whitelist. To use it, submit a ticket.
Log on to the ACK console. In the left navigation pane, click Clusters.
On the Clusters page, find the cluster you want and click its name. In the left navigation pane, click Workload Scaling.
In the upper-right corner of the page, click Create Auto Scaling, and then click the HPA and CronHPA tab. Select the target workload. In the Configure Scaling Policy section, select the Metric-based Auto Scaling (HPA) checkbox and configure the HPA policy.
Policy Name: The name of the HPA policy.
Minimum Containers: The minimum number of replicas to which the workload can be scaled in. This value must be an integer greater than or equal to 1.
Maximum Containers: The maximum number of replicas to which the workload can be scaled out. This value must be greater than the minimum number of replicas.
Metric: CPU, GPU, memory, Nginx Ingress requests, and custom metrics are supported. The metric type must match the resource type for which you set a request. If you specify multiple resource types, HPA triggers a scaling operation when it detects that any metric reaches its threshold.
Threshold: The percentage of resource usage. When the specified usage is exceeded, the application starts to scale out. For more information about the horizontal pod autoscaling algorithm, see Algorithm details.
After the HPA policy is created, you can view it on the Workload Scaling page. In the Actions column, you can view HPA-related metrics, such as resource usage and the maximum and minimum number of replicas. You can also manage the HPA policy, such as by updating its configuration or disabling it.
Result verification
On the Clusters page, find the cluster you want and click its name. In the left navigation pane, click Workload Scaling.
Click the Horizontal Scaling tab, and then click HPA to view the scaling status and the list of tasks.
In a production environment, the application scales based on the pod load. You can also perform stress testing on pods in a staging environment to verify the horizontal scaling behavior.
Create an HPA application using kubectl
You can also manually create an HPA from an orchestration template and attach it to the Deployment object that you want to scale. Then, you can use kubectl commands to configure autoscaling for the application. We recommend that you create only one HPA for each workload. The following example shows how to deploy an Nginx application that supports HPA.
Create a file named nginx.yml and copy the following content into it.
ImportantWhen you implement HPA, you must set the
requestresources for the pod. Otherwise, HPA cannot run. You can use the resource profile feature to analyze historical resource usage data and obtain recommendations for configuring container requests and limits. For more information, see Resource profile.Run the following command to create the Nginx application.
kubectl apply -f nginx.ymlCreate a file named hpa.yml and copy the following content into it to create an HPA.
Use
scaleTargetRefto specify the object to which the HPA is attached. In this example, the HPA is attached to the Deployment namednginx. A scaling operation is triggered when the average CPU usage of all containers in all pods reaches 50%.1.24 and later
apiVersion: autoscaling/v2 kind: HorizontalPodAutoscaler metadata: name: nginx-hpa spec: scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: nginx minReplicas: 1 # The minimum number of replicas to which the deployment can be scaled in. This must be an integer greater than or equal to 1. maxReplicas: 10 # The maximum number of replicas to which the deployment can be scaled out. This must be greater than minReplicas. metrics: - type: Resource resource: name: cpu target: type: Utilization averageUtilization: 50 # The target average utilization of the resource. This is the ratio of the average resource usage to the requested resource amount.Version 1.24 and earlier
apiVersion: autoscaling/v2beta2 kind: HorizontalPodAutoscaler metadata: name: nginx-hpa spec: scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: nginx minReplicas: 1 # Must be an integer greater than or equal to 1. maxReplicas: 10 # Must be greater than the minimum number of replicas. metrics: - type: Resource resource: name: cpu target: type: Utilization averageUtilization: 50To specify both CPU and memory metrics, specify both
cpuandmemoryresource types in themetricsfield instead of creating two HPAs. When HPA detects that either metric reaches its threshold, it triggers a scaling operation.metrics: - type: Resource resource: name: cpu target: type: Utilization averageUtilization: 50 - type: Resource resource: name: memory target: type: Utilization averageUtilization: 50Run the following command to create the HPA.
kubectl apply -f hpa.ymlAt this point, run
kubectl describe hpa <HPA_name>. In this example, the HPA name is nginx-hpa. The expected warning message indicates that the HPA is still being deployed. You can run thekubectl get hpacommand to check the HPA status.Warning FailedGetResourceMetric 2m (x6 over 4m) horizontal-pod-autoscaler missing request for cpu on container nginx in pod default/nginx-deployment-basic-75675f5897-mqzs7 Warning FailedComputeMetricsReplicas 2m (x6 over 4m) horizontal-pod-autoscaler failed to get cpu utilization: missing request for cpu on container nginx in pod default/nginx-deployment-basic-75675f5Wait for the HPA to be created and for the pods to meet the scaling condition. In this example, the condition is met when the CPU usage of the Nginx pod exceeds 50%. Then, run the
kubectl describe hpa <HPA_name>command again to check the horizontal scaling status.The expected output indicates that the HPA is running as expected.
Type Reason Age From Message ---- ------ ---- ---- ------- Normal SuccessfulRescale 5m6s horizontal-pod-autoscaler New size: 1; reason: All metrics below target
Related operations
If the default scaling behavior does not meet your business requirements, you can use the behavior field to configure the scale-in (scaleDown) and scale-out (scaleUp) behaviors with finer granularity. For more information, see Configurable scaling behavior.
Typical scenarios supported by behavior include but are not limited to:
Achieving rapid scale-out during sudden traffic spikes.
Implementing rapid scale-out and slow scale-in in scenarios with frequent load fluctuations.
Disabling scale-in for state-sensitive applications.
In resource-limited or cost-sensitive scenarios, using the stabilization window
stabilizationWindowSecondsto limit the scale-out speed and reduce frequent adjustments caused by transient fluctuations.
For more information about the behavior configuration and configuration examples, see Adjust the scaling sensitivity of HPA.
FAQ
What do I do if unknown is displayed in the current field in the HPA metrics?
What do I do if HPA cannot collect metrics and fails to perform scaling?
What do I do if excess pods are added by HPA during a rolling update?
What do I do if HPA does not scale pods when the scaling threshold is reached?
Can CronHPA and HPA interact without conflicts? How do I enable CronHPA to interact with HPA?
How do I fix the issue that excess pods are added by HPA when CPU or memory usage rapidly increases?
What does the unit of the utilization metric collected by HPA mean?
What do I do if unknown is displayed in the TARGETS column after I run the kubectl get hpa command?
How do I configure horizontal autoscaling after I customize the format of NGINX Ingress logs?
How do I query the sls_ingress_qps metric from the command line?
How to use the console to manage the VPA installed using kubectl?
References
Other related documents
To learn how to use metrics from Alibaba Cloud components to implement HPA with the External Metrics feature in Kubernetes, see Horizontally scale pods with Alibaba Cloud metrics.
For more information about how to convert Prometheus metrics into HPA-compatible metrics to implement HPA, see Horizontal pod autoscaling based on Prometheus metrics.
If you encounter problems when you use HPA, see Node autoscaling FAQ for troubleshooting information.
To coordinate CronHPA and HPA, see Coordinate CronHPA and HPA.
Other workload scaling solutions
If your application resource usage changes periodically and you need to scale pods based on a Crontab-like policy, see Use CronHPA for scheduled horizontal scaling.
If your application resource usage changes periodically but is difficult to define with rules, you can use Advanced Horizontal Pod Autoscaling (AHPA). AHPA automatically identifies business cycles based on historical metrics to scale pods. For more information, see Predictive scaling (AHPA).
To automatically set resource limits for pods based on their resource usage so that they receive sufficient compute resources, see Use Vertical Pod Autoscaling (VPA).
To flexibly customize scaling policies for pods based on Kubernetes events such as message queues, scheduled policies, and custom metrics, see Event-driven autoscaling.
Combined solutions
You can use HPA with the node autoscaling feature to automatically scale nodes when cluster node resources are insufficient. For more information, see Enable node autoscaling.