All Products
Search
Document Center

Container Service for Kubernetes:Use Horizontal Pod Autoscaling (HPA)

Last Updated:Jan 13, 2026

If you want to automatically scale pods based on CPU usage, memory usage, or other custom metrics, you can enable the Horizontal Pod Autoscaler (HPA) feature for your application containers. HPA can quickly scale out pod replicas to handle sudden workload spikes and scale in pods to save resources when the workload decreases. This process is automated and requires no manual intervention. HPA is suitable for scenarios with significant service fluctuations and frequent scaling needs, such as E-commerce, online education, and finance.

Before you begin

To help you better use the HPA feature, read the official Kubernetes document Horizontal Pod Autoscaling to understand its basic principles, algorithm details, and configurable scaling behaviors.

In addition, ACK clusters provide various workload scaling (scheduling layer elasticity) and node scaling (resource layer elasticity) solutions. Before you proceed, read Auto Scaling to understand the use cases and limitations of each solution.

Prerequisites

  • You have created an ACK managed cluster or an ACK dedicated cluster. For more information, see Create a cluster.

  • If you plan to use kubectl commands to implement HPA, connect to your Kubernetes cluster using kubectl. For more information, see Connect to an ACK cluster using kubectl.

Create an HPA application in the console

Container Service for Kubernetes (ACK) is integrated with HPA, which lets you create HPA policies in the ACK console. You can create an HPA policy when you create an application or enable HPA for an existing application. We recommend that you create only one HPA policy for each workload.

Create an HPA when you create an application

The following example shows how to enable Horizontal Pod Autoscaler (HPA) for a stateless Deployment. The steps are similar for other types of workloads.

  1. Log on to the ACK console. In the left navigation pane, click Clusters.

  2. On the Clusters page, click the name of the cluster that you want to manage. In the navigation pane on the left, choose Workloads > Deployments.

  3. On the Stateless page, click Create From Image.

  4. On the Create page, configure the basic information, container, Service, and scaling settings for the application to create a Deployment that supports HPA.

    For more information about the steps and configuration items, see Create a stateless workload (Deployment). The following list describes only the key configuration items.

    • Basic Information: Configure the application name, number of replicas, and other information.

    • Container Configuration: Configure the image and the required CPU and memory resources.

      You can use the resource profile feature to analyze historical resource usage data and obtain recommendations for configuring container requests and limits. For more information, see Resource profile.

      Important

      You must set resource requests for the application. Otherwise, you cannot enable HPA.

    • Advanced Configuration:

      • In the Scaling Configuration section, select Enable for Metric-based Scaling and configure the scaling conditions and parameters.

        • Metric: CPU and memory are supported. The metric type must match the resource type for which you set a request. If you specify both CPU and memory, HPA triggers a scaling operation when it detects that either metric reaches its threshold.

        • Trigger Condition: The percentage of resource usage. When the specified usage is exceeded, the application starts to scale out. For more information about the horizontal pod autoscaling algorithm, see Algorithm details.

        • Maximum Replicas: The maximum number of replicas to which the Deployment can be scaled out. This value must be greater than the minimum number of replicas.

        • Minimum Replicas: The minimum number of replicas to which the Deployment can be scaled in. This value must be an integer greater than or equal to 1.

    After the Deployment is created, you can view it on the Deployments page. Click the Deployment name, and then click the Pod Scaling tab. On this tab, you can view HPA-related metrics, such as CPU or memory usage and the maximum or minimum number of replicas. You can also manage the HPA policy, such as by updating its configuration or disabling it.

Create an HPA for an existing application

The following steps describe how to enable HPA for an existing stateless application (Deployment). The steps for other workload types are similar.

Workload page

  1. Log on to the ACK console. In the left navigation pane, click Clusters.

  2. On the Clusters page, click the name of the cluster that you want to manage. In the navigation pane on the left, choose Workloads > Deployments.

  3. On the Stateless page, click the target application. On the Container Scaling tab, click Create in the HPA area.

  4. In the Create dialog box, set the scaling configuration.

    • Name: The name of the HPA policy.

    • Metric: Click Add.

      • Metric: CPU and memory are supported. The metric type must match the resource type for which you set a request. If you specify both CPU and memory, HPA triggers a scaling operation when it detects that either metric reaches its threshold.

      • Threshold: The percentage of resource usage. When the specified usage is exceeded, the application starts to scale out. For more information about the horizontal pod autoscaling algorithm, see Algorithm details.

  • Maximum Containers: The maximum number of replicas to which the Deployment can be scaled out. This value must be greater than the minimum number of replicas.

  • Minimum Containers: The minimum number of replicas to which the Deployment can be scaled in. This value must be an integer greater than or equal to 1.

After the configuration is complete, you can click the Deployment name on the Deployments page, and then click the Pod Scaling tab. On this tab, you can view HPA-related metrics, such as CPU or memory usage and the maximum or minimum number of replicas. You can also manage the HPA policy, such as by updating its configuration or disabling it.

Workload scaling page

Note

This page is available only to users in the whitelist. To use it, submit a ticket.

  1. Log on to the ACK console. In the left navigation pane, click Clusters.

  2. On the Clusters page, find the cluster you want and click its name. In the left navigation pane, click Workload Scaling.

  3. In the upper-right corner of the page, click Create Auto Scaling, and then click the HPA and CronHPA tab. Select the target workload. In the Configure Scaling Policy section, select the Metric-based Auto Scaling (HPA) checkbox and configure the HPA policy.

    • Policy Name: The name of the HPA policy.

    • Minimum Containers: The minimum number of replicas to which the workload can be scaled in. This value must be an integer greater than or equal to 1.

    • Maximum Containers: The maximum number of replicas to which the workload can be scaled out. This value must be greater than the minimum number of replicas.

    • Metric: CPU, GPU, memory, Nginx Ingress requests, and custom metrics are supported. The metric type must match the resource type for which you set a request. If you specify multiple resource types, HPA triggers a scaling operation when it detects that any metric reaches its threshold.

    • Threshold: The percentage of resource usage. When the specified usage is exceeded, the application starts to scale out. For more information about the horizontal pod autoscaling algorithm, see Algorithm details.

After the HPA policy is created, you can view it on the Workload Scaling page. In the Actions column, you can view HPA-related metrics, such as resource usage and the maximum and minimum number of replicas. You can also manage the HPA policy, such as by updating its configuration or disabling it.

Result verification

  1. On the Clusters page, find the cluster you want and click its name. In the left navigation pane, click Workload Scaling.

  2. Click the Horizontal Scaling tab, and then click HPA to view the scaling status and the list of tasks.

Note

In a production environment, the application scales based on the pod load. You can also perform stress testing on pods in a staging environment to verify the horizontal scaling behavior.

Create an HPA application using kubectl

You can also manually create an HPA from an orchestration template and attach it to the Deployment object that you want to scale. Then, you can use kubectl commands to configure autoscaling for the application. We recommend that you create only one HPA for each workload. The following example shows how to deploy an Nginx application that supports HPA.

  1. Create a file named nginx.yml and copy the following content into it.

    Important

    When you implement HPA, you must set the request resources for the pod. Otherwise, HPA cannot run. You can use the resource profile feature to analyze historical resource usage data and obtain recommendations for configuring container requests and limits. For more information, see Resource profile.

    Expand to view the YAML example

    apiVersion: apps/v1 
    kind: Deployment
    metadata:
      name: nginx
      labels:
        app: nginx
    spec:
      replicas: 2
      selector:
        matchLabels:
          app: nginx  
      template:
        metadata:
          labels:
            app: nginx
        spec:
          containers:
          - name: nginx
            image: nginx:1.7.9 # Replace with the actual <image_name:tags>.
            ports:
            - containerPort: 80
            resources:
              requests:         # You must set requests. Otherwise, HPA cannot perform calculations and the metric is displayed as unknown.
                cpu: 500m
  2. Run the following command to create the Nginx application.

    kubectl apply -f nginx.yml
  3. Create a file named hpa.yml and copy the following content into it to create an HPA.

    Use scaleTargetRef to specify the object to which the HPA is attached. In this example, the HPA is attached to the Deployment named nginx. A scaling operation is triggered when the average CPU usage of all containers in all pods reaches 50%.

    1.24 and later

    apiVersion: autoscaling/v2
    kind: HorizontalPodAutoscaler
    metadata:
      name: nginx-hpa
    spec:
      scaleTargetRef:
        apiVersion: apps/v1
        kind: Deployment
        name: nginx
      minReplicas: 1  # The minimum number of replicas to which the deployment can be scaled in. This must be an integer greater than or equal to 1.
      maxReplicas: 10  # The maximum number of replicas to which the deployment can be scaled out. This must be greater than minReplicas.
      metrics:
      - type: Resource
        resource:
          name: cpu
          target:
            type: Utilization
            averageUtilization: 50 # The target average utilization of the resource. This is the ratio of the average resource usage to the requested resource amount.
                   

    Version 1.24 and earlier

    apiVersion: autoscaling/v2beta2
    kind: HorizontalPodAutoscaler
    metadata:
      name: nginx-hpa
    spec:
      scaleTargetRef:
        apiVersion: apps/v1
        kind: Deployment
        name: nginx
      minReplicas: 1  # Must be an integer greater than or equal to 1.
      maxReplicas: 10  # Must be greater than the minimum number of replicas.
      metrics:
      - type: Resource
        resource:
          name: cpu
          target:
            type: Utilization
            averageUtilization: 50
                   

    To specify both CPU and memory metrics, specify both cpu and memory resource types in the metrics field instead of creating two HPAs. When HPA detects that either metric reaches its threshold, it triggers a scaling operation.

    metrics:
      - type: Resource
        resource:
          name: cpu
          target:
            type: Utilization
            averageUtilization: 50
      - type: Resource
        resource:
          name: memory
          target:
            type: Utilization
            averageUtilization: 50
  4. Run the following command to create the HPA.

    kubectl apply -f hpa.yml

    At this point, run kubectl describe hpa <HPA_name>. In this example, the HPA name is nginx-hpa. The expected warning message indicates that the HPA is still being deployed. You can run the kubectl get hpa command to check the HPA status.

    Warning  FailedGetResourceMetric       2m (x6 over 4m)  horizontal-pod-autoscaler  missing request for cpu on container nginx in pod default/nginx-deployment-basic-75675f5897-mqzs7
    
    Warning  FailedComputeMetricsReplicas  2m (x6 over 4m)  horizontal-pod-autoscaler  failed to get cpu utilization: missing request for cpu on container nginx in pod default/nginx-deployment-basic-75675f5
  5. Wait for the HPA to be created and for the pods to meet the scaling condition. In this example, the condition is met when the CPU usage of the Nginx pod exceeds 50%. Then, run the kubectl describe hpa <HPA_name> command again to check the horizontal scaling status.

    The expected output indicates that the HPA is running as expected.

    Type    Reason             Age   From                       Message
      ----    ------             ----  ----                       -------
      Normal  SuccessfulRescale  5m6s  horizontal-pod-autoscaler  New size: 1; reason: All metrics below target

Related operations

If the default scaling behavior does not meet your business requirements, you can use the behavior field to configure the scale-in (scaleDown) and scale-out (scaleUp) behaviors with finer granularity. For more information, see Configurable scaling behavior.

Typical scenarios supported by behavior include but are not limited to:

  • Achieving rapid scale-out during sudden traffic spikes.

  • Implementing rapid scale-out and slow scale-in in scenarios with frequent load fluctuations.

  • Disabling scale-in for state-sensitive applications.

  • In resource-limited or cost-sensitive scenarios, using the stabilization window stabilizationWindowSeconds to limit the scale-out speed and reduce frequent adjustments caused by transient fluctuations.

For more information about the behavior configuration and configuration examples, see Adjust the scaling sensitivity of HPA.

FAQ

References

Other related documents

Other workload scaling solutions

  • If your application resource usage changes periodically and you need to scale pods based on a Crontab-like policy, see Use CronHPA for scheduled horizontal scaling.

  • If your application resource usage changes periodically but is difficult to define with rules, you can use Advanced Horizontal Pod Autoscaling (AHPA). AHPA automatically identifies business cycles based on historical metrics to scale pods. For more information, see Predictive scaling (AHPA).

  • To automatically set resource limits for pods based on their resource usage so that they receive sufficient compute resources, see Use Vertical Pod Autoscaling (VPA).

  • To flexibly customize scaling policies for pods based on Kubernetes events such as message queues, scheduled policies, and custom metrics, see Event-driven autoscaling.

Combined solutions

You can use HPA with the node autoscaling feature to automatically scale nodes when cluster node resources are insufficient. For more information, see Enable node autoscaling.