All Products
Search
Document Center

Container Service for Kubernetes:Use Horizontal Pod Autoscaling (HPA)

Last Updated:Feb 24, 2026

To automatically scale the number of Pods based on CPU utilization, memory usage, or other custom metrics, use Horizontal Pod Autoscaler (HPA) for your application's Pods. HPA automatically scales out by adding Pod replicas to handle sudden workload spikes and scales in by removing replicas when the workload decreases, which saves resources. HPA is ideal for workloads with fluctuating traffic and many services that require frequent scaling, such as e-commerce, online education, and financial services.

Before you begin

To use the HPA feature effectively, review the official Kubernetes documentation on Horizontal Pod Autoscaling to learn about its basic principles, algorithms, and configurable scaling behaviors.

In addition, Container Service for Kubernetes (ACK) provides various solutions for Workload Scaling (application-layer elasticity) and Node Scaling (resource-layer elasticity). Before proceeding, read Auto scaling overview for the use cases and limitations of different solutions.

Prerequisites

Create an HPA-enabled application in the console

ACK integrates the HPA feature, allowing you to create HPA-enabled applications in the ACK console. You can create an HPA when creating a new application, or enable it for an existing one. Create only one HPA per Workload to avoid conflicting scaling policies and ensure predictable behavior.

Create an HPA when creating an application

The following steps use a stateless Deployment as an example to show how to enable HPA for an application. The procedure is similar for other types of Workloads.

  1. Log on to the ACK console. In the left navigation pane, click Clusters.

  2. On the Clusters page, find the cluster you want to manage and click its name. In the left navigation pane, choose Workloads > Deployments.

  3. On the Deployments page, click Create from Image.

  4. On the Create page, configure the basic information, container settings, service settings, and scaling settings as prompted to create an HPA-enabled Deployment.

    For detailed steps and configuration descriptions, see Create a stateless workload (Deployment). The following section describes only the key configurations.

    • Basic Information: Configure the application's name, number of replicas, and other settings.

    • Container: Configure the image and specify the required CPU and memory resources.

      Use the resource profiling feature to analyze historical resource usage data and get recommendations for configuring container Requests and Limits. For more information, see Resource profiling.

      Important

      You must set resource Requests for your application; otherwise, HPA will not function.

    • Advanced:

      • In the Scaling section, select the Enable checkbox for HPA and configure the scaling conditions and parameters.

        • Metric: Supports CPU Usage and Memory Usage. The metric type must match the resource type for which you have set a Request. If you specify both CPU and Memory, scaling triggers as soon as either metric reaches its threshold.

        • Condition: The target resource utilization percentage. When the usage exceeds this value, the Pods scale out. For details about the Horizontal Pod Autoscaling algorithm, see Algorithm details.

        • Max. Replicas: The maximum number of replicas for the Deployment. This value must be greater than the minimum number of replicas.

        • Min. Replicas: The minimum number of replicas for the Deployment. This value must be an integer greater than or equal to 1.

    After the Deployment is created, you can view it on the Deployments page. Click the Deployment name, then click the Pods tab on the details page. Here, you can view HPA-related metrics, such as CPU or memory utilization and the maximum and minimum number of replicas. You can also manage the HPA, including updating its configuration or disabling it.

Create an HPA for an existing application

The following steps use a stateless Deployment as an example to show how to enable HPA for an existing application. The procedure is similar for other types of Workloads.

Workload page

  1. Log on to the ACK console. In the left navigation pane, click Clusters.

  2. On the Clusters page, find the cluster you want to manage and click its name. In the left navigation pane, choose Workloads > Deployments.

  3. On the Deployments page, click the target application. On the Pod Scaling tab, click Create in the HPA area.

  4. In the Create dialog box, set the scaling configurations as prompted.

    • Name: The name of the HPA policy.

    • Metric: Click Add.

      • Metric Name: Supports CPU Usage and Memory Usage. The metric type must match the resource type for which you have set a Request. If you specify both CPU and Memory, scaling triggers as soon as either metric reaches its threshold.

      • Threshold: The target resource utilization percentage. When the usage exceeds this value, the Pods scale out. For details about the Horizontal Pod Autoscaling algorithm, see Algorithm details.

  • Max. Containers: The maximum number of replicas for the Deployment. This value must be greater than the minimum number of replicas.

  • Min. Containers: The minimum number of replicas for the Deployment. This value must be an integer greater than or equal to 1.

After the configuration is complete, go to the Deployments page, click the Deployment name, then click the Pod Scaling tab. Here, you can view HPA-related metrics, such as CPU or memory utilization and the maximum or minimum number of replicas. You can also manage the HPA, including updating its configuration or disabling it.

Workload scaling page

Note

This page is available only to allowlisted users. To request access, submit a ticket.

  1. Log on to the ACK console. In the left navigation pane, click Clusters.

  2. On the Clusters page, find the cluster you want and click its name. In the left navigation pane, click Workload Scaling.

  3. In the upper-right corner of the page, click Create Auto Scaling, and then click the HPA and CronHPA tab. Select the target workload, select the Metric-based Auto Scaling (HPA) checkbox under Configure Scaling Policy, and configure the HPA policy as prompted.

    • Policy Name: The name of the HPA policy.

    • Min. Replicas: The minimum number of replicas for the Deployment. This value must be an integer greater than or equal to 1.

    • Max. Replicas: The maximum number of replicas for the Deployment. This value must be greater than the minimum number of replicas.

    • Metric Name: Supports CPU, GPU, Memory, Nginx Ingress requests, and custom metrics. The metric type must match the resource type for which you have set a Request. If you specify multiple metric types, scaling triggers as soon as any one of the metrics reaches its threshold.

    • Threshold: The target resource utilization percentage. When the usage exceeds this value, the Pods scale out. For details about the Horizontal Pod Autoscaling algorithm, see Algorithm details.

After the HPA is created, you can view the list of HPAs on the Workload Scaling page. In the Actions column, you can view HPA-related metrics, such as resource utilization and the maximum or minimum number of replicas. You can also manage the HPA, including updating its configuration or disabling it.

Verify the results

  1. On the Clusters page, find the cluster you want and click its name. In the left navigation pane, click Workload Scaling.

  2. Click the Horizontal Scaling tab, and then select HPA to view the scaling status and task list.

Note

In a production environment, the application scales based on workload. To verify this behavior in a test environment, perform stress testing on the Pods.

Create an HPA-enabled application by using kubectl

You can also manually create an HPA from a manifest and bind it to the Deployment object you want to scale. This lets you configure automatic Pod scaling by using kubectl commands. Create only one HPA per Workload to avoid conflicting scaling policies and ensure predictable behavior. The following steps use an HPA-enabled Nginx application as an example.

  1. Create a file named nginx.yml and copy the following content into it.

    Important

    To implement HPA, you must set the resource requests for your Pods. Otherwise, HPA will not function. Use the Resource Profile feature to analyze historical resource usage data and get recommendations for configuring container Requests and Limits. For more information, see Resource profile.

    Sample YAML

    apiVersion: apps/v1 
    kind: Deployment
    metadata:
      name: nginx
      labels:
        app: nginx
    spec:
      replicas: 2
      selector:
        matchLabels:
          app: nginx  
      template:
        metadata:
          labels:
            app: nginx
        spec:
          containers:
          - name: nginx
            image: nginx:1.7.9 # Replace with the actual <image_name:tags>.
            ports:
            - containerPort: 80
            resources:
              requests:         # You must set requests. Otherwise, HPA cannot perform calculations and the metric is displayed as unknown.
                cpu: 500m
  2. Run the following command to create the Nginx application.

    kubectl apply -f nginx.yml
  3. Create a file named hpa.yml and copy the following content into it to create the HPA.

    The scaleTargetRef field specifies the object to which this HPA is bound. In this example, it is bound to the Deployment named nginx. The HPA will trigger scaling when the average CPU utilization across all Pods reaches 50%.

    Kubernetes 1.24 and later

    apiVersion: autoscaling/v2
    kind: HorizontalPodAutoscaler
    metadata:
      name: nginx-hpa
    spec:
      scaleTargetRef:
        apiVersion: apps/v1
        kind: Deployment
        name: nginx
      minReplicas: 1  # The minimum number of replicas to which the deployment can be scaled in. This must be an integer greater than or equal to 1.
      maxReplicas: 10  # The maximum number of replicas to which the deployment can be scaled out. This must be greater than minReplicas.
      metrics:
      - type: Resource
        resource:
          name: cpu
          target:
            type: Utilization
            averageUtilization: 50 # The target average utilization of the resource. This is the ratio of the average resource usage to the requested resource amount.
                   

    Kubernetes earlier than 1.24

    apiVersion: autoscaling/v2beta2
    kind: HorizontalPodAutoscaler
    metadata:
      name: nginx-hpa
    spec:
      scaleTargetRef:
        apiVersion: apps/v1
        kind: Deployment
        name: nginx
      minReplicas: 1  # Must be an integer greater than or equal to 1.
      maxReplicas: 10  # Must be greater than the minimum number of replicas.
      metrics:
      - type: Resource
        resource:
          name: cpu
          target:
            type: Utilization
            averageUtilization: 50
                   

    To scale on both CPU and memory, add both resources under the metrics field instead of creating two separate HPAs. Scaling triggers as soon as any of the specified metrics reaches its threshold.

    metrics:
      - type: Resource
        resource:
          name: cpu
          target:
            type: Utilization
            averageUtilization: 50
      - type: Resource
        resource:
          name: memory
          target:
            type: Utilization
            averageUtilization: 50
  4. Run the following command to create the HPA.

    kubectl apply -f hpa.yml

    While the HPA is deploying, running kubectl describe hpa <HPA_NAME> (in this example, nginx-hpa) may show warning messages. Run the kubectl get hpa command to check the status of the HPA.

    Warning  FailedGetResourceMetric       2m (x6 over 4m)  horizontal-pod-autoscaler  missing request for cpu on container nginx in pod default/nginx-deployment-basic-75675f5897-mqzs7
    
    Warning  FailedComputeMetricsReplicas  2m (x6 over 4m)  horizontal-pod-autoscaler  failed to get cpu utilization: missing request for cpu on container nginx in pod default/nginx-deployment-basic-75675f5
  5. Wait for the HPA to be created successfully and for the Pods to meet the scaling condition (in this example, when the CPU utilization of the Nginx Pods exceeds 50%). Then, run the kubectl describe hpa <HPA_NAME> command again to check the horizontal scaling status.

    The expected output below indicates that the HPA is running correctly.

    Type    Reason             Age   From                       Message
      ----    ------             ----  ----                       -------
      Normal  SuccessfulRescale  5m6s  horizontal-pod-autoscaler  New size: 1; reason: All metrics below target

Related operations

If the default scaling behavior does not meet your business needs, use the behavior field to configure more granular scaling-down (scaleDown) and scaling-up (scaleUp) behaviors. For more information, see Configurable scaling behavior.

Use cases for the behavior field include:

  • Scaling out rapidly during sudden traffic surges.

  • Scaling out quickly and scaling in slowly in environments with frequent workload fluctuations.

  • Disabling scaling in for state-sensitive applications.

  • In resource-constrained or cost-sensitive scenarios, using a stabilizationWindowSeconds to limit the rate of scaling out and reduce frequent adjustments caused by brief fluctuations.

For configuration details and examples of the behavior field, see Adjust the scaling sensitivity of HPA.

FAQ

Related documents

Related tasks

Other Workload Scaling solutions

  • If your application has predictable, periodic resource usage and you need to scale Pods on a schedule, see Use CronHPA for scheduled horizontal scaling.

  • If your application's resource usage is cyclical but difficult to define with rules, you can use the Advanced Horizontal Pod Autoscaler (AHPA). It automatically identifies business cycles from historical metrics to scale Pods. For more information, see Predictive scaling (AHPA).

  • To automatically set resource requests and limits for your Pods based on their resource usage, ensuring they receive sufficient computing resources, see Use Vertical Pod Autoscaling (VPA).

  • To flexibly customize scaling policies and scale Pods based on Kubernetes events such as message queues, scheduled tasks, or custom metrics, see Event-driven autoscaling.

Combined solutions

You can combine HPA with node auto-scaling, which allows the cluster to automatically add nodes when its resources are insufficient. For more information, see Enable node autoscaling.