All Products
Search
Document Center

Container Service for Kubernetes:Implement horizontal pod autoscaling

Last Updated:Jan 21, 2025

You can enable the Horizontal Pod Autoscaler (HPA) feature to automatically scale pods based on CPU utilization, memory usage, or other metrics. HPA can quickly scale out replicated pods to handle heavy stress when the workloads surge and scale in appropriately to save resources when the workloads decrease. The entire process is automated and requires no human intervention. It is ideal for businesses with large fluctuations in service, large numbers of services, and frequent scaling requirements, such as e-commerce services, online education, and financial services.

Prerequisites

Create an application that has HPA enabled in the ACK console

  1. Log on to the ACK console. In the left-side navigation pane, click Clusters.

  2. On the Clusters page, find the cluster that you want to manage and click the name of the cluster or click Details in the Actions column. The details page of the cluster appears.

  3. In the left-side navigation pane of the details page, choose Workloads > Deployments.

  4. On the Deployments page, click Create from Image.

  5. On the Create page, enter the basic information, container configuration, service configuration, and scaling configuration as prompted to create a Deployment that supports HPA.

    For more information about specific steps and configuration parameters, see Create a stateless application from an image. The following list describes the key parameters.

    • Basic Information: Set the information of the application, such as the name and number of replicas.

    • Container: Select the image and the required CPU and memory resources.

      Important

      Set the request resources required by the application. Otherwise, HPA does not take effect.

    • Advanced:

      • In the Access Control section, click Create to the right of Services to set the parameters.

      • In the Scaling section, select Enable for HPA and configure the condition and related parameters.

        • Metrics: Select CPU Usage or Memory Usage, which must be the same as the one you specified in the Required Resources field. If both CPU Usage and Memory Usage are specified, HPA will perform scaling operations when any one of the metrics reaches the scaling threshold.

        • Condition: Specify the resource usage threshold. HPA triggers scaling events when the threshold is exceeded. For more information about the algorithms that are used to perform horizontal pod autoscaling, see Algorithm details.

        • Max. Replicas: Specify the maximum number of pods to which the Deployment can be scaled. The value of this parameter must be greater than the minimum number of replicated pods.

        • Min. Replicas: Specify the minimum number of pods that must run for the Deployment. The value of this parameter must be an integer greater than or equal to 1.

Create an application that has HPA enabled by using kubectl

You can also create an HPA by using an orchestration template and associate the HPA with the Deployment for which you want to enable HPA. Then, you can run kubectl commands to enable HPA. We recommend that you create only one application that has HPA enabled for a workload. In the following example, HPA is enabled for an NGINX application.

  1. Create a file named nginx.yml and copy the following content to the file.

    Important

    You must configure the request resources required by the application. Otherwise, you cannot enable HPA.

    apiVersion: apps/v1 
    kind: Deployment
    metadata:
      name: nginx
      labels:
        app: nginx
    spec:
      replicas: 2
      selector:
        matchLabels:
          app: nginx  
      template:
        metadata:
          labels:
            app: nginx
        spec:
          containers:
          - name: nginx
            image: nginx:1.7.9 # Replace it with your actual <image_name:tags>.
            ports:
            - containerPort: 80
            resources:
              requests:                         # This parameter is required for running the HPA.
                cpu: 500m
  2. Run the following command to create an NGINX application:

    kubectl apply -f nginx.yml
  3. Create a file named hpa.yml and copy the following content to the file to create an HPA.

    Use the scaleTargetRef parameter to associate the HPA with the nginx Deployment and trigger scaling operations when the average CPU utilization of all containers in the pod reaches 50%.

    YAML template for clusters whose Kubernetes versions are 1.24 and later

    apiVersion: autoscaling/v2
    kind: HorizontalPodAutoscaler
    metadata:
      name: nginx-hpa
    spec:
      scaleTargetRef:
        apiVersion: apps/v1
        kind: Deployment
        name: nginx
      minReplicas: 1  # The minimum number of containers that must run for the Deployment. The value of this parameter must be an integer greater than or equal to 1.
      maxReplicas: 10  # The maximum number of containers to which the Deployment can be scaled. The value of this parameter must be greater than minReplicas.
      metrics:
      - type: Resource
        resource:
          name: cpu
          target:
            type: Utilization
            averageUtilization: 50 # The average utilization of the target resource, which is the ratio of the average value of resource usage to its request amount.
                   

    YAML template for clusters whose Kubernetes versions are earlier than 1.24

    apiVersion: autoscaling/v2beta2
    kind: HorizontalPodAutoscaler
    metadata:
      name: nginx-hpa
    spec:
      scaleTargetRef:
        apiVersion: apps/v1
        kind: Deployment
        name: nginx
      minReplicas: 1  # Must be an integer greater than or equal to 1.
      maxReplicas: 10  # Must be greater than minReplicas.
      metrics:
      - type: Resource
        resource:
          name: cpu
          target:
            type: Utilization
            averageUtilization: 50
                   

    If you need to specify both CPU and memory metrics, you can specify both cpu and memory types of resources under the metrics field instead of creating two HPAs. If HPA detects that any one of the metrics reaches the scaling threshold, it will perform scaling operations.

    metrics:
      - type: Resource
        resource:
          name: cpu
          target:
            type: Utilization
            averageUtilization: 50
      - type: Resource
        resource:
          name: memory
          target:
            type: Utilization
            averageUtilization: 50
  4. Run the following command to create an HPA:

    kubectl apply -f hpa.yml

    At this point, run the kubectl describe hpa <HPA name> command, a warning similar to the following output is returned, indicating that the HPA is still being deployed. The HPA name in this example is nginx-hpa.yml. You can run the kubectl get hpa command to check the status of the HPA.

    Warning  FailedGetResourceMetric       2m (x6 over 4m)  horizontal-pod-autoscaler  missing request for cpu on container nginx in pod default/nginx-deployment-basic-75675f5897-mqzs7
    
    Warning  FailedComputeMetricsReplicas  2m (x6 over 4m)  horizontal-pod-autoscaler  failed to get cpu utilization: missing request for cpu on container nginx in pod default/nginx-deployment-basic-75675f5
  5. Wait for the HPA to be created and the pod to reach the scaling condition, which is when the pod CPU utilization of NGINX exceeds 50% in this example. Then, run the kubectl describe hpa <HPA name> command again to check the horizontal scaling status.

    If the following output is returned, the HPA is running as expected:

    Type    Reason             Age   From                       Message
      ----    ------             ----  ----                       -------
      Normal  SuccessfulRescale  5m6s  horizontal-pod-autoscaler  New size: 1; reason: All metrics below target