Implement horizontal pod autoscaling to scale pods based on CPU and memory - Container Service for Kubernetes

You can enable the Horizontal Pod Autoscaler (HPA) feature to automatically scale pods based on CPU utilization, memory usage, or other metrics. HPA can quickly scale out replicated pods to handle heavy stress when the workloads surge and scale in appropriately to save resources when the workloads decrease. The entire process is automated and requires no human intervention. It is ideal for businesses with large fluctuations in service, large numbers of services, and frequent scaling requirements, such as e-commerce services, online education, and financial services.

Before you begin

To help you better use the HPA feature, we recommend that you read the Kubernetes official documentation Horizontal Pod Autoscaling to understand the basic principles, algorithm details, and scaling configurations of HPA before reading this topic.

In addition, Container Service for Kubernetes (ACK) clusters provide various workload scaling solutions for scheduling layer elasticity and node scaling solutions for resource layer elasticity. We recommend that you read the Auto scaling overview to understand the applicable scenarios and usage limits of different solutions before using the HPA feature.

Prerequisites

An ACK managed cluster or ACK dedicated cluster is created. For more information, see Create a cluster.
If you use the kubectl command to implement HPA, you must make sure that a kubectl client is connected to the Kubernetes cluster. For more information, see Obtain the kubeconfig file of a cluster and use kubectl to connect to the cluster.

Create an application that has HPA enabled in the ACK console

ACK is integrated with HPA. You can create an application that has HPA enabled in the ACK console. You can enable HPA when you create an application or for an existing application. We recommend that you create only one application that has HPA enabled for a workload.

Enable HPA when you create an application

The following takes a Deployment application as an example to describe how to enable HPA when you create an application. The steps for other workload types are similar.

Log on to the ACK console. In the left-side navigation pane, click Clusters.
On the Clusters page, find the cluster that you want to manage and click its name. In the left-side pane, choose Workloads > Deployments.
On the Deployments page, click Create from Image.
On the Create page, enter the basic information, container configuration, service configuration, and scaling configuration as prompted to create a Deployment that supports HPA.
For more information about specific steps and configuration parameters, see Create a stateless application by using a Deployment. The following list describes the key parameters.
- Basic Information: Set the information of the application, such as the name and number of replicas.
- Container: Select the image and the required CPU and memory resources.
  You can use the resource profiling feature to analyze historical data of resource usage and get recommendations for configuring container requests and limits. For more information, see Resource profiling.
  Important
  You must configure the request resources required by the application. Otherwise, you cannot enable HPA.
- Advanced:
  - In the Access Control section, click Create to the right of Services to set the parameters.
  - In the Scaling section, select Enable for HPA and configure the scaling threshold and related parameters.
    - Metrics: Select CPU Usage or Memory Usage, which must be the same as the one you have specified in the Required Resources field. If both CPU Usage and Memory Usage are specified, HPA will perform scaling operations when any one of the metrics reaches the scaling threshold.
    - Condition: Specify the resource usage threshold. HPA triggers scaling events when the threshold is exceeded. For more information about the algorithms that are used to perform horizontal pod autoscaling, see Algorithm details.
    - Max. Replicas: Specify the maximum number of pods to which the Deployment can be scaled. The value of this parameter must be greater than the minimum number of replicated pods.
    - Min. Replicas: Specify the minimum number of pods that must run for the Deployment. The value of this parameter must be an integer greater than or equal to 1.
After creation, you can view the new Deployment on the Deployments page. Click the deployment name, and then click the Pod Scaling tab on the Basic Information page. In the HPA section, you can monitor metrics related to HPA activities, such as CPU and memory usage, and the maximum and minimum number of replicas. You can also manage HPA in this section, including updating its configuration and disabling it.

Enable HPA for an existing application

The following takes a Deployment application as an example to describe how to enable HPA for an existing application. The steps for other workload types are similar.

Use the workload page

Log on to the ACK console. In the left-side navigation pane, click Clusters.
On the Clusters page, find the cluster that you want to manage and click its name. In the left-side pane, choose Workloads > Deployments.
On the Deployments page, click the target application name, then click the Pod Scaling tab, then click Create to the right of HPA.
In the Create dialog box, configure the HPA settings as prompted.
- Name: Enter a name for the HPA policy.
- Metric: Select CPU Usage or Memory Usage, which must be the same as the one you have specified in the Required Resources field. If both CPU Usage and Memory Usage are specified, HPA will perform scaling operations when any one of the metrics reaches the scaling threshold.
- Threshold: Specify the resource usage threshold. HPA triggers scaling events when the threshold is exceeded. For more information about the algorithms that are used to perform horizontal pod autoscaling, see Algorithm details.
- Max. Containers: Specify the maximum number of pods to which the Deployment can be scaled. The value of this parameter must be greater than the minimum number of replicated pods.
- Min. Containers: Specify the minimum number of pods that must run for the Deployment. The value of this parameter must be an integer greater than or equal to 1.

After creation, you can click the Deployment name on the Deployments page. On the Basic Information page, click the Pod Scaling tab. In the HPA section, you can monitor metrics related to HPA activities, such as CPU and memory usage, and the maximum and minimum number of replicas. You can also manage HPA in this section, including updating its configuration and disabling it.

Use the workload scaling page

Note

This page is available for only users in whitelists. If you need to use it, submit a ticket to apply.

Log on to the ACK console. In the left-side navigation pane, click Clusters.
On the Clusters page, find the cluster that you want to manage and click its name. In the left-side pane, choose Auto Scaling > Workload Scaling.
In the upper right corner of the page, click Create Auto Scaling, select the target workload, then check the HPA option under the HPA and CronHPA tab, and configure the HPA policy as prompted.
- Scaling Policy Name: Enter a name for the HPA policy.
- Min. Containers: Specify the minimum number of pods that must run for the Deployment. The value of this parameter must be an integer greater than or equal to 1.
- Max. Containers: Specify the maximum number of pods to which the Deployment can be scaled. The value of this parameter must be greater than the minimum number of replicated pods.
- Scaling Metric: Supports CPU and memory, which need to be the same as the required resource types set. When both CPU and memory resource types are specified, HPA will perform scaling operations when any one of the metrics reaches the scaling threshold.
- Threshold: Specify the resource usage threshold. HPA triggers scaling events when the threshold is exceeded. For more information about the algorithms that are used to perform horizontal pod autoscaling, see Algorithm details.

After creation, you can view the HPA list on the Workload Scaling page. Click Details in the Actions column of the new HPA task. In the details page, you can monitor metrics related to HPA activities, such as CPU and memory usage, and the maximum and minimum number of replicas. You can also manage HPA in this section, including updating its configuration and disabling it.

Result verification

On the Clusters page, find the cluster that you want to manage and click its name. In the left-side pane, choose Auto Scaling > Workload Scaling.
Click the Horizontal Scaling tab, and then select HPA to view the scaling status and task list.

Note

After the application starts to run, container resources are automatically scaled based on the load among pods. You can also check whether HPA is enabled in the staging environment by performing a CPU stress test on the pods of the application.

Create an application that has HPA enabled by using kubectl

You can also create an HPA by using an orchestration template and associate the HPA with the Deployment for which you want to enable HPA. Then, you can run kubectl commands to enable HPA. We recommend that you create only one application that has HPA enabled for a workload. In the following example, HPA is enabled for an NGINX application.

Create a file named nginx.yml and copy the following content to the file.

Important

You must configure the request resources required by the application. Otherwise, you cannot enable HPA. You can use the resource profiling feature to analyze historical data of resource usage and get recommendations for configuring container requests and limits. For more information, see Resource profiling.

Expand to view YAML template

apiVersion: apps/v1 
kind: Deployment
metadata:
  name: nginx
  labels:
    app: nginx
spec:
  replicas: 2
  selector:
    matchLabels:
      app: nginx  
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx:1.7.9 # Replace it with your actual <image_name:tags>.
        ports:
        - containerPort: 80
        resources:
          requests:                         # This parameter is required for running the HPA.
            cpu: 500m

Run the following command to create an NGINX application:
```
kubectl apply -f nginx.yml
```

Create a file named hpa.yml and copy the following content to the file to create an HPA.

Use the scaleTargetRef parameter to associate the HPA with the nginx Deployment and trigger scaling operations when the average CPU utilization of all containers in the pod reaches 50%.

YAML template for clusters whose Kubernetes versions are 1.24 and later

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: nginx-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: nginx
  minReplicas: 1  # The minimum number of containers that must run for the Deployment. The value of this parameter must be an integer greater than or equal to 1.
  maxReplicas: 10  # The maximum number of containers to which the Deployment can be scaled. The value of this parameter must be greater than minReplicas.
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 50 # The average utilization of the target resource, which is the ratio of the average value of resource usage to its request amount.

YAML template for clusters whose Kubernetes versions are earlier than 1.24

apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
  name: nginx-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: nginx
  minReplicas: 1  # Must be an integer greater than or equal to 1.
  maxReplicas: 10  # Must be greater than minReplicas.
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 50

If you need to specify both CPU and memory metrics, you can specify both cpu and memory types of resources under the metrics field instead of creating two HPAs. If HPA detects that any one of the metrics reaches the scaling threshold, it will perform scaling operations.

metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 50
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 50

Run the following command to create an HPA:

kubectl apply -f hpa.yml

At this point, run the kubectl describe hpa <HPA name> command, a warning similar to the following output is returned, indicating that the HPA is still being deployed. The HPA name in this example is nginx-hpa.yml. You can run the kubectl get hpa command to check the status of the HPA.

Warning  FailedGetResourceMetric       2m (x6 over 4m)  horizontal-pod-autoscaler  missing request for cpu on container nginx in pod default/nginx-deployment-basic-75675f5897-mqzs7

Warning  FailedComputeMetricsReplicas  2m (x6 over 4m)  horizontal-pod-autoscaler  failed to get cpu utilization: missing request for cpu on container nginx in pod default/nginx-deployment-basic-75675f5

Wait for the HPA to be created and the pod to reach the scaling condition, which is when the pod CPU utilization of NGINX exceeds 50% in this example. Then, run the kubectl describe hpa <HPA name> command again to check the horizontal scaling status.
If the following output is returned, the HPA is running as expected:
```
Type    Reason             Age   From                       Message
  ----    ------             ----  ----                       -------
  Normal  SuccessfulRescale  5m6s  horizontal-pod-autoscaler  New size: 1; reason: All metrics below target
```

Related operations

If the default scaling behavior does not meet your business requirements, you can specify the scaleUp and scaleDown fields in the behavior parameter for more fine-grained scale-out and scale-in settings. For more information, see Configurable scaling behavior.

Typical scenarios supported by behavior include:

Achieve rapid scaling during traffic spikes.
Implement swift scaling up and slow scaling down in scenarios with frequent load fluctuations.
Prevent scaling down for applications that are sensitive to state.
Use the stabilization window stabilizationWindowSeconds in resource-constrained or cost-sensitive scenarios to limit the speed of scaling up, reducing frequent adjustments caused by transient fluctuations.

FAQs

References

Combination solutions

You can use HPA with the node auto scaling feature to automatically scale nodes when cluster node resources are insufficient. For more information, see Enable node auto scaling.

Container Service for Kubernetes:Implement horizontal pod autoscaling