Implement horizontal pod autoscaling - Container Service for Kubernetes

You can enable the Horizontal Pod Autoscaler (HPA) feature to automatically scale pods based on CPU utilization, memory usage, or other metrics. HPA can quickly scale out replicated pods to handle heavy stress when the workloads surge and scale in appropriately to save resources when the workloads decrease. The entire process is automated and requires no human intervention. It is ideal for businesses with large fluctuations in service, large numbers of services, and frequent scaling requirements, such as e-commerce services, online education, and financial services.

Prerequisites

Create an application that has HPA enabled in the ACK console

Log on to the ACK console. In the left-side navigation pane, click Clusters.
On the Clusters page, find the cluster that you want to manage and click the name of the cluster or click Details in the Actions column. The details page of the cluster appears.
In the left-side navigation pane of the details page, choose Workloads > Deployments.
On the Deployments page, click Create from Image.
On the Create page, enter the basic information, container configuration, service configuration, and scaling configuration as prompted to create a Deployment that supports HPA.
For more information about specific steps and configuration parameters, see Create a stateless application from an image. The following list describes the key parameters.
- Basic Information: Set the information of the application, such as the name and number of replicas.
- Container: Select the image and the required CPU and memory resources.
  Important
  Set the request resources required by the application. Otherwise, HPA does not take effect.
- Advanced:
  - In the Access Control section, click Create to the right of Services to set the parameters.
  - In the Scaling section, select Enable for HPA and configure the condition and related parameters.
    - Metrics: Select CPU Usage or Memory Usage, which must be the same as the one you specified in the Required Resources field. If both CPU Usage and Memory Usage are specified, HPA will perform scaling operations when any one of the metrics reaches the scaling threshold.
    - Condition: Specify the resource usage threshold. HPA triggers scaling events when the threshold is exceeded. For more information about the algorithms that are used to perform horizontal pod autoscaling, see Algorithm details.
    - Max. Replicas: Specify the maximum number of pods to which the Deployment can be scaled. The value of this parameter must be greater than the minimum number of replicated pods.
    - Min. Replicas: Specify the minimum number of pods that must run for the Deployment. The value of this parameter must be an integer greater than or equal to 1.

Create an application that has HPA enabled by using kubectl

You can also create an HPA by using an orchestration template and associate the HPA with the Deployment for which you want to enable HPA. Then, you can run kubectl commands to enable HPA. We recommend that you create only one application that has HPA enabled for a workload. In the following example, HPA is enabled for an NGINX application.

Create a file named nginx.yml and copy the following content to the file.

Important

You must configure the request resources required by the application. Otherwise, you cannot enable HPA.

apiVersion: apps/v1 
kind: Deployment
metadata:
  name: nginx
  labels:
    app: nginx
spec:
  replicas: 2
  selector:
    matchLabels:
      app: nginx  
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx:1.7.9 # Replace it with your actual <image_name:tags>.
        ports:
        - containerPort: 80
        resources:
          requests:                         # This parameter is required for running the HPA.
            cpu: 500m

Run the following command to create an NGINX application:
```
kubectl apply -f nginx.yml
```

Create a file named hpa.yml and copy the following content to the file to create an HPA.

Use the scaleTargetRef parameter to associate the HPA with the nginx Deployment and trigger scaling operations when the average CPU utilization of all containers in the pod reaches 50%.

YAML template for clusters whose Kubernetes versions are 1.24 and later

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: nginx-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: nginx
  minReplicas: 1  # The minimum number of containers that must run for the Deployment. The value of this parameter must be an integer greater than or equal to 1.
  maxReplicas: 10  # The maximum number of containers to which the Deployment can be scaled. The value of this parameter must be greater than minReplicas.
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 50 # The average utilization of the target resource, which is the ratio of the average value of resource usage to its request amount.

YAML template for clusters whose Kubernetes versions are earlier than 1.24

apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
  name: nginx-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: nginx
  minReplicas: 1  # Must be an integer greater than or equal to 1.
  maxReplicas: 10  # Must be greater than minReplicas.
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 50

If you need to specify both CPU and memory metrics, you can specify both cpu and memory types of resources under the metrics field instead of creating two HPAs. If HPA detects that any one of the metrics reaches the scaling threshold, it will perform scaling operations.

metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 50
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 50

Run the following command to create an HPA:

kubectl apply -f hpa.yml

At this point, run the kubectl describe hpa <HPA name> command, a warning similar to the following output is returned, indicating that the HPA is still being deployed. The HPA name in this example is nginx-hpa.yml. You can run the kubectl get hpa command to check the status of the HPA.

Warning  FailedGetResourceMetric       2m (x6 over 4m)  horizontal-pod-autoscaler  missing request for cpu on container nginx in pod default/nginx-deployment-basic-75675f5897-mqzs7

Warning  FailedComputeMetricsReplicas  2m (x6 over 4m)  horizontal-pod-autoscaler  failed to get cpu utilization: missing request for cpu on container nginx in pod default/nginx-deployment-basic-75675f5

Wait for the HPA to be created and the pod to reach the scaling condition, which is when the pod CPU utilization of NGINX exceeds 50% in this example. Then, run the kubectl describe hpa <HPA name> command again to check the horizontal scaling status.
If the following output is returned, the HPA is running as expected:
```
Type    Reason             Age   From                       Message
  ----    ------             ----  ----                       -------
  Normal  SuccessfulRescale  5m6s  horizontal-pod-autoscaler  New size: 1; reason: All metrics below target
```