You can enable the Horizontal Pod Autoscaler (HPA) feature to automatically scale pods based on CPU utilization, memory usage, or other metrics. HPA can quickly scale out replicated pods to handle heavy stress when the workloads surge and scale in appropriately to save resources when the workloads decrease. The entire process is automated and requires no human intervention. It is ideal for businesses with large fluctuations in service, large numbers of services, and frequent scaling requirements, such as e-commerce services, online education, and financial services.
Prerequisites
Create an application that has HPA enabled in the ACK console
Log on to the ACK console. In the left-side navigation pane, click Clusters.
On the Clusters page, find the cluster that you want to manage and click the name of the cluster or click Details in the Actions column. The details page of the cluster appears.
In the left-side navigation pane of the details page, choose .
On the Deployments page, click Create from Image.
On the Create page, enter the basic information, container configuration, service configuration, and scaling configuration as prompted to create a Deployment that supports HPA.
For more information about specific steps and configuration parameters, see Create a stateless application from an image. The following list describes the key parameters.
Basic Information: Set the information of the application, such as the name and number of replicas.
Container: Select the image and the required CPU and memory resources.
ImportantSet the request resources required by the application. Otherwise, HPA does not take effect.
Advanced:
In the Access Control section, click Create to the right of Services to set the parameters.
In the Scaling section, select Enable for HPA and configure the condition and related parameters.
Metrics: Select CPU Usage or Memory Usage, which must be the same as the one you specified in the Required Resources field. If both CPU Usage and Memory Usage are specified, HPA will perform scaling operations when any one of the metrics reaches the scaling threshold.
Condition: Specify the resource usage threshold. HPA triggers scaling events when the threshold is exceeded. For more information about the algorithms that are used to perform horizontal pod autoscaling, see Algorithm details.
Max. Replicas: Specify the maximum number of pods to which the Deployment can be scaled. The value of this parameter must be greater than the minimum number of replicated pods.
Min. Replicas: Specify the minimum number of pods that must run for the Deployment. The value of this parameter must be an integer greater than or equal to 1.
Create an application that has HPA enabled by using kubectl
You can also create an HPA by using an orchestration template and associate the HPA with the Deployment for which you want to enable HPA. Then, you can run kubectl commands to enable HPA. We recommend that you create only one application that has HPA enabled for a workload. In the following example, HPA is enabled for an NGINX application.
Create a file named nginx.yml and copy the following content to the file.
ImportantYou must configure the
request
resources required by the application. Otherwise, you cannot enable HPA.apiVersion: apps/v1 kind: Deployment metadata: name: nginx labels: app: nginx spec: replicas: 2 selector: matchLabels: app: nginx template: metadata: labels: app: nginx spec: containers: - name: nginx image: nginx:1.7.9 # Replace it with your actual <image_name:tags>. ports: - containerPort: 80 resources: requests: # This parameter is required for running the HPA. cpu: 500m
Run the following command to create an NGINX application:
kubectl apply -f nginx.yml
Create a file named hpa.yml and copy the following content to the file to create an HPA.
Use the
scaleTargetRef
parameter to associate the HPA with thenginx
Deployment and trigger scaling operations when the average CPU utilization of all containers in the pod reaches 50%.YAML template for clusters whose Kubernetes versions are 1.24 and later
apiVersion: autoscaling/v2 kind: HorizontalPodAutoscaler metadata: name: nginx-hpa spec: scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: nginx minReplicas: 1 # The minimum number of containers that must run for the Deployment. The value of this parameter must be an integer greater than or equal to 1. maxReplicas: 10 # The maximum number of containers to which the Deployment can be scaled. The value of this parameter must be greater than minReplicas. metrics: - type: Resource resource: name: cpu target: type: Utilization averageUtilization: 50 # The average utilization of the target resource, which is the ratio of the average value of resource usage to its request amount.
YAML template for clusters whose Kubernetes versions are earlier than 1.24
apiVersion: autoscaling/v2beta2 kind: HorizontalPodAutoscaler metadata: name: nginx-hpa spec: scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: nginx minReplicas: 1 # Must be an integer greater than or equal to 1. maxReplicas: 10 # Must be greater than minReplicas. metrics: - type: Resource resource: name: cpu target: type: Utilization averageUtilization: 50
If you need to specify both CPU and memory metrics, you can specify both
cpu
andmemory
types of resources under themetrics
field instead of creating two HPAs. If HPA detects that any one of the metrics reaches the scaling threshold, it will perform scaling operations.metrics: - type: Resource resource: name: cpu target: type: Utilization averageUtilization: 50 - type: Resource resource: name: memory target: type: Utilization averageUtilization: 50
Run the following command to create an HPA:
kubectl apply -f hpa.yml
At this point, run the
kubectl describe hpa <HPA name>
command, a warning similar to the following output is returned, indicating that the HPA is still being deployed. The HPA name in this example is nginx-hpa.yml. You can run thekubectl get hpa
command to check the status of the HPA.Warning FailedGetResourceMetric 2m (x6 over 4m) horizontal-pod-autoscaler missing request for cpu on container nginx in pod default/nginx-deployment-basic-75675f5897-mqzs7 Warning FailedComputeMetricsReplicas 2m (x6 over 4m) horizontal-pod-autoscaler failed to get cpu utilization: missing request for cpu on container nginx in pod default/nginx-deployment-basic-75675f5
Wait for the HPA to be created and the pod to reach the scaling condition, which is when the pod CPU utilization of NGINX exceeds 50% in this example. Then, run the
kubectl describe hpa <HPA name>
command again to check the horizontal scaling status.If the following output is returned, the HPA is running as expected:
Type Reason Age From Message ---- ------ ---- ---- ------- Normal SuccessfulRescale 5m6s horizontal-pod-autoscaler New size: 1; reason: All metrics below target