Use HPA to achieve application auto scaling based on QPS - Container Service for Kubernetes

If your application needs to dynamically adjust the total amount of computing resources based on the number of requests received per unit of time, you can use the QPS data collected by an Application Load Balancer (ALB) instance to set up auto scaling for the pods of the application.

Before you start

Before you start, we recommend that you read Create an ALB Ingress to learn about the basic features of ALB ingresses.

How it works

Queries per second (QPS) is the number of requests received per second. ALB instances can record client access data through Simple Log Service (SLS). Horizontal Pod Autoscaler (HPA) can monitor the QPS data of the service based on these access records and scale the corresponding workloads (such as Deployment and StatefulSet).

Prerequisites

The alibaba-cloud-metrics-adapter component is installed, and the version is 2.3.0 or later. For more information, see Deploy alibaba-cloud-metrics-adapter.
The Apache Benchmark stress testing tool is installed. For more information, see the official document Compiling and Installing.
A kubectl client is connected to the ACK cluster. For more information, see Obtain the kubeconfig file of a cluster and use kubectl to connect to the cluster.
Two switches are created in different availability zones and are in the same VPC as the cluster. For more information, see Obtain the kubeconfig file of a cluster and use kubectl to connect to the cluster.

Step 1: Create an AlbConfig and associate an SLS project

Check the SLS project associated with the cluster.
1. Log on to the ACK console. In the left-side navigation pane, click Clusters.
2. On the Clusters page, find the cluster that you want to manage and click its name. In the left-side pane, click Cluster Information.
3. On the Basic Information tab, find the Log Service Project resource and record the name of the SLS project on the right.

Create an AlbConfig.

Create and copy the following content into alb-qps.yaml, and fill in the SLS project information in the accessLogConfig field.

apiVersion: alibabacloud.com/v1
kind: AlbConfig
metadata:
  name: alb-qps
spec:
  config:
    name: alb-qps
    addressType: Internet
    zoneMappings:
    - vSwitchId: vsw-uf6ccg2a9g71hx8go**** # ID of the virtual switch
    - vSwitchId: vsw-uf6nun9tql5t8nh15****
    accessLogConfig:
      logProject: <LOG_PROJECT> # Name of the log project associated with the cluster
      logStore: <LOG_STORE> # Custom logstore name, must start with "alb_"
  listeners:
    - port: 80
      protocol: HTTP

The following table describes the fields in the preceding code block:

Field

Type

Description

logProject

string

The name of the Simple Log Service (SLS) project.

Default value: "".

logStore

string

The name of the SLS Logstore, which must start with alb_. The SLS Logstore is automatically created If it does not exist. For more information, see Enable Simple Log Service to collect access logs.

Default value: "".

Run the following command to create AlbConfig:

 kubectl apply -f alb-qps.yaml

Expected output:

albconfig.alibabacloud.com/alb-qps created

Step 2: Create sample resources

In addition to AlbConfig, ALB Ingress requires four types of resources: Deployment, Service, IngressClass, and Ingress to work as expected. You can use the following example to quickly create these resources.

Create the qps-quickstart.yaml file with the following content:

apiVersion: networking.k8s.io/v1
kind: IngressClass
metadata:
  name: qps-ingressclass
spec:
  controller: ingress.k8s.alibabacloud/alb
  parameters:
    apiGroup: alibabacloud.com
    kind: AlbConfig
    name: alb-qps # Same as the name of AlbConfig
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: qps-ingress
spec:
  ingressClassName: qps-ingressclass # Same as the name of Ingress Class
  rules:
   - host: demo.alb.ingress.top # Replace with your domain name
     http:
      paths:
      - path: /qps
        pathType: Prefix
        backend:
          service:
            name: qps-svc
            port:
              number: 80
---
apiVersion: v1
kind: Service
metadata:
  name: qps-svc
  namespace: default
spec:
  ports:
    - port: 80
      protocol: TCP
      targetPort: 80
  selector:
    app: qps-deploy
  type: NodePort
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: qps-deploy
  labels:
    app: qps-deploy
spec:
  replicas: 2
  selector:
    matchLabels:
      app: qps-deploy
  template:
    metadata:
      labels:
        app: qps-deploy
    spec:
      containers:
      - name: qps-container
        image: nginx:1.7.9
        ports:
        - containerPort: 80

Run the following command to create the sample resources:
```
kubectl apply -f qps-quickstart.yaml
```

Step 3: Create an HPA

Create the qps-hpa.yaml file, then copy the following content into it, and save it:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: qps-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: qps-deploy # Name of the workload controlled by HPA
  minReplicas: 2 # Minimum number of pods
  maxReplicas: 10 # Maximum number of pods
  metrics:
    - type: External
      external:
        metric:
          name: sls_alb_ingress_qps # Metric name for QPS data, do not modify
          selector:
            matchLabels:
              sls.project: <LOG_PROJECT> # Name of the log project associated with the cluster
              sls.logstore: <LOG_STORE> # Custom logstore name
              sls.ingress.route: default-qps-svc-80 # Path of the service, parameter format is <namespace>-<svc>-<port>
        target:
          type: AverageType
          averageValue: 2 # Expected target for the metric, in this example, the average QPS for all pods is 2

The following table describes the fields in the preceding code block:

Field	Description
scaleTargetRef	The workload used by the application. This example uses the Deployment named qps-deployment created in Step 1.
minReplicas	The minimum number of containers that the Deployment can be scale to. This value needs to be set to an integer greater than or equal to 1.
maxReplicas	The maximum number of containers that the Deployment can be scale to. This value needs to be greater than the minimum number of replicas.
external.metric.name	The QPS-based metric for HPA. Do not modify the value.
sls.project	The SLS project for the metric. Set the value to the SLS project specified in the AlbConfig.
sls.logstore	The Logstore for the metric. Set the value to the Logstore specified in the AlbConfig.
sls.ingress.route	The path of the Service. Specify the value in the <namespace>-<svc>-<port> format. This example uses the qps-svc Service created in Step 1.
external.target	The target value for the metric. In this example, the average QPS for all pods is 2. The HPA will adjust the number of pods to make the QPS as close to the target value as possible.

Run the following command to create HPA.
```
kubectl apply -f qps-hpa.yaml
```

Run the following command to check the HPA deployment status.

kubectl get hpa

Expected output:

NAME      REFERENCE               TARGETS     MINPODS   MAXPODS   REPLICAS   AGE
qps-hpa   Deployment/qps-deploy   0/2 (avg)   2         10        2          5m41s

Run the following command to view HPA configuration information.

kubectl describe hpa qps-hpa

Expected output:

Name:                                            qps-hpa
Namespace:                                       default
Labels:                                          <none>
Annotations:                                     <none>
CreationTimestamp:                               ******** # Timestamp of HPA, can be ignored
Reference:                                       Deployment/qps-deployment
Metrics:                                         ( current / target )
  "sls_alb_ingress_qps" (target average value):  0 / 2
Min replicas:                                    2
Max replicas:                                    10
Deployment pods:                                 2 current / 2 desired

(Optional) Step 4: Verify auto scaling

Verify application scale-out.
1. Run the following command to view Ingress information.
```
kubectl get ingress
```
  Expected output:
```
NAME            CLASS                HOSTS                  ADDRESS                         PORTS     AGE
qps-ingress     qps-ingressclass     demo.alb.ingress.top   alb-********.alb.aliyuncs.com   80        10m31s
```
  Record the values of HOSTS and ADDRESS for use in subsequent steps.
2. Run the following command to perform stress testing on the application. Replace demo.alb.ingress.top and alb-********.alb.aliyuncs.com with the values obtained in the previous step.
```
ab -r -c 5 -n 10000 -H Host:demo.alb.ingress.top http://alb-********.alb.aliyuncs.com/qps
```
3. Run the following command to check the scaling status of the application.
```
kubectl get hpa
```
  Expected output:
```
NAME      REFERENCE               TARGETS          MINPODS   MAXPODS   REPLICAS   AGE
qps-hpa   Deployment/qps-deploy   14375m/2 (avg)   2         10        10         15m
```
  The result shows that REPLICAS is 10, indicating that as the QPS data increases, the number of pods of the application are scaled out to 10.
Verify application scale-in.
After the stress testing is complete, run the following command to check the scaling status of the application.
```
kubectl get hpa
```
Expected output:
```
NAME      REFERENCE               TARGETS     MINPODS   MAXPODS   REPLICAS   AGE
qps-hpa   Deployment/qps-deploy   0/2 (avg)   2         10        2          28m
```
The result shows that REPLICAS is 2, indicating that after the QPS data drops to 0, the application is scaled in to 2 pods.

References

If you need to scale your application based on pod CPU or memory load, see Horizontal Pod Autoscaler (HPA).
If you need to schedule the scaling of applications, see Cron Horizontal Pod Autoscaler (CronHPA).
For node auto scaling, see Overview of node scaling.