Deploy AHPA

Container Service for Kubernetes supports Advanced Horizontal Pod Autoscaler (AHPA). AHPA predicts future resource demands by learning and analyzing historical data to dynamically adjust the number of pod replicas. This ensures that resources are scaled out and prefetched ahead of predicted demand peaks, and enhances the response speed and stability of the system. Conversely, AHPA scales down resources ahead of predicted demand troughs to save resource costs.

Prerequisites

An ACK managed cluster or an ACK Serverless cluster is created. For more information, see Create an ACK managed cluster or Create an ACK Serverless cluster.
Managed Service for Prometheus (Prometheus) is enabled, and application statistics within at least the last seven days are collected by Prometheus. The statistics include details of the CPU and memory resources that are used by applications. For more information about how to enable Prometheus, see Managed Service for Prometheus.

Step 1: Install the AHPA controller

Log on to the ACK console. In the left-side navigation pane, click Clusters.
On the Clusters page, find the cluster that you want to manage and click its name. In the left-side navigation pane, choose Operations > Add-ons.
On the Add-ons page, locate the AHPA Controller card. Click Install in the AHPA Controller card and follow the on-screen instructions to install the component.

Step 2: Add Prometheus as a data source

Log on to the ARMS console.
In the left-side navigation pane, choose Managed Service for Prometheus > Instances.
In the upper-left corner of the Instances page, select the region in which your Prometheus instance is deployed. Find the Prometheus instance that you want to manage and click its name. The name of the Prometheus instance is the same as that of the ACK cluster.
On the Settings page, find the HTTP API URL (Grafana Read URL) section, and record the following information:
- Optional. If access tokens are enabled, you must configure an access token for your cluster.
- View and record the internal endpoint.
Specify the endpoint of the Prometheus instance in the ACK cluster configurations.
1. Create a file named application-intelligence.yaml. Copy and paste the following content to the file.
  - prometheusUrl: The endpoint of the Prometheus instance.
  - token: The access token of the Prometheus instance.
```
apiVersion: v1
kind: ConfigMap
metadata:
  name: application-intelligence
  namespace: kube-system
data:
  prometheusUrl: "http://cn-hangzhou-intranet.arms.aliyuncs.com:9443/api/v1/prometheus/da9d7dece901db4c9fc7f5b9c40****/158120454317****/cc6df477a982145d986e3f79c985a****/cn-hangzhou"
  token: "eyJhxxxxx"
```
  Note
  If you want to view the Prometheus Service metrics displayed on the AHPA dashboard, set the following parameters in the ConfigMap:
  prometheus_writer_url: Specify the internal remote write endpoint of the Prometheus instance.
  prometheus_writer_ak: Specify the AccessKey ID of the Alibaba Cloud account.
  prometheus_writer_sk: Specify the AccessKey secret of the Alibaba Cloud account.
  For more information, see Enable Managed Service for Prometheus for AHPA.
2. Run the following command to deploy application-intelligence:
```
kubectl apply -f application-intelligence.yaml
```

Step 3: Deploy a test service

Deploy a test service that consists of a Deployment named fib-deployment and a Service named fib-svc. Deploy an application named fib-loader that is used to send requests to the test service to simulate traffic fluctuation. Then, deploy Horizontal Pod Autoscaler (HPA) to scale the test service. This way, you can compare the HPA scaling results and AHPA prediction results.

Create a file named demo.yaml. Copy and paste the following content to the file:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: fib-deployment
  namespace: default
  annotations:
    k8s.aliyun.com/eci-use-specs: "1-2Gi"
spec:
  replicas: 1
  selector:
    matchLabels:
      app: fib-deployment
  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%
    type: RollingUpdate
  template:
    metadata:
      creationTimestamp: null
      labels:
        app: fib-deployment
    spec:
      containers:
      - image: registry.cn-huhehaote.aliyuncs.com/kubeway/knative-sample-fib-server:20200820-171837
        imagePullPolicy: IfNotPresent
        name: user-container
        ports:
        - containerPort: 8080
          name: user-port
          protocol: TCP
        resources:
          limits:
            cpu: "1"
            memory: 2000Mi
          requests:
            cpu: "1"
            memory: 2000Mi
---
apiVersion: v1
kind: Service
metadata:
  name: fib-svc
  namespace: default
spec:
  ports:
  - name: http
    port: 80
    protocol: TCP
    targetPort: 8080
  selector:
    app: fib-deployment
  sessionAffinity: None
  type: ClusterIP
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: fib-loader
  namespace: default
spec:
  progressDeadlineSeconds: 600
  replicas: 1
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app: fib-loader
  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%
    type: RollingUpdate
  template:
    metadata:
      creationTimestamp: null
      labels:
        app: fib-loader
    spec:
      containers:
      - args:
        - -c
        - |
          /ko-app/fib-loader --service-url="http://fib-svc.${NAMESPACE}?size=35&interval=0" --save-path=/tmp/fib-loader-chart.html
        command:
        - sh
        env:
        - name: NAMESPACE
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: metadata.namespace
        image: registry.cn-huhehaote.aliyuncs.com/kubeway/knative-sample-fib-loader:20201126-110434
        imagePullPolicy: IfNotPresent
        name: loader
        ports:
        - containerPort: 8090
          name: chart
          protocol: TCP
        resources:
          limits:
            cpu: "8"
            memory: 16000Mi
          requests:
            cpu: "2"
            memory: 4000Mi
---
apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
  name: fib-hpa
  namespace: default
spec:
  maxReplicas: 50
  minReplicas: 1
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: fib-deployment
  targetCPUUtilizationPercentage: 50
---

Step 4: Deploy AHPA

To deploy AHPA and configure an AHPA policy, perform the following steps:

Create a file named ahpa-demo.yaml. Copy and paste the following content to the file:

apiVersion: autoscaling.alibabacloud.com/v1beta1
kind: AdvancedHorizontalPodAutoscaler
metadata:
  name: ahpa-demo
spec:
  scaleStrategy: observer
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 40
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: fib-deployment 
  maxReplicas: 100
  minReplicas: 2
  stabilizationWindowSeconds: 300
  prediction:
    quantile: 95
    scaleUpForward: 180
  instanceBounds:
  - startTime: "2021-12-16 00:00:00"
    endTime: "2031-12-16 00:00:00"
    bounds:
    - cron: "* 0-8 ? * MON-FRI"
      maxReplicas: 15
      minReplicas: 4
    - cron: "* 9-15 ? * MON-FRI"
      maxReplicas: 15
      minReplicas: 10
    - cron: "* 16-23 ? * MON-FRI"
      maxReplicas: 20
      minReplicas: 15

The following table describes some of the parameters.

Parameter	Required	Description

Parameter	Required	Description
scaleTargetRef	Yes	The Deployment for which you want to configure predictive scaling.
metrics	Yes	The metrics based on which the AHPA policy is implemented. The following metrics are supported: CPU, GPU, memory, queries per second (QPS), and response time (RT).
target	Yes	The scaling threshold. For example, if you specify `averageUtilization: 40`, the CPU utilization threshold for initiating scaling is 40%.
scaleStrategy	No	The scaling mode of AHPA. Default value: observer. Valid values: auto: AHPA automatically performs scaling operations. observer: AHPA observes the resource usage but does not perform scaling operations. You can use the observer mode to check whether AHPA works as expected. scalingUpOnly: AHPA performs only scale-out operations but does not perform scale-in operations. proactive: Only active predictions take effect. reactive: Only passive predictions take effect.
maxReplicas	Yes	The maximum number of replicated pods that are allowed.
minReplicas	Yes	The minimum number of replicated pods that must be guaranteed.
stabilizationWindowSeconds	No	The cooldown time of scale-in activities. Default value: 300. Unit: seconds.
prediction. quantile	Yes	The quantile at which the actual metric value is expected to fall below the scaling threshold. A larger value indicates a more conservative prediction, which suggests a higher probability that the system will scale resources in or out to ensure business stability. Valid values: 0 to 1. Default value: 0.99. The value is accurate to two decimal places. We recommend that you set the parameter to a value from 0.90 to 0.99.
prediction. scaleUpForward	Yes	The duration of a cold start, which is the time period from the point in time when a pod is created to the point in time when the pod is in the Ready state.
instanceBounds	No	The duration of a scaling operation. The number of replicated pods is limited by the maximum number and minimum number of replicated pods defined by AHPA. startTime: the start time of the scaling operation. endTime: the end time of the scaling operation.
instanceBounds. bounds. cron	No	This parameter is used to create a scheduled scaling job. The CRON expression `- cron: "* 0-8 ? * MON-FRI"` specifies that scaling jobs run from 00:00:00 to 08:00:00 on Monday to Friday each month.

The following table describes the fields that are contained in a CRON expression. For more information, see Cron expressions.

Field	Required	Valid value	Valid special character

Field	Required	Valid value	Valid special character
Minutes	Yes	0 to 59	* / , -
Hours	Yes	0 to 23	* / , -
Day of Month	Yes	1 to 31	* / , – ?
Month	Yes	1 to 12 or JAN to DEC	* / , -
Day of Week	No	0 to 6 or SUN to SAT	* / , – ?

Note

The Month and Day of Week fields are not case-sensitive. For example, you can specify SUN, Sun, or sun.
If you do not specify the Day of Week field, the default value * is used.
The following list describes the special characters:
- *: specifies an arbitrary value.
- /: specifies an increment.
- ,: separates a list of values.
- -: specifies a range.
- ?: specifies a placeholder.

Run the following command to apply the AHPA policy:
```
kubectl apply -f fib-deployment.yaml
```

Step 5: View the prediction results

Check whether AHPA works as expected. You can view the results by using Prometheus. For more information, see Enable Managed Service for Prometheus for AHPA.

Note

The AHPA prediction results are generated based on historical data within the last seven days. Therefore, you must wait seven days after you apply an AHPA policy. To apply an AHPA policy to an existing application, specify the application in the AHPA policy configurations.

In this example, the AHPA policy uses the observer scaling mode. The following figure shows the AHPA prediction results that are compared with the HPA scaling results. The HPA scaling results indicate the actual resource consumption during the application runtime. You can use the comparison to check whether the AHPA prediction results are as expected.

Actual and Predicted CPU Usage: The actual CPU usage based on HPA is represented by a green line. The CPU usage predicted by AHPA is represented by a yellow line.
- The preceding figure shows that the predicted CPU usage is higher than the actual CPU usage. This indicates that the predicted CPU capacity is sufficient.
- The preceding figure shows that the predicted CPU usage reaches a specific value earlier than the actual CPU usage. This indicates that the required resources are prepared in advance.

Pod Trends: The actual number of pods that are provisioned by HPA is represented by a green line. The number of pods that are predicted by AHPA is represented by a yellow line.
- The preceding figure shows that the value represented by the yellow line is less than the value represented by the green line. This indicates that the predicted number of pods is less than the actual number of pods.
- The preceding figure shows that the yellow curve is smoother than the green curve. This indicates that the changes in pod number are mild by using the AHPA scaling service, which improves business stability.

The results show that AHPA can use predictive scaling to handle fluctuating workloads as expected. After you confirm the prediction results, you can set the scaling mode to auto, which allows AHPA to automatically scale pods.

References

For more information about how to use Prometheus to monitor GPU metrics and how to use AHPA to perform predictive scaling based on GPU metrics, see Use AHPA to perform predictive scaling based on GPU metrics.
For more information about how to use dashboards provided by Prometheus to monitor your applications, see Enable Managed Service for Prometheus for AHPA.