Advanced Horizontal Pod Autoscaler (AHPA) can perform predictive scaling based on historical metrics, such as requests per second (RPS), concurrency, CPU, and memory usage. This predictive capability allows for proactive scaling planning, helping to prevent delays in service scaling. AHPA also allows you to specify the maximum and the minimum number of replicated pods within a period of time. By using cron expressions, you can set scaling ranges for particular time intervals, specifying the desired replica counts to ensure optimal resource allocation during different times of the day.
Prerequisites
Knative is deployed in your cluster. For more information, see Deploy Knative.
AHPA is deployed. For more information, see Deploy AHPA.
Step 1: Use AHPA to configure metrics for auto scaling
Use the following YAML template to create an AHPA configuration file and deploy it to the cluster:
apiVersion: autoscaling.alibabacloud.com/v1beta1
kind: AdvancedHorizontalPodAutoscalerTemplate
metadata:
name: ahpa-demo
spec:
metrics:
- type: Resource
resource:
name: rps
target:
type: Utilization
averageUtilization: 10 # The RPS threshold is set to 10.
maxReplicas: 50 # The maximum number of replicated pods is set to 50.
minReplicas: 0 # The minimum number of replicated pods is set to 0.
prediction:
quantile: 95 # The confidence level of prediction is set to 95%.
scaleUpForward: 180 # The time range of forward prediction is set to 180 seconds.
# The number of replicated pods is limited by the maximum number of replicated pods and the minimum number of replicated pods defined by AHPA from 00:00:00 on June 1, 2023 to 00:00:00 on June 1, 2123.
instanceBounds:
- startTime: "2023-06-01 00:00:00"
endTime: "2123-06-01 00:00:00"
bounds:
# The minimum number of replicated pods is 0 and the maximum number of replicated pods is 50 from 0 am to 6 am.
- cron: '* 0-6 ? * *'
maxReplicas: 50
minReplicas: 0
# The minimum number of replicated pods is 5 and the maximum number of replicated pods is 50 from 7 am to 9 am.
- cron: '* 7-9 ? * *'
maxReplicas: 50
minReplicas: 5
# The minimum number of replicated pods is 10 and the maximum number of replicated pods is 50 from 10 am to 4 pm.
- cron: '* 10-16 ? * *'
maxReplicas: 50
minReplicas: 10
# The minimum number of replicated pods is 2 and the maximum number of replicated pods is 50 from 5 pm to 11 pm.
- cron: '* 17-23 ? * *'
maxReplicas: 50
minReplicas: 2
Parameter | Required | Description |
metrics | Yes | Configure metrics for auto scaling. The RPS, concurrency, CPU, and memory metrics are supported. |
maxReplicas | Yes | The maximum number of replicated pods that are allowed. |
minReplicas | Yes | The minimum number of replicated pods that must be guaranteed. |
instanceBounds | No | The time period during which the number of replicated pods is limited by the maximum number of replicated pods and the minimum number of replicated pods defined by AHPA.
|
bounds | No | The maximum number of replicated pods and the minimum number of replicated pods within the specified time period.
|
Fields used in cron expressions
The following table describes the fields that are contained in a CRON expression. For more information, see Cron expressions.
Field | Special character | Required | Description |
Minutes | * / , - | Yes | Valid values: 0 to 59. |
Hours | * / , - | Yes | Valid values: 0 to 23. |
Day of month | * / , – ? | Yes | Valid values: 1 to 31. |
Month | * / , - | Yes | Valid values: 1 to 12 or JAN to DEC. Note The valid values from JAN to DEC are not case-sensitive. |
Day of week | * / , – ? | No | Valid values: 0 to 6 or SUN to SAT. Note
|
Special characters used in cron expressions:
An asterisk (*) indicates any value. For example,
*
indicates any minute or hour.A forward slash (/) indicates the step size. For example,
/5
indicates five time units.Commas (,) are used as delimiters. For example,
1,3,5
indicates values 1, 3, and 5.Hyphens (-) are used in value ranges. For example,
1-5
indicates values 1 to 5.Question marks (?) are used only in the Day of month and Day of week fields to indicate variable values.
Step 2: Create a Knative Service and enable AHPA for the Service
After you enable AHPA, you can use AHPA through the Knative Service.
Log on to the ACK console. In the left-side navigation pane, click Clusters.
On the Clusters page, find the cluster that you want to manage and click its name. In the left-side navigation pane, choose .
On the Services tab of the Knative page, set Namespace to default, click Create from Template, copy the following YAML content to the editor, and then click Create to create a Service named helloworld-go-demo.
apiVersion: serving.knative.dev/v1 kind: Service metadata: name: helloworld-go-demo spec: template: metadata: annotations: autoscaling.knative.dev/class: ahpa.autoscaling.knative.dev # Specify the AHPA plug-in. autoscaling.knative.dev.alibabacloud/ahpa-template: "ahpa-demo" # If you modify the AHPA template parameter, the corresponding revision is also updated. spec: containers: - image: registry.cn-hangzhou.aliyuncs.com/knative-sample/helloworld-go:73fbdd56 env: - name: TARGET value: "Knative"
After the Service is created, record the gateway address and domain name of the Service, which will be used in Step 3: Access the Service.
Step 3: Access the Service
Run the following command to access the Service:
# helloworld-go-demo.default.example.com is the default domain name of the Service.
# alb-i5lagvip6fga******.cn-shenzhen.alb.aliyuncs.com is the gateway address of the Service.
curl -H "Host: helloworld-go-demo.default.example.com" http://alb-i5lagvip6fga******.cn-shenzhen.alb.aliyuncs.com
Expected output:
Hello Knative!
Step 4 (Optional): Verify scheduled scaling
On the Monitoring Dashboards of Knative, you can view the trends of pod scaling for the Knative Service. For more information about the Knative dashboard, see View the Knative monitoring dashboard.
When the number of pods for a Knative application is scaled to zero, metrics such as the request concurrency and the number of requests sent to a pod per second cannot be collected by Managed Service for Prometheus. You can view these metrics in the console only after you access the pods of the Knative application.
When the number of pods for a Knative application is not zero, you can directly view the metrics in the console, such as the request concurrency and the number of requests sent to a pod per second. You do not need to access the pods of the Knative application.
References
You can configure auto scaling based on the number of concurrent pod requests and RPS configurations. For more information, see Enable auto scaling to withstand traffic fluctuations.