Use Knative and AHPA to implement scheduled auto scaling

Advanced Horizontal Pod Autoscaler (AHPA) can perform predictive scaling based on historical metrics, such as requests per second (RPS), concurrency, CPU, and memory usage. This predictive capability allows for proactive scaling planning, helping to prevent delays in service scaling. AHPA also allows you to specify the maximum and the minimum number of replicated pods within a period of time. By using cron expressions, you can set scaling ranges for particular time intervals, specifying the desired replica counts to ensure optimal resource allocation during different times of the day.

Prerequisites

Knative is deployed in your cluster. For more information, see Deploy Knative.
AHPA is deployed. For more information, see Deploy AHPA.

Step 1: Use AHPA to configure metrics for auto scaling

Use the following YAML template to create an AHPA configuration file and deploy it to the cluster:

apiVersion: autoscaling.alibabacloud.com/v1beta1
kind: AdvancedHorizontalPodAutoscalerTemplate
metadata:
  name: ahpa-demo
spec:
  metrics:
  - type: Resource
    resource:
      name: rps
      target:
        type: Utilization
        averageUtilization: 10 # The RPS threshold is set to 10. 
  maxReplicas: 50 # The maximum number of replicated pods is set to 50. 
  minReplicas: 0 # The minimum number of replicated pods is set to 0. 
  prediction:
    quantile: 95 # The confidence level of prediction is set to 95%. 
    scaleUpForward: 180 # The time range of forward prediction is set to 180 seconds. 
# The number of replicated pods is limited by the maximum number of replicated pods and the minimum number of replicated pods defined by AHPA from 00:00:00 on June 1, 2023 to 00:00:00 on June 1, 2123. 
  instanceBounds:
  - startTime: "2023-06-01 00:00:00"
    endTime: "2123-06-01 00:00:00"
    bounds:
# The minimum number of replicated pods is 0 and the maximum number of replicated pods is 50 from 0 am to 6 am. 
    - cron: '* 0-6 ? * *'
      maxReplicas: 50
      minReplicas: 0
# The minimum number of replicated pods is 5 and the maximum number of replicated pods is 50 from 7 am to 9 am. 
    - cron: '* 7-9 ? * *'
      maxReplicas: 50
      minReplicas: 5
# The minimum number of replicated pods is 10 and the maximum number of replicated pods is 50 from 10 am to 4 pm. 
    - cron: '* 10-16 ? * *'
      maxReplicas: 50
      minReplicas: 10
# The minimum number of replicated pods is 2 and the maximum number of replicated pods is 50 from 5 pm to 11 pm. 
    - cron: '* 17-23 ? * *'
      maxReplicas: 50
      minReplicas: 2

Parameter	Required	Description

Parameter	Required	Description
metrics	Yes	Configure metrics for auto scaling. The RPS, concurrency, CPU, and memory metrics are supported.
maxReplicas	Yes	The maximum number of replicated pods that are allowed.
minReplicas	Yes	The minimum number of replicated pods that must be guaranteed.
instanceBounds	No	The time period during which the number of replicated pods is limited by the maximum number of replicated pods and the minimum number of replicated pods defined by AHPA. `startTime`: the start time. `endTime`: the end time.
bounds	No	The maximum number of replicated pods and the minimum number of replicated pods within the specified time period. cron: a cron expression that specifies a time period. You can enter a cron expression to configure a CronJob. For more information about how to use a cron expression to configure a CronJob or automatically scale out pods, refer to the Fields used in cron expressions section and view the definitions of the special characters and wildcard characters used in cron expressions. maxReplicas: the maximum number of replicated pods. minReplicas: the minimum number of replicated pods.

Fields used in cron expressions

The following table describes the fields that are contained in a CRON expression. For more information, see Cron expressions.

Field	Special character	Required	Description

Field	Special character	Required	Description
Minutes	* / , -	Yes	Valid values: 0 to 59.
Hours	* / , -	Yes	Valid values: 0 to 23.
Day of month	* / , – ?	Yes	Valid values: 1 to 31.
Month	* / , -	Yes	Valid values: 1 to 12 or JAN to DEC. Note The valid values from JAN to DEC are not case-sensitive.
Day of week	* / , – ?	No	Valid values: 0 to 6 or SUN to SAT. Note The valid values from SUN to SAT are not case-sensitive. For example, both SUN and sun indicate Sunday. If you do not specify the Day of week field, any day of the week is applied, which is equivalent to the wildcard character (`*`).

Special characters used in cron expressions:

An asterisk (*) indicates any value. For example, * indicates any minute or hour.
A forward slash (/) indicates the step size. For example, /5 indicates five time units.
Commas (,) are used as delimiters. For example, 1,3,5 indicates values 1, 3, and 5.
Hyphens (-) are used in value ranges. For example, 1-5 indicates values 1 to 5.
Question marks (?) are used only in the Day of month and Day of week fields to indicate variable values.

Step 2: Create a Knative Service and enable AHPA for the Service

After you enable AHPA, you can use AHPA through the Knative Service.

Log on to the ACK console. In the left-side navigation pane, click Clusters.
On the Clusters page, find the cluster that you want to manage and click its name. In the left-side navigation pane, choose Applications > Knative.

On the Services tab of the Knative page, set Namespace to default, click Create from Template, copy the following YAML content to the editor, and then click Create to create a Service named helloworld-go-demo.

apiVersion: serving.knative.dev/v1
kind: Service
metadata:
  name: helloworld-go-demo
spec:
  template:
    metadata:
      annotations:
        autoscaling.knative.dev/class: ahpa.autoscaling.knative.dev # Specify the AHPA plug-in. 
        autoscaling.knative.dev.alibabacloud/ahpa-template: "ahpa-demo" # If you modify the AHPA template parameter, the corresponding revision is also updated. 
    spec:
      containers:
      - image: registry.cn-hangzhou.aliyuncs.com/knative-sample/helloworld-go:73fbdd56
        env:
        - name: TARGET
          value: "Knative"

After the Service is created, record the gateway address and domain name of the Service, which will be used in Step 3: Access the Service.

Step 3: Access the Service

Run the following command to access the Service:

# helloworld-go-demo.default.example.com is the default domain name of the Service. 
# alb-i5lagvip6fga******.cn-shenzhen.alb.aliyuncs.com is the gateway address of the Service. 
curl -H "Host: helloworld-go-demo.default.example.com" http://alb-i5lagvip6fga******.cn-shenzhen.alb.aliyuncs.com

Expected output:

Hello Knative!

Step 4 (Optional): Verify scheduled scaling

On the Monitoring Dashboards of Knative, you can view the trends of pod scaling for the Knative Service. For more information about the Knative dashboard, see View the Knative monitoring dashboard.

Note

When the number of pods for a Knative application is scaled to zero, metrics such as the request concurrency and the number of requests sent to a pod per second cannot be collected by Managed Service for Prometheus. You can view these metrics in the console only after you access the pods of the Knative application.
When the number of pods for a Knative application is not zero, you can directly view the metrics in the console, such as the request concurrency and the number of requests sent to a pod per second. You do not need to access the pods of the Knative application.

References

You can configure auto scaling based on the number of concurrent pod requests and RPS configurations. For more information, see Enable auto scaling to withstand traffic fluctuations.