Use Knative and AHPA to implement scheduled auto scaling - Container Service for Kubernetes

When traffic patterns follow predictable daily cycles, reactive scaling alone can cause delays during traffic spikes. Advanced Horizontal Pod Autoscaler (AHPA) addresses this by performing predictive scaling based on historical metrics such as requests per second (RPS), concurrency, CPU, and memory usage. Combined with cron-based instance bounds, AHPA lets you define minimum and maximum replica counts for specific time windows, so your Knative Services scale proactively instead of reactively.

Prerequisites

Before you begin, make sure that you have:

Knative is deployed in your cluster. For more information, see Deploy and manage Knative.
AHPA deployed. For more information, see Deploy AHPA

Step 1: Configure AHPA metrics for auto scaling

Create an AHPA configuration file and deploy it to the cluster.

apiVersion: autoscaling.alibabacloud.com/v1beta1
kind: AdvancedHorizontalPodAutoscalerTemplate
metadata:
  name: ahpa-demo
spec:
  metrics:
  - type: Resource
    resource:
      name: rps
      target:
        type: Utilization
        averageUtilization: 10 # RPS threshold set to 10.
  maxReplicas: 50 # Maximum number of replicas set to 50.
  minReplicas: 0 # Minimum number of replicas set to 0.
  prediction:
    quantile: 95 # Prediction confidence level set to 95%.
    scaleUpForward: 180 # Forward prediction time range set to 180 seconds.
# Replica counts are bounded by the AHPA-defined max and min from 2023-06-01 to 2123-06-01.
  instanceBounds:
  - startTime: "2023-06-01 00:00:00"
    endTime: "2123-06-01 00:00:00"
    bounds:
# 0:00 AM - 6:00 AM: min 0, max 50
    - cron: '* 0-6 ? * *'
      maxReplicas: 50
      minReplicas: 0
# 7:00 AM - 9:00 AM: min 5, max 50
    - cron: '* 7-9 ? * *'
      maxReplicas: 50
      minReplicas: 5
# 10:00 AM - 4:00 PM: min 10, max 50
    - cron: '* 10-16 ? * *'
      maxReplicas: 50
      minReplicas: 10
# 5:00 PM - 11:00 PM: min 2, max 50
    - cron: '* 17-23 ? * *'
      maxReplicas: 50
      minReplicas: 2

The following table describes the AHPA parameters.

Parameter	Required	Description
metrics	Yes	The metrics used for auto scaling. Supported metrics: RPS, concurrency, CPU, and memory.
maxReplicas	Yes	The maximum number of replicas allowed.
minReplicas	Yes	The minimum number of replicas guaranteed.
instanceBounds	No	A time window that constrains replica counts to the AHPA-defined maximum and minimum. Contains `startTime` (start of the window) and `endTime` (end of the window).
bounds	No	The maximum and minimum replica counts within a specific time period. Contains: `cron` (a cron expression that defines the time period), `maxReplicas` (maximum replicas), and `minReplicas` (minimum replicas).

Cron expression fields

The following table describes the fields in a cron expression. For more information, see Cron expressions.

Field	Special character	Required	Description
Minutes	* / , -	Yes	Valid values: 0 to 59.
Hours	* / , -	Yes	Valid values: 0 to 23.
Day of month	* / , - ?	Yes	Valid values: 1 to 31.
Month	* / , -	Yes	Valid values: 1 to 12 or JAN to DEC. The values JAN to DEC are not case-sensitive.
Day of week	* / , - ?	No	Valid values: 0 to 6 or SUN to SAT. The values SUN to SAT are not case-sensitive. If not specified, any day of the week is applied, which is equivalent to the wildcard character (`*`).

Special characters used in cron expressions:

An asterisk (*) indicates any value. For example, * indicates any minute or hour.
A forward slash (/) indicates the step size. For example, /5 indicates every five time units.
Commas (,) are used as delimiters. For example, 1,3,5 indicates values 1, 3, and 5.
Hyphens (-) are used in value ranges. For example, 1-5 indicates values 1 to 5.
Question marks (?) are used only in the Day of month and Day of week fields to indicate variable values.

Step 2: Create a Knative Service and enable AHPA

After you deploy AHPA, you can use it through a Knative Service.

Log on to the Container Service Management Console . In the navigation pane on the left, click Clusters.
On the Clusters page, click the name of your cluster. In the navigation pane on the left, click Applications > Knative.

On the Services tab of the Knative page, set Namespace to default, click Create from Template, copy the following YAML content to the editor, and then click Create to create a Service named helloworld-go-demo.

apiVersion: serving.knative.dev/v1
kind: Service
metadata:
  name: helloworld-go-demo
spec:
  template:
    metadata:
      annotations:
        autoscaling.knative.dev/class: ahpa.autoscaling.knative.dev # Specify the AHPA plugin.
        autoscaling.knative.dev.alibabacloud/ahpa-template: "ahpa-demo" # If you modify the AHPA template parameter, the corresponding revision is also updated.
    spec:
      containers:
      - image: registry.cn-hangzhou.aliyuncs.com/knative-sample/helloworld-go:73fbdd56
        env:
        - name: TARGET
          value: "Knative"

After the Service is created, record the gateway address and domain name. You need them in Step 3.

Step 3: Access the Service

Run the following command to access the Service:

# helloworld-go-demo.default.example.com is the default domain name of the Service.
# alb-i5lagvip6fga******.cn-shenzhen.alb.aliyuncs.com is the gateway address of the Service.
curl -H "Host: helloworld-go-demo.default.example.com" http://alb-i5lagvip6fga******.cn-shenzhen.alb.aliyuncs.com

Expected output:

Hello Knative!

Step 4 (optional): Verify scheduled scaling

On the Monitoring Dashboards of Knative, you can view the trends of pod scaling for the Knative Service. For more information about the Knative dashboard, see View the Knative monitoring dashboard.

Note

If a Knative application scales to zero pods, Prometheus cannot collect metrics for the pods, such as the number of concurrent requests and requests per second. These metrics appear on the console only after the Knative application pods are accessed.
If a Knative application has not scaled to zero pods, the console displays metrics for the pods, such as the number of concurrent requests and requests per second, even if the pods are not accessed.

References

For information about configuring auto scaling based on concurrent pod requests and RPS, see Enable auto scaling to withstand traffic fluctuations.