Auto-scaling of reserved instances - - Alibaba Cloud Documentation Center

The reserved mode allows you to reserve instances in response to function invocation requests. This reduces the occurrences of cold starts and improves the response speed for latency-sensitive online services. You can perform scheduled auto-scaling or metric tracing auto-scaling to make better use of reserved instances.

Scheduled auto-scaling

Definition: Scheduled auto-scaling is used to flexibly configure reserved instances. You can configure the number of reserved instances to be automatically adjusted to a specified value at a specified time so that the number of instances can meet the concurrency of your business.
Scenario: You can use scheduled auto-scaling to reserve instances in advance of periodic or predicted traffic peaks for functions. When the number of instances invoked concurrently by a function is higher than the reserved instance concurrency, the excess instances are charged on a pay-as-you-go basis.
Example of configuration: Two scheduled operations are configured. The first scheduled operation scales out the reserved instances before the traffic peak, and the second scheduled operation scales in the reserved instances after the traffic peak.

Example of parameter settings:

In this example, a function named function_1 in a service named service_1 is configured to automatically scale in and out. Set the scaling period for function_1 to the period from 10:00:00 on November 1, 2020 to 10:00:00 on November 30, 2020 (UTC+8). The number of reserved instances is scaled out to 50 at 20:00 every day and scale in to 10 at 22:00 every day.

{
  "ServiceName": "service_1",
  "FunctionName": "function_1",
  "Qualifier": "alias_1",
  "SchedulerActions": [
    {
      "Name": "action_1",
      "StartTime": "2020-11-01T10:00:00Z",
      "EndTime": "2020-11-30T10:00:00Z",
      "TargetValue": 50,
      "ScheduleExpression": "cron(0 0 20 * * *)"
    },
    {
      "Name": "action_2",
      "StartTime": "2020-11-01T10:00:00Z",
      "EndTime": "2020-11-30T10:00:00Z",
      "TargetValue": 10,
      "ScheduleExpression": "cron(0 0 22 * * *)"
    }
  ]
}

Parameters


Parameter	Description	Schema
Name	The name of the scheduled auto-scaling task.	string
StartTime	The time when the configuration starts to take effect. Specify the value in UTC.	string
EndTime	The time when the configuration expires. Specify the value in UTC.	string
TargetValue	The number of instances to be reached.	integer (int64)
ScheduleExpression	The scheduled expression that specifies when to run the scheduled task. The following formats are supported: At expressions - "at(yyyy-mm-ddThh:mm:ss)": runs the scheduled task only once. Specify the value in UTC. Cron expressions - "cron(0 0 20 * * )": runs the scheduled task multiple times. Specify the value in the standard CRON format. For example, cron(0 0 20 * *) indicates that the scheduled task is run at 20:00 every day.	string

The following tables describe the fields and special characters of the CRON expression (Seconds Minutes Hours Day-of-month Month Day-of-week).

Table 1. Fields
Field name	Valid value	Allowed special characters
Seconds	0 to 59	None
Minutes	0 to 59	, - * /
Hours	0 to 23	, - * /
Day-of-month	1 to 31	, - * ? /
Month	1 to 12 or JAN to DEC	, - * /
Day-of-week	1 to 7 or MON to SUN	, - * ?

Table 2. Special characters
Character	Description	Example
*	Indicates any or each.	In the `Minutes` field, 0 indicates that the task is run at the 0th second of every minute.
,	Indicates the list value.	In the `Day-of-week` field, MON, WED, and FRI indicate Monday, Wednesday, and Friday.
-	Indicates a range.	In the `Hours` field, 10-12 indicates that the time range is from 10:00 to 12:00 in UTC.
?	Indicates an uncertain value.	This value is used with other specified values. For example, if you specify a specific date, but you do not care what day of the week it is, you can use this special character in the `Day-of-week` field.
/	Indicates the increment of a value. For example, n/m means to add an increment m to n each time.	In the `minute` field, 3/5 indicates that the operation is performed every 5 minutes starting from the minute 3.

Metric tracking auto-scaling

Definition: Metric tracking auto-scaling tracks the metrics to dynamically scale reserved instances.
Scenario: Function Compute periodically collects the concurrency usage rate of reserved instances, and uses this metric together with the scale-out and scale-in trigger values you configured to control the scaling of reserved instances. In this way, the number of reserved instances can be scaled based on your business needs.
Principle: Reserved instances are scaled in or out every minute based on the metric.
- When the metric exceeds the scale-out threshold, the system scales out the number of instances to the destination value as soon as possible.
- When the metric is lower than the scale-in threshold, the system slightly scales in the number of instances to the destination value.
If the maximum and minimum numbers of reserved instances are configured, the system scales the number of reserved instances between the maximum and minimum numbers. If the number of instances reaches the value range, scaling stops.
Example of configuration:
- When the traffic increases and the number of required instances reaches 80% of the scale-out threshold, the number of reserved instances starts to scale out till it reaches the scale-out threshold. Requests that cannot be processed by reserved instances are sent to pay-as-you-go instances.
- When the traffic decreases and the number of required instances reaches 60% of the scale-in threshold, the number of reserved instances starts to scale in.

Statistics only on reserved instances are collected to calculate the concurrency usage rate of reserved instances. The statistics on the pay-as-you-go instances are not included. The metric is calculated based on the following formula: Number of concurrent requests to which reserved instances are responding/Maximum number of concurrent requests to which all reserved instances can respond. The metric value ranges from 0 to 1. The maximum number of concurrent requests to which reserved instances can respond is calculated based on different Function operations:

Single request processed by one instance: Maximum concurrency = Number of instances.
Multiple requests processed by one instance: Maximum concurrency = Number of instances × Number of requests concurrently processed by one instance.

Scale-in and scale-out values:

The values are determined by the current metric value, scaling threshold, number of reserved instances, and scaling factor.
Calculation principle: The system scales in based on a scale-in factor. Value range: (0,1]. You do not need to set this factor. The scale-in and scale-out values are rounded-up integers of the following calculation results:
- Scale-out value = (Current metric value/Scale-out threshold) × Number of reserved instances.
- Scale-in ratio = (1 - Current metric value/Scale-out threshold) × Scale-in factor.
- Scale-in value = Current number of instances × (1 - Scale-in ratio).
Example: The current metric value is 90%, the scale-out threshold is 80%, and the number of reserved instances is 100. The scale out value = (90%/80%) × 100 = 112.5 (rounded up to 113). The number of reserved instances is increased to 113.

Example of parameter settings:

In this example, a function named function_1 in a service named service_1 is configured to automatically scales in and out based on the ProvisionedConcurrencyUtilization metric. Set the scaling period from 10:00:00 on November 1, 2020 to 10:00:00 on November 30, 2020. When the concurrency usage rate exceeds 60%, the number of reserved instances can scale out to 100. When the concurrency usage rate is lower than 60%, the number of reserved instances can scale in to 10.

{
  "ServiceName": "service_1",
  "FunctionName": "function_1",
  "Qualifier": "alias_1",
  "TargetTrackingPolicies": [
    {
      "Name": "action_1",
      "StartTime": "2020-11-01T10:00:00Z",
      "EndTime": "2020-11-30T10:00:00Z",
      "MetricType": "ProvisionedConcurrencyUtilization",
      "MetricTarget": 0.6,
      "MinCapacity": 10,
      "MaxCapacity": 100,
    }
  ]
}

Parameters


Parameter	Description	Schema
Name	The name of the scheduled auto-scaling task.	string
StartTime	The time when the configuration starts to take effect. Specify the value in UTC.	string
EndTime	The time when the configuration expires. Specify the value in UTC.	string
MetricType	The tracked metric: ProvisionedConcurrencyUtilization.	string
MetricTarget	The tracked value of the metric.	double
MinCapacity	The minimum scale-in value.	integer (int64)
MaxCapacity	The maximum scale-out value.	integer (int64)