All Products
Search
Document Center

Function Compute:Limits and rules of auto instance scaling

Last Updated:Sep 03, 2024

You can configure auto scaling rules based on the total number of on-demand instances and provisioned instances and limits on instance scaling speed. For provisioned instances, you can create a scheduled scaling policy and water-level scaling policy to improve resource utilization.

Instance scaling behavior

Function Compute preferentially uses existing instances to process requests. If the existing instances are fully loaded, Function Compute creates new instances to process requests. As the number of invocations increases, Function Compute continues to create new instances until enough instances are created to handle requests or the upper limit is reached. The scale-out of instances is subject to the limitations of the scaling speed. For more information, see Limits on the scale-out speed of instances in different regions.

This section describes instance scaling behavior of on-demand and provisioned instances. After you configure provisioned instances for a function, a specific number of instances are reserved prior to function invocations so that execution of requests are not delayed by cold starts.

Scaling of on-demand instances

When the total number of instances or the scaling-out speed of instances reaches the limit, Function Compute reports a throttling error, for which the HTTP status code is 429. The following figure shows how Function Compute performs throttling in a scenario in which the number of invocations rapidly increases.

image

  • ①: Before the upper limit for burst instances is reached, Function Compute immediately creates instances when the number of requests increases. During this process, a cold start occurs but no throttling error is reported.

  • ②: When the limit for burst instances is reached, the increase of instances is restricted by the growth rate. Throttling errors are reported for some requests.

  • ③: When the upper limit of instances is reached, some requests are throttled.

Scaling of provisioned instances

If the number of sudden invocations is large, the creation of a large number of instances is throttled, which results in request failures. Cold starts of instances also increase request latency. To prevent such issues, you can use provisioned instances in Function Compute. Provisioned instances are reserved in advance of invocations.

The following figure shows the throttling behavior of provisioned instances in a scenario that has the same amount of traffic as the preceding figure.

image

  • ①: Before the provisioned instances are fully loaded, requests are immediately processed. During this process, no cold start occurs and no throttling error is reported.

  • ②: When the provisioned instances are fully loaded, Function Compute immediately creates instances before the upper limit for burst instances is reached. During this process, a cold start occurs but no throttling error is reported.

Limits on the scale-out speed of instances in different regions

Region

Upper limit of burst instances

Upper limit of instance growth rate

China (Hangzhou), China (Shanghai), China (Beijing), China (Zhangjiakou), and China (Shenzhen)

300

300 per minute

Other regions

100

100 per minute

  • In the same region, the limits on instance scale-out speed for provisioned instances and on-demand instances are the same.

  • The scaling speed of GPU-accelerated instances is lower than that of elastic instances. We recommend that you use GPU-accelerated instances together with the provisioned mode.

Note

If you need higher scaling speed, join the DingTalk group 64970014484 for technical support.

Auto scaling of provisioned instances

A fixed number of provisioned instances may lead to insufficient utilization of resources. You can configure scheduled scaling and water-level scaling to resolve this issue.

Note

If you configure a scheduled scaling policy and a water-lever scaling policy, the maximum value specified by these scaling policies is used as the number of provisioned instances.

Scheduled scaling

Scenarios

Scheduled instance scaling applies to functions that have noticeable cyclical patterns or predictable traffic peaks. If the number of concurrent invocations is greater than the concurrency capacity of the scheduled scaling policy, excess requests are sent to on-demand instances for processing.

Sample configuration

You can configure two scheduled scaling policies. The first one increases the number of provisioned instances before traffic surges. The second policy decreases the number of provisioned instances when the traffic declines. The following figure shows the details.

image

The following code snippet provides an example of the request parameters of PutProvisionConfig that is called to create a scheduled scaling policy. In this example, a scheduled scaling policy is configured for function_1. The time zone is set to Asia/Shanghai (UTC+8). The policy takes effect from 10:00:00 on August 1, 2024 to 10:00:00 on August 30, 2024. The policy increases the number of instances to 50 at 20:00 every day and reduces the number of instances to 10 at 22:00 every day.

"scheduledActions": [
    {
      "name": "scale_up_action",
      "startTime": "2024-08-01T10:00:00",
      "endTime": "2024-08-30T10:00:00",
      "target": 50,
      "scheduleExpression": "cron(0 0 20 * * *)",
      "timeZone": "Asia/Shanghai"
    },
    {
      "name": "scale_down_action",
      "startTime": "2024-08-01T10:00:00",
      "endTime": "2024-08-30T10:00:00",
      "target": 10,
      "scheduleExpression": "cron(0 0 22 * * *)",
      "timeZone": "Asia/Shanghai"
    }
  ]

The following table describes the parameters.

Parameter

Description

name

The name of the scheduled scaling task.

startTime

The time when the configurations start to take effect. By default, if no time zone is specified, the UTC time is used.

endTime

The time when the configurations stop to take effect. By default, if no time zone is specified, the UTC time is used.

target

The number of target provisioned instances.

scheduleExpression

The schedule information. If no time zone is specified, the UTC time is automatically used.

The following formats are supported:

  • At expressions - "at(yyyy-mm-ddThh:mm:ss)": runs the scheduled task only once.

    For example, if you want to start scheduling at 20:00 on April 1, 2024 (UTC+8), set the time zone to Asia/Shanghai and set the value to at(2024-04-01T20:00:00).

  • Cron expressions - "cron(0 0 4 * * *)": runs the scheduled task for multiple times. Use crontab expressions.

    For example, if you want to run the scheduled task at 20:00 (UTC+8) every day. Set the time zone to Asia/Shanghai and set the value of this parameter to cron(0 0 20 * * *).

timeZone

The specified time zone.

Cron expressions

The following table describes fields in cron(Seconds Minutes Hours Day-of-month Month Day-of-week).

Parameter

Value range

Allowed special characters

Seconds

0 to 59

N/A

Minutes

0 to 59

, - * /

Hours

0 to 23

, - * /

Day-of-month

1 to 31

, - * ? /

Month

1 to 12 or JAN to DEC

, - * /

Day-of-week

1 to 7 or MON to SUN

, - * ?

The following table describes the special characters in a cron expression.

Character

Description

Example

*

Indicates any or each.

In the Minutes field, 0 specifies that the task is run at the 0th second of every minute.

,

Specifies a value list.

In the Day-of-week field, MON, WED, FRI specifies every Monday, Wednesday, and Friday.

-

Specifies a range.

In the Hours field, 10-12 specifies a time range from 10:00 to 12:00 in UTC.

?

Specifies an uncertain value.

This special character is used with other specified values. For example, if you specify a date, but you do not require the specified date to be a specific day of the week, you can use this character in the Day-of-week field.

/

Specifies increments. n/m indicates an increment of m starting from the position of n.

In the minute field, 3/5 indicates that the operation is performed every 5 minutes starting from the third minute.

Water-level scaling

Scenarios

Function Compute periodically collects values of metrics, such as the concurrency utilization of provisioned instances or resource utilization of provisioned instances. Provisioned instances are scaled based on values of the metrics and the minimum and maximum numbers of provisioned instances you specify.

Sample configuration

Assume you configure a water-level scaling policy in which you specify concurrency utilization threshold of provisioned instances. When the traffic volume increases, the scale-out threshold is triggered and the system starts to scale out provisioned instances. The scale-out stops if the specified maximum value is reached. Excess requests are sent to on-demand instances. When the traffic volume decreases, the scale-in threshold is triggered and the system starts to scale in provisioned instances. The following figure shows the details.

image

Note
  • If you use reserved scaling, you must enable the instance-level metrics feature. Otherwise, the 400 InstanceMetricsRequired error is reported. For more information about how to enable instance-level metrics, see Configure instance-level metrics.

  • The concurrency utilization metric only includes concurrency of provisioned instances and does not include concurrency of on-demand instances.

  • The concurrency utilization of provisioned instances is the ratio of the concurrency requests to which provisioned instances are responding to the maximum concurrency value. The value range is [0,1].

The following code snippet provides an example of the request parameters of PutProvisionConfig that is called to create a water-level scaling policy. In this example, a water-level scaling policy is configured for function_1. The time zone is set to Asia/Shanghai (UTC+8). The policy takes effect from 10:00:00 on August 1, 2024 to 10:00:00 on August 30, 2024. The policy tracks ProvisionedConcurrencyUtilization and starts to perform scale-out when the concurrency utilization exceeds 60% and scale-in when the concurrency utilization is lower than 60%. The upper limit is 100 and the lower limit is 10.

"targetTrackingPolicies": [
    {
      "name": "action_1",
      "startTime": "2024-08-01T10:00:00",
      "endTime": "2024-08-30T10:00:00",
      "metricType": "ProvisionedConcurrencyUtilization",
      "metricTarget": 0.6,
      "minCapacity": 10,
      "maxCapacity": 100,
      "timeZone": "Asia/Shanghai"
    }
  ]

The following table describes the parameters.

Parameter

Description

name

The name of the metric-based scaling task.

startTime

The time when the water-level scaling configurations start to take effect. By default, if no time zone is specified, the UTC time is used.

endTime

The time when the water-level scaling configurations stop to take effect. By default, if no time zone is specified, the UTC time is used.

metricType

The name of the tracked metric. In this example, the ProvisionedConcurrencyUtilization metric is tracked.

metricTarget

The threshold value for metric-based scaling.

minCapacity

The maximum number of provisioned instances for scale-out.

maxCapacity

The minimum number of provisioned instances for scale-in.

timeZone

The time zone.

Scaling principles

A relatively conservative scale-in process is achieved by using a scale-in coefficient, whose value falls in the range of (0,1]. The scale-in coefficient is a system parameter that is used to slow down the scale-in speed. You do not need to set the scale-in coefficient. The target values for scaling operations are the smallest integers that are greater than or equal to the following calculation results:

  • Scale-out target = Number of current provisioned instances × (Current metric value/Specified utilization threshold)

  • Scale-in target = Number of current provisioned instances × Scaling coefficient × (1 – Current metric value/Specified utilization threshold)

Example:

If the current metric value is 80%, the specified utilization threshold is 40%, and the number of current provisioned instances is 100, the target number of instances is: 100 × (80%/40%) = 200. The number of provisioned instances is increased to 200, if the specified maximum number of provisioned instances is equal to or greater than 200. This ensures that the utilization threshold remains close to 40%.

Maximum concurrency

The following items describe how to calculate the maximum number of concurrent invocations for different instance concurrency values:

  • A single instance processes a single request at a time

    Maximum concurrency = Number of instances.

  • A single instance concurrently processes multiple requests at a time

    Maximum concurrency = Number of instances × Instance concurrency

For more information about the scenarios, benefits, configurations, and impacts of the instance concurrency feature, see Configure instance concurrency.

More information

  • For more information about the basic concepts and billing methods of on-demand instances and provisioned instances, see Instance types and usage modes.

  • For more information about how to improve resource utilization of provisioned instances, see Configure provisioned instances.

  • By default, all functions within an Alibaba Cloud account in the same region share the preceding scaling limits. To limit the number of instances for a function, you can specify an upper limit for concurrent instances. For more information, see Specify the maximum number of concurrent instances. After the maximum number of on-demand instances is specified, Function Compute returns a throttling error if the total number of running instances for the function reaches the specified limit.