Limits and rules of instance scaling

Updated at: 2025-02-27 02:28

There are two usage modes of instances in Function Compute: on-demand mode and provisioned mode. In both modes, you can configure auto scaling rules based on limits related to the number of instances and their scaling speed. For provisioned instances, you can configure scheduled scaling and threshold-based scaling rules.

Instance scaling behavior

Function Compute preferentially uses existing instances to process requests. When the existing instances are at full capacity, Function Compute creates new ones to process requests. As the number of requests increases, Function Compute continues to create new instances until enough instances are created to handle incoming requests or the number of instances reaches the upper limit. The scaling speed of instances is limited by both the maximum number of burstable instances allowed and the maximum rate at which instances can grow. For more information about the limits in different regions, see Limits on the scaling speed of instances in different regions.

This section describes the scaling behaviors of on-demand and provisioned instances. Configuring provisioned instances for a function allows you to reserve a specific number of instances before function invocations, which helps mitigate cold starts.

Scaling of on-demand instances
Scaling of provisioned instances

If the instance number or scaling speed goes beyond the limit, Function Compute returns an HTTP 429 status code, indicating that a throttling error has occurred. The following figure shows how Function Compute applies throttling when invocations surge.

image
  • ①: Function Compute immediately creates instances to handle the surge in requests. Cold starts occur during this process. No throttling errors are reported because the number of burstable instances has not reached the upper limit.

  • ②: The increase in the number of instances is now limited by the instance growth rate, as the upper limit for burstable instances has been reached. Throttling errors are reported for some requests.

  • ③: The number of instances reaches the upper limit, resulting in throttling errors for some requests.

When the number of sudden invocations is too large, throttling errors become inevitable. In addition, the creation of new instances introduces cold starts. Both increase the request handling latency. To mitigate latency, you can reserve instances in advance in Function Compute. These reserved instances are called provisioned instances.

The following figure shows how Function Compute applies throttling when provisioned instances are configured and invocations surge in the same manner as in the on-demand mode case.

image
  • ①: All incoming requests are immediately processed, until the provisioned instances reach their full capacity. During this process, no cold starts occur, and no throttling errors are reported.

  • ②: The provisioned instances are now fully utilized. Function Compute starts to create on-demand instances to handle subsequent requests until the number of burstable instances reaches the upper limit. During this process, cold starts occur, but no throttling errors are reported.

Limits on the scaling speed of instances in different regions

Region

Maximum number of burstable instances

Maximum instance growth rate

Region

Maximum number of burstable instances

Maximum instance growth rate

China (Hangzhou), China (Shanghai), China (Beijing), China (Zhangjiakou), and China (Shenzhen)

300

300 per minute

Other regions

100

100 per minute

  • In the same region, the scaling speed limits do not distinguish between provisioned and on-demand instances.

  • GPU-accelerated instances have a slower scaling speed compared to CPU instances. Therefore, we recommend that you reserve GPU-accelerated instances in advance using provisioned mode.

Note

If you need faster scaling speeds, join the DingTalk group (group ID: 64970014484) for technical support.

Auto scaling of provisioned instances

In addition to setting a fixed number of provisioned instances, you can make flexible adjustments by configuring scheduled and threshold-based scaling policies. This helps improve instance utilization.

Important

If no scheduled or threshold-based scaling policies are configured for your provisioned instances, the number of provisioned instances will always be the value of the defaultTarget parameter.

  • If multiple scheduled scaling policies are configured, the number of provisioned instances at any given time is determined by the target instance number specified by the active policy.

  • If both scheduled and threshold-based scaling policies are configured, the number of provisioned instances at any given time is determined by the highest target instance number among the active policies.

For more information, see Example.

Scheduled scaling
Threshold-based scaling

Scenarios

Choose scheduled scaling when your function experiences distinct periodic patterns or predictable traffic peaks. When the number of concurrent invocations exceeds the capacity defined by the scheduled scaling policy, all excess requests will be directed to on-demand instances for processing.

Sample configuration

The following figure shows two scheduled actions for instance scaling: the first action scales out the provisioned instances before the traffic peak, while the second scales in the instances afterward.

image

The following code snippet shows how to call the PutProvisionConfig operation to configure scheduled scaling policies. In this example, a function named function_1 is configured to automatically scale in and out, with the time zone set to Asia/Shanghai (UTC+8). The configurations take effect from 10:00:00 on August 1, 2024, to 10:00:00 on August 30, 2024 (UTC+8). During this period, the number of provisioned instances is increased to 50 at 20:00 (UTC+8) and reduced to 10 at 22:00 (UTC+8) each day.

"scheduledActions": [
    {
      "name": "scale_up_action",
      "startTime": "2024-08-01T10:00:00",
      "endTime": "2024-08-30T10:00:00",
      "target": 50,
      "scheduleExpression": "cron(0 0 20 * * *)",
      "timeZone": "Asia/Shanghai"
    },
    {
      "name": "scale_down_action",
      "startTime": "2024-08-01T10:00:00",
      "endTime": "2024-08-30T10:00:00",
      "target": 10,
      "scheduleExpression": "cron(0 0 22 * * *)",
      "timeZone": "Asia/Shanghai"
    }
  ]

The following table describes the parameters in the code snippet.

Parameter

Description

name

The name of the scheduled scaling task.

startTime

The time when the scaling policy starts to take effect. The system defaults to using UTC if you do not specify a time zone.

endTime

The time when the scaling policy expires. The system defaults to using UTC if you do not specify a time zone.

target

The target number of provisioned instances.

scheduleExpression

The schedule information. The system defaults to using UTC if you do not specify a time zone.

The following formats are supported:

  • At expressions - "at(yyyy-mm-ddThh:mm:ss)": runs the scheduled task only once.

    For example, if you want to run the scheduled task at 20:00 on April 1, 2024 (UTC+8), set the time zone to Asia/Shanghai and configure this parameter to at(2024-04-01T20:00:00).

  • Cron expressions - "cron(0 0 4 * * *)": runs the scheduled task for multiple times. Set the value in the standard crontab format.

    For example, if you want to run the scheduled task at 20:00 (UTC+8) every day, set the time zone to Asia/Shanghai and configure this parameter to cron(0 0 20 * * *).

timeZone

The specified time zone.

Cron expressions

The following table describes the fields of a cron expression in the format of Seconds Minutes Hours Day-of-month Month Day-of-week.

Field

Valid values

Allowed special characters

Seconds

0 to 59

None

Minutes

0 to 59

, - * /

Hours

0 to 23

, - * /

Day-of-month

1 to 31

, - * ? /

Month

1 to 12 or JAN to DEC

, - * /

Day-of-week

1 to 7 or MON to SUN

, - * ?

The following table describes the special characters in a cron expression.

Character

Description

Example

*

Indicates any or each.

In the Minutes field, 0 indicates that the task is run at the start of every minute.

,

Specifies a list of values.

In the Day-of-week field, MON, WED, FRI indicates every Monday, Wednesday, and Friday.

-

Specifies a range.

In the Hours field, 10-12 indicates a time range from 10:00 to 12:00 in your specified time zone.

?

Indicates an uncertain value.

This character is used together with specified values. For example, when you specify a date without tying it to a particular day of the week, you can use this character in the Day-of-week field.

/

Specifies increments. n/m indicates an increment of m starting from the position of n.

In the Minutes field, 3/5 indicates that the task is run every 5 minutes starting from the third minute.

Scenarios

After you configure a threshold-based scaling policy, Function Compute periodically collects the concurrency or resource utilization metrics for the provisioned instances. It uses these metrics, along with the minimum and maximum numbers of provisioned instances you specify, to control instance scaling, ensuring the number of instances aligns more closely with actual resource usage.

Sample configuration

The following figure shows an example of auto scaling based on the utilization of instance concurrency. When the traffic volume increases, the scale-out threshold is triggered and Function Compute starts to increase the number of provisioned instances. The scale-out stops when the number reaches the upper limit you set. Excess requests are sent to on-demand instances for processing.

image
Note
  • To configure a threshold-based scaling policy, you must first enable the collection of instance-level metrics. Otherwise, a 400 InstanceMetricsRequired error will be reported. For more information, see Enable collection of instance-level metrics.

  • The concurrency utilization metric includes only the concurrency of provisioned instances, excluding that of on-demand instances.

  • The concurrency utilization metric evaluates the ratio of concurrent requests handled by provisioned instances to the maximum number of concurrent requests that all provisioned instances can handle. The value of the metric can range from 0 to 1.

The following code snippet shows how to call the PutProvisionConfig operation to configure threshold-based scaling policies. In this example, a function named function_1 is configured to automatically scale in and out based on the ProvisionedConcurrencyUtilization metric, which tracks the concurrency utilization of provisioned instances. The time zone is set to Asia/Shanghai (UTC+8). The configurations take effect from 10:00:00 on August 1, 2024, to 10:00:00 on August 30, 2024 (UTC+8). During this period, when concurrency utilization exceeds 60%, the number of provisioned instances is increased, up to a maximum of 100. Conversely, when concurrency utilization falls below 60%, the number of provisioned instances is reduced, down to a minimum of 10.

"targetTrackingPolicies": [
    {
      "name": "action_1",
      "startTime": "2024-08-01T10:00:00",
      "endTime": "2024-08-30T10:00:00",
      "metricType": "ProvisionedConcurrencyUtilization",
      "metricTarget": 0.6,
      "minCapacity": 10,
      "maxCapacity": 100,
      "timeZone": "Asia/Shanghai"
    }
  ]

The following table describes the parameters in the code snippet.

Parameter

Description

name

The name of the threshold-based scaling task.

startTime

The time when the scaling policy starts to take effect. The system defaults to using UTC if you do not specify a time zone.

endTime

The time when the scaling policy expires. The system defaults to using UTC if you do not specify a time zone.

metricType

The metric that is tracked. In this example, the value is set to ProvisionedConcurrencyUtilization.

metricTarget

The threshold that triggers auto scaling.

minCapacity

The minimum number of provisioned instances allowed.

maxCapacity

The maximum number of provisioned instances allowed.

timeZone

The specified time zone.

Scaling principles

When instance scale-in is triggered, Function Compute gradually reduces the number of provisioned instances based on a scale-in coefficient that ranges from 0 (excluded) to 1. The scale-in coefficient is a system parameter used to slow down the scale-in speed. It does not require manual configuration. The target values for scaling tasks are the smallest integers that are greater than or equal to the following calculation results:

  • Scale-out target value = Current provisioned instances × (Current metric value/Specified utilization threshold)

  • Scale-in target value = Current provisioned instances × Scale-in coefficient × (1 - Current metric value/Specified utilization threshold)

The following example demonstrates how to calculate the scale-out target. Similarly, the scale-in target can be determined using the previously mentioned principle and formula.

If the current metric value is 80%, the specified utilization threshold is 40%, and the current number of provisioned instances is 100, then the target number of instances is calculated as follows: 100 × (80%/40%) = 200. The number of provisioned instances is increased to 200 (as long as this does not exceed the maximum allowed) to ensure that utilization stays around 40%.

Example

The following example clarifies how the target values specified by the defaultTarget parameter and scheduled scaling policies determine the number of provisioned instances at a specific time. In this example, the defaultTarget parameter is set to 5, and two scheduled scaling policies are configured, using the Asia/Shanghai time zone (UTC+8). The configurations take effect from 10:00:00 on January 9, 2025, to 00:00:00 on January 11, 2025 (UTC+8). During this period, the number of provisioned instances is increased to 20 at 10:00 (UTC+8) and reduced to 10 at 22:00 (UTC+8) each day. The following code snippet shows the content of the scaling policies:

{
    "defaultTarget": 5,
    "scheduledActions": [
        {
            "name": "scale_up_action",
            "startTime": "2025-01-09T10:00:00",
            "endTime": "2025-01-11T00:00:00",
            "target": 20,
            "scheduleExpression": "cron(0 0 10 * * *)",
            "timeZone": "Asia/Shanghai"
        },
        {
            "name": "scale_down_action",
            "startTime": "2025-01-09T10:00:00",
            "endTime": "2025-01-11T00:00:00",
            "target": 10,
            "scheduleExpression": "cron(0 0 22 * * *)",
            "timeZone": "Asia/Shanghai"
        }
    ]
}

The following figure illustrates the changes in the number of provisioned instances over time:

image

Maximum concurrency

The maximum number of concurrent requests that all provisioned instances can handle, or maximum concurrency, is determined by the instance concurrency setting.

  • Each instance processes a single request at a time

    Maximum concurrency = Number of instances

  • Each instance processes multiple requests at a time

    Maximum concurrency = Number of instances × Number of requests concurrently processed by an instance

For more information about the scenarios, benefits, configurations, and impacts of the instance concurrency feature, see Configure instance concurrency.

References

  • For more information about the basic concepts and billing methods of on-demand and provisioned modes, see Instance types and usage modes.

  • For more information about how to improve resource utilization of provisioned instances, see Configure provisioned instances.

  • By default, all functions within an Alibaba Cloud account in the same region share the same scaling limits. For more information about how to limit the number of instances for a specified function, see Specify the maximum number of instances. When the number of running instances exceeds the specified maximum, Function Compute returns a throttling error.

  • On this page (1, M)
  • Instance scaling behavior
  • Limits on the scaling speed of instances in different regions
  • Auto scaling of provisioned instances
  • Example
  • Maximum concurrency
  • References
Feedback
phone Contact Us

Chat now with Alibaba Cloud Customer Service to assist you in finding the right products and services to meet your needs.

alicare alicarealicarealicare