Limits and rules of instance scaling

There are two usage modes of instances in Function Compute: on-demand mode and provisioned mode. In both modes, you can configure auto scaling rules based on limits related to the number of instances and their scaling speed. For provisioned instances, you can configure scheduled scaling and threshold-based scaling rules.

Instance scaling behavior

Function Compute preferentially uses existing instances to process requests. When the existing instances are at full capacity, Function Compute creates new ones to process requests. As the number of requests increases, Function Compute continues to create new instances until enough instances are created to handle incoming requests or the number of instances reaches the upper limit. The scaling speed of instances is limited by both the maximum number of burstable instances allowed and the maximum rate at which instances can grow. For more information about the limits in different regions, see Limits on the scaling speed of instances in different regions.

This section describes the scaling behaviors of on-demand and provisioned instances. Configuring provisioned instances for a function allows you to reserve a specific number of instances before function invocations, which helps mitigate cold starts.

Scaling of on-demand instances

Scaling of provisioned instances

If the instance number or scaling speed goes beyond the limit, Function Compute returns an HTTP 429 status code, indicating that a throttling error has occurred. The following figure shows how Function Compute applies throttling when invocations surge.

①: Function Compute immediately creates instances to handle the surge in requests. Cold starts occur during this process. No throttling errors are reported because the number of burstable instances has not reached the upper limit.
②: The increase in the number of instances is now limited by the instance growth rate, as the upper limit for burstable instances has been reached. Throttling errors are reported for some requests.
③: The number of instances reaches the upper limit, resulting in throttling errors for some requests.

When the number of sudden invocations is too large, throttling errors become inevitable. In addition, the creation of new instances introduces cold starts. Both increase the request handling latency. To mitigate latency, you can reserve instances in advance in Function Compute. These reserved instances are called provisioned instances.

The following figure shows how Function Compute applies throttling when provisioned instances are configured and invocations surge in the same manner as in the on-demand mode case.

①: All incoming requests are immediately processed, until the provisioned instances reach their full capacity. During this process, no cold starts occur, and no throttling errors are reported.
②: The provisioned instances are now fully utilized. Function Compute starts to create on-demand instances to handle subsequent requests until the number of burstable instances reaches the upper limit. During this process, cold starts occur, but no throttling errors are reported.

Limits on the scaling speed of instances in different regions

Region	Maximum number of burstable instances	Maximum instance growth rate

Region	Maximum number of burstable instances	Maximum instance growth rate
China (Hangzhou), China (Shanghai), China (Beijing), China (Zhangjiakou), and China (Shenzhen)	300	300 per minute
Other regions	100	100 per minute

In the same region, the scaling speed limits do not distinguish between provisioned and on-demand instances.
GPU-accelerated instances have a slower scaling speed compared to CPU instances. Therefore, we recommend that you reserve GPU-accelerated instances in advance using provisioned mode.

Note

If you need faster scaling speeds, join the DingTalk group (group ID: 64970014484) for technical support.

Auto scaling of provisioned instances

In addition to setting a fixed number of provisioned instances, you can make flexible adjustments by configuring scheduled and threshold-based scaling policies. This helps improve instance utilization.

Important

If no scheduled or threshold-based scaling policies are configured for your provisioned instances, the number of provisioned instances will always be the value of the defaultTarget parameter.

If multiple scheduled scaling policies are configured, the number of provisioned instances at any given time is determined by the target instance number specified by the active policy.
If both scheduled and threshold-based scaling policies are configured, the number of provisioned instances at any given time is determined by the highest target instance number among the active policies.

For more information, see Example.

Scheduled scaling

Threshold-based scaling

Scenarios

Choose scheduled scaling when your function experiences distinct periodic patterns or predictable traffic peaks. When the number of concurrent invocations exceeds the capacity defined by the scheduled scaling policy, all excess requests will be directed to on-demand instances for processing.

Sample configuration

The following figure shows two scheduled actions for instance scaling: the first action scales out the provisioned instances before the traffic peak, while the second scales in the instances afterward.

The following code snippet shows how to call the PutProvisionConfig operation to configure scheduled scaling policies. In this example, a function named function_1 is configured to automatically scale in and out, with the time zone set to Asia/Shanghai (UTC+8). The configurations take effect from 10:00:00 on August 1, 2024, to 10:00:00 on August 30, 2024 (UTC+8). During this period, the number of provisioned instances is increased to 50 at 20:00 (UTC+8) and reduced to 10 at 22:00 (UTC+8) each day.

"scheduledActions": [
    {
      "name": "scale_up_action",
      "startTime": "2024-08-01T10:00:00",
      "endTime": "2024-08-30T10:00:00",
      "target": 50,
      "scheduleExpression": "cron(0 0 20 * * *)",
      "timeZone": "Asia/Shanghai"
    },
    {
      "name": "scale_down_action",
      "startTime": "2024-08-01T10:00:00",
      "endTime": "2024-08-30T10:00:00",
      "target": 10,
      "scheduleExpression": "cron(0 0 22 * * *)",
      "timeZone": "Asia/Shanghai"
    }
  ]

The following table describes the parameters in the code snippet.

Parameter	Description
name	The name of the scheduled scaling task.
startTime	The time when the scaling policy starts to take effect. The system defaults to using UTC if you do not specify a time zone.
endTime	The time when the scaling policy expires. The system defaults to using UTC if you do not specify a time zone.
target	The target number of provisioned instances.
scheduleExpression	The schedule information. The system defaults to using UTC if you do not specify a time zone. The following formats are supported: At expressions - "at(yyyy-mm-ddThh:mm:ss)": runs the scheduled task only once. For example, if you want to run the scheduled task at 20:00 on April 1, 2024 (UTC+8), set the time zone to Asia/Shanghai and configure this parameter to `at(2024-04-01T20:00:00)`. Cron expressions - "cron(0 0 4 * * )": runs the scheduled task for multiple times. Set the value in the standard crontab format. For example, if you want to run the scheduled task at 20:00 (UTC+8) every day, set the time zone to Asia/Shanghai and configure this parameter to `cron(0 0 20 * *)`.
timeZone	The specified time zone.

Cron expressions

The following table describes the fields of a cron expression in the format of Seconds Minutes Hours Day-of-month Month Day-of-week.

Field	Valid values	Allowed special characters
Seconds	0 to 59	None
Minutes	0 to 59	, - * /
Hours	0 to 23	, - * /
Day-of-month	1 to 31	, - * ? /
Month	1 to 12 or JAN to DEC	, - * /
Day-of-week	1 to 7 or MON to SUN	, - * ?

The following table describes the special characters in a cron expression.

Character	Description	Example
*	Indicates any or each.	In the `Minutes` field, 0 indicates that the task is run at the start of every minute.
,	Specifies a list of values.	In the `Day-of-week` field, MON, WED, FRI indicates every Monday, Wednesday, and Friday.
-	Specifies a range.	In the `Hours` field, 10-12 indicates a time range from 10:00 to 12:00 in your specified time zone.
?	Indicates an uncertain value.	This character is used together with specified values. For example, when you specify a date without tying it to a particular day of the week, you can use this character in the `Day-of-week` field.
/	Specifies increments. n/m indicates an increment of m starting from the position of n.	In the `Minutes` field, 3/5 indicates that the task is run every 5 minutes starting from the third minute.

Scenarios

After you configure a threshold-based scaling policy, Function Compute periodically collects the concurrency or resource utilization metrics for the provisioned instances. It uses these metrics, along with the minimum and maximum numbers of provisioned instances you specify, to control instance scaling, ensuring the number of instances aligns more closely with actual resource usage.

Sample configuration

The following figure shows an example of auto scaling based on the utilization of instance concurrency. When the traffic volume increases, the scale-out threshold is triggered and Function Compute starts to increase the number of provisioned instances. The scale-out stops when the number reaches the upper limit you set. Excess requests are sent to on-demand instances for processing.

Note

To configure a threshold-based scaling policy, you must first enable the collection of instance-level metrics. Otherwise, a 400 InstanceMetricsRequired error will be reported. For more information, see Enable collection of instance-level metrics.
The concurrency utilization metric includes only the concurrency of provisioned instances, excluding that of on-demand instances.
The concurrency utilization metric evaluates the ratio of concurrent requests handled by provisioned instances to the maximum number of concurrent requests that all provisioned instances can handle. The value of the metric can range from 0 to 1.

The following code snippet shows how to call the PutProvisionConfig operation to configure threshold-based scaling policies. In this example, a function named function_1 is configured to automatically scale in and out based on the ProvisionedConcurrencyUtilization metric, which tracks the concurrency utilization of provisioned instances. The time zone is set to Asia/Shanghai (UTC+8). The configurations take effect from 10:00:00 on August 1, 2024, to 10:00:00 on August 30, 2024 (UTC+8). During this period, when concurrency utilization exceeds 60%, the number of provisioned instances is increased, up to a maximum of 100. Conversely, when concurrency utilization falls below 60%, the number of provisioned instances is reduced, down to a minimum of 10.

"targetTrackingPolicies": [
    {
      "name": "action_1",
      "startTime": "2024-08-01T10:00:00",
      "endTime": "2024-08-30T10:00:00",
      "metricType": "ProvisionedConcurrencyUtilization",
      "metricTarget": 0.6,
      "minCapacity": 10,
      "maxCapacity": 100,
      "timeZone": "Asia/Shanghai"
    }
  ]

The following table describes the parameters in the code snippet.

Parameter	Description
name	The name of the threshold-based scaling task.
startTime	The time when the scaling policy starts to take effect. The system defaults to using UTC if you do not specify a time zone.
endTime	The time when the scaling policy expires. The system defaults to using UTC if you do not specify a time zone.
metricType	The metric that is tracked. In this example, the value is set to ProvisionedConcurrencyUtilization.
metricTarget	The threshold that triggers auto scaling.
minCapacity	The minimum number of provisioned instances allowed.
maxCapacity	The maximum number of provisioned instances allowed.
timeZone	The specified time zone.

Scaling principles

When instance scale-in is triggered, Function Compute gradually reduces the number of provisioned instances based on a scale-in coefficient that ranges from 0 (excluded) to 1. The scale-in coefficient is a system parameter used to slow down the scale-in speed. It does not require manual configuration. The target values for scaling tasks are the smallest integers that are greater than or equal to the following calculation results:

Scale-out target value = Current provisioned instances × (Current metric value/Specified utilization threshold)
Scale-in target value = Current provisioned instances × Scale-in coefficient × (1 - Current metric value/Specified utilization threshold)

The following example demonstrates how to calculate the scale-out target. Similarly, the scale-in target can be determined using the previously mentioned principle and formula.

If the current metric value is 80%, the specified utilization threshold is 40%, and the current number of provisioned instances is 100, then the target number of instances is calculated as follows: 100 × (80%/40%) = 200. The number of provisioned instances is increased to 200 (as long as this does not exceed the maximum allowed) to ensure that utilization stays around 40%.

Example

The following example clarifies how the target values specified by the defaultTarget parameter and scheduled scaling policies determine the number of provisioned instances at a specific time. In this example, the defaultTarget parameter is set to 5, and two scheduled scaling policies are configured, using the Asia/Shanghai time zone (UTC+8). The configurations take effect from 10:00:00 on January 9, 2025, to 00:00:00 on January 11, 2025 (UTC+8). During this period, the number of provisioned instances is increased to 20 at 10:00 (UTC+8) and reduced to 10 at 22:00 (UTC+8) each day. The following code snippet shows the content of the scaling policies:

{
    "defaultTarget": 5,
    "scheduledActions": [
        {
            "name": "scale_up_action",
            "startTime": "2025-01-09T10:00:00",
            "endTime": "2025-01-11T00:00:00",
            "target": 20,
            "scheduleExpression": "cron(0 0 10 * * *)",
            "timeZone": "Asia/Shanghai"
        },
        {
            "name": "scale_down_action",
            "startTime": "2025-01-09T10:00:00",
            "endTime": "2025-01-11T00:00:00",
            "target": 10,
            "scheduleExpression": "cron(0 0 22 * * *)",
            "timeZone": "Asia/Shanghai"
        }
    ]
}

The following figure illustrates the changes in the number of provisioned instances over time:

Maximum concurrency

The maximum number of concurrent requests that all provisioned instances can handle, or maximum concurrency, is determined by the instance concurrency setting.

Each instance processes a single request at a time
Maximum concurrency = Number of instances
Each instance processes multiple requests at a time
Maximum concurrency = Number of instances × Number of requests concurrently processed by an instance

For more information about the scenarios, benefits, configurations, and impacts of the instance concurrency feature, see Configure instance concurrency.

References

For more information about the basic concepts and billing methods of on-demand and provisioned modes, see Instance types and usage modes.
For more information about how to improve resource utilization of provisioned instances, see Configure provisioned instances.
By default, all functions within an Alibaba Cloud account in the same region share the same scaling limits. For more information about how to limit the number of instances for a specified function, see Specify the maximum number of instances. When the number of running instances exceeds the specified maximum, Function Compute returns a throttling error.

Instance scaling behavior

Limits on the scaling speed of instances in different regions

Auto scaling of provisioned instances

Scenarios

Sample configuration

Cron expressions

Scenarios

Sample configuration

Scaling principles

Example

Maximum concurrency

References

Sales Support

Technical Support

Connect & Report Abuse

About Alibaba Cloud

Our Global Network

Quick Start

Global Offices

Olympic Games Paris 2024 New

Stade Roland Garros – Glitz from the Past New

Place de la Concorde – “Breaking” the Barriers New

Vaires-sur-Marne Nautical Stadium – Sports with Sustainability New

International Broadcast Center – Images, Sounds, and Data that Captivate Billions New

Customer Success Stories New

Trust Center

Security & Compliance Center

Cloud Compliance Resources

Security Compliance FAQs

Product & Feature Update New

Cloud Forward

Press Room

Alibaba Cloud e-Magazine New

Alibaba Cloud in Analyst Research

Notice

Go Global Service New

Go Global Alliance with Alibaba Cloud

Asia Accelerator Hot

Information Compliance

China Gateway - MLPS 2.0 Compliance New

China Gateway - Networking

China Gateway - Global Application Acceleration New

China Gateway - Security

China Gateway - Data Security New

ICP Support Hot

China Gateway - Omnichannel Data Mid-End New

China Gateway - Organizational Data Mid-End New

China Gateway - Business Mid-End New

China Gateway - AI Service for Conversational Chatbots New

China Gateway - Online Education

China Gateway - Domain Registration

Work at Alibaba Cloud

Experienced Professionals

Students and Graduates

Free Trial

Pricing

Promo Center

Price Reduction

Pay Less and Deploy More

FinOps

Elastic Compute Service (ECS)

Simple Application Server (SAS)

Elastic GPU Service

Elastic Desktop Service (EDS)

Object Storage Service (OSS)

Cloud Enterprise Network (CEN)

Web Application Firewall (WAF)

Domain Names

Container Compute Service (ACS)

Secure Access Service Edge (SASE)

Intelligent Media Services(IMS)

Edge Security Acceleration (ESA)(Original DCDN)

Intelligent Media Management

DingTalk Enterprise

YiDA

Alibaba Cloud Model Studio

Apsara Prime - For Easy Cloud Product Selection

Alibaba Cloud ECS - Cater All Your Cloud Hosting Needs

1TB CDN—Get Free 1 TB Outbound Traffic Plan Now

Security—Under Attack? Get Free Security Support

Short Message Service - Free Testing is Available

Elastic Compute Service (ECS) Hot

CloudBox

Compute Nest

Dedicated Host Hot

ECS Bare Metal Instance

Elastic GPU Service Featured

Simple Application Server (SAS) Hot

Auto Scaling

Cloud Phone Beta

Elastic Desktop Service (EDS) Featured

Batch Compute

Elastic High Performance Computing (E-HPC)

Super Computing Cluster (SCC)

Function Compute (FC)