Configure auto scaling rules for provisioned instances to maximize resource utilization - Function Compute

Provisioned instances help reduce request latencies caused by cold starts during peak hours. You can configure a scheduled or threshold-based scaling policy for provisioned instances to improve resource utilization.

Limits

The following table shows the limits on the scaling speed of provisioned instances in different regions.

Region	Upper limit of burst instances	Upper limit of instance growth rate
China (Hangzhou), China (Shanghai), China (Beijing), China (Zhangjiakou), and China (Shenzhen)	300	300 per minute
Other regions	100	100 per minute

Note

If you need higher scaling speed, join the DingTalk group 64970014484 for technical support.

Configure provisioned instances

Important

You can configure provisioned instances to mitigate cold starts for latency-sensitive online services. Once configured, the provisioned instances remain active and continue to incur charges, even if no requests are being processed, until you choose to release them. For information about the billing details, see Billing overview.
You can only configure provisioned instances for a function of the LATEST version or with a specific alias.

Step 1: Create a provisioned instance policy

You can use one of the following methods to create a provisioned instance policy:

Create a provisioned instance policy on the Function Details > Configurations > Provisioned Instances page. This method is used in this topic.
Create a provisioned instance policy on the Advanced Features > Auto Scaling > Provisioned Instance Policy page.

Log on to the Function Compute console. In the left-side navigation pane, click Functions. In the top navigation bar, select a region. On the Functions page, click the function that you want to manage.
On the Function Details > Configurations > Provisioned Instances page, click Create Provisioned Instance Policy.
In the Create Provisioned Instance Policy panel, specify the number of provisioned instances.
Complete the configuration of the auto scaling policy in the panel.
Scheduled scaling
Choose scheduled scaling when your service has distinct periodic patterns or predictable traffic peaks. When the number of concurrent invocations exceeds the capacity defined by the scheduled scaling policy, all excess requests will be directed to on-demand instances for processing. For more information, see Scheduled scaling.
In the example shown in the preceding figure, the time zone is set to Asia/Shanghai (UTC+8), and the policy takes effect from August 1, 2024 to August 30, 2024. During this period, the number of provisioned instances is increased to 50 at 10:00 every Monday and reduced to 10 at 22:00 every Monday.
Threshold-based scaling
Choose threshold-based scaling when your service has unpredictable traffic patterns or sudden spikes in usage. Threshold-based scaling policies adjust the number of provisioned instances every minute based on the utilization of instance concurrency or the utilization of function resources. For more information, see Water-level scaling.
In the example shown in the preceding figure, the time zone is set to Asia/Shanghai (UTC+8), and the policy takes effect from 10:00 on August 1, 2024 to 10:00 on August 30, 2024. The concurrency utilization of provisioned instances is monitored. When it exceeds 60%, instances are scaled out; when it falls below 60%, they are scaled in. The maximum number of provisioned instances is 100 and the minimum number is 10.

Step 2: Verify the policy

Click the function and go to the Function Details > Monitoring > Function Metrics page to view the number of provisioned instances in Function Provisioned Instances (count).

Modify or delete a provisioned instance policy

On the Configurations tab of the Function Details page, click the Provisioned Instances tab in the left-side navigation tree to view the policies you have created. Click Modify or Delete in the Actions column to modify or delete a policy.

References

For more information about the basic concepts and billing methods of on-demand and provisioned modes, see Instance types and usage modes.
For more information about the limits, behaviors, and scaling rules of function instances in on-demand and provisioned modes, see Limits and rules of auto instance scaling.
By default, all functions within an Alibaba Cloud account in the same region share the same scaling limits. For more information about how to limit the number of instances for a specified function, see Specify the maximum number of instances. Function Compute returns a throttling error if the total number of running instances for the function reaches the limit you specify.