Provisioned instances help reduce request latencies caused by cold starts during peak hours. You can configure an auto scaling policy, such as a scheduled scaling policy or water-level scaling policy, for provisioned instances to improve resource utilization and prevent resource waste.
Limits
The following table shows the limits on the scale-out rate of provisioned instances in different regions.
Region | Upper limit of burst instances | Upper limit of instance growth rate |
China (Hangzhou), China (Shanghai), China (Beijing), China (Zhangjiakou), and China (Shenzhen) | 300 | 300 per minute |
Other regions | 100 | 100 per minute |
If you need higher scaling speed, join the DingTalk group 64970014484 for technical support.
Configure provisioned instances
Step 1: Create a provisioned instance policy
You can use one of the following methods to create a provisioned instance policy:
Create a provisioned instance policy on the
page. This method is used in this topic.Create a provisioned instance policy on the
page.
You can configure provisioned instances to mitigate the issue of cold starts and enhance the performance for latency-sensitive online services. After you configure provisioned function instances, the provisioned instances remain active. You are charged for these provisioned instances before they are not released, even if no requests are processed. For information about the billing details, see Billing overview.
Log on to the Function Compute console. In the left-side navigation pane, click Functions. In the top navigation bar, select a region. On the Functions page, click the function that you want to manage.
On the
page, click Create Provisioned Instance Policy.In the Create Provisioned Instance Policy panel, specify the number of provisioned instances.
In the Create Provisioned Instance Policy panel, configure the auto scaling policy for provisioned instances.
Scheduled scaling
Scheduled scaling is suitable for functions that have distinct periodic patterns or predictable traffic peaks. If the number of concurrent invocations is greater than the concurrency capacity of the scheduled scaling policy, excess requests are sent to on-demand instances for processing. For more information, see Scheduled scaling.
In the example shown in the preceding figures, the time zone is set to Asia/Shanghai (UTC+8), and the policy takes effect from August 1, 2024 to August 30, 2024. During this period, provisioned instances are increased to 50 at 10:00 every Monday and reduced to 10 at 22:00 every Monday.
Water-level scaling
Water-level scaling scales provisioned instances every minute based on the concurrency utilization of provisioned instances or resource utilization. For more information, see Water-level scaling.
In the example shown in the preceding figure, the time zone is set to Asia/Shanghai (UTC+8), and the policy takes effect from 10:00 on August 1, 2024 to 10:00 on August 30, 2024. Concurrency Utilization of Provisioned Instances is tracked and the threshold is set to 60%. When the concurrency utilization exceeds 60%, provisioned instances are scaled out. When the concurrency utilization is lower than 60%, provisioned instances are scaled in. The maximum number of provisioned instances is 100 and the minimum number is 10.
Step 2: Verify the policy
Click the function and go to the
page to view the number of provisioned instances in Function Provisioned Instances (count).Modify or delete a provisioned instance policy
On the Configurations tab of the Function Details page, click the Provisioned Instances tab in the left-side navigation tree to view created policies. Click Modify or Delete in the Actions column to modify or delete a policy.
More information
For more information about the basic concepts and billing methods of the on-demand mode and provisioned mode, see Instance types and usage modes.
For more information about the limits, behaviors, and scaling rules of function instances in on-demand mode and provisioned mode, see Limits and rules of auto instance scaling.
By default, all functions within an Alibaba Cloud account in the same region share the preceding scaling limits. To limit the number of instances for a function, you can configure the maximum number of concurrent instances. For more information, see Specify the maximum number of concurrent instances. After you configure the maximum number of concurrent instances, Function Compute returns a throttling error if the total number of running instances for the function reaches the specified limit.