Usage modes and specifications of elastic instances and GPU-accelerated instances - Function Compute

Function Compute provides elastic instances and GPU-accelerated instances. Both types of instances can be used in on-demand and provisioned modes. On-demand instances are billed based on actual execution duration. You can use on-demand instances together with the instance concurrency feature to improve resource utilization. Billing of a provisioned instance starts when Function Compute starts the provisioned instance and ends when you release the instance. Provisioned instances can effectively mitigate cold starts. This topic describes types, usage modes, billing methods, and specifications of function instances in Function Compute.

Instance types

Elastic instance: the basic instance type of Function Compute. Elastic instances are suitable for scenarios with bursty traffic or compute-intensive workloads.
GPU-accelerated instance: instances that use the Ampere and Turing architectures for GPU acceleration. GPU-accelerated instances are mainly used in audio and video processing, AI, and image processing scenarios. Instances of this type accelerate business by offloading workloads to GPU hardware.
The following topics describe the best practices for GPU-accelerated instances in different scenarios:
Important
- GPU-accelerated instances can be deployed only by using container images.
- When you use GPU-accelerated instances, join the DingTalk user group 64970014484 and provide the following information so that technical support can be provided in a timely manner:
  - Your organization name, such as your company name.
  - The ID of your Alibaba Cloud account.
  - The region in which you want to use GPU-accelerated instances. Example: China (Shenzhen).
  - Your contact information, such as your mobile number, email address, or DingTalk account.

Instance modes

GPU-accelerated instances and elastic instances can both work in on-demand mode and provisioned mode. This section describes the two modes.

On-demand mode

Introduction

On-demand instances are allocated and released by Function Compute. Function Compute automatically scales instances based on the number of function invocations. Function Compute creates instances when the number of function invocations increases, and destroys excess instances when the number of function invocations decreases. On-demand instances are automatically created upon requests. On-demand instances are destroyed if no requests are submitted for a period of time (usually 3 to 5 minutes). The first time you invoke an on-demand instance, a cold start occurs.

By default, a maximum of 300 instances can be created in on-demand mode for an Alibaba Cloud account in each region. If you need to increase this limit, join the DingTalk group 64970014484 for technical support.

Billing method

Billing of an on-demand instance starts when requests are sent to the instance for processing and ends when the requests are processed. Each on-demand instance can process one or more requests at a time. For more information, see Configure instance concurrency.

No instance is allocated if no request is submitted for processing, and therefore no fees are generated. In on-demand mode, you are charged only when your function is invoked. For more information about pricing and billing, see Billing overview.

Note

If you want to improve resource utilization of instances, we recommend that you configure instance concurrency based on your business requirements. In this case, multiple tasks preemptively share CPU and memory resources on your instance to improve resource utilization.

Instance concurrency = 1

Measurement of execution duration starts when a request arrives at an instance and ends when the request is completely executed.

Instance concurrency > 1

Measurement of the execution duration of an on-demand instance starts when the first request is reached and ends when the last request is executed. You can reuse resources to concurrently process multiple requests. This way, resource costs can be reduced.

Provisioned mode

Introduction

In provisioned mode, you can manage the allocation and release of function instances. Provisioned instances are retained unless you release them. Invocation requests are preferentially distributed to provisioned instances. If provisioned instances are not enough to process all requests, Function Compute allocates on-demand instances to process excess requests. For more information about how to delete a provisioned instance, see Configure auto scaling rules.

Note

If cold starts are an issue for you, we recommend that you use provisioned instances. You can specify a fixed number of provisioned instances or configure a scheduled auto scaling policy or a metric-based scaling policy based on factors such as your resource budget, traffic fluctuations of your business, and resource usage thresholds. The average cold start-caused latency of instances is significantly reduced when provisioned instances are used.

Idle mode

Elastic instances

States of elastic instances are classified into the active state and the idle state based on whether vCPU resources are allocated. By default, the idle mode is enabled.

Active instances
Instances are considered active if they are processing requests or if Idle Mode is disabled. If you disable Idle Mode, vCPUs are always allocated to provisioned instances regardless of whether the instances are processing requests or not. Running of background tasks is not affected.
Idle instances
If you enable Idle Mode, Function Compute freezes vCPUs of provisioned instances when the instances are not processing requests. In this case, the provisioned instances enter the idle state. You are not charged for vCPU resources when the instances are in the idle state, which helps you save costs. If a PreFreeze hook is configured for an instance, the instance enters the idle state after the PreFreeze hook is executed. Otherwise, the instance immediately enters the idle state when it finishes processing requests. For more information about instance states, see Function instance lifecycle.

You can choose whether to enable the idle mode feature based on your business requirements.

Costs
If you want to use provisioned instances to mitigate cold starts and hope to save costs, we recommend that you enable the idle mode feature. This feature allows you to pay only for memory and disk resources of provisioned instances if provisioned instances are in the idle state, and requests can be responded without cold starts.
Background tasks
If your function needs to run background tasks, we recommend that you do not enable the idle mode. The following items provide example scenarios:
- Some application frameworks rely on built-in schedulers or background features. Some dependent middleware needs to regularly report heartbeats.
- Some asynchronous operations are performed by using Goroutine lightweight threads that use Go, asynchronous functions that use Node.js, or asynchronous threads that use Java.

GPU-accelerated instances

GPU-accelerated instances are classified into active instances and idle instances based on whether the instances are allocated GPUs. By default, the idle mode is enabled for GPU-accelerated instances.

Active instances
Instances are considered active if they are processing requests or if the idle mode feature is disabled for them. After Idle Mode is enabled, Function Compute freezes GPUs of provisioned instances and the instances enter the idle state when no requests are sent.
Idle instances
Provisioned instances for which Idle Mode is enabled enter the idle state when they are not processing requests.

Billing method

Active instances
The billing of provisioned instances starts when the provisioned instances are created and ends when the provisioned instances are released. Provisioned instances are requested and released by yourself. Therefore, you are charged for the provisioned instances based on active mode prices even if they are not processing any requests if the instances are not released and the idle mode is not enabled.
Idle instances
After the idle mode is enabled, the provisioned instances enter the idle state when they are not processing requests. The prices of idle instances are much lower than prices of active instances. For more information, see Conversion factors.

Instance specifications

Elastic instances

The following table describes specifications of elastic instances. You can configure instance specifications based on your business requirements.

vCPU

Memory size (MB)

Maximum code package size (GB)

Maximum function execution duration (second)

Maximum disk size (GB)

Maximum bandwidth (Gbit/s)

0.05 to 16.

Note: The value must be a multiple of 0.05.

128 to 32,768.

Note: The value must be a multiple of 64.

86,400

Valid values:

512 MB. This is the default value.
10 GB.

Note

The ratio of vCPU to memory capacity (in GB) is 1: N. N must be a value that ranges from 1 to 4.

GPU-accelerated instances

The following table describes specifications of GPU-accelerated instances. You can configure instance specifications based on your business requirements.

Note

fc.gpu.tesla.1 GPU instances provide essentially the same GPU performance as physical NVIDIA T4 cards.
fc.gpu.ampere.1 GPU instances provide essentially the same GPU performance as physical NVIDIA A10 cards.

Instance type	Full GPU size (GB)	Computing power of full GPU card (TFLOPS)		Available specifications				On-demand mode	Regular provisioned mode	Idle provisioned mode
Instance type	Full GPU size (GB)	FP16	FP32	vGPU memory (MB)	vGPU computing power (card)	vCPU	Memory size (MB)	On-demand mode	Regular provisioned mode	Idle provisioned mode
fc.gpu.tesla.1	16	65	8	Valid values: 1,024 to 16,384 (1 GB to 16 GB) Note: The value must be a multiple of 1,024.	The value is calculated based on the following formula: vGPU computing power = vGPU memory (GB)/16. For example, if you set the vGPU memory to 5 GB, the maximum vGPU computing power is 5/16 memory cards. The computing power is automatically allocated by Function Compute and does not need to be manually allocated.	Valid values: 0.05 to the value of [vGPU memory (GB)/2]. Note: The value must be a multiple of 0.05. For more information, see GPU specifications.	Valid values: 128 to the value of [vGPU memory (GB) x 2,048]. Note: The value must be a multiple of 64. For more information, see GPU specifications.	Y	Y	Y
fc.gpu.ampere.1	24	125	30	Valid values: 1,024 to 24,576 (1 GB to 24 GB) Note: The value must be a multiple of 1,024.	The value is calculated based on the following formula: vGPU memory (GB)/24. For example, if you set the vGPU memory to 5 GB, you can use up to 5/24 memory cards. The computing power is automatically allocated by Function Compute and does not need to be manually allocated.	Valid values: 0.05 to the value of [vGPU memory (GB)/3]. Note: The value must be a multiple of 0.05. For more information, see GPU specifications.	Valid values: 128 to the value of [vGPU memory (GB) x 4096)/3]. Note: The value must be a multiple of 64. For more information, see GPU specifications.	Y	Y	Y
fc.gpu.ada.1	48	119	60	49,152 (48 GB) Note: Only 48 GB GPU memory is supported.	By default, computer power of full GPU cards is allocated. Note: The computing power is automatically allocated by Function Compute and does not need to be manually allocated.	Valid value: 8. Note: Only 8 vCPUs are supported.	Valid value: 65536. Note: Only 64 GB memory is supported.	N	Y	Y

GPU-accelerated instances of Function Compute also support the following resource specifications.

Image size (GB)

Maximum function execution duration (second)

Maximum disk size (GB)

Maximum bandwidth (Gbit/s)

Container Registry Enterprise Edition (Standard Edition): 15

Container Registry Enterprise Edition (Advanced Edition): 15

Container Registry Enterprise Edition (Basic Edition): 15

Container Registry Personal Edition (free): 15

86,400

Note

Setting the instance type to g1 is equivalent to setting the instance type to fc.gpu.tesla.1.
GPU-accelerated instances of Tesla series GPU cards are supported in the following regions: China (Hangzhou), China (Shanghai), China (Beijing), China (Zhangjiakou), China (Shenzhen), Japan (Tokyo), US (Virginia), and Singapore.
GPU-accelerated instances of Ampere series GPU cards are supported in the following regions: China (Hangzhou), China (Shanghai), Japan (Tokyo), and Singapore.
GPU-accelerated instances of Ada series GPU cards are supported in the following regions: China (Beijing), China (Hangzhou), China (Shanghai), and China (Shenzhen).

GPU specifications

Expand to view details of fc.gpu.tesla.1.

vGPU memory (MB)	vCPU	Maximum memory size (GB)	Memory size (MB)
1,024	0.05–0.5	2	128–2,048
2,048	0.05–1	4	128–4,096
3,072	0.05–1.5	6	128–6,144
4,096	0.05–2	8	128–8,192
5,120	0.05–2.5	10	128–10,240
6,144	0.05–3	12	128–12,288
7,168	0.05–3.5	14	128–14,336
8,192	0.05–4	16	128–16,384
9,216	0.05–4.5	18	128–18,432
10,240	0.05–5	20	128–20,480
11,264	0.05–5.5	22	128–22,528
12,288	0.05–6	24	128–24,576
13,312	0.05–6.5	26	128–26,624
14,336	0.05–7	28	128–28,672
15,360	0.05–7.5	30	128–30,720
16,384	0.05–8	32	128–32,768

Expand to view details of fc.gpu.ampere.1.

vGPU memory (MB)	vCPU	Maximum memory size (GB)	Memory size (MB)
1,024	0.05–0.3	1.3125	128–1,344
2,048	0.05–0.65	2.625	128–2,688
3,072	0.05–1	4	128–4,096
4,096	0.05–1.3	5.3125	128–5,440
5,120	0.05–1.65	6.625	128–6,784
6,144	0.05–2	8	128–8,192
7,168	0.05–2.3	9.3125	128–9,536
8,192	0.05–2.65	10.625	128–10,880
9,216	0.05–3	12	128–12,288
10,240	0.05–3.3	13.3125	128–13,632
11,264	0.05–3.65	14.625	128–14,976
12,288	0.05–4	16	128–16,384
13,312	0.05–4.3	17.3125	128–17,728
14,336	0.05–4.65	18.625	128–19,072
15,360	0.05–5	20	128–20,480
16,384	0.05–5.3	21.3125	128–21,824
17,408	0.05–5.65	22.625	128–23,168
18,432	0.05–6	24	128–24,576
19,456	0.05–6.3	25.3125	128–25,920
20,480	0.05–6.65	26.625	128–27,264
21,504	0.05–7	28	128–28,672
22,528	0.05–7.3	29.3125	128–30,016
23,552	0.05–7.65	30.625	128–31,360
24,576	0.05–8	32	128–32,768

Additional information

You can enable the idle mode feature when you configure auto scaling rules. For more information, see Configure auto scaling rules.
For more information about the billing methods and billable items of Function Compute, see Billing overview.
When you call an API operation to create a function, you can use the instanceType parameter to specify an instance type. For more information, see CreateFunction.
For more information about how to specify type and specifications of an instance in the Function Compute console, see Manage functions.