Instance types and usage modes

Updated at: 2025-03-18 08:43

Function Compute provides CPU instances and GPU-accelerated instances. Both types of instances can be used in on-demand and provisioned mode. On-demand instances are billed based on actual execution durations. You can use on-demand instances together with the instance concurrency feature to improve resource utilization. Billing of a provisioned instance starts when Function Compute starts the provisioned instance and ends when you release the instance. Provisioned instances can effectively mitigate cold starts. This topic describes the types, usage modes, billing methods, and specifications of function instances in Function Compute.

Instance types

  • CPU instances: the basic instance type of Function Compute. CPU instances are suitable for scenarios with traffic bursts or compute-intensive workloads.

  • GPU-accelerated instance: Instances that use the Ampere and Turing architectures for GPU acceleration. GPU-accelerated instances are mainly used in audio and video processing, AI, and image processing scenarios. Instances of this type accelerate business by offloading workloads to GPU hardware.

    The following topics describe the best practices for GPU-accelerated instances in different scenarios:

    Important
    • GPU-accelerated instances can be deployed only by using container images.

    • When you use GPU-accelerated instances, join the DingTalk user group 64970014484 and provide the following information so that technical support can be provided in a timely manner:

      • Your organization name, such as your company name.

      • The ID of your Alibaba Cloud account.

      • The region in which you want to use GPU-accelerated instances. Example: China (Shenzhen).

      • Your contact information, such as your mobile number, email address, or DingTalk account.

Instance modes

Both GPU-accelerated instances and CPU instances support on-demand mode and provisioned mode. This section describes the two usage modes.

On-demand mode

Overview

On-demand instances are allocated and released by Function Compute. Function Compute automatically scales instances based on the number of function invocations. Function Compute creates instances when the number of function invocations increases, and destroys excess instances when the number of function invocations decreases. On-demand instances are automatically created upon requests. On-demand instances are destroyed if no requests are submitted for a specific period of time (usually 3 to 5 minutes). The first time you invoke an on-demand instance, a cold start occurs.

By default, each Alibaba Cloud account can run up to 100 instances in each region. The actual quota displayed on the General Quotas page in the Quota Center console prevails. You can increase the quota in the Quota Center console.

Billing method

Billing of an on-demand instance starts when requests are sent to the instance for processing and ends when the requests are processed. An on-demand instance can process one or more requests at a time. For more information, see Create a web function.

No instance is allocated if no request is submitted for processing, and therefore no fees are generated. In on-demand mode, you are charged only when your function is invoked. For more information about pricing and billing, see Billing overview.

Note

If you want to improve resource utilization of instances, we recommend that you configure instance concurrency based on your business requirements. In this case, multiple tasks preemptively share CPU and memory resources on your instance to improve resource utilization.

Instance concurrency = 1
Instance concurrency > 1

Measurement of execution duration starts when a request arrives at an instance and ends when the request execution is completed.

image

Measurement of the execution duration of an on-demand instance starts when the first request is received and ends when execution of the last request is completed. You can reuse resources to concurrently process multiple requests for cost reduction.

image

Provisioned mode

Overview

In provisioned mode, you can manage the allocation and release of function instances. Provisioned instances are retained unless you release them. Invocation requests are preferentially distributed to provisioned instances. If provisioned instances are not enough to process the requests, Function Compute allocates on-demand instances to process excess requests. For more information about how to delete a provisioned instance, see Configure provisioned instances.

Note

If cold starts one of your major concerns, we recommend that you use provisioned instances. You can specify a fixed number of provisioned instances or configure a scheduled auto scaling policy or a metric-based scaling policy based on factors such as your resource budget, traffic fluctuations of your business, and resource usage thresholds. The average cold start-caused latency of instances is significantly reduced when provisioned instances are used.

Billing method

  • Active instance

    Generally, instances that are processing requests are active instances. In the provisioned mode, instances are always active if you do not enable the idle mode. The billing of these provisioned instances starts when the instances are allocated and ends when you release the instances. Therefore, you are charged for the provisioned instances based on active mode prices even if they are not processing any requests if the instances are not released and the idle mode is not enabled.

    image
  • Idle instances

    For a provisioned instance, if no requests are being executed on the instance after the idle mode is enabled, the instance enters the idle state. After the idle mode is enabled, Function Compute freezes GPU or vCPU resources for provisioned instances that are not processing requests and the instances enter the idle state, in which you are charged at much lower prices. For more information, see CU conversion factors.

    In the example shown in the following figure, the idle mode is enabled and the billing of provisioned instances starts when the provisioned instances are created and ends when the provisioned instances are released. When a provisioned instance is not processing requests, the instance enters the idle state. The instance enters the active state when it starts to process requests. Fees are calculated based on the following formula: Fee = (Total idle resource usage x Unit price of idle resources) + (Total active resource usage x Unit price of active resource).

    image
    Note

    By default, Function Compute 3.0 enables the idle mode for provisioned instances. Function Compute freezes vCPU resources for provisioned instances that are not processing requests, in which case you are charged lower prices. When the instance enters the idle state, cold start is also eliminated to ensure the instance can process the new requests immediately when they arrive.

Instance specifications

  • CPU-accelerated Instances

    The following table describes the specifications of CPU instances. You can select instance specifications based on your business requirements.

    vCPUs

    Memory size (MB)

    Maximum code package size (GB)

    Maximum function execution duration (seconds)

    Maximum disk size (GB)

    Maximum bandwidth (Gbit/s)

    vCPUs

    Memory size (MB)

    Maximum code package size (GB)

    Maximum function execution duration (seconds)

    Maximum disk size (GB)

    Maximum bandwidth (Gbit/s)

    0.05~16

    Note: The value must be a multiple of 0.05.

    128~32768

    Note: The value must be a multiple of 64.

    10

    86400

    10

    Valid values:

    • 512 MB. This is the default value.

    • 10 GB.

    5

    Note

    The ratio of vCPU to memory capacity (in GB) is 1: N. N must be a value that ranges from 1 to 4.

  • GPU-accelerated instances

    The following table describes specifications of GPU-accelerated instances. You can configure instance specifications based on your business requirements.

    Note

    fc.gpu.tesla.1 GPU instances provide essentially the same GPU performance as physical NVIDIA T4 GPUs.

    Instance specifications

    Full GPU size (GB)

    Computing power of full GPUs (TFLOPS)

    Available specifications

    On-demand mode

    Regular provisioned mode

    Idle provisioned mode

    FP16

    FP32

    vGPU memory (MB)

    vGPU computing power (GPU)

    vCPUs

    Memory size (MB)

    Instance specifications

    Full GPU size (GB)

    Computing power of full GPUs (TFLOPS)

    Available specifications

    On-demand mode

    Regular provisioned mode

    Idle provisioned mode

    FP16

    FP32

    vGPU memory (MB)

    vGPU computing power (GPU)

    vCPUs

    Memory size (MB)

    fc.gpu.tesla.1

    16

    65

    8

    Valid values: 1024 to 16384 (1 GB to 16 GB).

    Note: The value must be a multiple of 1024.

    The value is calculated based on the following formula: vGPU computing power = vGPU memory (GB)/16. For example, if you set the vGPU memory to 5 GB, the maximum vGPU computing power is 5/16 memory cards.

    The computing power is automatically allocated by Function Compute and does not need to be manually allocated.

    Valid values: 0.05 to the value of vGPU memory (GB)/2].

    Note: The value must be a multiple of 0.05. For more information, see the GPU specifications section of this topic.

    Valid values: 128 to the value of [vGPU memory (GB) x 2048].

    Note: The value must be a multiple of 64. For more information, see the GPU specifications section of this topic.

    Y

    Y

    Y

    fc.gpu.ada.1

    48

    119

    60

    49152 (48 GB)

    Note: Only 48 GB GPU memory is supported.

    By default, computer power of full GPUs is allocated.

    Note: The computing power is automatically allocated by Function Compute and does not need to be manually allocated.

    Valid value: 8.

    Note: Only 8-core vCPUs are supported.

    Valid value: 65536.

    Note: Only 64 GB memory is supported.

    Y

    Y

    Y

    GPU-accelerated instances of Function Compute also support the following resource specifications.

    Image size (GB)

    Maximum function execution duration (second)

    Maximum disk size (GB)

    Maximum bandwidth (Gbit/s)

    Image size (GB)

    Maximum function execution duration (second)

    Maximum disk size (GB)

    Maximum bandwidth (Gbit/s)

    Container Registry Enterprise Edition (Standard Edition): 15

    Container Registry Enterprise Edition (Advanced Edition): 15

    Container Registry Enterprise Edition (Basic Edition): 15

    Container Registry Personal Edition (free): 15

    86400

    10

    5

    Note
    • Setting the instance type to g1 is equivalent to setting the instance type to fc.gpu.tesla.1.

    • GPU-accelerated instances of Tesla series GPUs are supported in the following regions: China (Hangzhou), China (Shanghai), China (Beijing), China (Zhangjiakou), China (Shenzhen), Japan (Tokyo), US (Virginia), and Singapore.

    • GPU-accelerated instances of Ada series GPUs are supported in the following regions: China (Beijing), China (Hangzhou), China (Shanghai), China (Shenzhen), Singapore, and US (Virginia).

GPU specifications

Expand to view details of fc.gpu.tesla.1.

vGPU memory (MB)

vCPUs

Maximum memory size (GB)

Memory size (MB)

1024

0.05~0.5

2

128~2048

2048

0.05~1

4

128~4096

3072

0.05~1.5

6

128~6144

4096

0.05~2

8

128~8192

5120

0.05~2.5

10

128~10240

6144

0.05~3

12

128~12288

7168

0.05~3.5

14

128~14336

8192

0.05~4

16

128~16384

9216

0.05~4.5

18

128~18432

10240

0.05~5

20

128~20480

11264

0.05~5.5

22

128~22528

12288

0.05~6

24

128~24576

13312

0.05~6.5

26

128~26624

14336

0.05~7

28

128~28672

15360

0.05~7.5

30

128~30720

16384

0.05~8

32

128~32768

Relationship between GPU types and instance concurrency

  • A Tesla series GPU has a total memory capacity of 16 GB. If you configure the GPU Memory Size parameter to 1 GB, you can run 16 GPU containers simultaneously on one GPU of this series. By default, the total number of GPUs in a region is limited to 30. Therefore, at any given time, a maximum of 480 Tesla series GPU containers can run within a region.

    • If you set the instance concurrency of your GPU function to 1, a maximum of 480 inference requests can be concurrently processed by your function in a region.

    • If you set the instance concurrency of your GPU function to 5, a maximum of 2,400 inference requests can be concurrently processed by your function in a region.

  • An Ada series GPU has a total memory capacity of 48 GB, and can carry only one GPU container (the GPU Memory Size parameter can only be set to 48 GB). By default, the total number of GPUs in a region is limited to 30. Therefore, at any given time, a maximum of 30 Ada series GPU containers can run within a region.

    • If you set the instance concurrency of your GPU function to 1, a maximum of 30 inference requests can be concurrently processed by your function in a region.

    • If you set the instance concurrency of your GPU function to 5, a maximum of 150 inference requests can be concurrently processed by your function in a region.

Additional information

  • You can enable the idle mode when you configure auto scaling rules. For more information, see Configure provisioned instances.

  • For more information about the billing methods and billable items of Function Compute, see Billing overview.

  • When you call an API operation to create a function, you can use the instanceType parameter to specify an instance type. For more information, see CreateFunction.

  • For more information about how to specify the instance type and instance specifications in the Function Compute console, see Create a web function.

  • On this page (1, T)
  • Instance types
  • Instance modes
  • On-demand mode
  • Provisioned mode
  • Instance specifications
  • GPU specifications
  • Additional information
Feedback
phone Contact Us

Chat now with Alibaba Cloud Customer Service to assist you in finding the right products and services to meet your needs.

alicare alicarealicarealicare