All Products
Search
Document Center

Function Compute:Best practice for reducing cold start latencies

Last Updated:Feb 02, 2024

This topic describes how to optimize the cold starts and function performances of on-demand instances by using provisioned instances in Function Compute.

What is a cold start?

Function Compute supports two types of instances: on-demand instances and provisioned instances. On-demand instances are automatically allocated and released by Function Compute. You are charged based on the actual amount of time that it takes the instances to process requests. Using on-demand instances saves you the trouble of managing and allocating resources. However, on-demand mode brings cold starts and increases invocation latencies, which negatively affects the performance of functions.

Cold start refers to the process of preparing the execution environment and code, such as code download, container start, runtime initialization, and code initialization. After the cold start is complete, the function instance is ready to process subsequent requests.

image

Optimize cold starts of on-demand instances

Cold starts can be optimized on the system and user sides. On the system side, Function Compute has made many improvements in terms of cold start optimization. On the user side, we recommend that you optimize cold starts by using the following methods:

  • Use lightweight code packages

    Use lightweight code packages and remove unnecessary dependencies. For example, you can run npm prune in a Node.js runtime or autoflake in a Python runtime. In addition, you can delete redundant files, such as test source code and invalid binary files and data files, from third-party libraries to reduce the time for code download and decompression.

  • Use a lightweight programming language

    Some languages, such as Java, bring longer cold starts than other languages. For applications that are sensitive to cold starts, using a lightweight programming language, such as Python, can greatly reduce long tail latencies if the latencies of warm starts are not significantly different.

  • Configure proper memory

    For a function with fixed concurrency settings, more CPU resources are available for function running if a larger memory is configured.

  • Reduces the frequency of cold starts

    • Use a time trigger to warm up instances.

    • You can use the Initializer hook to reduce cold starts. In this case, Function Compute asynchronously calls initializers to eliminate the time spent on code initialization. Cold starts are imperceptible to users during system and function upgrades of Function Compute.

Hybrid mode

In most cases, cold starts on the user side are difficult to eliminate. For example, in deep learning inference scenarios, cold starts are inevitable if a large number of model files need to be loaded or functions must interact with legacy systems by using clients that take a long time to initialize. In these scenarios, you can configure provisioned instances or use both provisioned instances and on-demand instances if your function is sensitive to latencies.

Provisioned instances are allocated and released by you. You are charged based on the running duration of the instances. The system automatically allocates on-demand instances when the resources of provisioned instances are not enough to meet the requirements of workloads. This helps achieve the optimal balance between performance and resource utilization. Provisioned instances allow you to allocate computing resources in advance based on workload fluctuations. The system still uses provisioned instances to process requests when on-demand instances are allocated. This eliminates latencies caused by cold starts.

Provisioned instances are preferentially used if you configure both provisioned instances and on-demand instances. For example, you have 10 provisioned instances. Function Compute allocates on-demand instances to process requests if more than 10 instances are required in 1 second to process requests. Whether an instance is fully loaded is determined by the concurrency settings of the instance. The system tracks the number of requests that are being processed on each instance. When the number of concurrent requests of an instance reaches the specified upper limit, the system allocates another instance. When the number of requests for all instances reaches the upper limit, new instances are created. Provisioned instances are managed by you. You are charged for provisioned instances even when they are not processing requests. On-demand instances are managed by Function Compute. The system reclaims on-demand instances that are not processing requests for a while. You are charged only based on the actual period during which on-demand instances processes requests. For more information about the billing rules, see Billing overview. You can configure the upper limit for on-demand instances to ensure that the resource usage is within the expected range.