FAQ about GPU-accelerated instances - Function Compute - Alibaba Cloud Documentation Center

This topic provides answers to some commonly asked questions about GPU-accelerated instances.

What are the driver and CUDA versions of GPU-accelerated instances in Function Compute?
What do I do if CUFFT_INTERNAL_ERROR is reported during a function execution?
What do I do if a CUDA GPG error is reported when I build an image?
Why is the instance type of my GPU-accelerated instance g1?
Why do my provisioned GPU-accelerated instances fail to be allocated?
What is the limit on the size of a GPU image?
What do I do if a GPU image fails to be converted to an accelerated image?
Should a model be integrated into or separated from an image?
How do I perform a model warm-up?
What do I do if the [FunctionNotStarted] Function Instance health check failed on port xxx in 120 seconds error is reported when I start a GPU image?
What do I do if the end-to-end latency of my function is large and fluctuates greatly?
What do I do if the system fails to find the NVIDIA driver?
What do I do if "On-demand invocation of current GPU type is disabled..." is reported on Ada-series GPU-accelerated instances?

What are the driver and CUDA versions of GPU-accelerated instances in Function Compute?

The following items list versions of main components of GPU-accelerated instances:

Driver versions: Drivers include kernel-mode drivers (KMD) such as nvidia.ko and CUDA user-mode drivers (UMD) such as libcuda.so. Drivers that are used by GPU-accelerated instances of Function Compute are provided by NVIDA and deployed by Function Compute. The driver versions used by GPU-accelerated instances may change as a result of iterations, releases of new card models, bug fixes, and driver lifecycle expiration. We recommend that you do not specify a specific driver version in container images. For more information, see Image usage notes.
CUDA Toolkit versions: CUDA Toolkit includes various components, such as CUDA Runtime, cuDNN, and cuFFT. The CUDA Toolkit version is determined by the container image you use.

GPU drivers and CUDA Toolkit are released by NVIDIA and related to each other. For more information, see NVIDIA CUDA Toolkit Release Notes.

The current driver version of GPU-accelerated instances in Function Compute is 550.54.15, and the version of the CUDA module is 12.4. For best compatibility, we recommend that you use CUDA Toolkit 11.8 or later and do not use a CUDA UMD whose version is later than that of the platform.

What do I do if CUFFT_INTERNAL_ERROR is reported during a function execution?

The cuFFT library in CUDA 11.7 has forward compatibility issues. If you encounter the above error in card types newer than Ampere, we recommend that you upgrade CUDA to 11.8 or later. For more information about GPU card types, see Instance specifications.

Take PyTorch as an example. After the upgrade, you can use the following code snippet for verification. If no error is reported, the upgrade is valid.

import torch
out = torch.fft.rfft(torch.randn(1000).cuda())

What do I do if a CUDA GPG error is reported when I build an image?

The following GPG error is reported during an image building process:

W: GPG error: https://developer.download.nvidia.cn/compute/cuda/repos/ubuntu2004/x86_64  InRelease: The following signatures couldn't be verified because the public key is not available: NO_PUBKEY A4B469963BF863CC
E: The repository 'https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64  InRelease' is not signed.

In this case, you can add the following script to the RUN rm command line of the Dockerfile file and rebuild your image.

RUN apt-key adv --keyserver keyserver.ubuntu.com --recv-keys A4B469963BF863CC

Why is the instance type of my GPU-accelerated instance g1?

The g1 instance type is the same as fc.gpu.tesla.1. For more information, see Instance specifications.

Why do my provisioned GPU-accelerated instances fail to be allocated?

The allocation of provisioned instances may fail due to the following reasons:

The startup of the provisioned instances times out.
- Error code: FunctionNotStarted.
- Error message: Function instance health check failed on port XXX in 120 seconds.
- Solution: View the application startup logic to check whether the logic of downloading models from the Internet and loading large models (larger than 10 GB) exists. We recommend that you start the web server before you run the model loading logic.
The maximum number of instances at the function level or region level is reached.
- Error code: ResourceThrottled.
- Error message: Reserve resource exceeded limit.
- Solution: If you want to use more physical GPU cards, join the DingTalk group 64970014484 for technical support.

What is the limit on the size of a GPU image?

The image size limit applies only to compressed images. You can view the size of a compressed image in the Container Registry console. You can run the docker images command to query the size of an uncompressed image.

In most cases, an uncompressed image that is smaller than 20 GB in size can be deployed to Function Compute and used as expected.

What do I do if a GPU image fails to be converted to an accelerated image?

The time required to convert an image increases as the size of your image grows. A timeout error may cause a conversion failure. You can re-trigger the acceleration conversion of the GPU image by configuring and saving the function configurations in the Function Compute console. You do not need to modify the parameters if you want to retain existing settings.

Should a model be integrated into or separated from an image?

If your model files are large, frequently iterated, or exceed the size limit on images when they are published with an image, we recommend that you separate the model from the image. If the model is small in size, for example, 100 MB, and not frequently changed, you can distribute the model file with the image. For more information about the limits on the size of a platform image, see What is the limit on the size of a GPU image?

To deploy a model separately from an image, you can store the model in a File Storage NAS (NAS) file system or an Object Storage Service (OSS) file system. The model is loaded from the mount target when the application starts. For more information, see Configure a NAS file system and Configure an OSS file system.

We recommend that you store models in a Performance NAS file system, which is compatible with Portable Operating System Interface (POSIX) and has a high initial read bandwidth that helps reduce the time required to load your model. For more information, see General-purpose NAS file systems.
You can also store models in OSS buckets. This way, you can use OSS accelerators to achieve lower latency and higher throughput. If you mount an OSS file system, workloads are processed in UMD and instance memory and temporary storage space are consumed. Therefore, we recommend that you use OSS on GPU-accelerated instances with large specifications.

How do I perform a model warm-up?

We recommend that you warm up a model in the /initialize method. The model is connected to production traffic only after the /initialize method is completed. You can refer to the following topics to learn more about model warm-up:

What do I do if the [FunctionNotStarted] Function Instance health check failed on port xxx in 120 seconds error is reported when I start a GPU image?

Cause: An AI/GPU application takes too long to start. As a result, the health check of Function Compute fails. A common reason an AI/GPU applications take too long to start is that it takes too long to load the model, which causes the startup of the web server to time out.
Solution:
- Do not dynamically load the model over the Internet when the application starts. We recommend that you place the model in an image or in a NAS file system and load the model from the nearest path.
- Place model initialization in the /initialize method to preferentially start the application. Specifically, load the mode after the web server is started.
  Note
  For more information about lifecycles of function instances, see Function instance lifecycle.

What do I do if the end-to-end latency of my function is large and fluctuates greatly?

Make sure that the state of image acceleration is Available in the environment information.
Check the type of the NAS file system. If your function needs to read data, such as a model, from a NAS file system, we recommend that you use a Performance NAS file system, instead of the Capacity type, to ensure the performance. For more information, see General-purpose NAS file systems.

What do I do if the system fails to find the NVIDIA driver?

This issue occurs if you run the docker run --gpus all command to specify a container and use the docker commit method to build an application image. On-premises NVIDIA information is contained in the built image and the driver cannot be mounted after the image is deployed to Function Compute. The system cannot find the NVIDIA driver.

To resolve the issue, we recommend that you use Dockerfile to build an application image. For more information, see Dockerfile.

Do not specify a specific driver version in a container image. For more information, see Image usage notes.

What do I do if "On-demand invocation of current GPU type is disabled..." is reported on Ada-series GPU-accelerated instances?

Generally, ResourceExhausted:On-demand invocation of current GPU type is disabled, please provision instances instead is reported because the actual number of requests exceeds the maximum number of requests of provisioned instances. Ada-series GPU-accelerated instances support only the provisioned mode. We recommend that you increase the number of provisioned instances based on the actual number of requests.