All Products
Search
Document Center

Function Compute:GPU-accelerated instance FAQ

Last Updated:Jan 30, 2026

This topic describes common issues you may encounter when using GPU-accelerated instances and provides solutions.

What are the driver and CUDA versions for Function Compute GPU-accelerated instances?

The component versions for GPU-accelerated instances are divided into two parts:

  • Driver version: This includes the kernel mode driver nvidia.ko and the CUDA user mode driver libcuda.so. The drivers for Function Compute GPU-accelerated instances are provided by NVIDIA and deployed by the Function Compute platform. The driver version for GPU-accelerated instances may change in the future due to feature iterations, new card models, bug fixes, or driver lifecycle expiration. Avoid adding driver-specific content to your container image. For more information, see What do I do if the NVIDIA driver cannot be found?.

  • CUDA Toolkit version: This includes CUDA Runtime, cuDNN, and cuFFT. You determine the CUDA Toolkit version when you build the container image.

The GPU driver and CUDA Toolkit are released by NVIDIA and have specific version dependencies. For more information, see the CUDA Toolkit Release Notes for the corresponding version.

Function Compute GPU-accelerated instances currently use driver version 580.95.05, which corresponds to CUDA user mode driver version 13.0. For optimal compatibility, use a minimum CUDA Toolkit version of 11.8. The CUDA Toolkit version must not exceed the CUDA user mode driver version that is provided by the platform.

What do I do if I encounter a CUFFT_INTERNAL_ERROR during execution?

The cuFFT library in CUDA 11.7 has a known forward compatibility issue that may cause this error on newer card models. Upgrade to at least CUDA 11.8. For more information about GPU card models, see Specifications.

For example, in PyTorch, you can use the following code snippet to verify the upgrade. If no error occurs, the upgrade was successful.

import torch
out = torch.fft.rfft(torch.randn(1000).cuda())

How do I resolve a CUDA GPG error when building an image?

When you build an image, you may encounter a GPG error. The error message is as follows.

W: GPG error: https://developer.download.nvidia.cn/compute/cuda/repos/ubuntu2004/x86_64  InRelease: The following signatures couldn't be verified because the public key is not available: NO_PUBKEY A4B469963BF863CC
E: The repository 'https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64  InRelease' is not signed.

To resolve this, add the following script after the RUN rm command in your Dockerfile and then rebuild the image.

RUN apt-key adv --keyserver keyserver.ubuntu.com --recv-keys A4B469963BF863CC

Why is my GPU-accelerated instance type displayed as g1?

Setting the instance type to g1 is equivalent to setting it to fc.gpu.tesla.1. For more information, see Specifications.

Why did my provisioned GPU-accelerated instance fail to be provisioned?

A provisioned instance may fail for the following reasons:

  • Provisioned instance startup timeout

    • Error code: "FunctionNotStarted"

    • Error message: "Function instance health check failed on port XXX in 120 seconds"

    • Solution: Check your application startup logic. Look for logic that downloads models from the public network or loads Large Language Models (LLMs) larger than 10 GB. Start the web server first, and then load the model.

  • The instance quota for the function or region is reached

    • Error code: "ResourceThrottled"

    • Error message: "Reserve resource exceeded limit"

    • Solution: The default quota for physical GPU cards for a single Alibaba Cloud account in a region is 30. The actual quota is displayed in the Quota Center. If you require more physical cards, submit a request in the Quota Center.

What is the size limit for a GPU image?

The image size limit applies to the compressed image, not the uncompressed image. You can view the compressed image size in the Alibaba Cloud Container Registry console. You can also run the docker images command locally to query the uncompressed image size.

Typically, an image with an uncompressed size of less than 20 GB can be deployed to Function Compute and used normally.

What do I do if GPU image acceleration fails?

Image acceleration takes longer for larger images, which can cause the process to fail due to a timeout. You can re-trigger the accelerated image conversion by editing and saving the function configuration in the Function Compute console. You do not need to change any parameters.

Should the model be packaged in the image or separated from it?

If your model file is large, iterates frequently, or exceeds the platform's image size limit when published with the image, separate the model from the image. If you choose to separate the model from the image, you can store the model in a NAS or OSS file system. For more information, see Best practices for model storage on GPU-accelerated instances.

How do I perform model warm-up, and are there any best practices?

Perform model warm-up in the /initialize method. The instance starts receiving production traffic only after the /initialize method is complete. For more information, see the following documents:

Why does my GPU image fail to start with the error "[FunctionNotStarted] Function Instance health check failed on port xxx in 120 seconds."

  • Cause: The AI/GPU application takes too long to start, which causes the health check on the Function Compute (FC) platform to fail. A common reason for the long startup time is that loading the model takes too long, which causes the web server to time out.

  • Solution:

    • Do not dynamically load models from the public network during application startup. Place the model in the image or in File Storage NAS to load it from a nearby location.

    • Place the model initialization in the /initialize method. This allows the web server to start before the model begins to load.

      Note

      For more information about the function instance lifecycle, see Function instance lifecycle.

My function has high and fluctuating end-to-end latency. How can I fix this?

  1. First, confirm that the image acceleration status in the environment context is active.

  2. Confirm the type of your NAS file system. If your function needs to read data from a NAS file system, such as a model, use a compute-optimized General-purpose NAS file system for better performance. Do not use a storage-optimized file system. For more information, see General-purpose NAS.

What do I do if the NVIDIA driver cannot be found?

When you specify a container with the docker run --gpus all command and then build an application image using docker commit, the resulting image contains local NVIDIA driver information. This prevents the driver from mounting correctly after the image is deployed to Function Compute. As a result, the system cannot find the NVIDIA driver.

To resolve this issue, build your application image using a Dockerfile. For more information, see dockerfile.

Also, do not add driver-related components to the image, and avoid making your application dependent on a specific driver version. For example, do not include libcuda.so, which provides the CUDA Driver API, in your image. This dynamic library is strongly tied to the device kernel driver version. A mismatch between this type of dynamic library in the image and the host environment can cause abnormal application behavior due to compatibility issues.

When a function instance is created, the Function Compute platform injects user mode components related to the driver into the container. These components match the driver version provided by the platform. This is also the behavior of GPU container virtualization technologies such as NVIDIA Container Runtime. It delegates driver-specific tasks to the platform resource provider to maximize the environmental adaptability of the GPU container image. The drivers for Function Compute GPU-accelerated instances are provided by NVIDIA. The driver version for GPU-accelerated instances may change in the future due to feature iterations, new card models, bug fixes, or driver lifecycle expiration.

If you are already using GPU container virtualization technologies such as NVIDIA Container Runtime, avoid creating images with the docker commit command. Such images will contain the injected driver-related components. When you use such an image on the Function Compute platform, a version mismatch with the platform's components may cause undefined behavior, such as application exceptions.

What should I do if on-demand calls fail to create GPU-accelerated instances and return "ResourceExhausted" or "ResourceThrottled" errors?

Because GPU resources are scarce, on-demand calls are affected by fluctuations in the resource pool. This may prevent instances from being created in time to serve requests. For predictable resource delivery, configure scaling rules for your function to reserve resources. For more information about billing for provisioned instances, see Billing overview.