FAQ about GPU-accelerated instances

This topic provides answers to some commonly asked questions about GPU-accelerated instances.

What are the driver and CUDA versions of GPU-accelerated instances in Function Compute?
What do I do if CUFFT_INTERNAL_ERROR is reported during a function execution?
What do I do if a CUDA GPG error is reported when I build an image?
Why is the instance type of my GPU-accelerated instance g1?
Why do my provisioned GPU-accelerated instances fail to be allocated?
What is the limit on the size of a GPU image?
What do I do if a GPU image fails to be converted to an accelerated image?
Should a model be integrated into or separated from an image?
How do I perform a model warm-up?
What do I do if the "[FunctionNotStarted] Function Instance health check failed on port xxx in 120 seconds" error is reported when I start a GPU image?
What do I do if the end-to-end latency of my function is large and fluctuates greatly?
What do I do if the system fails to find the NVIDIA driver?
What do I do if "On-demand invocation of current GPU type is disabled..." is reported on Ada-series GPU-accelerated instances?
What are the usage notes for idle GPU instances?

What are the driver and CUDA versions of GPU-accelerated instances in Function Compute?

The following items list versions of main components of GPU-accelerated instances

Driver versions: Drivers include kernel-mode drivers (KMD) such as nvidia.ko and CUDA user-mode drivers (UMD) such as libcuda.so. Drivers that are used by GPU-accelerated instances of Function Compute are provided by NVIDA and deployed by Function Compute. The driver versions used by GPU-accelerated instances may change as a result of iterations, releases of new card models, bug fixes, and driver lifecycle expiration. We recommend that you do not specify a specific driver version in container images. For more information, see Image usage notes.
CUDA Toolkit versions: CUDA Toolkit includes various components, such as CUDA Runtime, cuDNN, and cuFFT. The CUDA Toolkit version is determined by the container image you use.

GPU drivers and CUDA Toolkit are released by NVIDIA and related to each other. For more information, see NVIDIA CUDA Toolkit Release Notes.

The current driver version of GPU-accelerated instances in Function Compute is 550.54.15, and the version of the CUDA module is 12.4. For best compatibility, we recommend that you use CUDA Toolkit 11.8 or later and do not use a CUDA UMD whose version is later than that of the platform.

What do I do if CUFFT_INTERNAL_ERROR is reported during a function execution?

The cuFFT library in CUDA 11.7 has forward compatibility issues. If you encounter the above error in card types newer than Ampere, we recommend that you upgrade CUDA to 11.8 or later. For more information about GPU card types, see Instance types and usage modes.

Take PyTorch as an example. After the upgrade, you can use the following code snippet for verification. If no error is reported, the upgrade is valid.

import torch
out = torch.fft.rfft(torch.randn(1000).cuda())

What do I do if a CUDA GPG error is reported when I build an image?

The following GPG error is reported during the image building process:

W: GPG error: https://developer.download.nvidia.cn/compute/cuda/repos/ubuntu2004/x86_64  InRelease: The following signatures couldn't be verified because the public key is not available: NO_PUBKEY A4B469963BF863CC
E: The repository 'https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64  InRelease' is not signed.

In this case, you can append the following script to the RUN rm command line of the Dockerfile file and rebuild the image.

RUN apt-key adv --keyserver keyserver.ubuntu.com --recv-keys A4B469963BF863CC

Why is the instance type of my GPU-accelerated instance g1?

The g1 instance type is the same as fc.gpu.tesla.1. For more information, see Instance specifications.

Why do my provisioned GPU-accelerated instances fail to be allocated?

The allocation of provisioned instances may fail due to the following reasons:

The startup of the provisioned instances times out.
- Error code: FunctionNotStarted.
- Error message: Function instance health check failed on port XXX in 120 seconds.
- Solution: View the application startup logic to check whether the logic for downloading models from the Internet and loading large models (larger than 10 GB) exists. We recommend that you start the web server before you run the model loading logic.
The maximum number of instances at the function level or region level is reached.
- Error code: ResourceThrottled.
- Error message: Reserve resource exceeded limit.
- Solution: If you want to use more physical GPU cards, join the DingTalk group 64970014484 for technical support.

What is the limit on the size of a GPU image?

The image size limit applies only to compressed images. You can view the size of a compressed image in the Container Registry console. You can run the docker images command to query the size of an uncompressed image.

In most cases, an uncompressed image that is smaller than 20 GB in size can be deployed to Function Compute and used as expected.

What do I do if a GPU image fails to be converted to an accelerated image?

The time required to convert an image increases as the size of your image grows. A timeout error may cause a conversion failure. You can re-trigger the acceleration conversion of the GPU image by configuring and saving the function configurations in the Function Compute console. You do not need to modify the parameters if you want to retain existing settings.

Should a model be integrated into or separated from an image?

If your model files are large, frequently iterated, or exceed the size limit on images when they are published with an image, we recommend that you separate the model from the image. If the model is small in size, for example, 100 MB, and not frequently changed, you can distribute the model file with the image. For more information about the limits on the size of a platform image, see What is the limit on the size of a GPU image?

To deploy a model separately from an image, you can store the model in a File Storage NAS (NAS) file system or an Object Storage Service (OSS) file system. The model is loaded from the mount target when the application starts. For more information, see Configure a NAS file system and Configure an OSS file system.

We recommend that you store models in a Performance NAS file system, which is compatible with Portable Operating System Interface (POSIX) and has a high initial read bandwidth that helps reduce the time required to load your model. For more information, see General-purpose NAS file systems.
You can also store models in OSS buckets. This way, you can use OSS accelerators to achieve lower latency and higher throughput. If you mount an OSS file system, workloads are processed in UMD and instance memory and temporary storage space are consumed. Therefore, we recommend that you use OSS on GPU-accelerated instances with large specifications.

How do I perform a model warm-up?

We recommend that you warm up your model by using the /initialize method. Production traffic is directed to the model only after the warm-up based on the /initialize method is complete. You can refer to the following topics to learn more:

What do I do if the "[FunctionNotStarted] Function Instance health check failed on port xxx in 120 seconds" error is reported when I start a GPU image?

Cause: The startups of AI/GPU applications are time-consuming. As a result, the health check of the applications in the Function Compute console fails. In most cases, the startups of the AI/GPU applications are time-consuming because it takes long time to load models. This causes the startup of the web server to time out.
Solution:
- Do not dynamically load the model over the Internet when the application starts. We recommend that you place the model in an image or in a NAS file system and load the model from the nearest path.
- Place model initialization in the /initialize method to preferentially start the application. Specifically, load the mode after the web server is started.
  Note
  For more information about the lifecycle of a function instance, see Configure instance lifecycles.

What do I do if the end-to-end latency of my function is large and fluctuates greatly?

Make sure that the state of image acceleration is Available in the environment information.
Check the type of the NAS file system. If your function needs to read data, such as a model, from a NAS file system, we recommend that you use a Performance NAS file system, instead of the Capacity type, to ensure the performance. For more information, see General-purpose NAS file systems.

What do I do if the system fails to find the NVIDIA driver?

This issue occurs when you run the docker run --gpus all command to specify a container and use the docker commit method to build an application image. As a result, the driver cannot be mounted and the NVIDIA driver cannot be found after the image is deployed to Function Compute. The system cannot find the NVIDIA driver.

To resolve the issue, we recommend that you use Dockerfile to build an application image. For more information, see Dockerfile.

Do not specify a specific driver version in a container image. For more information, see Image usage notes.

What do I do if "On-demand invocation of current GPU type is disabled..." is reported on Ada-series GPU-accelerated instances?

Generally, ResourceExhausted:On-demand invocation of current GPU type is disabled, please provision instances instead is reported because the actual number of requests exceeds the maximum number of requests of provisioned instances. GPU-accelerated instances of the Ada card type support only the provisioned mode. We recommend that you increase the number of provisioned instances based on the actual number of requests.

What are the usage notes for idle GPU instances?

CUDA version
We recommend that you use CUDA 12.2 or an earlier version.
Image permissions
We recommend that you run container images as the default root user.
Instance logon
You cannot log on to an idle GPU-accelerated instance because the GPU cards are frozen.
Model warmup and pre-inference
To ensure that the latency of the initial wake-up of an idle GPU-accelerated instance meets your businesses requirements, we recommend that you use the initialize hook in your business code to warm up or preload your model. For more information, see Model warm up.

What are the driver and CUDA versions of GPU-accelerated instances in Function Compute?

What do I do if CUFFT_INTERNAL_ERROR is reported during a function execution?

What do I do if a CUDA GPG error is reported when I build an image?

Why is the instance type of my GPU-accelerated instance g1?

Why do my provisioned GPU-accelerated instances fail to be allocated?

What is the limit on the size of a GPU image?

What do I do if a GPU image fails to be converted to an accelerated image?

Should a model be integrated into or separated from an image?

How do I perform a model warm-up?

What do I do if the "[FunctionNotStarted] Function Instance health check failed on port xxx in 120 seconds" error is reported when I start a GPU image?

What do I do if the end-to-end latency of my function is large and fluctuates greatly?

What do I do if the system fails to find the NVIDIA driver?

What do I do if "On-demand invocation of current GPU type is disabled..." is reported on Ada-series GPU-accelerated instances?

What are the usage notes for idle GPU instances?

Sales Support

Technical Support

Connect & Report Abuse

About Alibaba Cloud

Our Global Network

Quick Start

Global Offices

Olympic Games Paris 2024 New

Stade Roland Garros – Glitz from the Past New

Place de la Concorde – “Breaking” the Barriers New

Vaires-sur-Marne Nautical Stadium – Sports with Sustainability New

International Broadcast Center – Images, Sounds, and Data that Captivate Billions New

Customer Success Stories New

Trust Center

Security & Compliance Center

Cloud Compliance Resources

Security Compliance FAQs

Product & Feature Update New

Cloud Forward

Press Room

Alibaba Cloud e-Magazine New

Alibaba Cloud in Analyst Research

Notice

Go Global Service New

Go Global Alliance with Alibaba Cloud

China Gateway Hot

Information Compliance

China Gateway - MLPS 2.0 Compliance New

China Gateway - Networking

China Gateway - Global Application Acceleration New

China Gateway - Security

China Gateway - Data Security New

ICP Support Hot

China Gateway - Omnichannel Data Mid-End New

China Gateway - Organizational Data Mid-End New

China Gateway - Business Mid-End New

China Gateway - AI Service for Conversational Chatbots New

China Gateway - Online Education

China Gateway - Domain Registration

Work at Alibaba Cloud

Experienced Professionals

Students and Graduates

Free Trial

Pricing

Promo Center

Price Reduction

Pay Less and Deploy More

FinOps

Elastic Compute Service (ECS)

Simple Application Server (SAS)

Elastic GPU Service

Elastic Desktop Service (EDS)

Object Storage Service (OSS)

Cloud Enterprise Network (CEN)

Web Application Firewall (WAF)

Domain Names

Intelligent Media Services(IMS)

Edge Security Acceleration (ESA)(Original DCDN)

Intelligent Media Management

DingTalk Enterprise

YiDA

Alibaba Cloud Model Studio

ApsaraDB for SelectDB

CAPTCHA

Apsara Prime - For Easy Cloud Product Selection

Alibaba Cloud ECS - Cater All Your Cloud Hosting Needs

1TB CDN—Get Free 1 TB Outbound Traffic Plan Now

Security—Under Attack? Get Free Security Support

Short Message Service - Free Testing is Available

Elastic Compute Service (ECS) Hot

CloudBox

Compute Nest

Dedicated Host Hot

ECS Bare Metal Instance

Elastic Desktop Service (EDS) Featured

Cloud Phone Beta

Elastic GPU Service Featured

Simple Application Server (SAS) Hot

Auto Scaling

Batch Compute

Elastic High Performance Computing (E-HPC)

Super Computing Cluster (SCC)

Function Compute (FC)