Automatically install or load the Tesla driver when you create a GPU-accelerated instance

In general-purpose computing and graphics acceleration scenarios, GPU-accelerated instances can provide enhanced computing and graphics rendering capabilities after you install the NVIDIA Tesla driver on the instances. You can configure parameters to automatically install or load the Tesla driver when you create a GPU-accelerated instance. You can also manually install the Tesla driver after a GPU-accelerated instance is created. This topic describes how to automatically install or load the Tesla driver when you create a GPU-accelerated instance.

Driver installation methods

The following table describes the methods that can be used to automatically install or load the Tesla driver. You can choose a method based on the performance requirements in general-purpose computing and graphics acceleration scenarios.

Method	Description	References

Method	Description	References
Public image	When you create a GPU-accelerated instance, select a public image and Auto-install GPU Driver.	Automatically install the driver by using a public image
Automatic installation script	When you create a GPU-accelerated instance, do not select Auto-install GPU Driver in the Image section. Instead, enter an automatic installation script in the field in the User Data part to install the Tesla driver.	Install the driver by using an automatic installation script

Automatically install the driver by using a public image

You can select Auto-install GPU Driver only for specific Linux public images. If you use a public image and select Auto-install GPU Driver, the system automatically installs the Tesla driver when you create a GPU-accelerated instance.

Go to the instance buy page in the Elastic Compute Service (ECS) console.
Click the Custom Launch tab.

Configure parameters for the instance based on your business requirements. The parameters include Billing Method, Region, Network and Zone, Instance Type, and Image.

This section describes how to configure the Instance Type and Image parameters. For more information about other parameters, see Parameter settings. The following table lists the instance families of GPU-accelerated instances for which you can install the Tesla driver when you create the instances, the supported image versions, and the corresponding driver versions.

Note

The Tesla driver is used to drive physical GPUs and can be used together with the CUDA and cuDNN libraries to improve GPU utilization. The CUDA and cuDNN libraries are installed together with the Tesla driver. To keep your system up-to-date, we recommend that you use the latest versions of the Tesla driver, CUDA library, and cuDNN library.

Instance family	Public image version	Tesla driver version	CUDA library version	cuDNN library version

Instance family	Public image version	Tesla driver version	CUDA library version	cuDNN library version
gn7e, gn7s, gn7i, gn6v, gn6i, gn6e, gn5, and gn5i ebmgn7e, ebmgn7i, ebmgn6v, ebmgn6i, ebmgn6e, and ebmgn5i gn8is, ebmgn8is, gn8v, and ebmgn8v	Alibaba Cloud Linux 2 and Alibaba Cloud Linux 3 Ubuntu 22.04, Ubuntu 20.04, and Ubuntu 18.04 CentOS 8.x and CentOS 7.x Note ebmgn8v and ebmgn7e do not support Ubuntu 18.04 images.	550.127.08	12.4.1	9.2.0.82
gn7e, gn7s, gn7i, gn6v, gn6i, gn6e, gn5, and gn5i ebmgn7e, ebmgn7i, ebmgn6v, ebmgn6i, ebmgn6e, and ebmgn5i gn8is and ebmgn8is	Alibaba Cloud Linux 2 and Alibaba Cloud Linux 3 Ubuntu 20.04 and Ubuntu 18.04 CentOS 8.x and CentOS 7.x Note ebmgn7e does not support Ubuntu 18.04 images.	535.216.03	12.1.1	8.9.7.29
gn7i, gn7e, gn7s, gn6v, gn6i, gn6e, gn5, and gn5i ebmgn7, ebmgn7i, ebmgn7e, ebmgn6v, ebmgn6i, ebmgn6e, and ebmgn5i	Alibaba Cloud Linux 2 and Alibaba Cloud Linux 3 Ubuntu 20.04, Ubuntu 18.04, and Ubuntu 16.04 CentOS 8.x and CentOS 7.x Debian 10.10 Note ebmgn7e does not support Ubuntu 18.04 and Ubuntu 20.04 images.	470.256.02	11.4.1	8.2.4
gn7, gn7i, gn7e, gn6v, gn6i, gn6e, gn5, and gn5i ebmgn7, ebmgn7i, ebmgn7e, ebmgn6v, ebmgn6i, ebmgn6e, and ebmgn5i	Alibaba Cloud Linux 2 Ubuntu 20.04, Ubuntu 18.04, and Ubuntu 16.04 CentOS 8.x and CentOS 7.x	460.91.03	11.2.2	8.1.1
gn7, gn7e, gn6v, gn6i, gn6e, gn5, and gn5i ebmgn7, ebmgn7e, ebmgn6v, ebmgn6i, ebmgn6e, and ebmgn5i	Alibaba Cloud Linux 2 Ubuntu 20.04, Ubuntu 18.04, and Ubuntu 16.04 CentOS 8.x and CentOS 7.x	460.91.03	11.0.2	8.1.1 8.0.4
gn6v, gn6i, gn6e, gn5, and gn5i ebmgn6v, ebmgn6i, ebmgn6e, and ebmgn5i	Alibaba Cloud Linux 2 Ubuntu 18.04 and Ubuntu 16.04 CentOS 8.x and CentOS 7.x	460.91.03	10.2.89	8.1.1 8.0.4 7.6.5
gn6v, gn6i, gn6e, gn5, and gn5i ebmgn6v, ebmgn6i, ebmgn6e, and ebmgn5i	Ubuntu 18.04 and Ubuntu 16.04 CentOS 7.x	450.80.02 440.64.00	10.1.168	8.0.4 7.6.5 7.5.0
gn6v, gn6i, gn6e, gn5, and gn5i ebmgn6v, ebmgn6i, ebmgn6e, and ebmgn5i	Ubuntu 18.04 and Ubuntu 16.04 CentOS 7.x	450.80.02 440.64.00	10.0.130	7.6.5 7.5.0 7.4.2 7.3.1

Important

To change the operating system of an instance after the instance is created, you must use a public image that supports automatic installation of the Tesla driver. If you use a public image that does not support automatic installation of the Tesla driver, you must disable automatic installation of the Tesla driver for the instance before you change the operating system. For more information, see How do I disable the automatic installation feature of the Tesla driver when I replace the operating system of a GPU-accelerated instance?
If you installed PyTorch 2.1.2 by using pip3 install torch, you must install CUDA 12.1. Otherwise, an error is thrown when you use PyTorch. For more information, see What do I do if the "undefined symbol: __nvJitLinkAddData_12_1, version libnvJitLink.so.12" error message appears when I use PyTorch?

In this example, a gn7i instance is used. On the Public Images tab of the Image section, select a Linux distribution and version, such as Alibaba Cloud Linux 3.2104 LTS 64-bit. Then, select Auto-install GPU Driver, a CUDA library version, a driver version, and a cuDNN library version. This way, the system automatically installs the Tesla driver when you create the GPU-accelerated instance.

After the instance is created or started, take note of the following information about the Tesla driver:

The system requires approximately 10 to 20 minutes to automatically install the Tesla driver. The duration varies based on the private bandwidth and the number of vCPUs supported by the instance type. To view the installation process, you can connect to the instance. You can also check the installation log in /root/auto_install/auto_install.log after the installation is complete. The following table describes the information displayed during the installation process.

Installation state	Displayed information

Installation state	Displayed information
Installing	The installation progress bar appears.
Installed	The installation result ALL INSTALL OK appears.
Installation failed	The installation result INSTALL FAIL appears.

Important

Do not perform operations on the instance during the installation process. This is because the GPU becomes unavailable during the installation process. If specific GPU-related software fails to be automatically installed, the instance may become unavailable.

Follow the on-screen instructions to complete the payment.

Install the driver by using an automatic installation script

If you do not select Auto-install GPU Driver in the Image section when you create a GPU-accelerated instance, you can enter an automatic installation script in the field in the User Data part to install the Tesla driver.

Parameters in an automatic installation script

If you use an automatic installation script, you must modify the following parameters based on your business requirements.

Change the versions of the Tesla driver, CUDA library, and cuDNN library based on the instance family and image that you use. For more information about the supported versions, see the "table" provided in the Automatically install the driver by using a public image section.

In this example, the Tesla driver version is changed to 550.127.08, the CUDA library version is changed to 12.4.1, and the cuDNN library version is changed to 9.2.0.82. Sample code:

DRIVER_VERSION="550.127.08"
CUDA_VERSION="12.4.1"
CUDNN_VERSION="9.2.0.82"

Procedure

Go to the instance buy page in the ECS console.
Click the Custom Launch tab.
Configure parameters for the instance based on your business requirements. The parameters include Billing Method, Region, Network and Zone, Instance Type, Image, and User Data.
For more information about the parameters, see Parameter settings.

In the field in the User Data part of the Advanced Settings(Optional) section, enter the automatic installation script that you prepared.

You can prepare an automatic installation script. For more information, see Parameters in an automatic installation script.

In this example, the script uses the .run installation package to install modules, such as the Tesla driver. Sample script:

#!/bin/sh

#Please input version to install
DRIVER_VERSION="550.127.08"
CUDA_VERSION="12.4.1"
CUDNN_VERSION="9.2.0.82"
IS_INSTALL_eRDMA="FALSE"
IS_INSTALL_RDMA="FALSE"
INSTALL_DIR="/root/auto_install"

#using .run to install driver and cuda
auto_install_script="auto_install_v4.0.sh"

script_download_url=$(curl http://100.100.100.200/latest/meta-data/source-address | head -1)"/opsx/ecs/linux/binary/script/${auto_install_script}"
echo $script_download_url

rm -rf $INSTALL_DIR
mkdir -p $INSTALL_DIR
cd $INSTALL_DIR && wget -t 10 --timeout=10 $script_download_url && bash ${INSTALL_DIR}/${auto_install_script} $DRIVER_VERSION $CUDA_VERSION $CUDNN_VERSION $IS_INSTALL_RDMA $IS_INSTALL_eRDMA

Follow the on-screen instructions to complete the payment.
Note
- If you call the RunInstances operation to create a GPU-accelerated instance, you can install the Tesla driver only by using the UserData parameter to upload the automatic installation script. For more information, see RunInstances.
- If the system does not automatically install the Tesla driver when you create a GPU-accelerated instance, you can run an automatic installation script after the instance is created to install software, such as the Tesla driver. To install software, you must log on to the instance by using SSH, create a file on the instance, copy your automatic installation script to the instance, and then run the script as a shell script. For more information about how to connect to an instance, see Methods for connecting to an ECS instance.

References

If the system does not automatically install or load the Tesla driver when you create a GPU-accelerated compute-optimized instance in general-purpose computing and graphics acceleration scenarios, you must install the driver after the instance is created. For more information, see the following topics:

Driver installation methods

Automatically install the driver by using a public image

Install the driver by using an automatic installation script

Parameters in an automatic installation script

Procedure

References

Sales Support

Technical Support

Connect & Report Abuse

About Alibaba Cloud

Our Global Network

Quick Start

Global Offices

Olympic Games Paris 2024 New

Stade Roland Garros – Glitz from the Past New

Place de la Concorde – “Breaking” the Barriers New

Vaires-sur-Marne Nautical Stadium – Sports with Sustainability New

International Broadcast Center – Images, Sounds, and Data that Captivate Billions New

Customer Success Stories New

Trust Center

Security & Compliance Center

Cloud Compliance Resources

Security Compliance FAQs

Product & Feature Update New

Cloud Forward

Press Room

Alibaba Cloud e-Magazine New

Alibaba Cloud in Analyst Research

Notice

Go Global Service New

Go Global Alliance with Alibaba Cloud

Asia Accelerator Hot

Information Compliance

China Gateway - MLPS 2.0 Compliance New

China Gateway - Networking

China Gateway - Global Application Acceleration New

China Gateway - Security

China Gateway - Data Security New

ICP Support Hot

China Gateway - Omnichannel Data Mid-End New

China Gateway - Organizational Data Mid-End New

China Gateway - Business Mid-End New

China Gateway - AI Service for Conversational Chatbots New

China Gateway - Online Education

China Gateway - Domain Registration

Work at Alibaba Cloud

Experienced Professionals

Students and Graduates

Free Trial

Pricing

Promo Center

Price Reduction

Pay Less and Deploy More

FinOps

Elastic Compute Service (ECS)

Simple Application Server (SAS)

Elastic GPU Service

Elastic Desktop Service (EDS)

Object Storage Service (OSS)

Cloud Enterprise Network (CEN)

Web Application Firewall (WAF)

Domain Names

Lingma

Container Compute Service (ACS)

Secure Access Service Edge (SASE)

Intelligent Media Services(IMS)

Edge Security Acceleration (ESA)(Original DCDN)

Intelligent Media Management

DingTalk Enterprise

YiDA

Alibaba Cloud Model Studio

Apsara Prime - For Easy Cloud Product Selection

Alibaba Cloud ECS - Cater All Your Cloud Hosting Needs

1TB CDN—Get Free 1 TB Outbound Traffic Plan Now

Security—Under Attack? Get Free Security Support

Short Message Service - Free Testing is Available

Elastic Compute Service (ECS) Hot

CloudBox

Compute Nest

Dedicated Host Hot

ECS Bare Metal Instance

Elastic GPU Service Featured

Simple Application Server (SAS) Hot

Auto Scaling

Cloud Phone Beta

Elastic Desktop Service (EDS) Featured

Batch Compute

Elastic High Performance Computing (E-HPC)

Super Computing Cluster (SCC)