In general-purpose computing and graphics acceleration scenarios, GPU-accelerated instances can provide enhanced computing and graphics rendering capabilities after you install the NVIDIA Tesla driver on the instances. You can configure parameters to automatically install or load the Tesla driver when you create a GPU-accelerated instance. You can also manually install the Tesla driver after a GPU-accelerated instance is created. This topic describes how to automatically install or load the Tesla driver when you create a GPU-accelerated instance.
Driver installation methods
The following table describes the methods that can be used to automatically install or load the Tesla driver. You can choose a method based on the performance requirements in general-purpose computing and graphics acceleration scenarios.
Method | Description | References |
Public image | When you create a GPU-accelerated instance, select a public image and select Auto-install GPU Driver. | |
Automatic installation script | When you create a GPU-accelerated instance, do not select Auto-install GPU Driver in the Image section. Instead, enter an automatic installation script in the field in the User Data part to install the Tesla driver. | Install the driver by using an automatic installation script |
Automatically install the driver by using a public image
You can select Auto-install GPU Driver only for specific Linux public images. If you use a public image and select Auto-install GPU Driver, the system automatically installs the Tesla driver when you create the GPU-accelerated instance.
Go to the instance buy page in the Elastic Compute Service (ECS) console.
Click the Custom Launch tab.
Configure parameters for the instance based on your business requirements. The parameters include Billing Method, Region, Network and Zone, Instance Type, and Image.
This section describes how to configure the Instance Type and Image parameters. For more information about other parameters, see Parameter settings. The following table lists the instance families of GPU-accelerated instances for which you can install the Tesla driver when you create the instances, the supported image versions, and the corresponding driver versions.
NoteThe Tesla driver is used to drive physical GPUs and can be used together with the CUDA and cuDNN libraries to improve GPU utilization. The CUDA and cuDNN libraries are installed together with the Tesla driver. To keep your system up-to-date, we recommend that you use the latest versions of the Tesla driver, CUDA library, and cuDNN library.
Instance family
Public image version
Tesla driver version
CUDA library version
cuDNN library version
gn7e, gn7s, gn7i, gn6v, gn6i, gn6e, gn5, and gn5i
ebmgn7e, ebmgn7i, ebmgn6v, ebmgn6i, ebmgn6e, and ebmgn5i
Alibaba Cloud Linux 2 and Alibaba Cloud Linux 3
Ubuntu 22.04, Ubuntu 20.04, and Ubuntu 18.04
CentOS 8.x and CentOS 7.x
Noteebmgn7e does not support images of Ubuntu 18.04.
550.90.07
12.4.1
9.2.0.82
gn7e, gn7s, gn7i, gn6v, gn6i, gn6e, gn5, and gn5i
ebmgn7e, ebmgn7i, ebmgn6v, ebmgn6i, ebmgn6e, and ebmgn5i
Alibaba Cloud Linux 2 and Alibaba Cloud Linux 3
Ubuntu 20.04 and Ubuntu 18.04
CentOS 8.x and CentOS 7.x
Noteebmgn7e does not support images of Ubuntu 18.04.
535.154.05
12.1.1
8.9.7.29
gn7e, gn7s, gn7i, gn6v, gn6i, gn6e, gn5, and gn5i
ebmgn7, ebmgn7i, ebmgn7e, ebmgn6v, ebmgn6i, ebmgn6e, and ebmgn5i
Alibaba Cloud Linux 2 and Alibaba Cloud Linux 3
Ubuntu 20.04 and Ubuntu 18.04
CentOS 8.x and CentOS 7.x
525.105.17
12.0.1
8.9.1.23
gn7i, gn7e, gn7s, gn6v, gn6i, gn6e, gn5, and gn5i
ebmgn7, ebmgn7i, ebmgn7e, ebmgn6v, ebmgn6i, ebmgn6e, and ebmgn5i
Alibaba Cloud Linux 2 and Alibaba Cloud Linux 3
Ubuntu 20.04, Ubuntu 18.04, and Ubuntu 16.04
CentOS 8.x and CentOS 7.x
Debian 10.10
470.161.03
11.4.1
8.2.4
gn7, gn7i, gn7e, gn6v, gn6i, gn6e, gn5, and gn5i
ebmgn7, ebmgn7i, ebmgn7e, ebmgn6v, ebmgn6i, ebmgn6e, and ebmgn5i
Alibaba Cloud Linux 2
Ubuntu 20.04, Ubuntu 18.04, and Ubuntu 16.04
CentOS 8.x and CentOS 7.x
460.91.03
11.2.2
8.1.1
gn7, gn7e, gn6v, gn6i, gn6e, gn5, and gn5i
ebmgn7, ebmgn7e, ebmgn6v, ebmgn6i, ebmgn6e, and ebmgn5i
Alibaba Cloud Linux 2
Ubuntu 20.04, Ubuntu 18.04, and Ubuntu 16.04
CentOS 8.x and CentOS 7.x
460.91.03
11.0.2
8.1.1
8.0.4
gn6v, gn6i, gn6e, gn5, and gn5i
ebmgn6v, ebmgn6i, ebmgn6e, and ebmgn5i
Alibaba Cloud Linux 2
Ubuntu 18.04 and Ubuntu 16.04
CentOS 8.x and CentOS 7.x
460.91.03
10.2.89
8.1.1
8.0.4
7.6.5
gn6v, gn6i, gn6e, gn5, and gn5i
ebmgn6v, ebmgn6i, ebmgn6e, and ebmgn5i
Ubuntu 18.04 and Ubuntu 16.04
CentOS 7.x
450.80.02
440.64.00
10.1.168
8.0.4
7.6.5
7.5.0
gn6v, gn6i, gn6e, gn5, and gn5i
ebmgn6v, ebmgn6i, ebmgn6e, and ebmgn5i
Ubuntu 18.04 and Ubuntu 16.04
CentOS 7.x
450.80.02
440.64.00
10.0.130
7.6.5
7.5.0
7.4.2
7.3.1
ImportantTo change the OS of an instance after the instance is created, you must use a public image that supports automatic installation of the Tesla driver. If you use a public image that does not support automatic installation of the Tesla driver, you must disable automatic installation of the Tesla driver for the instance before you change the OS. For more information, see How do I disable the automatic installation feature of the Tesla driver when I replace the operating system of a GPU-accelerated instance?
If you installed PyTorch 2.1.2 by using
pip3 install torch
, you must install CUDA 12.1. Otherwise, an error is thrown when you use PyTorch. For more information, see What do I do if the "undefined symbol: __nvJitLinkAddData_12_1, version libnvJitLink.so.12" error message appears when I use PyTorch?
In this example, a gn7i instance is used. On the Public Images tab of the Image section, select a Linux distribution and version, such as Alibaba Cloud Linux 3.2104 LTS 64-bit. Then, select Auto-install GPU Driver, and select a CUDA library version, driver version, and cuDNN library version. This way, the system automatically installs the Tesla driver when you create the GPU-accelerated instance.
After the instance is created or started, take note of the following information about the Tesla driver:
The system requires approximately 10 to 20 minutes to automatically install the Tesla driver. The duration varies based on the private bandwidth and the number of vCPUs supported by the instance type. To view the installation process, you can connect to the instance. You can also check the installation log in /root/auto_install/auto_install.log after the installation is complete. The following table describes the information displayed during the installation process.
Installation state
Displayed information
Installing
The installation progress bar appears.
Installed
The installation result ALL INSTALL OK appears.
Installation failed
The installation result INSTALL FAIL appears.
ImportantDo not perform operations on the instance during the installation process. This is because the GPU becomes unavailable during the installation process. If specific GPU-related software fails to be automatically installed, the instance may become unavailable.
Follow the on-screen instructions to complete the payment.
Install the driver by using an automatic installation script
If you do not select Auto-install GPU Driver in the Image section when you create a GPU-accelerated instance, you can enter an automatic installation script in the field in the User Data part to install the Tesla driver.
Parameters in an automatic installation script
If you use an automatic installation script, you must modify the following parameters based on your business requirements.
Change the versions of the Tesla driver, CUDA library, and cuDNN library based on the instance family and image that you use. For more information about the supported versions, see the "table" provided in the Automatically install the driver by using a public image section.
In this example, the Tesla driver version is changed to 470.161.03, the CUDA library version is changed to 11.4.1, and the cuDNN library version is changed to 8.2.4. Sample code:
DRIVER_VERSION="470.161.03"
CUDA_VERSION="11.4.1"
CUDNN_VERSION="8.2.4"
Procedure
Go to the instance buy page in the ECS console.
Click the Custom Launch tab.
Configure parameters for the instance based on your business requirements. The parameters include Billing Method, Region, Network and Zone, Instance Type, Image, and User Data.
For more information about the parameters, see Parameter settings.
In the field in the User Data part of the Advanced Settings(Optional) section, enter the automatic installation script that you prepared.
You can prepare an automatic installation script. For more information, see Parameters in an automatic installation script.
In this example, the script uses the
.run
installation package to install modules, such as the Tesla driver. Sample script:#!/bin/sh #Please input version to install DRIVER_VERSION="550.90.07" CUDA_VERSION="12.4.1" CUDNN_VERSION="9.2.0.82" IS_INSTALL_eRDMA="FALSE" IS_INSTALL_RDMA="FALSE" INSTALL_DIR="/root/auto_install" #using .run to install driver and cuda auto_install_script="auto_install_v4.0.sh" script_download_url=$(curl http://100.100.100.200/latest/meta-data/source-address | head -1)"/opsx/ecs/linux/binary/script/${auto_install_script}" echo $script_download_url rm -rf $INSTALL_DIR mkdir -p $INSTALL_DIR cd $INSTALL_DIR && wget -t 10 --timeout=10 $script_download_url && bash ${INSTALL_DIR}/${auto_install_script} $DRIVER_VERSION $CUDA_VERSION $CUDNN_VERSION $IS_INSTALL_RDMA $IS_INSTALL_eRDMA
Follow the on-screen instructions to complete the payment.
NoteIf you call the RunInstances operation to create a GPU-accelerated instance, you can install the Tesla driver only by using the UserData parameter to upload the automatic installation script. For more information, see RunInstances.
If the system does not automatically install the Tesla driver when you create a GPU-accelerated instance, you can run an automatic installation script after the instance is created to install software, such as the Tesla driver. To install software, you must log on to the instance by using SSH, create a file on the instance, copy your automatic installation script to the instance, and then run the script as a shell script. For more information about how to connect to an instance, see ECS instance connection method overview.
References
If the system does not automatically install or load the Tesla driver when you create a GPU-accelerated compute-optimized instance in general-purpose computing and graphics acceleration scenarios, you must install the driver after the instance is created. For more information, see the following topics: