Manually install the Tesla driver on Linux - Elastic GPU Service

GPU-accelerated instances on which the NVIDIA Tesla driver is installed can deliver high-performance computing capabilities or provide smoother graphics display effects in specific scenarios. The scenarios include general-purpose computing scenarios such as deep learning and AI scenarios, and graphics acceleration scenarios such as Open Graphics Library (OpenGL), Direct3D, and cloud gaming scenarios. If you do not install the Tesla driver when you create a GPU-accelerated compute-optimized Linux instance, you must manually install the Tesla driver after you create the instance. This topic describes how to manually install the Tesla driver on a GPU-accelerated compute-optimized Linux instance.

Note

If a GPU-accelerated compute-optimized instance runs Alibaba Cloud Linux 3 and the Tesla driver is not automatically installed when you create the instance, you can use YUM to install the driver. For more information, see Use YUM to quickly install the NVIDIA Tesla driver on a GPU-accelerated compute-optimized Alibaba Cloud Linux 3 instance.

Procedure

This procedure applies to all GPU-accelerated compute-optimized Linux instances. For more information, see gn, ebm, and scc series of GPU-accelerated compute-optimized instance families. You can install only the Tesla driver that runs the same OS as the instances. For example, you can install only the Linux Tesla driver on a GPU-accelerated compute-optimized Linux instance.

Step 1: Download the Tesla driver

Visit the NVIDIA driver download page.
Note
For more information about how to install and configure an NVIDIA driver, see NVIDIA CUDA Installation Guide for Linux.

Configure search conditions and click Find to search for a driver that is suitable for your instance.

Tesla驱动.jpg

The following table describes the search conditions.

Condition	Description	Example
Product type Product series Product family	Select the product type, product series, and product family based on the GPU of the GPU-accelerated instance. Note For more information about how to view the details of a GPU-accelerated instance, such as the instance ID, instance type, and OS, see View instance information.	Data Center / Tesla A-Series NVIDIA A10
OS	Select a Linux version based on the image of the instance.	Linux 64-bit
CUDA Toolkit version	Select a CUDA Toolkit version.	11.4
Language	Select a language for the driver.	Chinese (Simplified)

GPUs, supported driver versions, and CUDA Toolkit versions of specific GPU-accelerated compute-optimized instance families

Instance family	gn8v	gn8is	gn7e	gn7i	gn7	gn6e	gn6i	gn6v	gn5i	gn5
Product type	Data Center / Tesla	Data Center / Tesla	Data Center / Tesla	Data Center / Tesla	Data Center / Tesla	Data Center / Tesla	Data Center / Tesla	Data Center / Tesla	Data Center / Tesla	Data Center / Tesla
Product series	H-Series	L-Series	A-Series	A-Series	A-Series	V-Series	T-Series	V-Series	P-Series	P-Series
Recommended Tesla driver version	Version 550.90.07 or later		Version 450.80.02 or later	Version 460.73.01 or later	Version 450.80.02 or later	Version 410.79 or later
Recommended CUDA Toolkit version	CUDA Toolkit 12.4 Update 1		CUDA Toolkit 11.0 Update 1	CUDA Toolkit 11.2	CUDA Toolkit 11.0 Update 1	CUDA Toolkit 10.1 Update 2

Note

The preceding table describes only the GPU information about specific popular GPU-accelerated compute-optimized instance families. Instances that use the same GPU have the same GPU information, such as the same product type, product series, and product family. For example, instances of the ebmgn7i and gn7i instance families use NVIDIA A10 GPUs. Therefore, the product type, product series, and product family of the instances are the same.
When you manually install the Tesla driver and CUDA Toolkit, you must make sure that the driver version is compatible with the CUDA Toolkit version. For more information, see CUDA Compatibility.

On the result page, click View More Versions.
Find the driver that you want to download and click View next to the driver name.
In this example, the Data Center Driver for Linux x64 driver whose driver version is 470.161.03 and CUDA Toolkit version is 11.4 is selected.
On the details page of the driver that you want to download, right-click Download and select Copy URL.
Connect to the GPU-accelerated compute-optimized Linux instance.
For more information, see Use Workbench to connect to a Linux instance over SSH.
Run the following command to download the driver installation package:
Replace the URL in the sample code with the URL that you obtained in Substep 5.
```
wget https://us.download.nvidia.com/tesla/470.161.03/NVIDIA-Linux-x86_64-470.161.03.run
```

Step 2: Install the Tesla driver

The method for installing the Tesla driver on an instance varies based on the OS of the instance. The following section describes how to install the Tesla driver on different OSs.

CentOS

Run the following command to check whether the kernel-devel and kernel-headers packages are installed on the GPU-accelerated instance:
```
sudo rpm  -qa | grep $(uname -r)
```
- If the command output includes the version information about the kernel-devel and kernel-headers packages, the packages are installed. Sample command output:
```
kernel-3.10.0-1062.18.1.el7.x86_64
kernel-devel-3.10.0-1062.18.1.el7.x86_64
kernel-headers-3.10.0-1062.18.1.el7.x86_64
```
- If the command output does not include the version information about the kernel-devel (kernel-devel-*) and kernel-headers (kernel-headers-*) packages, you must download and install the packages of the required version. For more information, see kernel-devel and kernel-headers.
  Important
  If the kernel-devel version is different from the kernel version, a compilation error occurs when you install RPM Package Manager (RPM) for your driver. Therefore, check the kernel version in the command output before you download the kernel-devel version. In the preceding command output, the kernel version is 3.10.0-1062.18.1.el7.x86_64.
Grant the permissions on the installation package to your Tesla driver and install the driver.
In this example, the Linux 64-bit Tesla driver is used. We recommend that you use a .run installation package for your Tesla driver, such as the NVIDIA-Linux-x86_64-xxxx.run package. Run the following commands to grant the execution permissions on the installation package and install the Tesla driver:
Note
If the installation package of your Tesla driver is in another format, such as the .deb or .rpm format, refer to NVIDIA CUDA Installation Guide for Linux for the installation method.
```
sudo chmod +x NVIDIA-Linux-x86_64-xxxx.run
```
```
sudo sh NVIDIA-Linux-x86_64-xxxx.run
```
Run the following command to check whether the Tesla driver is installed:
```
nvidia-smi
```
If the following command output is displayed, the Tesla driver is installed.
(Optional) Enable Persistence Mode by using the NVIDIA Persistence Daemon.
After the Tesla driver is installed, Persistence Mode is in the disabled (Off) state by default. The Tesla driver can achieve more stable performance when Persistence Mode is enabled. To ensure business continuity, we recommend that you enable Persistence Mode by using the NVIDIA Persistence Daemon. For more information, see Persistence Daemon on the official NVIDIA website.
Note
- Persistence Mode (Persistence-M) is a term for a user-settable driver property that keeps a GPU in the initialized state.
- If you enable Persistence Mode by running the nvidia-smi -pm 1 command, the setting may become invalid after the instance is restarted. For more information about how to fix the error, see What do I do if Persistence Mode that I enabled does not take effect and the ECC status or the MIG feature fails to be configured after a GPU-accelerated instance is restarted? We recommend that you use the NVIDIA Persistence Daemon to enable Persistence Mode.
1. Run the following command to run the NVIDIA Persistence Daemon:
```
sudo nvidia-persistenced --user username 
# Replace username with your username.
```
2. Run the following command to view the status of Persistence Mode:
```
nvidia-smi
```
  If the following command output is displayed, Persistence Mode is in the enabled (On) state.
(Optional) Enable Persistence Mode after you restart the system.
If you restart the system, the enabled (On) state of Persistence Mode becomes invalid. You can perform the following operations to enable Persistence Mode:
Install the installation scripts provided by NVIDIA, such as the sample script and the installer script, to the /usr/share/doc/NVIDIA_GLX-1.0/samples/nvidia-persistenced-init.tar.bz2 path by installing the Tesla driver installation package.
1. Run the following commands to decompress and install the installation script provided by NVIDIA:
```
cd  /usr/share/doc/NVIDIA_GLX-1.0/samples/
sudo tar xf nvidia-persistenced-init.tar.bz2
cd  nvidia-persistenced-init
sudo sh install.sh
```
2. Run the following command to check whether the NVIDIA Persistence Daemon runs as expected:
```
sudo systemctl status nvidia-persistenced
```
  If the following command output is displayed, the NVIDIA Persistence Daemon runs as expected.
  Note
  You can adapt the NVIDIA Persistence Daemon installation script based on your OS to ensure that the NVIDIA Persistence Daemon works as expected.
3. Run the following command to verify that Persistence Mode is in the enabled (On) state:
```
nvidia-smi
```
4. (Optional) Run the following commands to disable the NVIDIA Persistence Daemon.
  You can disable the NVIDIA Persistence Daemon based on your business requirements.
```
sudo systemctl stop nvidia-persistenced
sudo systemctl disable nvidia-persistenced
```
(Conditionally required) Install nvidia-fabricmanager that matches the driver version. This operation is required when your GPU-accelerated instance belongs to the ebmgn8v, ebmgn7, or ebmgn7e instance family.
Important
- If the GPU-accelerated instance belongs to the ebmgn8v, ebmgn7, or ebmgn7e instance family, you must install nvidia-fabricmanager that matches the driver version. Otherwise, you cannot use the instance as expected.
- You can skip this operation if the GPU-accelerated instance does not belong to the ebmgn8v, ebmgn7, or ebmgn7e instance family.
1. Install nvidia-fabricmanager.
  You can install nvidia-fabricmanager by using the source code or the installation package. The commands that are required to install nvidia-fabricmanager vary based on the OS. In the following examples, the driver version is 460.91.03, and CentOS 7.x and CentOS 8.x are used. Replace driver_version with the version of the driver that you downloaded in Step 1: Download the Tesla driver.
  - Source code
    - CentOS 7.x
      driver_version=460.91.03 sudo yum -y install yum-utils sudo yum-config-manager --add-repo https://developer.download.nvidia.com/compute/cuda/repos/rhel7/x86_64/cuda-rhel7.repo sudo yum install -y nvidia-fabric-manager-${driver_version}-1
    - CentOS 8.x
      driver_version=460.91.03 driver_version_main=$(echo $driver_version | awk -F '.' '{print $1}') distribution=rhel8 ARCH=$( /bin/arch ) sudo dnf config-manager --add-repo https://developer.download.nvidia.com/compute/cuda/repos/$distribution/${ARCH}/cuda-$distribution.repo sudo dnf module enable -y nvidia-driver:${driver_version_main} sudo dnf install -y nvidia-fabric-manager-0:${driver_version}-1
  - Installation package
    - CentOS 7.x
      driver_version=460.91.03 sudo wget https://developer.download.nvidia.com/compute/cuda/repos/rhel7/x86_64/nvidia-fabric-manager-${driver_version}-1.x86_64.rpm sudo rpm -ivh nvidia-fabric-manager-${driver_version}-1.x86_64.rpm
    - CentOS 8.x
      driver_version=460.91.03 sudo wget https://developer.download.nvidia.com/compute/cuda/repos/rhel8/x86_64/nvidia-fabric-manager-${driver_version}-1.x86_64.rpm sudo rpm -ivh nvidia-fabric-manager-${driver_version}-1.x86_64.rpm
2. Run the following commands to start nvidia-fabricmanager:
```
sudo systemctl enable nvidia-fabricmanager
sudo systemctl start nvidia-fabricmanager
```
3. Run the following command to check whether nvidia-fabricmanager is installed:
```
systemctl status nvidia-fabricmanager
```
  If the following command output is displayed, nvidia-fabricmanager is installed.

Other Linux distributions such as Ubuntu

Grant the permissions on the installation package to your Tesla driver and install the driver.
In this example, the Linux 64-bit Tesla driver is used. We recommend that you use a .run installation package for your Tesla driver, such as the NVIDIA-Linux-x86_64-xxxx.run package. Run the following commands to grant the execution permissions on the installation package and install the Tesla driver:
Note
If the installation package of your Tesla driver is in another format, such as the .deb or .rpm format, refer to NVIDIA CUDA Installation Guide for Linux for the installation method.
```
sudo chmod +x NVIDIA-Linux-x86_64-xxxx.run
```
```
sudo sh NVIDIA-Linux-x86_64-xxxx.run
```
Run the following command to check whether the Tesla driver is installed:
```
nvidia-smi
```
If the following command output is displayed, the Tesla driver is installed.
(Optional) Enable Persistence Mode by using the NVIDIA Persistence Daemon.
After the Tesla driver is installed, Persistence Mode is in the disabled (Off) state by default. The Tesla driver can achieve more stable performance when Persistence Mode is enabled. To ensure business continuity, we recommend that you enable Persistence Mode by using the NVIDIA Persistence Daemon. For more information, see Persistence Daemon on the official NVIDIA website.
Note
- Persistence Mode (Persistence-M) is a term for a user-settable driver property that keeps a GPU in the initialized state.
- If you enable Persistence Mode by running the nvidia-smi -pm 1 command, the setting may become invalid after the instance is restarted. For more information about how to fix the error, see What do I do if Persistence Mode that I enabled does not take effect and the ECC status or the MIG feature fails to be configured after a GPU-accelerated instance is restarted? We recommend that you use the NVIDIA Persistence Daemon to enable Persistence Mode.
1. Run the following command to run the NVIDIA Persistence Daemon:
```
sudo nvidia-persistenced --user username 
# Replace username with your username.
```
2. Run the following command to view the status of Persistence Mode:
```
nvidia-smi
```
  If the following command output is displayed, Persistence Mode is in the enabled (On) state.
(Optional) Enable Persistence Mode after you restart the system.
If you restart the system, the enabled (On) state of Persistence Mode becomes invalid. You can perform the following operations to enable Persistence Mode:
Install the installation scripts provided by NVIDIA, such as the sample script and the installer script, to the /usr/share/doc/NVIDIA_GLX-1.0/samples/nvidia-persistenced-init.tar.bz2 path by installing the Tesla driver installation package.
1. Run the following commands to decompress and install the installation script provided by NVIDIA:
```
cd  /usr/share/doc/NVIDIA_GLX-1.0/samples/
sudo tar xf nvidia-persistenced-init.tar.bz2
cd  nvidia-persistenced-init
sudo sh install.sh
```
2. Run the following command to check whether the NVIDIA Persistence Daemon runs as expected:
```
sudo systemctl status nvidia-persistenced
```
  If the following command output is displayed, the NVIDIA Persistence Daemon runs as expected.
  Note
  You can adapt the NVIDIA Persistence Daemon installation script based on your OS to ensure that the NVIDIA Persistence Daemon works as expected.
3. Run the following command to verify that Persistence Mode is in the enabled (On) state:
```
nvidia-smi
```
4. (Optional) Run the following commands to disable the NVIDIA Persistence Daemon.
  You can disable the NVIDIA Persistence Daemon based on your business requirements.
```
sudo systemctl stop nvidia-persistenced
sudo systemctl disable nvidia-persistenced
```

(Conditionally required) Install nvidia-fabricmanager that matches the driver version. This operation is required when your GPU-accelerated instance belongs to the ebmgn8v, ebmgn7, or ebmgn7e instance family.

Important

If the GPU-accelerated instance belongs to the ebmgn8v, ebmgn7, or ebmgn7e instance family, you must install nvidia-fabricmanager that matches the driver version. Otherwise, you cannot use the instance as expected.
You can skip this operation if the GPU-accelerated instance does not belong to the ebmgn8v, ebmgn7, or ebmgn7e instance family.

Install nvidia-fabricmanager.

You can install nvidia-fabricmanager by using the source code or the installation package. The commands that are required to install nvidia-fabricmanager vary based on the OS. In the following examples, the driver versions are 460.91.03 and 535.154.05, and Ubuntu 16.04, Ubuntu 18.04, Ubuntu 20.04, and Ubuntu 22.04 are used. Replace driver_version with the version of the driver that you downloaded in Step 1: Download the Tesla driver.

Important

When you install nvidia-fabricmanager on Ubuntu 22.04, the version of the Tesla driver must be later than 515.48.07. In the following sample commands for Ubuntu 22.04, the driver version is 535.154.05.

Source code

Ubuntu 16.04, Ubuntu 18.04, or Ubuntu 20.04

driver_version=460.91.03
driver_version_main=$(echo $driver_version | awk -F '.' '{print $1}')
distribution=$(. /etc/os-release;echo $ID$VERSION_ID | sed -e 's/\.//g')
sudo wget https://developer.download.nvidia.com/compute/cuda/repos/$distribution/x86_64/cuda-$distribution.pin
sudo mv cuda-$distribution.pin /etc/apt/preferences.d/cuda-repository-pin-600
sudo wget https://developer.download.nvidia.com/compute/cuda/repos/$distribution/x86_64/3bf863cc.pub
sudo apt-key add 3bf863cc.pub
sudo rm 3bf863cc.pub
sudo echo "deb https://developer.download.nvidia.com/compute/cuda/repos/$distribution/x86_64 /" | tee /etc/apt/sources.list.d/cuda.list
sudo apt-get update
sudo apt-get -y install nvidia-fabricmanager-${driver_version_main}=${driver_version}-*

Ubuntu 22.04

driver_version=535.154.05
driver_version_main=$(echo $driver_version | awk -F '.' '{print $1}')
distribution=$(. /etc/os-release;echo $ID$VERSION_ID | sed -e 's/\.//g')
sudo wget https://developer.download.nvidia.com/compute/cuda/repos/$distribution/x86_64/cuda-$distribution.pin
sudo mv cuda-$distribution.pin /etc/apt/preferences.d/cuda-repository-pin-600
sudo wget https://developer.download.nvidia.com/compute/cuda/repos/$distribution/x86_64/3bf863cc.pub
sudo apt-key add 3bf863cc.pub
sudo rm 3bf863cc.pub
sudo echo "deb https://developer.download.nvidia.com/compute/cuda/repos/$distribution/x86_64 /" | tee /etc/apt/sources.list.d/cuda.list
sudo apt-get update
sudo apt-get -y install nvidia-fabricmanager-${driver_version_main}=${driver_version}-*

Installation package

Ubuntu 16.04

driver_version=460.91.03
driver_version_main=$(echo $driver_version | awk -F '.' '{print $1}')
sudo wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/nvidia-fabricmanager-${driver_version_main}_${driver_version}-1_amd64.deb
sudo dpkg -i nvidia-fabricmanager-${driver_version_main}_${driver_version}-1_amd64.deb

Ubuntu 18.04

driver_version=460.91.03
driver_version_main=$(echo $driver_version | awk -F '.' '{print $1}')
sudo wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/nvidia-fabricmanager-${driver_version_main}_${driver_version}-1_amd64.deb
sudo dpkg -i nvidia-fabricmanager-${driver_version_main}_${driver_version}-1_amd64.deb

Ubuntu 20.04

driver_version=460.91.03
driver_version_main=$(echo $driver_version | awk -F '.' '{print $1}')
sudo wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/nvidia-fabricmanager-${driver_version_main}_${driver_version}-1_amd64.deb
sudo dpkg -i nvidia-fabricmanager-${driver_version_main}_${driver_version}-1_amd64.deb

Ubuntu 22.04

driver_version=535.154.05 
driver_version_main=$(echo $driver_version | awk -F '.' '{print $1}')
sudo wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/nvidia-fabricmanager-${driver_version_main}_${driver_version}-1_amd64.deb
sudo dpkg -i nvidia-fabricmanager-${driver_version_main}_${driver_version}-1_amd64.deb

Run the following commands to start nvidia-fabricmanager:

sudo systemctl enable nvidia-fabricmanager
sudo systemctl start nvidia-fabricmanager

Run the following command to check whether nvidia-fabricmanager is installed:
```
systemctl status nvidia-fabricmanager
```
If the following command output is displayed, nvidia-fabricmanager is installed.
Note
The GPU can work as expected only if the version of nvidia-fabricmanager is consistent with the Tesla driver version. For GPU-accelerated compute-optimized instances that run Ubuntu, the apt-daily service may automatically update nvidia-fabricmanager if you installed nvidia-fabricmanager by using an installation package. This results in version inconsistency between nvidia-fabricmanager and the Tesla driver. As a result, nvidia-fabricmanager fails to be started and the GPU cannot work as expected. For more information about how to resolve this issue, see What do I do if the GPU fails to work because the nvidia-fabricmanager version is inconsistent with the Tesla driver version?

References

If you purchase a GPU-accelerated compute-optimized Windows instance, you can install only the Tesla driver to better use the instance in general-purpose computing scenarios, such as deep learning and AI scenarios. For more information, see Manually install the Tesla driver on a GPU-accelerated compute-optimized Windows instance.
You can install the Tesla driver when you create a GPU-accelerated instance. For more information, see Automatically install or load the Tesla driver when you create a GPU-accelerated instance.
If you no longer need the Tesla driver due to a specific reason, you can uninstall the driver. For more information, see Uninstall the NVIDIA Tesla driver.
If the driver version of your GPU-accelerated instance cannot meet your business requirements, or the GPU-accelerated instance becomes unavailable due to an invalid driver type or version, you can uninstall the driver and install a new driver. You can also upgrade the driver. For more information, see Upgrade the NVIDIA Tesla or GRID driver.