GPU-accelerated instances on which the NVIDIA Tesla driver is installed can deliver high-performance computing capabilities or provide smoother graphics display effects in specific scenarios. The scenarios include general-purpose computing scenarios such as deep learning and AI scenarios, and graphics acceleration scenarios such as Open Graphics Library (OpenGL), Direct3D, and cloud gaming scenarios. If you do not install the Tesla driver when you create a GPU-accelerated compute-optimized Linux instance, you must manually install the Tesla driver after you create the instance. This topic describes how to manually install the Tesla driver on a GPU-accelerated compute-optimized Linux instance.
If a GPU-accelerated compute-optimized instance runs Alibaba Cloud Linux 3 and the Tesla driver is not automatically installed when you create the instance, you can use YUM to install the driver. For more information, see Use YUM to quickly install the NVIDIA Tesla driver on a GPU-accelerated compute-optimized Alibaba Cloud Linux 3 instance.
Procedure
This procedure applies to all GPU-accelerated compute-optimized Linux instances. For more information, see GPU-accelerated compute-optimized instance families (gn, ebm, and scc series). You can install only the Tesla driver that runs the same OS as the instances. For example, you can install only the Linux Tesla driver on a GPU-accelerated compute-optimized Linux instance.
Step 1: Download the Tesla driver
Visit the NVIDIA driver download page.
NoteFor more information about how to install and configure an NVIDIA driver, see NVIDIA CUDA Installation Guide for Linux.
Configure search conditions and click Find to search for a driver that is suitable for your instance.
The following table describes the search conditions.
Condition
Description
Example
Product type
Product series
Product family
Select the product type, product series, and product family based on the GPU of the GPU-accelerated instance.
NoteFor more information about how to view the details of a GPU-accelerated instance, such as the instance ID, instance type, and OS, see View instance information.
Data Center / Tesla
A-Series
NVIDIA A10
OS
Select a Linux version based on the image of the instance.
Linux 64-bit
CUDA Toolkit version
Select a CUDA Toolkit version.
11.4
Language
Select a language for the driver.
Chinese (Simplified)
On the result page, click View More Versions.
Find the driver that you want to download and click View next to the driver name.
In this example, the Data Center Driver for Linux x64 driver whose driver version is 470.161.03 and CUDA Toolkit version is 11.4 is selected.
On the details page of the driver that you want to download, right-click Download and select Copy URL.
Connect to the GPU-accelerated compute-optimized Linux instance.
For more information, see Use Workbench to connect to a Linux instance over SSH.
Run the following command to download the driver installation package:
Replace the URL in the sample code with the URL that you obtained in Substep 5.
wget https://us.download.nvidia.com/tesla/470.161.03/NVIDIA-Linux-x86_64-470.161.03.run
Step 2: Install the Tesla driver
The method for installing the Tesla driver on an instance varies based on the OS of the instance. The following section describes how to install the Tesla driver on different OSs.
CentOS
Run the following command to check whether the kernel-devel and kernel-headers packages are installed on the GPU-accelerated instance:
sudo rpm -qa | grep $(uname -r)
If the command output includes the version information about the kernel-devel and kernel-headers packages, the packages are installed. Sample command output:
kernel-3.10.0-1062.18.1.el7.x86_64 kernel-devel-3.10.0-1062.18.1.el7.x86_64 kernel-headers-3.10.0-1062.18.1.el7.x86_64
If the command output does not include the version information about the kernel-devel (kernel-devel-*) and kernel-headers (kernel-headers-*) packages, you must download and install the packages of the required version. For more information, see kernel-devel and kernel-headers.
ImportantIf the kernel-devel version is different from the kernel version, a compilation error occurs when you install RPM Package Manager (RPM) for your driver. Therefore, check the kernel version in the command output before you download the kernel-devel version. In the preceding command output, the kernel version is 3.10.0-1062.18.1.el7.x86_64.
Grant the permissions on the installation package to your Tesla driver and install the driver.
In this example, the Linux 64-bit Tesla driver is used. We recommend that you use a .run installation package for your Tesla driver, such as the NVIDIA-Linux-x86_64-xxxx.run package. Run the following commands to grant the execution permissions on the installation package and install the Tesla driver:
NoteIf the installation package of your Tesla driver is in another format, such as the .deb or .rpm format, refer to NVIDIA CUDA Installation Guide for Linux for the installation method.
sudo chmod +x NVIDIA-Linux-x86_64-xxxx.run
sudo sh NVIDIA-Linux-x86_64-xxxx.run
Run the following command to check whether the Tesla driver is installed:
nvidia-smi
If the following command output is displayed, the Tesla driver is installed.
(Optional) Enable Persistence Mode by using the NVIDIA Persistence Daemon.
After the Tesla driver is installed, Persistence Mode is in the disabled (
Off
) state by default. The Tesla driver can achieve more stable performance when Persistence Mode is enabled. To ensure business continuity, we recommend that you enable Persistence Mode by using the NVIDIA Persistence Daemon. For more information, see Persistence Daemon on the official NVIDIA website.NotePersistence Mode (Persistence-M) is a term for a user-settable driver property that keeps a GPU in the initialized state.
If you enable Persistence Mode by running the
nvidia-smi -pm 1
command, the setting may become invalid after the instance is restarted. For more information about how to fix the error, see What do I do if Persistence Mode that I enabled does not take effect and the ECC status or the MIG feature fails to be configured after a GPU-accelerated instance is restarted? We recommend that you use the NVIDIA Persistence Daemon to enable Persistence Mode.
Run the following command to run the NVIDIA Persistence Daemon:
sudo nvidia-persistenced --user username # Replace username with your username.
Run the following command to view the status of Persistence Mode:
nvidia-smi
If the following command output is displayed, Persistence Mode is in the enabled (
On
) state.
(Optional) Enable Persistence Mode after you restart the system.
If you restart the system, the enabled (
On
) state of Persistence Mode becomes invalid. You can perform the following operations to enable Persistence Mode:Install the installation scripts provided by NVIDIA, such as the sample script and the installer script, to the
/usr/share/doc/NVIDIA_GLX-1.0/samples/nvidia-persistenced-init.tar.bz2
path by installing the Tesla driver installation package.Run the following commands to decompress and install the installation script provided by NVIDIA:
cd /usr/share/doc/NVIDIA_GLX-1.0/samples/ sudo tar xf nvidia-persistenced-init.tar.bz2 cd nvidia-persistenced-init sudo sh install.sh
Run the following command to check whether the NVIDIA Persistence Daemon runs as expected:
sudo systemctl status nvidia-persistenced
If the following command output is displayed, the NVIDIA Persistence Daemon runs as expected.
NoteYou can adapt the NVIDIA Persistence Daemon installation script based on your OS to ensure that the NVIDIA Persistence Daemon works as expected.
Run the following command to verify that Persistence Mode is in the enabled (
On
) state:nvidia-smi
(Optional) Run the following commands to disable the NVIDIA Persistence Daemon.
You can disable the NVIDIA Persistence Daemon based on your business requirements.
sudo systemctl stop nvidia-persistenced sudo systemctl disable nvidia-persistenced
(Conditionally required) Install nvidia-fabricmanager that matches the driver version. This operation is required when your GPU-accelerated instance belongs to the ebmgn7 or ebmgn7e instance family.
ImportantIf the GPU-accelerated instance belongs to the ebmgn7 or ebmgn7e instance family, you must install nvidia-fabricmanager that matches the driver version. Otherwise, you cannot use the instance as expected.
You can skip this operation if the GPU-accelerated instance does not belong to the ebmgn7 or ebmgn7e instance family.
Install nvidia-fabricmanager.
You can install nvidia-fabricmanager by using the source code or the installation package. The commands that are required to install nvidia-fabricmanager vary based on the OS. In the following examples, the driver version is 460.91.03, and CentOS 7.x and CentOS 8.x are used. Replace
driver_version
with the version of the driver that you downloaded in Step 1: Download the Tesla driver.Source code
Installation package
Run the following commands to start nvidia-fabricmanager:
sudo systemctl enable nvidia-fabricmanager sudo systemctl start nvidia-fabricmanager
Run the following command to check whether nvidia-fabricmanager is installed:
systemctl status nvidia-fabricmanager
If the following command output is displayed, nvidia-fabricmanager is installed.
Other Linux distributions such as Ubuntu
Grant the permissions on the installation package to your Tesla driver and install the driver.
In this example, the Linux 64-bit Tesla driver is used. We recommend that you use a .run installation package for your Tesla driver, such as the NVIDIA-Linux-x86_64-xxxx.run package. Run the following commands to grant the execution permissions on the installation package and install the Tesla driver:
NoteIf the installation package of your Tesla driver is in another format, such as the .deb or .rpm format, refer to NVIDIA CUDA Installation Guide for Linux for the installation method.
sudo chmod +x NVIDIA-Linux-x86_64-xxxx.run
sudo sh NVIDIA-Linux-x86_64-xxxx.run
Run the following command to check whether the Tesla driver is installed:
nvidia-smi
If the following command output is displayed, the Tesla driver is installed.
(Optional) Enable Persistence Mode by using the NVIDIA Persistence Daemon.
After the Tesla driver is installed, Persistence Mode is in the disabled (
Off
) state by default. The Tesla driver can achieve more stable performance when Persistence Mode is enabled. To ensure business continuity, we recommend that you enable Persistence Mode by using the NVIDIA Persistence Daemon. For more information, see Persistence Daemon on the official NVIDIA website.NotePersistence Mode (Persistence-M) is a term for a user-settable driver property that keeps a GPU in the initialized state.
If you enable Persistence Mode by running the
nvidia-smi -pm 1
command, the setting may become invalid after the instance is restarted. For more information about how to fix the error, see What do I do if Persistence Mode that I enabled does not take effect and the ECC status or the MIG feature fails to be configured after a GPU-accelerated instance is restarted? We recommend that you use the NVIDIA Persistence Daemon to enable Persistence Mode.
Run the following command to run the NVIDIA Persistence Daemon:
sudo nvidia-persistenced --user username # Replace username with your username.
Run the following command to view the status of Persistence Mode:
nvidia-smi
If the following command output is displayed, Persistence Mode is in the enabled (
On
) state.
(Optional) Enable Persistence Mode after you restart the system.
If you restart the system, the enabled (
On
) state of Persistence Mode becomes invalid. You can perform the following operations to enable Persistence Mode:Install the installation scripts provided by NVIDIA, such as the sample script and the installer script, to the
/usr/share/doc/NVIDIA_GLX-1.0/samples/nvidia-persistenced-init.tar.bz2
path by installing the Tesla driver installation package.Run the following commands to decompress and install the installation script provided by NVIDIA:
cd /usr/share/doc/NVIDIA_GLX-1.0/samples/ sudo tar xf nvidia-persistenced-init.tar.bz2 cd nvidia-persistenced-init sudo sh install.sh
Run the following command to check whether the NVIDIA Persistence Daemon runs as expected:
sudo systemctl status nvidia-persistenced
If the following command output is displayed, the NVIDIA Persistence Daemon runs as expected.
NoteYou can adapt the NVIDIA Persistence Daemon installation script based on your OS to ensure that the NVIDIA Persistence Daemon works as expected.
Run the following command to verify that Persistence Mode is in the enabled (
On
) state:nvidia-smi
(Optional) Run the following commands to disable the NVIDIA Persistence Daemon.
You can disable the NVIDIA Persistence Daemon based on your business requirements.
sudo systemctl stop nvidia-persistenced sudo systemctl disable nvidia-persistenced
(Conditionally required) Install nvidia-fabricmanager that matches the driver version. This operation is required when your GPU-accelerated instance belongs to the ebmgn7 or ebmgn7e instance family.
ImportantIf the GPU-accelerated instance belongs to the ebmgn7 or ebmgn7e instance family, you must install nvidia-fabricmanager that matches the driver version. Otherwise, you cannot use the instance as expected.
You can skip this operation if the GPU-accelerated instance does not belong to the ebmgn7 or ebmgn7e instance family.
Install nvidia-fabricmanager.
You can install nvidia-fabricmanager by using the source code or the installation package. The commands that are required to install nvidia-fabricmanager vary based on the OS. In the following examples, the driver versions are 460.91.03 and 535.154.05, and Ubuntu 16.04, Ubuntu 18.04, Ubuntu 20.04, and Ubuntu 22.04 are used. Replace
driver_version
with the version of the driver that you downloaded in Step 1: Download the Tesla driver.ImportantWhen you install nvidia-fabricmanager on Ubuntu 22.04, the version of the Tesla driver must be later than 515.48.07. In the following sample commands for Ubuntu 22.04, the driver version is 535.154.05.
Source code
Installation package
Run the following commands to start nvidia-fabricmanager:
sudo systemctl enable nvidia-fabricmanager sudo systemctl start nvidia-fabricmanager
Run the following command to check whether nvidia-fabricmanager is installed:
systemctl status nvidia-fabricmanager
If the following command output is displayed, nvidia-fabricmanager is installed.
NoteThe GPU can work as expected only if the version of nvidia-fabricmanager is consistent with the Tesla driver version. For GPU-accelerated compute-optimized instances that run Ubuntu, the apt-daily service may automatically update nvidia-fabricmanager if you installed nvidia-fabricmanager by using an installation package. This results in version inconsistency between nvidia-fabricmanager and the Tesla driver. As a result, nvidia-fabricmanager fails to be started and the GPU cannot work as expected. For more information about how to resolve this issue, see What do I do if the GPU fails to work because the nvidia-fabricmanager version is inconsistent with the Tesla driver version?
References
If you purchase a GPU-accelerated compute-optimized Windows instance, you can install only the Tesla driver to better use the instance in general-purpose computing scenarios, such as deep learning and AI scenarios. For more information, see Manually install the Tesla driver on a GPU-accelerated compute-optimized Windows instance.
You can install the Tesla driver when you create a GPU-accelerated instance. For more information, see Automatically install or load the Tesla driver when you create a GPU-accelerated instance.
If you no longer need the Tesla driver due to a specific reason, you can uninstall the driver. For more information, see Uninstall the NVIDIA Tesla driver.
If the driver version of your GPU-accelerated instance cannot meet your business requirements, or the GPU-accelerated instance becomes unavailable due to an invalid driver type or version, you can uninstall the driver and install a new driver. You can also upgrade the driver. For more information, see Upgrade the NVIDIA Tesla or GRID driver.