GPU-accelerated instances on which NVIDIA Tesla drivers are installed can deliver high-performance computing capabilities or provide smoother graphics display effects in specific scenarios. The scenarios include general computing scenarios such as deep learning and AI scenarios, and graphics acceleration scenarios such as Open Graphics Library (OpenGL), Direct3D, and cloud gaming scenarios. If you do not install a Tesla driver when you create a GPU-accelerated compute-optimized Linux instance, you must manually install the Tesla driver after you create the instance. This topic describes how to manually install a Tesla driver on a GPU-accelerated compute-optimized Linux instance.
If a GPU-accelerated compute-optimized instance runs the Alibaba Cloud Linux 3 operating system and the Tesla driver is not automatically installed when you create the instance, you can use YUM to install the driver. For more information, see Use YUM to quickly install the NVIDIA Tesla driver on a GPU-accelerated compute-optimized Alibaba Cloud Linux 3 instance.
Procedure
This procedure applies to all GPU-accelerated compute-optimized Linux instances. For more information, see GPU-accelerated compute-optimized instance families. You can install only Tesla drivers that run the same operating system as the instances. For example, you can install only a Linux Tesla driver on a GPU-accelerated compute-optimized Linux instance.
Step 1: Download a Tesla driver
Visit the NVIDIA driver download page.
NoteFor more information about how to install and configure an NVIDIA driver, see NVIDIA CUDA Installation Guide for Linux.
Configure search conditions and click Find to search for a driver that is suitable for your instance.
The following table describes the search conditions.
Condition
Description
Example
Product category
Product series
Product
Select the product category, product series, and product based on the GPU of the GPU-accelerated instance.
NoteFor information about how to view the details of a GPU-accelerated instance, such as the instance ID, instance type, and operating system, see View instance information.
Data Center / Tesla
A-Series
NVIDIA A10
Operating system
Select a Linux version based on the image of the instance.
Linux 64-bit
CUDA Toolkit version
Select a CUDA Toolkit version.
11.4
Language
Select a language for the driver.
Chinese (Simplified)
On the result page, click View More Versions.
Find the driver that you want to download and click View next to the driver name.
In this example, the Data Center Driver for Linux x64 driver whose driver version is 470.161.03 and CUDA Toolkit version is 11.4 is selected.
On the details page of the driver that you want to download, right-click Download and select Copy URL.
Connect to the GPU-accelerated compute-optimized Linux instance.
For more information, see Connect to a Linux instance by using a password or key
Run the following command to download the driver installation package:
Replace the URL in the sample code with the URL that you obtained in Substep 5.
wget https://us.download.nvidia.com/tesla/470.161.03/NVIDIA-Linux-x86_64-470.161.03.run
Step 2: Install the Tesla driver
The method for installing a Tesla driver on an instance varies based on the OS of the instance. The following section describes how to install a Tesla driver on different OSs.
CentOS
Run the following command to check whether the kernel-devel and kernel-headers packages are installed on the GPU-accelerated instance:
sudo rpm -qa | grep $(uname -r)
If the command output includes the version information about the kernel-devel and kernel-headers packages, the packages are installed. Sample command output:
kernel-3.10.0-1062.18.1.el7.x86_64 kernel-devel-3.10.0-1062.18.1.el7.x86_64 kernel-headers-3.10.0-1062.18.1.el7.x86_64
If the command output does not include the version information about the kernel-devel (kernel-devel-*) and kernel-headers (kernel-headers-*) packages, you must download and install the packages of the required version. For more information, see kernel-devel and kernel-headers.
ImportantIf the kernel-devel version is different from the kernel version, a compilation error occurs when you install RPM Package Manager (RPM) for your driver. Therefore, check the kernel version in the command output before you download the kernel-devel version. In the preceding command output, the kernel version is 3.10.0-1062.18.1.el7.x86_64.
Grant the permissions on the installation package to your Tesla driver and install the driver.
In this example, a Linux 64-bit Tesla driver is used. We recommend that you use a .run installation package for your Tesla driver, such as the NVIDIA-Linux-x86_64-xxxx.run package. Run the following commands to grant the execution permissions on the installation package and install the Tesla driver:
NoteIf the installation package of your Tesla driver is in another format, such as the .deb or .rpm format, refer to NVIDIA CUDA Installation Guide for Linux for the installation method.
sudo chmod +x NVIDIA-Linux-x86_64-xxxx.run
sudo sh NVIDIA-Linux-x86_64-xxxx.run
Run the following command to check whether the Tesla driver is installed:
nvidia-smi
If the following command output is displayed, the Tesla driver is installed.
(Optional) Enable the persistence mode (Persistence-M) by using the NVIDIA Persistence Daemon.
After the Tesla driver is installed, Persistence-M is in the disabled (
off
) state by default. A Tesla driver can achieve more stable performance when Persistence-M is enabled. To ensure business continuity, we recommend that you enable Persistence-M by using the NVIDIA Persistence Daemon. For more information, see Persistence Daemon on the official NVIDIA website.NotePersistence-M is a term for a user-settable driver property that keeps a GPU in the initialized state.
Issues are caused if you enable Persistence-M by running the
nvidia-smi -pm 1
command. For example, Persistence-M is still in the disabled state after the instance is restarted. For more information, see After a GPU-accelerated compute-optimized instance is restarted, the operation that I performed to run the nvidia-smi -pm 1 command to enable Persistence-M does not take effect and the ECC state fails to be configured. How do I fix the issues? We recommend that you use the NVIDIA Persistence Daemon to enable Persistence-M.
Run the following command to run the NVIDIA Persistence Daemon:
sudo nvidia-persistenced --user username # Replace username with your username.
Run the following command to view the status of Persistence-M:
nvidia-smi
If the following command output is displayed, Persistence-M is in the enabled (
On
) state.
(Optional) Enable Persistence-M after you restart the system.
If you restart the system, the enabled (
on
) state of Persistence-M becomes invalid. You can perform the following operations to enable Persistence-M:Install the installation scripts provided by NVIDIA, such as the sample script and the installer script, to the
/usr/share/doc/NVIDIA_GLX-1.0/samples/nvidia-persistenced-init.tar.bz2
path by installing the Tesla driver installation package.Run the following commands to decompress and install the installation script provided by NVIDIA:
cd /usr/share/doc/NVIDIA_GLX-1.0/samples/ sudo tar xf nvidia-persistenced-init.tar.bz2 cd nvidia-persistenced-init sudo sh install.sh
Run the following command to check whether the NVIDIA Persistence Daemon runs as expected:
sudo systemctl status nvidia-persistenced
If the following command output is displayed, the NVIDIA Persistence Daemon runs as expected.
NoteYou can adapt the NVIDIA Persistence Daemon installation script based on your operating system to ensure that the NVIDIA Persistence Daemon works as expected.
Run the following command to verify that Persistence-M is in the enabled (
on
) state:nvidia-smi
(Optional) Run the following commands to disable the NVIDIA Persistence Daemon.
You can disable the NVIDIA Persistence Daemon based on your business requirements.
sudo systemctl stop nvidia-persistenced sudo systemctl disable nvidia-persistenced
(Conditionally required) Install NVIDIA Fabric Manager that matches the driver version. This operation is required when your GPU-accelerated instance belongs to the ebmgn7 or ebmgn7e instance family.
ImportantIf the GPU-accelerated instance belongs to the ebmgn7 or ebmgn7e instance family, you must install NVIDIA Fabric Manager that matches the driver version. Otherwise, you cannot use the instance as expected.
You can skip this operation if the GPU-accelerated instance does not belong to the ebmgn7 or ebmgn7e instance family.
Install NVIDIA Fabric Manager.
You can install NVIDIA Fabric Manager by using the source code or the installation package. The commands that are required to install NVIDIA Fabric Manager vary based on the operating system. In the following examples, the driver version is 460.91.03, and CentOS 7.x and CentOS 8.x are used. Replace
driver_version
with the version of the driver that you downloaded in Step 1: Download a Tesla driver.Source code
Installation package
Run the following commands to start NVIDIA Fabric Manager:
sudo systemctl enable nvidia-fabricmanager sudo systemctl start nvidia-fabricmanager
Run the following command to check whether NVIDIA Fabric Manager is installed:
systemctl status nvidia-fabricmanager
If the following command output is displayed, NVIDIA Fabric Manager is installed.
Other Linux distributions such as Ubuntu
Grant the permissions on the installation package to your Tesla driver and install the driver.
In this example, a Linux 64-bit Tesla driver is used. We recommend that you use a .run installation package for your Tesla driver, such as the NVIDIA-Linux-x86_64-xxxx.run package. Run the following commands to grant the execution permissions on the installation package and install the Tesla driver:
NoteIf the installation package of your Tesla driver is in another format, such as the .deb or .rpm format, refer to NVIDIA CUDA Installation Guide for Linux for the installation method.
sudo chmod +x NVIDIA-Linux-x86_64-xxxx.run
sudo sh NVIDIA-Linux-x86_64-xxxx.run
Run the following command to check whether the Tesla driver is installed:
nvidia-smi
If the following command output is displayed, the Tesla driver is installed.
(Optional) Enable the persistence mode (Persistence-M) by using the NVIDIA Persistence Daemon.
After the Tesla driver is installed, Persistence-M is in the disabled (
off
) state by default. A Tesla driver can achieve more stable performance when Persistence-M is enabled. To ensure business continuity, we recommend that you enable Persistence-M by using the NVIDIA Persistence Daemon. For more information, see Persistence Daemon on the official NVIDIA website.NotePersistence-M is a term for a user-settable driver property that keeps a GPU in the initialized state.
Issues are caused if you enable Persistence-M by running the
nvidia-smi -pm 1
command. For example, Persistence-M is still in the disabled state after the instance is restarted. For more information, see After a GPU-accelerated compute-optimized instance is restarted, the operation that I performed to run the nvidia-smi -pm 1 command to enable Persistence-M does not take effect and the ECC state fails to be configured. How do I fix the issues? We recommend that you use the NVIDIA Persistence Daemon to enable Persistence-M.
Run the following command to run the NVIDIA Persistence Daemon:
sudo nvidia-persistenced --user username # Replace username with your username.
Run the following command to view the status of Persistence-M:
nvidia-smi
If the following command output is displayed, Persistence-M is in the enabled (
On
) state.
(Optional) Enable Persistence-M after you restart the system.
If you restart the system, the enabled (
on
) state of Persistence-M becomes invalid. You can perform the following operations to enable Persistence-M:Install the installation scripts provided by NVIDIA, such as the sample script and the installer script, to the
/usr/share/doc/NVIDIA_GLX-1.0/samples/nvidia-persistenced-init.tar.bz2
path by installing the Tesla driver installation package.Run the following commands to decompress and install the installation script provided by NVIDIA:
cd /usr/share/doc/NVIDIA_GLX-1.0/samples/ sudo tar xf nvidia-persistenced-init.tar.bz2 cd nvidia-persistenced-init sudo sh install.sh
Run the following command to check whether the NVIDIA Persistence Daemon runs as expected:
sudo systemctl status nvidia-persistenced
If the following command output is displayed, the NVIDIA Persistence Daemon runs as expected.
NoteYou can adapt the NVIDIA Persistence Daemon installation script based on your operating system to ensure that the NVIDIA Persistence Daemon works as expected.
Run the following command to verify that Persistence-M is in the enabled (
on
) state:nvidia-smi
(Optional) Run the following commands to disable the NVIDIA Persistence Daemon.
You can disable the NVIDIA Persistence Daemon based on your business requirements.
sudo systemctl stop nvidia-persistenced sudo systemctl disable nvidia-persistenced
(Conditionally required) Install NVIDIA Fabric Manager that matches the driver version. This operation is required when your GPU-accelerated instance belongs to the ebmgn7 or ebmgn7e instance family.
ImportantIf the GPU-accelerated instance belongs to the ebmgn7 or ebmgn7e instance family, you must install NVIDIA Fabric Manager that matches the driver version. Otherwise, you cannot use the instance as expected.
You can skip this operation if the GPU-accelerated instance does not belong to the ebmgn7 or ebmgn7e instance family.
Install NVIDIA Fabric Manager.
You can install NVIDIA Fabric Manager by using the source code or the installation package. The commands that are required to install NVIDIA Fabric Manager vary based on the operating system. In the following examples, the driver versions are 460.91.03 and 535.154.05, and Ubuntu 16.04, Ubuntu 18.04, Ubuntu 20.04, and Ubuntu 22.04 are used. Replace
driver_version
with the version of the driver that you downloaded in Step 1: Download a Tesla driver.ImportantWhen you install NVIDIA Fabric Manager on Ubuntu 22.04, the version of the Tesla driver must be later than 515.48.07. In the following sample commands for Ubuntu 22.04, the driver version is 535.154.05.
Source code
Installation package
Run the following commands to start NVIDIA Fabric Manager:
sudo systemctl enable nvidia-fabricmanager sudo systemctl start nvidia-fabricmanager
Run the following command to check whether NVIDIA Fabric Manager is installed:
systemctl status nvidia-fabricmanager
If the following command output is displayed, NVIDIA Fabric Manager is installed.
NoteThe GPU can work as expected only if the version of NVIDIA Fabric Manager is consistent with the Tesla driver version. For GPU-accelerated compute-optimized instances that run Ubuntu, the apt-daily service may automatically update NVIDIA Fabric Manager if you install NVIDIA Fabric Manager by using an installation package. This results in version inconsistency between NVIDIA Fabric Manager and the Tesla driver. As a result, NVIDIA Fabric Manager fails to start and the GPU fails to work as expected. For information about how to resolve this issue, see What do I do if the GPU fails to work because the nvidia-fabricmanager version is inconsistent with the Tesla driver version?
References
If you purchase a GPU-accelerated compute-optimized Windows instance, you can install only a Tesla driver to better use the instance in general computing scenarios, such as deep learning and AI scenarios. For more information, see Manually install a Tesla driver on a GPU-accelerated compute-optimized Windows instance.
You can install a Tesla driver when you create a GPU-accelerated instance. For more information, see Automatically install or load the Tesla driver when you create a GPU-accelerated instance.
If you no longer need a Tesla driver due to a specific reason, you can uninstall the driver. For more information, see Uninstall an NVIDIA Tesla driver.
If the driver version of your GPU-accelerated instance cannot meet your business requirements, or the GPU-accelerated instance becomes unavailable due to an invalid driver type or version, you can uninstall the driver and install a new driver. You can also upgrade the driver. For more information, see Upgrade an NVIDIA Tesla or GRID driver.