All Products
Search
Document Center

Elastic GPU Service:Manually install the Tesla driver on a GPU-accelerated compute-optimized Linux instance

Last Updated:Dec 12, 2024

GPU-accelerated instances on which the NVIDIA Tesla driver is installed can deliver high-performance computing capabilities or provide smoother graphics display effects in specific scenarios. The scenarios include general-purpose computing scenarios such as deep learning and AI scenarios, and graphics acceleration scenarios such as Open Graphics Library (OpenGL), Direct3D, and cloud gaming scenarios. If you do not install the Tesla driver when you create a GPU-accelerated compute-optimized Linux instance, you must manually install the Tesla driver after you create the instance. This topic describes how to manually install the Tesla driver on a GPU-accelerated compute-optimized Linux instance.

Note

If a GPU-accelerated compute-optimized instance runs Alibaba Cloud Linux 3 and the Tesla driver is not automatically installed when you create the instance, you can use YUM to install the driver. For more information, see Use YUM to quickly install the NVIDIA Tesla driver on a GPU-accelerated compute-optimized Alibaba Cloud Linux 3 instance.

Procedure

This procedure applies to all GPU-accelerated compute-optimized Linux instances. For more information, see GPU-accelerated compute-optimized instance families (gn, ebm, and scc series). You can install only the Tesla driver that runs the same OS as the instances. For example, you can install only the Linux Tesla driver on a GPU-accelerated compute-optimized Linux instance.

Step 1: Download the Tesla driver

  1. Visit the NVIDIA driver download page.

    Note

    For more information about how to install and configure an NVIDIA driver, see NVIDIA CUDA Installation Guide for Linux.

  2. Configure search conditions and click Find to search for a driver that is suitable for your instance.

    Tesla驱动.jpg

    The following table describes the search conditions.

    Condition

    Description

    Example

    • Product type

    • Product series

    • Product family

    Select the product type, product series, and product family based on the GPU of the GPU-accelerated instance.

    Note

    For more information about how to view the details of a GPU-accelerated instance, such as the instance ID, instance type, and OS, see View instance information.

    • Data Center / Tesla

    • A-Series

    • NVIDIA A10

    OS

    Select a Linux version based on the image of the instance.

    Linux 64-bit

    CUDA Toolkit version

    Select a CUDA Toolkit version.

    11.4

    Language

    Select a language for the driver.

    Chinese (Simplified)

    GPUs, supported driver versions, and CUDA Toolkit versions of specific GPU-accelerated compute-optimized instance families

    Instance family

    gn8is

    gn7e

    gn7i

    gn7

    gn6e

    gn6i

    gn6v

    gn5i

    gn5

    Product type

    Data Center / Tesla

    Data Center / Tesla

    Data Center / Tesla

    Data Center / Tesla

    Data Center / Tesla

    Data Center / Tesla

    Data Center / Tesla

    Data Center / Tesla

    Data Center / Tesla

    Product series

    L-Series

    A-Series

    A-Series

    A-Series

    V-Series

    T-Series

    V-Series

    P-Series

    P-Series

    Recommended Tesla driver version

    Version 550.90.07 or later

    Version 450.80.02 or later

    Version 460.73.01 or later

    Version 450.80.02 or later

    Version 410.79 or later

    Recommended CUDA Toolkit version

    CUDA Toolkit 12.4 Update 1

    CUDA Toolkit 11.0 Update 1

    CUDA Toolkit 11.2

    CUDA Toolkit 11.0 Update 1

    CUDA Toolkit 10.1 Update 2

    Note
    • The preceding table describes only the GPU information about specific popular GPU-accelerated compute-optimized instance families. Instances that use the same GPU have the same GPU information, such as the same product type, product series, and product family. For example, instances of the ebmgn7i and gn7i instance families use NVIDIA A10 GPUs. Therefore, the product type, product series, and product family of the instances are the same.

    • When you manually install the Tesla driver and CUDA Toolkit, you must make sure that the driver version is compatible with the CUDA Toolkit version. For more information, see CUDA Compatibility.

  3. On the result page, click View More Versions.

  4. Find the driver that you want to download and click View next to the driver name.

    In this example, the Data Center Driver for Linux x64 driver whose driver version is 470.161.03 and CUDA Toolkit version is 11.4 is selected.

  5. On the details page of the driver that you want to download, right-click Download and select Copy URL.

    驱动下载.jpg

  6. Connect to the GPU-accelerated compute-optimized Linux instance.

    For more information, see Use Workbench to connect to a Linux instance over SSH.

  7. Run the following command to download the driver installation package:

    Replace the URL in the sample code with the URL that you obtained in Substep 5.

    wget https://us.download.nvidia.com/tesla/470.161.03/NVIDIA-Linux-x86_64-470.161.03.run

Step 2: Install the Tesla driver

The method for installing the Tesla driver on an instance varies based on the OS of the instance. The following section describes how to install the Tesla driver on different OSs.

CentOS

  1. Run the following command to check whether the kernel-devel and kernel-headers packages are installed on the GPU-accelerated instance:

    sudo rpm  -qa | grep $(uname -r)
    • If the command output includes the version information about the kernel-devel and kernel-headers packages, the packages are installed. Sample command output:

      kernel-3.10.0-1062.18.1.el7.x86_64
      kernel-devel-3.10.0-1062.18.1.el7.x86_64
      kernel-headers-3.10.0-1062.18.1.el7.x86_64
    • If the command output does not include the version information about the kernel-devel (kernel-devel-*) and kernel-headers (kernel-headers-*) packages, you must download and install the packages of the required version. For more information, see kernel-devel and kernel-headers.

      Important

      If the kernel-devel version is different from the kernel version, a compilation error occurs when you install RPM Package Manager (RPM) for your driver. Therefore, check the kernel version in the command output before you download the kernel-devel version. In the preceding command output, the kernel version is 3.10.0-1062.18.1.el7.x86_64.

  2. Grant the permissions on the installation package to your Tesla driver and install the driver.

    In this example, the Linux 64-bit Tesla driver is used. We recommend that you use a .run installation package for your Tesla driver, such as the NVIDIA-Linux-x86_64-xxxx.run package. Run the following commands to grant the execution permissions on the installation package and install the Tesla driver:

    Note

    If the installation package of your Tesla driver is in another format, such as the .deb or .rpm format, refer to NVIDIA CUDA Installation Guide for Linux for the installation method.

    sudo chmod +x NVIDIA-Linux-x86_64-xxxx.run
    sudo sh NVIDIA-Linux-x86_64-xxxx.run
  3. Run the following command to check whether the Tesla driver is installed:

    nvidia-smi

    If the following command output is displayed, the Tesla driver is installed.

    驱动版本.jpg

  4. (Optional) Enable Persistence Mode by using the NVIDIA Persistence Daemon.

    After the Tesla driver is installed, Persistence Mode is in the disabled (Off) state by default. The Tesla driver can achieve more stable performance when Persistence Mode is enabled. To ensure business continuity, we recommend that you enable Persistence Mode by using the NVIDIA Persistence Daemon. For more information, see Persistence Daemon on the official NVIDIA website.

    Note
    1. Run the following command to run the NVIDIA Persistence Daemon:

      sudo nvidia-persistenced --user username 
      # Replace username with your username.
    2. Run the following command to view the status of Persistence Mode:

      nvidia-smi

      If the following command output is displayed, Persistence Mode is in the enabled (On) state.

      persistence.jpg

  5. (Optional) Enable Persistence Mode after you restart the system.

    If you restart the system, the enabled (On) state of Persistence Mode becomes invalid. You can perform the following operations to enable Persistence Mode:

    Install the installation scripts provided by NVIDIA, such as the sample script and the installer script, to the /usr/share/doc/NVIDIA_GLX-1.0/samples/nvidia-persistenced-init.tar.bz2 path by installing the Tesla driver installation package.

    1. Run the following commands to decompress and install the installation script provided by NVIDIA:

      cd  /usr/share/doc/NVIDIA_GLX-1.0/samples/
      sudo tar xf nvidia-persistenced-init.tar.bz2
      cd  nvidia-persistenced-init
      sudo sh install.sh
    2. Run the following command to check whether the NVIDIA Persistence Daemon runs as expected:

      sudo systemctl status nvidia-persistenced

      If the following command output is displayed, the NVIDIA Persistence Daemon runs as expected.

      persistence Daemon.jpg

      Note

      You can adapt the NVIDIA Persistence Daemon installation script based on your OS to ensure that the NVIDIA Persistence Daemon works as expected.

    3. Run the following command to verify that Persistence Mode is in the enabled (On) state:

      nvidia-smi
    4. (Optional) Run the following commands to disable the NVIDIA Persistence Daemon.

      You can disable the NVIDIA Persistence Daemon based on your business requirements.

      sudo systemctl stop nvidia-persistenced
      sudo systemctl disable nvidia-persistenced
  6. (Conditionally required) Install nvidia-fabricmanager that matches the driver version. This operation is required when your GPU-accelerated instance belongs to the ebmgn7 or ebmgn7e instance family.

    Important
    • If the GPU-accelerated instance belongs to the ebmgn7 or ebmgn7e instance family, you must install nvidia-fabricmanager that matches the driver version. Otherwise, you cannot use the instance as expected.

    • You can skip this operation if the GPU-accelerated instance does not belong to the ebmgn7 or ebmgn7e instance family.

    1. Install nvidia-fabricmanager.

      You can install nvidia-fabricmanager by using the source code or the installation package. The commands that are required to install nvidia-fabricmanager vary based on the OS. In the following examples, the driver version is 460.91.03, and CentOS 7.x and CentOS 8.x are used. Replace driver_version with the version of the driver that you downloaded in Step 1: Download the Tesla driver.

      • Source code

        • CentOS 7.x

          driver_version=460.91.03
          sudo yum -y install yum-utils
          sudo yum-config-manager --add-repo https://developer.download.nvidia.com/compute/cuda/repos/rhel7/x86_64/cuda-rhel7.repo
          sudo yum install -y nvidia-fabric-manager-${driver_version}-1
        • CentOS 8.x

          driver_version=460.91.03
          driver_version_main=$(echo $driver_version | awk -F '.' '{print $1}')
          distribution=rhel8
          ARCH=$( /bin/arch )
          sudo dnf config-manager --add-repo https://developer.download.nvidia.com/compute/cuda/repos/$distribution/${ARCH}/cuda-$distribution.repo
          sudo dnf module enable -y nvidia-driver:${driver_version_main}
          sudo dnf install -y nvidia-fabric-manager-0:${driver_version}-1
      • Installation package

        • CentOS 7.x

          driver_version=460.91.03
          sudo wget https://developer.download.nvidia.com/compute/cuda/repos/rhel7/x86_64/nvidia-fabric-manager-${driver_version}-1.x86_64.rpm
          sudo rpm -ivh nvidia-fabric-manager-${driver_version}-1.x86_64.rpm
        • CentOS 8.x

          driver_version=460.91.03
          sudo wget https://developer.download.nvidia.com/compute/cuda/repos/rhel8/x86_64/nvidia-fabric-manager-${driver_version}-1.x86_64.rpm
          sudo rpm -ivh nvidia-fabric-manager-${driver_version}-1.x86_64.rpm
    2. Run the following commands to start nvidia-fabricmanager:

      sudo systemctl enable nvidia-fabricmanager
      sudo systemctl start nvidia-fabricmanager
    3. Run the following command to check whether nvidia-fabricmanager is installed:

      systemctl status nvidia-fabricmanager

      If the following command output is displayed, nvidia-fabricmanager is installed.

      Dingtalk_20240910143221.jpg

Other Linux distributions such as Ubuntu

  1. Grant the permissions on the installation package to your Tesla driver and install the driver.

    In this example, the Linux 64-bit Tesla driver is used. We recommend that you use a .run installation package for your Tesla driver, such as the NVIDIA-Linux-x86_64-xxxx.run package. Run the following commands to grant the execution permissions on the installation package and install the Tesla driver:

    Note

    If the installation package of your Tesla driver is in another format, such as the .deb or .rpm format, refer to NVIDIA CUDA Installation Guide for Linux for the installation method.

    sudo chmod +x NVIDIA-Linux-x86_64-xxxx.run
    sudo sh NVIDIA-Linux-x86_64-xxxx.run
  2. Run the following command to check whether the Tesla driver is installed:

    nvidia-smi

    If the following command output is displayed, the Tesla driver is installed.

    驱动版本.jpg

  3. (Optional) Enable Persistence Mode by using the NVIDIA Persistence Daemon.

    After the Tesla driver is installed, Persistence Mode is in the disabled (Off) state by default. The Tesla driver can achieve more stable performance when Persistence Mode is enabled. To ensure business continuity, we recommend that you enable Persistence Mode by using the NVIDIA Persistence Daemon. For more information, see Persistence Daemon on the official NVIDIA website.

    Note
    1. Run the following command to run the NVIDIA Persistence Daemon:

      sudo nvidia-persistenced --user username 
      # Replace username with your username.
    2. Run the following command to view the status of Persistence Mode:

      nvidia-smi

      If the following command output is displayed, Persistence Mode is in the enabled (On) state.

      persistence.jpg

  4. (Optional) Enable Persistence Mode after you restart the system.

    If you restart the system, the enabled (On) state of Persistence Mode becomes invalid. You can perform the following operations to enable Persistence Mode:

    Install the installation scripts provided by NVIDIA, such as the sample script and the installer script, to the /usr/share/doc/NVIDIA_GLX-1.0/samples/nvidia-persistenced-init.tar.bz2 path by installing the Tesla driver installation package.

    1. Run the following commands to decompress and install the installation script provided by NVIDIA:

      cd  /usr/share/doc/NVIDIA_GLX-1.0/samples/
      sudo tar xf nvidia-persistenced-init.tar.bz2
      cd  nvidia-persistenced-init
      sudo sh install.sh
    2. Run the following command to check whether the NVIDIA Persistence Daemon runs as expected:

      sudo systemctl status nvidia-persistenced

      If the following command output is displayed, the NVIDIA Persistence Daemon runs as expected.

      persistence Daemon.jpg

      Note

      You can adapt the NVIDIA Persistence Daemon installation script based on your OS to ensure that the NVIDIA Persistence Daemon works as expected.

    3. Run the following command to verify that Persistence Mode is in the enabled (On) state:

      nvidia-smi
    4. (Optional) Run the following commands to disable the NVIDIA Persistence Daemon.

      You can disable the NVIDIA Persistence Daemon based on your business requirements.

      sudo systemctl stop nvidia-persistenced
      sudo systemctl disable nvidia-persistenced
  5. (Conditionally required) Install nvidia-fabricmanager that matches the driver version. This operation is required when your GPU-accelerated instance belongs to the ebmgn7 or ebmgn7e instance family.

    Important
    • If the GPU-accelerated instance belongs to the ebmgn7 or ebmgn7e instance family, you must install nvidia-fabricmanager that matches the driver version. Otherwise, you cannot use the instance as expected.

    • You can skip this operation if the GPU-accelerated instance does not belong to the ebmgn7 or ebmgn7e instance family.

    1. Install nvidia-fabricmanager.

      You can install nvidia-fabricmanager by using the source code or the installation package. The commands that are required to install nvidia-fabricmanager vary based on the OS. In the following examples, the driver versions are 460.91.03 and 535.154.05, and Ubuntu 16.04, Ubuntu 18.04, Ubuntu 20.04, and Ubuntu 22.04 are used. Replace driver_version with the version of the driver that you downloaded in Step 1: Download the Tesla driver.

      Important

      When you install nvidia-fabricmanager on Ubuntu 22.04, the version of the Tesla driver must be later than 515.48.07. In the following sample commands for Ubuntu 22.04, the driver version is 535.154.05.

      • Source code

        Ubuntu 16.04, Ubuntu 18.04, or Ubuntu 20.04

        driver_version=460.91.03
        driver_version_main=$(echo $driver_version | awk -F '.' '{print $1}')
        distribution=$(. /etc/os-release;echo $ID$VERSION_ID | sed -e 's/\.//g')
        sudo wget https://developer.download.nvidia.com/compute/cuda/repos/$distribution/x86_64/cuda-$distribution.pin
        sudo mv cuda-$distribution.pin /etc/apt/preferences.d/cuda-repository-pin-600
        sudo wget https://developer.download.nvidia.com/compute/cuda/repos/$distribution/x86_64/3bf863cc.pub
        sudo apt-key add 3bf863cc.pub
        sudo rm 3bf863cc.pub
        sudo echo "deb https://developer.download.nvidia.com/compute/cuda/repos/$distribution/x86_64 /" | tee /etc/apt/sources.list.d/cuda.list
        sudo apt-get update
        sudo apt-get -y install nvidia-fabricmanager-${driver_version_main}=${driver_version}-*

        Ubuntu 22.04

        driver_version=535.154.05
        driver_version_main=$(echo $driver_version | awk -F '.' '{print $1}')
        distribution=$(. /etc/os-release;echo $ID$VERSION_ID | sed -e 's/\.//g')
        sudo wget https://developer.download.nvidia.com/compute/cuda/repos/$distribution/x86_64/cuda-$distribution.pin
        sudo mv cuda-$distribution.pin /etc/apt/preferences.d/cuda-repository-pin-600
        sudo wget https://developer.download.nvidia.com/compute/cuda/repos/$distribution/x86_64/3bf863cc.pub
        sudo apt-key add 3bf863cc.pub
        sudo rm 3bf863cc.pub
        sudo echo "deb https://developer.download.nvidia.com/compute/cuda/repos/$distribution/x86_64 /" | tee /etc/apt/sources.list.d/cuda.list
        sudo apt-get update
        sudo apt-get -y install nvidia-fabricmanager-${driver_version_main}=${driver_version}-*
      • Installation package

        • Ubuntu 16.04

          driver_version=460.91.03
          driver_version_main=$(echo $driver_version | awk -F '.' '{print $1}')
          sudo wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/nvidia-fabricmanager-${driver_version_main}_${driver_version}-1_amd64.deb
          sudo dpkg -i nvidia-fabricmanager-${driver_version_main}_${driver_version}-1_amd64.deb
        • Ubuntu 18.04

          driver_version=460.91.03
          driver_version_main=$(echo $driver_version | awk -F '.' '{print $1}')
          sudo wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/nvidia-fabricmanager-${driver_version_main}_${driver_version}-1_amd64.deb
          sudo dpkg -i nvidia-fabricmanager-${driver_version_main}_${driver_version}-1_amd64.deb
        • Ubuntu 20.04

          driver_version=460.91.03
          driver_version_main=$(echo $driver_version | awk -F '.' '{print $1}')
          sudo wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/nvidia-fabricmanager-${driver_version_main}_${driver_version}-1_amd64.deb
          sudo dpkg -i nvidia-fabricmanager-${driver_version_main}_${driver_version}-1_amd64.deb
        • Ubuntu 22.04

          driver_version=535.154.05 
          driver_version_main=$(echo $driver_version | awk -F '.' '{print $1}')
          sudo wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/nvidia-fabricmanager-${driver_version_main}_${driver_version}-1_amd64.deb
          sudo dpkg -i nvidia-fabricmanager-${driver_version_main}_${driver_version}-1_amd64.deb
    2. Run the following commands to start nvidia-fabricmanager:

      sudo systemctl enable nvidia-fabricmanager
      sudo systemctl start nvidia-fabricmanager
    3. Run the following command to check whether nvidia-fabricmanager is installed:

      systemctl status nvidia-fabricmanager

      If the following command output is displayed, nvidia-fabricmanager is installed.

      image.png

      Note

      The GPU can work as expected only if the version of nvidia-fabricmanager is consistent with the Tesla driver version. For GPU-accelerated compute-optimized instances that run Ubuntu, the apt-daily service may automatically update nvidia-fabricmanager if you installed nvidia-fabricmanager by using an installation package. This results in version inconsistency between nvidia-fabricmanager and the Tesla driver. As a result, nvidia-fabricmanager fails to be started and the GPU cannot work as expected. For more information about how to resolve this issue, see What do I do if the GPU fails to work because the nvidia-fabricmanager version is inconsistent with the Tesla driver version?

References