All Products
Search
Document Center

Elastic GPU Service:Manually install a Tesla driver on a GPU-accelerated compute-optimized Linux instance

Last Updated:Oct 17, 2024

GPU-accelerated instances on which NVIDIA Tesla drivers are installed can deliver high-performance computing capabilities or provide smoother graphics display effects in specific scenarios. The scenarios include general computing scenarios such as deep learning and AI scenarios, and graphics acceleration scenarios such as Open Graphics Library (OpenGL), Direct3D, and cloud gaming scenarios. If you do not install a Tesla driver when you create a GPU-accelerated compute-optimized Linux instance, you must manually install the Tesla driver after you create the instance. This topic describes how to manually install a Tesla driver on a GPU-accelerated compute-optimized Linux instance.

Note

If a GPU-accelerated compute-optimized instance runs the Alibaba Cloud Linux 3 operating system and the Tesla driver is not automatically installed when you create the instance, you can use YUM to install the driver. For more information, see Use YUM to quickly install the NVIDIA Tesla driver on a GPU-accelerated compute-optimized Alibaba Cloud Linux 3 instance.

Procedure

Note

This procedure applies to all GPU-accelerated compute-optimized Linux instances. For more information, see GPU-accelerated compute-optimized instance families. You can install only Tesla drivers that run the same operating system as the instances. For example, you can install only a Linux Tesla driver on a GPU-accelerated compute-optimized Linux instance.

Step 1: Download a Tesla driver

  1. Visit the NVIDIA driver download page.

    Note

    For more information about how to install and configure an NVIDIA driver, see NVIDIA CUDA Installation Guide for Linux.

  2. Configure search conditions and click Find to search for a driver that is suitable for your instance.

    Tesla驱动.jpg

    The following table describes the search conditions.

    Condition

    Description

    Example

    • Product category

    • Product series

    • Product

    Select the product category, product series, and product based on the GPU of the GPU-accelerated instance.

    Note

    For information about how to view the details of a GPU-accelerated instance, such as the instance ID, instance type, and operating system, see View instance information.

    • Data Center / Tesla

    • A-Series

    • NVIDIA A10

    Operating system

    Select a Linux version based on the image of the instance.

    Linux 64-bit

    CUDA Toolkit version

    Select a CUDA Toolkit version.

    11.4

    Language

    Select a language for the driver.

    Chinese (Simplified)

    GPUs of GPU-accelerated instances and supported driver versions and CUDA Toolkit versions

    Item

    gn5

    gn5i

    gn6v

    gn6i

    gn6e

    gn7

    gn7i

    gn7e

    Product category

    Data Center / Tesla

    Data Center / Tesla

    Data Center / Tesla

    Data Center / Tesla

    Data Center / Tesla

    Data Center / Tesla

    Data Center / Tesla

    Data Center / Tesla

    Product series

    P-Series

    P-Series

    V-Series

    T-Series

    V-Series

    A-Series

    A-Series

    A-Series

    Recommended Tesla driver version

    Version 410.79 or later

    Version 450.80.02 or later

    Version 460.73.01 or later

    Version 450.80.02 or later

    Recommended CUDA Toolkit version

    CUDA Toolkit 10.1 Update 2

    CUDA Toolkit 11.0 Update 1

    CUDA Toolkit 11.2

    CUDA Toolkit 11.0 Update 1

    Note
    • The preceding table describes only the GPU information about specific popular GPU-accelerated compute-optimized instance families. Instances that use the same GPU have the same GPU information, such as the same product type, product series, and product family. For example, instances of the ebmgn7i and gn7i instance families use NVIDIA A10 GPUs. Therefore, the product type, product series, and product family of the instances are the same.

    • When you manually install the Tesla driver and CUDA Toolkit, you must make sure that the driver version is compatible with the CUDA Toolkit version. For more information, see CUDA Compatibility.

  3. On the result page, click View More Versions.

  4. Find the driver that you want to download and click View next to the driver name.

    In this example, the Data Center Driver for Linux x64 driver whose driver version is 470.161.03 and CUDA Toolkit version is 11.4 is selected.

  5. On the details page of the driver that you want to download, right-click Download and select Copy URL.

    驱动下载.jpg

  6. Connect to the GPU-accelerated compute-optimized Linux instance.

    For more information, see Connect to a Linux instance by using a password or key

  7. Run the following command to download the driver installation package:

    Replace the URL in the sample code with the URL that you obtained in Substep 5.

    wget https://us.download.nvidia.com/tesla/470.161.03/NVIDIA-Linux-x86_64-470.161.03.run

Step 2: Install the Tesla driver

The method for installing a Tesla driver on an instance varies based on the OS of the instance. The following section describes how to install a Tesla driver on different OSs.

CentOS

  1. Run the following command to check whether the kernel-devel and kernel-headers packages are installed on the GPU-accelerated instance:

    sudo rpm  -qa | grep $(uname -r)
    • If the command output includes the version information about the kernel-devel and kernel-headers packages, the packages are installed. Sample command output:

      kernel-3.10.0-1062.18.1.el7.x86_64
      kernel-devel-3.10.0-1062.18.1.el7.x86_64
      kernel-headers-3.10.0-1062.18.1.el7.x86_64
    • If the command output does not include the version information about the kernel-devel (kernel-devel-*) and kernel-headers (kernel-headers-*) packages, you must download and install the packages of the required version. For more information, see kernel-devel and kernel-headers.

      Important

      If the kernel-devel version is different from the kernel version, a compilation error occurs when you install RPM Package Manager (RPM) for your driver. Therefore, check the kernel version in the command output before you download the kernel-devel version. In the preceding command output, the kernel version is 3.10.0-1062.18.1.el7.x86_64.

  2. Grant the permissions on the installation package to your Tesla driver and install the driver.

    In this example, a Linux 64-bit Tesla driver is used. We recommend that you use a .run installation package for your Tesla driver, such as the NVIDIA-Linux-x86_64-xxxx.run package. Run the following commands to grant the execution permissions on the installation package and install the Tesla driver:

    Note

    If the installation package of your Tesla driver is in another format, such as the .deb or .rpm format, refer to NVIDIA CUDA Installation Guide for Linux for the installation method.

    sudo chmod +x NVIDIA-Linux-x86_64-xxxx.run
    sudo sh NVIDIA-Linux-x86_64-xxxx.run
  3. Run the following command to check whether the Tesla driver is installed:

    nvidia-smi

    If the following command output is displayed, the Tesla driver is installed.

    驱动版本.jpg

  4. (Optional) Enable the persistence mode (Persistence-M) by using the NVIDIA Persistence Daemon.

    After the Tesla driver is installed, Persistence-M is in the disabled (off) state by default. A Tesla driver can achieve more stable performance when Persistence-M is enabled. To ensure business continuity, we recommend that you enable Persistence-M by using the NVIDIA Persistence Daemon. For more information, see Persistence Daemon on the official NVIDIA website.

    Note
    1. Run the following command to run the NVIDIA Persistence Daemon:

      sudo nvidia-persistenced --user username 
      # Replace username with your username.
    2. Run the following command to view the status of Persistence-M:

      nvidia-smi

      If the following command output is displayed, Persistence-M is in the enabled (On) state.

      persistence.jpg

  5. (Optional) Enable Persistence-M after you restart the system.

    If you restart the system, the enabled (on) state of Persistence-M becomes invalid. You can perform the following operations to enable Persistence-M:

    Install the installation scripts provided by NVIDIA, such as the sample script and the installer script, to the /usr/share/doc/NVIDIA_GLX-1.0/samples/nvidia-persistenced-init.tar.bz2 path by installing the Tesla driver installation package.

    1. Run the following commands to decompress and install the installation script provided by NVIDIA:

      cd  /usr/share/doc/NVIDIA_GLX-1.0/samples/
      sudo tar xf nvidia-persistenced-init.tar.bz2
      cd  nvidia-persistenced-init
      sudo sh install.sh
    2. Run the following command to check whether the NVIDIA Persistence Daemon runs as expected:

      sudo systemctl status nvidia-persistenced

      If the following command output is displayed, the NVIDIA Persistence Daemon runs as expected.

      persistence Daemon.jpg

      Note

      You can adapt the NVIDIA Persistence Daemon installation script based on your operating system to ensure that the NVIDIA Persistence Daemon works as expected.

    3. Run the following command to verify that Persistence-M is in the enabled (on) state:

      nvidia-smi
    4. (Optional) Run the following commands to disable the NVIDIA Persistence Daemon.

      You can disable the NVIDIA Persistence Daemon based on your business requirements.

      sudo systemctl stop nvidia-persistenced
      sudo systemctl disable nvidia-persistenced
  6. (Conditionally required) Install NVIDIA Fabric Manager that matches the driver version. This operation is required when your GPU-accelerated instance belongs to the ebmgn7 or ebmgn7e instance family.

    Important
    • If the GPU-accelerated instance belongs to the ebmgn7 or ebmgn7e instance family, you must install NVIDIA Fabric Manager that matches the driver version. Otherwise, you cannot use the instance as expected.

    • You can skip this operation if the GPU-accelerated instance does not belong to the ebmgn7 or ebmgn7e instance family.

    1. Install NVIDIA Fabric Manager.

      You can install NVIDIA Fabric Manager by using the source code or the installation package. The commands that are required to install NVIDIA Fabric Manager vary based on the operating system. In the following examples, the driver version is 460.91.03, and CentOS 7.x and CentOS 8.x are used. Replace driver_version with the version of the driver that you downloaded in Step 1: Download a Tesla driver.

      • Source code

        • CentOS 7.x

          driver_version=460.91.03
          sudo yum -y install yum-utils
          sudo yum-config-manager --add-repo https://developer.download.nvidia.com/compute/cuda/repos/rhel7/x86_64/cuda-rhel7.repo
          sudo yum install -y nvidia-fabric-manager-${driver_version}-1
        • CentOS 8.x

          driver_version=460.91.03
          driver_version_main=$(echo $driver_version | awk -F '.' '{print $1}')
          distribution=rhel8
          ARCH=$( /bin/arch )
          sudo dnf config-manager --add-repo https://developer.download.nvidia.com/compute/cuda/repos/$distribution/${ARCH}/cuda-$distribution.repo
          sudo dnf module enable -y nvidia-driver:${driver_version_main}
          sudo dnf install -y nvidia-fabric-manager-0:${driver_version}-1
      • Installation package

        • CentOS 7.x

          driver_version=460.91.03
          sudo wget https://developer.download.nvidia.com/compute/cuda/repos/rhel7/x86_64/nvidia-fabric-manager-${driver_version}-1.x86_64.rpm
          sudo rpm -ivh nvidia-fabric-manager-${driver_version}-1.x86_64.rpm
        • CentOS 8.x

          driver_version=460.91.03
          sudo wget https://developer.download.nvidia.com/compute/cuda/repos/rhel8/x86_64/nvidia-fabric-manager-${driver_version}-1.x86_64.rpm
          sudo rpm -ivh nvidia-fabric-manager-${driver_version}-1.x86_64.rpm
    2. Run the following commands to start NVIDIA Fabric Manager:

      sudo systemctl enable nvidia-fabricmanager
      sudo systemctl start nvidia-fabricmanager
    3. Run the following command to check whether NVIDIA Fabric Manager is installed:

      systemctl status nvidia-fabricmanager

      If the following command output is displayed, NVIDIA Fabric Manager is installed.

      Dingtalk_20240910143221.jpg

Other Linux distributions such as Ubuntu

  1. Grant the permissions on the installation package to your Tesla driver and install the driver.

    In this example, a Linux 64-bit Tesla driver is used. We recommend that you use a .run installation package for your Tesla driver, such as the NVIDIA-Linux-x86_64-xxxx.run package. Run the following commands to grant the execution permissions on the installation package and install the Tesla driver:

    Note

    If the installation package of your Tesla driver is in another format, such as the .deb or .rpm format, refer to NVIDIA CUDA Installation Guide for Linux for the installation method.

    sudo chmod +x NVIDIA-Linux-x86_64-xxxx.run
    sudo sh NVIDIA-Linux-x86_64-xxxx.run
  2. Run the following command to check whether the Tesla driver is installed:

    nvidia-smi

    If the following command output is displayed, the Tesla driver is installed.

    驱动版本.jpg

  3. (Optional) Enable the persistence mode (Persistence-M) by using the NVIDIA Persistence Daemon.

    After the Tesla driver is installed, Persistence-M is in the disabled (off) state by default. A Tesla driver can achieve more stable performance when Persistence-M is enabled. To ensure business continuity, we recommend that you enable Persistence-M by using the NVIDIA Persistence Daemon. For more information, see Persistence Daemon on the official NVIDIA website.

    Note
    1. Run the following command to run the NVIDIA Persistence Daemon:

      sudo nvidia-persistenced --user username 
      # Replace username with your username.
    2. Run the following command to view the status of Persistence-M:

      nvidia-smi

      If the following command output is displayed, Persistence-M is in the enabled (On) state.

      persistence.jpg

  4. (Optional) Enable Persistence-M after you restart the system.

    If you restart the system, the enabled (on) state of Persistence-M becomes invalid. You can perform the following operations to enable Persistence-M:

    Install the installation scripts provided by NVIDIA, such as the sample script and the installer script, to the /usr/share/doc/NVIDIA_GLX-1.0/samples/nvidia-persistenced-init.tar.bz2 path by installing the Tesla driver installation package.

    1. Run the following commands to decompress and install the installation script provided by NVIDIA:

      cd  /usr/share/doc/NVIDIA_GLX-1.0/samples/
      sudo tar xf nvidia-persistenced-init.tar.bz2
      cd  nvidia-persistenced-init
      sudo sh install.sh
    2. Run the following command to check whether the NVIDIA Persistence Daemon runs as expected:

      sudo systemctl status nvidia-persistenced

      If the following command output is displayed, the NVIDIA Persistence Daemon runs as expected.

      persistence Daemon.jpg

      Note

      You can adapt the NVIDIA Persistence Daemon installation script based on your operating system to ensure that the NVIDIA Persistence Daemon works as expected.

    3. Run the following command to verify that Persistence-M is in the enabled (on) state:

      nvidia-smi
    4. (Optional) Run the following commands to disable the NVIDIA Persistence Daemon.

      You can disable the NVIDIA Persistence Daemon based on your business requirements.

      sudo systemctl stop nvidia-persistenced
      sudo systemctl disable nvidia-persistenced
  5. (Conditionally required) Install NVIDIA Fabric Manager that matches the driver version. This operation is required when your GPU-accelerated instance belongs to the ebmgn7 or ebmgn7e instance family.

    Important
    • If the GPU-accelerated instance belongs to the ebmgn7 or ebmgn7e instance family, you must install NVIDIA Fabric Manager that matches the driver version. Otherwise, you cannot use the instance as expected.

    • You can skip this operation if the GPU-accelerated instance does not belong to the ebmgn7 or ebmgn7e instance family.

    1. Install NVIDIA Fabric Manager.

      You can install NVIDIA Fabric Manager by using the source code or the installation package. The commands that are required to install NVIDIA Fabric Manager vary based on the operating system. In the following examples, the driver versions are 460.91.03 and 535.154.05, and Ubuntu 16.04, Ubuntu 18.04, Ubuntu 20.04, and Ubuntu 22.04 are used. Replace driver_version with the version of the driver that you downloaded in Step 1: Download a Tesla driver.

      Important

      When you install NVIDIA Fabric Manager on Ubuntu 22.04, the version of the Tesla driver must be later than 515.48.07. In the following sample commands for Ubuntu 22.04, the driver version is 535.154.05.

      • Source code

        Ubuntu 16.04, Ubuntu 18.04, or Ubuntu 20.04

        driver_version=460.91.03
        driver_version_main=$(echo $driver_version | awk -F '.' '{print $1}')
        distribution=$(. /etc/os-release;echo $ID$VERSION_ID | sed -e 's/\.//g')
        sudo wget https://developer.download.nvidia.com/compute/cuda/repos/$distribution/x86_64/cuda-$distribution.pin
        sudo mv cuda-$distribution.pin /etc/apt/preferences.d/cuda-repository-pin-600
        sudo wget https://developer.download.nvidia.com/compute/cuda/repos/$distribution/x86_64/3bf863cc.pub
        sudo apt-key add 3bf863cc.pub
        sudo rm 3bf863cc.pub
        sudo echo "deb https://developer.download.nvidia.com/compute/cuda/repos/$distribution/x86_64 /" | tee /etc/apt/sources.list.d/cuda.list
        sudo apt-get update
        sudo apt-get -y install nvidia-fabricmanager-${driver_version_main}=${driver_version}-*

        Ubuntu 22.04

        driver_version=535.154.05
        driver_version_main=$(echo $driver_version | awk -F '.' '{print $1}')
        distribution=$(. /etc/os-release;echo $ID$VERSION_ID | sed -e 's/\.//g')
        sudo wget https://developer.download.nvidia.com/compute/cuda/repos/$distribution/x86_64/cuda-$distribution.pin
        sudo mv cuda-$distribution.pin /etc/apt/preferences.d/cuda-repository-pin-600
        sudo wget https://developer.download.nvidia.com/compute/cuda/repos/$distribution/x86_64/3bf863cc.pub
        sudo apt-key add 3bf863cc.pub
        sudo rm 3bf863cc.pub
        sudo echo "deb https://developer.download.nvidia.com/compute/cuda/repos/$distribution/x86_64 /" | tee /etc/apt/sources.list.d/cuda.list
        sudo apt-get update
        sudo apt-get -y install nvidia-fabricmanager-${driver_version_main}=${driver_version}-*
      • Installation package

        • Ubuntu 16.04

          driver_version=460.91.03
          driver_version_main=$(echo $driver_version | awk -F '.' '{print $1}')
          sudo wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/nvidia-fabricmanager-${driver_version_main}_${driver_version}-1_amd64.deb
          sudo dpkg -i nvidia-fabricmanager-${driver_version_main}_${driver_version}-1_amd64.deb
        • Ubuntu 18.04

          driver_version=460.91.03
          driver_version_main=$(echo $driver_version | awk -F '.' '{print $1}')
          sudo wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/nvidia-fabricmanager-${driver_version_main}_${driver_version}-1_amd64.deb
          sudo dpkg -i nvidia-fabricmanager-${driver_version_main}_${driver_version}-1_amd64.deb
        • Ubuntu 20.04

          driver_version=460.91.03
          driver_version_main=$(echo $driver_version | awk -F '.' '{print $1}')
          sudo wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/nvidia-fabricmanager-${driver_version_main}_${driver_version}-1_amd64.deb
          sudo dpkg -i nvidia-fabricmanager-${driver_version_main}_${driver_version}-1_amd64.deb
        • Ubuntu 22.04

          driver_version=535.154.05 
          driver_version_main=$(echo $driver_version | awk -F '.' '{print $1}')
          sudo wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/nvidia-fabricmanager-${driver_version_main}_${driver_version}-1_amd64.deb
          sudo dpkg -i nvidia-fabricmanager-${driver_version_main}_${driver_version}-1_amd64.deb
    2. Run the following commands to start NVIDIA Fabric Manager:

      sudo systemctl enable nvidia-fabricmanager
      sudo systemctl start nvidia-fabricmanager
    3. Run the following command to check whether NVIDIA Fabric Manager is installed:

      systemctl status nvidia-fabricmanager

      If the following command output is displayed, NVIDIA Fabric Manager is installed.

      image.png

      Note

      The GPU can work as expected only if the version of NVIDIA Fabric Manager is consistent with the Tesla driver version. For GPU-accelerated compute-optimized instances that run Ubuntu, the apt-daily service may automatically update NVIDIA Fabric Manager if you install NVIDIA Fabric Manager by using an installation package. This results in version inconsistency between NVIDIA Fabric Manager and the Tesla driver. As a result, NVIDIA Fabric Manager fails to start and the GPU fails to work as expected. For information about how to resolve this issue, see What do I do if the GPU fails to work because the nvidia-fabricmanager version is inconsistent with the Tesla driver version?

References