All Products
Search
Document Center

Elastic Compute Service:Enable eRDMA on an enterprise-level instance

Last Updated:Jan 29, 2026

Some enterprise-level Elastic Compute Service (ECS) instances support elastic Remote Direct Memory Access (eRDMA). This feature delivers a high-performance RDMA network service that offers ultra-low latency, high throughput, and high elasticity without requiring changes to your existing network architecture. This topic describes how to enable eRDMA on an enterprise-level ECS instance.

Limits

Limitations

Description

Instance type

The following instance families support eRDMA:

Some instance types, such as g9ae, c9ae, r9ae, g9a, c9a, r9a, and u2a, require four or more vCPUs to support eRDMA.

Click to view the supported instance types

Image

  • Alibaba Cloud Linux 3 (recommended)

  • Alibaba Cloud Linux 2 (x86-based systems only)

  • CentOS 7.9 (x86-based systems only)

  • Ubuntu 18.04, 20.04, 22.04, or 24.04

  • Anolis OS 8.4 ANCK or 8.6 ANCK (Arm-based systems only)

Note

The images that are available vary based on the instance type. The images available on the instance buy page are final.

Number of eRDMA devices

To query the maximum number of ERIs that you can bind to an ECS instance of a specific instance type, call the DescribeInstanceTypes operation and check the value of the EriQuantity parameter in the response. A value of 0 indicates that you cannot bind an ERI to an ECS instance of the instance type.

Network limits

  • You cannot assign an IPv6 address to an ENI after you enable the ERI feature for the ENI.

  • When two instances communicate over an eRDMA connection, the communication path cannot span across network elements, such as Server Load Balancer (SLB) instances.

  • An eRDMA-capable GPU-accelerated instance cannot directly communicate with an eRDMA-capable enterprise-level instance because they use different eRDMA working modes. To enable communication, deploy eRDMA on the enterprise-level instance in the same way that you deploy eRDMA on a GPU-accelerated instance. This includes installing the eRDMA and OpenFabrics Enterprise Distribution (OFED) drivers and attaching an ERI to the enterprise-level instance. For more information, see Enable eRDMA on a GPU-accelerated instance.

Configure eRDMA for an enterprise-level instance

Configure eRDMA when you create an instance

Important
  • If the operating system does not support the eRDMA driver or if the automatic installation fails, you can install the driver using a script or by performing a manual installation after the instance is created. For more information, see Configure eRDMA for an existing instance.

  • After the instance starts, the eRDMA driver installation may take some time to complete.

  1. Go to the instance purchase page.

  2. Create an enterprise-level instance that supports ERIs. During the creation process, take note of the following configuration items. For information about other parameters, see Create an instance using the wizard.

    • Instance and Image: Select an instance type that supports eRDMA and select the option to install the eRDMA driver.

      • Instance: For more information, see Limits.

      • Image: Select Public Image and then select Install eRDMA driver. The eRDMA driver is automatically installed when the instance starts. You do not need to perform a manual installation.

        image

    • ENI: To the right of Primary ENI, enable the ERI feature to attach an ERI to the ECS instance.

      image

      Note

      When you purchase an enterprise-level instance, you can enable the ERI feature only for the primary ENI. To configure eRDMA for a secondary ENI, you can enable the ERI feature for the secondary ENI in the console or by calling an API operation. For more information, see Elastic RDMA Interface (ERI).

Configure eRDMA for an existing instance

  1. Make sure that the instance type is in the list of instance types that support eRDMA.

    The instance type must support eRDMA.

  2. Verify that eRDMA is configured correctly for the instance.

    • First, verify that eRDMA is configured correctly for the instance. For more information, see Verify eRDMA configurations.

    • If you confirm that eRDMA is not configured for the instance, perform the following steps to install the eRDMA driver and attach an ERI to the instance.

  3. Install the eRDMA driver on the instance.

    If you do not select the eRDMA driver when you create the instance, the driver is not automatically installed. You must then install it manually or using a script.

    • Script-based installation: The latest stable version of the driver package is downloaded by default.

    • Manual installation: You can download a specific version of the driver package.

    Install using a one-click script

    1. Run the following command to download the latest stable version of the driver package.

      curl -O http://mirrors.cloud.aliyuncs.com/erdma/env_setup.sh
    2. Run the following command to install the downloaded driver package.

      sudo /bin/bash env_setup.sh > /var/log/erdma_install.log 2>&1

      After you run the installation script, the script automatically installs the software dependencies required for the eRDMA driver and then installs the eRDMA driver. Wait for the script to finish running.

      Note

      If the driver fails to install using the script, check the installation log. The path of the installation log is /var/log/erdma_install.log.

    Install manually step by step

    1. Run the following command to update the prerequisite software packages.

      • For Alibaba Cloud Linux 3, CentOS, or Anolis OS:

        sudo yum update -y
      • For Ubuntu: You do not need to perform an update. Skip this step.

    2. Run the following commands in sequence to view the latest kernel package version and the kernel version of the operating system.

      rpm -qa | grep kernel  # View the latest kernel package version.
      uname -r  # View the kernel version of the operating system.

      If the command output is similar to the following example, the versions are consistent and no further action is required. If the versions are inconsistent, restart the instance for the changes to take effect.

      image.png

    3. Run the following command to install the dependency packages.

      • For x86-based instances, perform the following operations:

        • For Alibaba Cloud Linux 3, CentOS, or Anolis OS:

          sudo yum install gcc-c++ dkms cmake kernel-devel kernel-headers libnl3 libnl3-devel
        • For Ubuntu:

          sudo apt-get install dkms cmake libnl-3-dev libnl-route-3-dev linux-headers-generic
      • For Arm-based instances, build tasks are run based on the source code. This process requires many software dependencies that may change. You can skip this step and run the installation script directly. If the installation script fails, it prompts you to install the required software dependencies. Install the dependencies as prompted and then run the software installation again.

    4. Run the following command to download the driver installation package.

      • Obtain the software package from an internal network address.

        wget http://mirrors.cloud.aliyuncs.com/erdma/erdma_installer-latest.tar.gz
      • Obtain the software package from an Internet address.

        wget https://mirrors.aliyun.com/erdma/erdma_installer-latest.tar.gz

      By default, the latest version of the driver installation package is downloaded. You can also download a specific version of the driver package. For information about the release of different versions of the eRDMA installation package, see Step 2: Install the eRDMA driver for an ECS instance.

    5. Run the following command to decompress the installation package and go to the file directory.

      tar -xvf erdma_installer-latest.tar.gz && cd erdma_installer
    6. Run the following command to install the driver.

      • Method 1: Manually confirm the uninstallation and automatic download steps during the installation process.

        sudo sh install.sh
      • Method 2: Install without confirmation.

        sudo sh install.sh  --batch

      Confirm the installation result based on the returned information.

      If the following information is returned, the driver is installed.

      4.png

      If the following information is returned, the driver failed to install. Perform the operations as prompted and then try to install the driver again.

      5.png

      Note

      If you use a CentOS 7 system and a message that indicates missing software packages appears when you reinstall the driver, but you cannot obtain the packages using yum, you may need to run the yum install -y epel-release command to install the EPEL repository before you can obtain the packages.

  4. Attach an ERI to the instance.

    You can attach an ERI to an instance in one of the following ways.

    Note

    To query the maximum number of ERIs that you can bind to an ECS instance of a specific instance type, call the DescribeInstanceTypes operation and check the value of the EriQuantity parameter in the response. A value of 0 indicates that you cannot bind an ERI to an ECS instance of the instance type.

    • Enable the ERI feature for an ENI that is bound to an ECS instance

      You can enable the ERI feature for an ENI that is bound to an ECS instance by modifying the attributes of the ENI. For more information, see the Change the status of the ERI feature for an existing ENI section of the "ERIs" topic.

    • Create an ERI and bind the ERI to an ECS instance

    • Call API operations to create an ERI and bind the ERI to an ECS instance

      Perform the following steps:

      1. Call an API operation to create an ERI.

        Call the CreateNetworkInterface operation to create an ENI and set the NetworkInterfaceTrafficMode parameter to HighPerformance to enable the ERI feature for the ENI.

        After the call is successful, record the return value of the NetworkInterfaceId parameter, which is the ERI ID.

      2. Set the NetworkInterfaceId parameter to the return value recorded in the preceding step and the InstanceId parameter to the ID of an ECS instance and call the AttachNetworkInterface operation to bind the ERI to the ECS instance.

        Important

        If the instance type of the ECS instance supports multiple ERIs per instance, we recommend that you set the NetworkCardIndex parameter to a different value for each ERI when you bind multiple ERIs to the instance. This ensures that the ERIs are bound to different channels and the maximum network bandwidth is achieved for the instance. For more information, see Network card indexes.

Test the eRDMA write latency of an instance

You can install perftest and then use ib_write_lat to test the write latency on two enterprise-level instances that have eRDMA configured. For more information about perftest tests, see perftest test set.

Prerequisites

  1. Prepare two enterprise-level instances that have eRDMA configured. The eRDMA software stack must be installed and ERIs must be attached to the instances. One instance serves as the server and the other as the client.

  2. Make sure that the network is configured correctly and that the two servers can communicate with each other over the internal network. For more information, see Enable service interconnection between ECS instances.

Procedure

  1. Remotely connect to the two instances.

    For more information, see Connect to a Linux instance using Workbench.

  2. Verify that the eRDMA configurations on both instances are correct.

    For more information, see Verify eRDMA configurations.

  3. Run the following commands on the two instances to install the perftest tool.

    You can download the perftest package from the official perftest repository and install perftest, or use a Yellowdog Updater, Modified (YUM) or Advanced Packaging Tool (APT) repository to install perftest.

    Official perftest repository

    1. Enable public bandwidth for an ECS instance on which you want to install perftest. For more information, see Enable public bandwidth.

    2. Download the perftest package from the official perftest repository and install perftest.

    YUM or APT repository

    Note

    Different versions of perftest are included in the repositories of different Linux distributions. Incompatibility may occur. To prevent incompatibility, we recommend that you identify the Linux distribution run by the ECS instance on which you want to install perftest and install the perftest version included in the repository of the same Linux distribution. Otherwise, download the perftest package from the official perftest repository and install perftest.

    • Alibaba Cloud Linux 3/CentOS/Anolis OS

      sudo yum install perftest -y
    • Ubuntu

      sudo apt install perftest -y
  4. Test the eRDMA network latency against the expected performance.

    1. On the server instance, run the following command to start ib_write_lat as a server that listens for connections from the client.

      ib_write_lat -R -a -F
      • -R: uses RDMA_CM to establish a connection.

        Important
        • By default, CPU-based instance types that support eRDMA install the eRDMA kernel driver in Standard mode. In this mode, only the RDMA_CM connection establishment method is supported. For more information, see Connection establishment methods.

        • By default, perftest establishes out-of-band (OOB) connections. When you run a perftest test on a CPU-based instance, you can specify the -R parameter on both the server and the client to use the RDMA_CM connection establishment method. Otherwise, an exception may occur when a connection is established.

        • You can also use the command line to make the RDMA_CM and OOB connection establishment methods compatible. For more information, see Modify the connection establishment modes of eRDMA and bRPC to ensure compatibility. After you make the methods compatible, you do not need to add the -R parameter to the command.

      • -a: runs tests for all message sizes, from 2 bytes to 2^23 bytes. This lets you test the effect of different message sizes on latency.

      • -F: forcefully overwrites any existing connection. This means that if a connection was previously established, using the -F option ignores the existing connection and forcefully establishes a new one.

    2. On the client instance, run the following command to start ib_write_lat and connect to the server.

      ib_write_lat -R -a -F <server_ip>

      Replace <server_ip> with the private IP address of the network interface card (NIC) for which the ERI feature is enabled on the server ECS instance. For information about how to obtain an IP address, see View IP addresses.

    3. View the test results.

      After the client test is complete, ib_write_lat outputs the test configuration information, connection information, and performance test results. The results include latency-related statistics, such as the minimum, maximum, and average latency.

      image

      Description of latency data in the ib_write_lat test results

      • #bytes: The message size. This is the size of the payload used in the test, ranging from 2 bytes to 8,388,608 bytes. Different message sizes help you understand the performance under different loads.

      • #iterations: The number of iterations. This indicates how many times the test for each message size was repeated. A high number of iterations provides more stable averages and statistics.

      • t_min[usec]: The minimum latency. This is the minimum latency recorded in all measurements, in microseconds. This value provides a reference for the best-case network latency.

      • t_max[usec]: The maximum latency. This is the maximum latency recorded in all measurements, in microseconds. A high maximum latency may indicate network problems or transient congestion.

      • t_typical[usec]: The typical latency. This is the common latency in the test, in microseconds. It is usually the median of all measured values.

      • t_avg[usec]: The average latency. This is the average latency of all measured values, in microseconds. This value provides an overall impression of the network latency.

      • t_stdev[usec]: The standard deviation of the latency. This indicates the degree of variation in latency values, in microseconds. A smaller standard deviation means that the latency is more stable, while a larger standard deviation means that the latency fluctuates more.

      • 99% percentile[usec]: The 99th percentile of latency. This indicates that 99% of the measured values are below this value, in microseconds. This data point helps you understand latency performance in extreme cases.

      • 99.9% percentile[usec]: The 99.9th percentile of latency. This indicates that 99.9% of the measured values are below this value, in microseconds. This data point helps you understand latency performance in extreme cases.

      By combining this data, you can obtain a comprehensive understanding of the RDMA network performance and use it for network optimization and troubleshooting. For example, if you find that the latency suddenly increases at a specific message size, you may need to check whether the network configuration or hardware performance meets the requirements. If you see large fluctuations in latency, you may need to further investigate congestion or instability in the network.