You can configure Elastic Remote Direct Memory Access (eRDMA) on specific enterprise-level Elastic Compute Service (ECS) instances to use the low-latency, high-throughput, high-performance, and highly scalable RDMA network services and improve network performance without the need to modify the network architecture.
Limits
Item | Description |
Region | eRDMA is available in the following regions: China (Beijing), China (Shanghai), China (Hangzhou), China (Shenzhen), China (Guangzhou), China (Ulanqab), and China (Heyuan). |
Instance family | The following instance families support eRDMA: g8a, general-purpose instance family c8a, compute-optimized instance family r8a, memory-optimized instance family g8i, general-purpose instance family c8i, compute-optimized instance family r8i, memory-optimized instance family g8ae, performance-enhanced general-purpose instance family c8ae, performance-enhanced compute-optimized instance family r8ae, performance-enhanced memory-optimized instance family g8y, general-purpose instance family c8y, compute-optimized instance family r8y, memory-optimized instance family i4, instance family with local SSDs
|
Image | Alibaba Cloud Linux 3 (recommended) Alibaba Cloud Linux 2 for x86 CentOS 7.9 for x86 Ubuntu 18.04/20.04/22.04 Anolis OS 8.4 ANCK for Arm and Anolis OS 8.6 ANCK for Arm
Note The images that are available for selection vary based on the instance type. The images that are available for selection are displayed on the instance buy page when you select an instance type that supports eRDMA. |
Number of eRDMA devices | To query the maximum number of ERIs that you can bind to an ECS instance of a specific instance type, call the DescribeInstanceTypes operation and check the value of the EriQuantity parameter in the response. A value of 0 indicates that you cannot bind an ERI to an ECS instance of the instance type. |
Network | You cannot assign IPv6 addresses to elastic RDMA interfaces (ERIs). When two ECS instances communicate over eRDMA, the communication path cannot span across network elements, such as Server Load Balancer (SLB) instances.
|
Configure eRDMA on an enterprise-level ECS instance
Configure eRDMA when you create an ECS instance
Configure eRDMA on an existing ECS instance
Important
When you create an eRDMA-capable instance that runs Alibaba Cloud Linux, Ubuntu, or Anolis OS, you can enable eRDMA by selecting the Auto-install eRDMA Driver option to automatically install the eRDMA driver and enabling the ERI feature for the primary ENI.
If you cannot select the Auto-install eRDMA Driver option for the operating system version that you select or the eRDMA driver fails to be automatically installed, you can install the driver manually or by using a script after the instance is created. For more information, see the Configure eRDMA on an existing instance section of this topic.
After you start the ECS instance, wait for a period of time for the system to install the eRDMA driver.
Go to the ECS instance buy page.
Create an enterprise-level ECS instance that supports ERIs. When you create the ECS instance, take note of the following parameters or options. For information about other parameters on the ECS instance buy page, see Create an instance on the Custom Launch tab.
Instances & Images: Select an instance type that supports eRDMA and install the eRDMA driver.
Instance: For more information, see the Limits section of this topic.
Image: Click the Public Images tab, select a public image, and then select Auto-install eRDMA Driver. The system automatically installs the eRDMA driver when the instance is started.

ENI: Select the eRDMA Interface option on the right side of Primary ENI to bind an ERI to the ECS instance.

Note
When you create an enterprise-level instance, you can enable the ERI feature only for the primary elastic network interface (ENI). You can enable the ERI feature for a secondary ENI in the ECS console or by calling an API operation. For more information, see ERIs.
Check the instance type against the list of instance types that support eRDMA.
Make sure that the instance type supports eRDMA.
Check whether eRDMA is configured as expected for the instance.
For information about how to check whether eRDMA is configured as expected for the instance, see the Verify the correctness of the eRDMA configurations section of the "Use eRDMA" topic.
If eRDMA is not configured as expected for the instance, you can perform the following steps to install the eRDMA driver and bind an ERI to the ECS instance.
Install the eRDMA driver on the instance.
If you do not select Auto-install eRDMA Driver when you create the instance, the eRDMA driver is not automatically installed on the instance. Install the eRDMA driver manually or by using a script based on the actual scenario.
If you use a script to install the eRDMA driver, the installation package for the latest stable eRDMA driver version is automatically downloaded.
If you want to manually install the eRDMA driver, you can download the package for a specific eRDMA driver version.
Execute a script to install the eRDMA driver
Manually install the eRDMA driver
Run the following command to download the most recent and stable eRDMA driver package:
curl -O http://mirrors.cloud.aliyuncs.com/erdma/env_setup.sh
Run the following command to install the eRDMA driver package:
sudo /bin/bash env_setup.sh > /var/log/erdma_install.log 2>&1
The script automatically installs the dependencies that are required by the eRDMA driver and then the eRDMA driver. Wait for the script execution to complete.
Note
If the eRDMA driver fails to be installed by using the script, check logs in the /var/log/erdma_install.log
file.
Update the prerequisite package.
For Alibaba Cloud Linux 3, CentOS, and Anolis OS, run the following command:
For Ubuntu, skip this step.
Run the following commands in sequence to query the most recent kernel package version and the operating system kernel version:
rpm -qa | grep kernel #Query the latest kernel package version.
uname -r #Query the operating system kernel version.
The command outputs shown in the following figure indicate that the kernel package version is the same as the operating system kernel version. In this case, you do not need to perform additional operations. If the versions are different, restart the ECS instance to make the versions the same.

Install dependency packages.
If the ECS instance is an x86 instance, run one of the following commands based on the instance operating system.
For Alibaba Cloud Linux 3, CentOS, and Anolis OS, run the following command:
sudo yum install gcc-c++ dkms cmake kernel-devel kernel-headers libnl3 libnl3-devel
For Ubuntu, run the following command:
sudo apt-get install dkms cmake libnl-3-dev libnl-route-3-dev kernel-headers
If the ECS instance is an Arm instance, the building task is executed based on the source code. In this case, a large number of dependencies are required and subject to change. You can skip this step and execute the installation script. If the installation script fails to install dependency packages, you are prompted to install the required dependency packages. Install the dependency packages as prompted and then re-install the eRDMA driver.
Download the driver installation package.
In this example, the installation package for the latest eRDMA driver version is downloaded. You can download the installation package for a specific eRDMA driver version based on your business scenarios. For information about the release of different versions of the eRDMA installation package, see the Install the eRDMA driver for an ECS instance section of the "Use eRDMA" topic.
Run the following command to decompress the installation package and then go to the directory to which the installation package is decompressed:
tar -xvf erdma_installer-latest.tar.gz && cd erdma_installer
Use one of the following methods to install the eRDMA driver:
Method 1: Run the following command to install the eRDMA driver. During the installation process, confirm relevant uninstallation steps and automatic installation steps.
Method 2: Run the following command to automatically install the eRDMA driver:
sudo sh install.sh --batch
View the command output to check whether the driver is installed.
The following command output indicates that the eRDMA driver is installed.

The following command output indicates that the eRDMA driver failed to be installed. Perform operations as prompted and then re-install the eRDMA driver.

Note
If the ECS instance runs CentOS 7 and you receive an error message indicating that packages are missing when you re-install the driver, you may fail to obtain the packages by running the yum
commands. In this case, you may need to run the yum install -y epel-release
command to install the Extra Packages for Enterprise Linux (EPEL) repository before you obtain the packages.
Bind an ERI to the ECS instance.
You can use one of the following methods to bind an ERI to the ECS instance.
Note
To query the maximum number of ERIs that you can bind to an ECS instance of a specific instance type, call the DescribeInstanceTypes operation and check the value of the EriQuantity parameter in the response. A value of 0 indicates that you cannot bind an ERI to an ECS instance of the instance type.
Enable the ERI feature for an ENI that is bound to an ECS instance
You can enable the ERI feature for an ENI that is bound to an ECS instance by modifying the attributes of the ENI. For more information, see the Change the status of the ERI feature for an existing ENI section of the "ERIs" topic.
Test the eRDMA write latency
You can install Perftest
and test the write latency by using ib_write_lat
on two enterprise-level instances that have eRDMA configured. For information about Perftest tests, see the Perftest test set section of the "Use eRDMA" topic.
Prepare the environment
Create two enterprise-level ECS instances that function as the server and client. Make sure that the eRDMA software stack is installed on the ECS instances and the ERIs are bound to the instances.
Make sure that the instances have valid network configurations and can communicate with each other over the internal network. For more information, see Connect ECS instances through an internal network.
Procedure
Connect to the two ECS instances.
For more information, see Use Workbench to connect to a Linux instance over SSH.
Verify and confirm that the eRDMA configurations on both instances are correct.
For more information, see Verify the correctness of eRDMA configurations of the "Use eRDMA" topic.
Install Perftest on each ECS instance.
You can download the perftest package from the official perftest repository and install perftest, or use a Yellowdog Updater, Modified (YUM) or Advanced Packaging Tool (APT) repository to install perftest.
Official perftest repository
YUM or APT repository
Note
Different versions of perftest are included in the repositories of different Linux distributions. Incompatibility may occur. To prevent incompatibility, we recommend that you identify the Linux distribution run by the ECS instance on which you want to install perftest and install the perftest version included in the repository of the same Linux distribution. Otherwise, download the perftest package from the official perftest repository and install perftest.
Test whether the eRDMA network latency meets the expected performance.
On the server-side instance, run the following command to start ib_write_lat
as a server that listens for connections from the client:
-R
: uses RDMA Connection Manager (CM) to establish a connection.
Important
By default, CPU-based instance families that support eRDMA install the eRDMA kernel-mode driver in Standard mode. In this mode, only the RDMA_CM connection establishement method is supported. For more information, see the Connection establishment method section of the "eRDMA" topic.
By default, Perftest establishes out of bandwidth (OOB) connections. When you perform Perftest tests on a CPU-based instance, you can specify the -R parameter on both the server and client to establish a connection by using the RDMA_CM method. Otherwise, an exception may occur during the connection establishment process.
You can also use the CLI method to enable compatibility between the RDMA_CM and OOB connection establishement methods. For more information, see Change the connection establishment mode of eRDMA to be compatible with bRPC. After you enable compatibility between the RDMA_CM and OOB connection establishement methods by using the CLI method, do not specify the -R parameter in the connection establishement command.
-a
: sends test messages of all sizes. The size range is 2 bytes to 2^23 bytes. This allows you to test the impacts of different message sizes on latency.
-F
: forcefully replaces an existing connection. If you do not require an existing RDMA connection, configure the -F
parameter to ignore the existing connection and establish a new connection.
On the client-side instance, run the following command to start ib_write_lat
and connect to the server:
ib_write_lat -R -a -F <server_ip>
Replace <server_ip>
with the private IP address of the ERI bound to the server-side ECS instance. For information about how to query IP addresses, see View IP addresses.
Check the test results.
After the client is tested, ib_write_lat
outputs the test configuration information, connection information, and performance test results. The statistics include the minimum, maximum, and average latency.

Latency data in the ib_write_lat test results
#bytes
: the size of the payload of a test message. Valid values: 2 to 8388608. Unit: bytes. Different message sizes help you understand the performance under different loads.
#iterations
: the number of iterations, which specifies the number of times messages of each size are repeatedly tested. A larger value indicates more stable statistics results, including average values.
t_min[usec]
: the minimum latency recorded in all tests. Unit: microseconds. This value provides a reference for the best-case network latency.
t_max[usec]
: the maximum latency recorded in all tests. Unit: microseconds. A large value may indicate specific network issues or transient traffic congestion.
t_typical[usec]
: the typical latency recorded in tests. Unit: microseconds. In most cases, the value is the median of all tests.
t_avg[usec]
: the average latency of all tests. Unit: microseconds. The average latency reflects the overall user experience on network latency.
t_stdev[usec]
: the standard deviation of the latency. Unit: microseconds. A smaller value indicates more stable latency. A larger value indicates that the latency fluctuates.
99% percentile[usec]
: the latency value at the 99th percentile, which indicates that 99% of the latency values in the test results are lower than this value. Unit: microseconds. The data points at the 99th percentile help you understand latency performance in extreme cases.
99.9% percentile[usec]
: the latency value at the 99.9th percentile, which indicates that 99.9% of the latency values in the test results are lower than this value. Unit: microseconds. The data points at the 99.9th percentile help you understand latency performance in extreme cases.
The preceding latency statistics provide you with a comprehensive understanding of the RDMA network performance to help you optimize network performance and troubleshoot network issues. For example, if the test results indicate a sudden increase in latency when test messages of a specific size are sent, you can check whether the network configurations or hardware performance meets your business requirements. If the test results indicate fluctuations in latency, you can check for traffic congestion or network instability issues.