All Products
Search
Document Center

Elastic Compute Service:Use eRDMA

Last Updated:Dec 13, 2024

You can create Elastic Compute Service (ECS) instances of instance types that support elastic Remote Direct Memory Access (eRDMA) and bind elastic RDMA interfaces (ERIs) to the instances to use eRDMA and benefit from the large-scale networking capabilities and low latency of RDMA. ERIs are elastic network interfaces (ENIs) for which the ERI feature is enabled.

Enable eRDMA on an ECS instance

Select an instance type that supports eRDMA

Bind ERIs to an ECS instance

You can enable the ERI feature for the primary ENI when you create an ECS instance or bind an ERI to an ECS instance after the instance is created.

  • Create an ERI and bind the ERI to an ECS instance

  • Enable the ERI feature for an ENI that is bound to an ECS instance

    You can enable the ERI feature for an ENI that is bound to an ECS instance by modifying the attributes of the ENI. For more information, see the Change the status of the ERI feature for an existing ENI section of the "ERIs" topic.

  • Call API operations to create an ERI and bind the ERI to an ECS instance

    Perform the following steps:

    1. Call an API operation to create an ERI.

      Call the CreateNetworkInterface operation to create an ENI and set the NetworkInterfaceTrafficMode parameter to HighPerformance to enable the ERI feature for the ENI.

      After the call is successful, record the return value of the NetworkInterfaceId parameter, which is the ERI ID.

    2. Set the NetworkInterfaceId parameter to the return value recorded in the preceding step and the InstanceId parameter to the ID of an ECS instance and call the AttachNetworkInterface operation to bind the ERI to the ECS instance.

      Important

      If the instance type of the ECS instance supports multiple ERIs per instance, we recommend that you set the NetworkCardIndex parameter to a different value for each ERI when you bind multiple ERIs to the instance. This ensures that the ERIs are bound to different channels and the maximum network bandwidth is achieved for the instance. For more information, see the Request parameters section of the "AttachNetworkInterface" topic.

Install the eRDMA driver on an ECS instance

Important
  • The eRDMA driver is developed by Alibaba Cloud in-house. Alibaba Cloud provides technical support for the eRDMA driver.

  • The installation process of the eRDMA driver requires a period of time to complete.

  • eRDMA driver installation packages

    Release notes for installation packages of different eRDMA driver versions (ordered by release date, from the latest to the earliest)

    Version

    Release date

    Download link

    Checksum

    Description

    1.4.0

    2024-9-27

    http://mirrors.cloud.aliyuncs.com/erdma/erdma_installer-1.4.0.tar.gz

    • MD5: 77135d946dddc015000c8f3ea4e6c586

    • SHA256: 8613d3d81e8eb3b78bf840c37cbe02c79f62631df36cdc8b2c7c101f49f5af29

    Performance in heterogeneous GPU-based scenarios is optimized.

    1.3.3

    2023-10-09

    http://mirrors.cloud.aliyuncs.com/erdma/erdma_installer-1.3.3.tar.gz

    • MD5: 51ffb06266255139554275bc86fa4caa

    • SHA256: 5aad6d006662bd902ef5e913fb97d2a6623aadeeacd06f1c3f1c74cbd1f57ded

    This version is updated to include the latest patches.

    1.3.2

    2023-09-08

    http://mirrors.cloud.aliyuncs.com/erdma/erdma_installer-1.3.2.tar.gz

    • MD5: 8492016fc96eece6a60687b0e4ea66dd

    • SHA256: 89ab265dc9fa8d56f1b2d8b13d7f50032390a265eddb2e04eeee3aa86fd169ce

    This version is updated to include the latest patches.

    1.3.1

    2023-08-18

    http://mirrors.cloud.aliyuncs.com/erdma/erdma_installer-1.3.1.tar.gz

    • MD5: b9b90212e6ba49d57b81d3c5d4210deb

    • SHA256: 4ebe31760443613f8f61fcdbef7a85b277dabc59039d048898536ea4fe5d8d4a

    The underlying transmission mode on the driver side can be set to strong ordering. In strong ordering mode, data packets are transmitted to memory only in sequence.

    1.3.0

    2023-06-26

    http://mirrors.cloud.aliyuncs.com/erdma/erdma_installer-1.3.0.tar.gz

    • MD5: 2da0c65643b5e2ffb61d75e1b5e5a7ab

    • SHA256: cce03aac0e07d0890884c35ad4f10e9d15f587535d788c8fc97ea268312ad4a9

    • Multi-level page tables are supported during memory region (MR) registration.

    • The IPv6 feature is supported, and IPv6 support from underlying hardware is required.

    • Ubuntu 22.04 is supported.

    • This version is updated to include the latest patches.

    1.2.3

    2023-05-30

    http://mirrors.cloud.aliyuncs.com/erdma/erdma_installer-1.2.3.tar.gz

    • MD5: 7496a6324f3872469d7194c2e234b19f

    • SHA256: 16c2de0d90da6906db91c2e2469aaad9e24131c44ce52b9464036f1c3747f8a2

    This version is updated to include the latest patches.

    1.2.2

    2023-05-04

    http://mirrors.cloud.aliyuncs.com/erdma/erdma_installer-1.2.2.tar.gz

    • MD5: f449d3961a41ff6a97a53cfa29e20d6c

    • SHA256: 11fdb4b3c778762ad0bdf2d0327008aa2ecb22dc508c9f9fae3568b41ae5462b

    Ubuntu 22.04 is supported.

    1.2.1

    2023-04-04

    http://mirrors.cloud.aliyuncs.com/erdma/erdma_installer-1.2.1.tar.gz

    • MD5: e080103934da76ce83924da789aecece

    • SHA256: be3a89e57143d7544cf968052250df92f911aebb035f07b06ebeb8c5f13bf976

    This version is updated to include the latest patches.

    1.2.0

    2023-03-09

    http://mirrors.cloud.aliyuncs.com/erdma/erdma_installer-1.2.0.tar.gz

    • MD5: c8d440a6e35ec6d2aaf1a568affea876

    • SHA256: d484997e28e29f862dc580c112b55b389a00faf88dc6aa89eea588ee1369a8ca

    • The compatible mode is supported.

    • This version is updated to include the latest patches.

    1.1.0

    2023-01-16

    http://mirrors.cloud.aliyuncs.com/erdma/erdma_installer-1.1.0.tar.gz

    • MD5: 1fea69d819919a77384f902213eb681e

    • SHA256: 176c3bb35d5584e8c8e43eba9b1824b8cb2b43a19d802c4e469363ed8e33fea6

    This version is updated to include the latest patches.

  • Install the eRDMA Driver

    You can automatically install the eRDMA driver by selecting the Auto-install eRDMA Driver option when you create an ECS instance that supports eRDMA, or manually install the eRDMA driver after the instance is created.

  • Check the version of the eRDMA kernel-mode driver

    After the eRDMA driver is installed, you can run the eadm ver command to check the version of the eRDMA kernel-mode driver. The latest eRDMA driver version 1.4.0 corresponds to the eRDMA kernel-mode driver version 0.2.37.

    image

Verify the correctness of eRDMA configurations

The ibv_devinfo command helps check whether eRDMA functions as expected. You can run the ibv_devinfo command to obtain information about devices, such as the hardware attributes, port status, and supported features of the devices. If at least one port is in the PORT_ACTIVE state, RDMA components run as expected, and RDMA features are enabled and can be used as expected. For more information, see 13.7. Testing Early InfiniBand RDMA operation.

You can also pass the -v parameter to the ibv_devinfo command to query more detailed information about each device, including the hardware version, supported maximum message size, number of queues, and memory window size. You can optimize and check RDMA network performance based on the preceding information.

You can run the ibv_devinfo command to verify the correctness of eRDMA configurations on an ECS instance.

  • Correct eRDMA configurations: The ERIs are bound to the ECS instance, and the eRDMA driver is installed on the instance as expected.

    Note
    • If the instance supports and is bound with multiple ERIs and the state field for the port of each eRDMA device on the instance is PORT_ACTIVE, the ERIs function as expected.

    • If the state field for the port of an ERI is invalid state, the ERI is abnormal. In this case, check whether the ERI is properly configured. For example, run the ifconfig command to check whether all configurations, including IP addresses, of the ERI exist. For more information, see Configure a secondary ENI.

    image

  • No ERIs bound to the instance: The eRDMA driver is installed on the instance, but no ERIs are bound to the instance. In this case, you must bind ERIs to the instance. For more information, see the Bind ERIs to an ECS instance section of this topic.

    image

  • eRDMA driver not installed as expected: ERIs are bound to the instance, but the eRDMA driver is not installed on the instance as expected. In this case, you must re-install the eRDMA driver on the instance. For more information, see the Install the eRDMA driver on an ECS instance section of this topic.

    image

You can also use the diagnose tool to check the basic functionality of eRDMA. For more information, see the Use the diagnose tool to check RDMA-related issues and evaluate eRDMA performance section of the "Monitor and check eRDMA" topic.

Test eRDMA network performance

Perftest is a performance test toolkit that provides various test options to evaluate network operations, such as the send, receive, read, and write operations. Perftest allows you to measure the performance metrics, such as latency and bandwidth, of RDMA operations. You can determine the performance of RDMA devices and networks based on the measurements and optimize configurations or resolve potential issues. For more information, see perftest.

Test programs included in perftest

Perftest includes a collection of test programs. You can use the test programs based on your business requirements to test network bandwidth or latency and evaluate network performance. The following table describes the test programs.

RDMA operation

Bandwidth test program

Latency test program

Send

ib_send_bw (send bandwidth test)

ib_send_lat (send latency test)

RDMA Read

ib_read_bw (read bandwidth test)

ib_read_lat (read latency test)

RDMA Write

ib_write_bw (write bandwidth test)

ib_write_lat (write latency test)

RDMA Atomic

ib_atomic_bw (atomic bandwidth test)

ib_atomic_lat (atomic latency test)

Native Ethernet

raw_ethernet_bw (raw Ethernet bandwidth test)

raw_ethernet_lat (raw Ethernet latency test)

Install perftest

You can download the perftest package from the official perftest repository and install perftest, or use a Yellowdog Updater, Modified (YUM) or Advanced Packaging Tool (APT) repository to install perftest.

Official perftest repository
  1. Enable public bandwidth for an ECS instance on which you want to install perftest. For more information, see Enable public bandwidth for an ECS instance.

  2. Download the perftest package from the official perftest repository and install perftest.

YUM or APT repository
Note

Different versions of perftest are included in the repositories of different Linux distributions. Incompatibility may occur. To prevent incompatibility, we recommend that you identify the Linux distribution run by the ECS instance on which you want to install perftest and install the perftest version included in the repository of the same Linux distribution. Otherwise, download the perftest package from the official perftest repository and install perftest.

  • Alibaba Cloud Linux 3, CentOS, and Anolis OS

    sudo yum install perftest -y
  • Ubuntu

    sudo apt install perftest -y

Example of using perftest

You can run each test program included in perftest as a separate command. For example, run ib_send_lat as a command to perform a send latency test.

Correct test parameters are crucial when you use perftest to perform performance tests. By properly configuring the parameters, you can control perftest behaviors in a more accurate manner to meet specific test requirements and obtain more accurate test results. The following table describes specific critical parameters of perftest.

Common test parameters

You can run the <Subcommand> -h command to query test parameters and how to configure the parameters.

Test category

Test parameter

Latency test

  • -C, --report-cycles: Report times in CPU cycle units. This parameter is helpful in accurately measuring latency.

  • -H, --report-histogram: Print out all results. By default, only the summary is printed out. This parameter helps you understand the data distribution.

  • -U, --report-unsorted: Print out unsorted results. You can specify this parameter to analyze the original data distribution. By default, sorted results are printed out.

Bandwidth test

  • -b, --bidirectional: Measure bidirectional bandwidth. By default, unidirectional bandwidth is measured. This parameter is an important metric based on which you can determine the bidirectional transmission capabilities of networks.

  • -N, --no peak-bw: Cancel peak bandwidth (peak-bw) calculation. By default, peak bandwidth calaculation is not canceled. This allows you to focus on stable bandwidth performance.

  • -t, --tx-depth=<dep>: Specify the size of the transmit (Tx) queue, which affects the concurrency and performance of the test. Default value: 128.

  • -D, --duration=<sec>: Run a test for a customized period of seconds.

Send test

  • -r, --rx-depth=<dep>: Specify the size of the receive (Rx) queue, which affects the buffer size and performance. Default value: 512.

  • -g, --mcg=<num_of_qps>: Send messages to the multicast group to which <num_of_qps> Queue Pairs (QPs) are attached. This parameter is helpful in testing multicast performance.

Other advanced options

  • -u, --qp-timeout=<timeout>: Specify the QP timeout. Unit: microsecond. Default value: 14. The QP timeout is calculated by using the following formula: 4 × 2^(-u).

  • --force-link=<type>: Force the links to a specific type: IB or Ethernet. You can specify this parameter to test a specific type of network links.

  • --use_hugepages: Use Hugepages instead of contig or memalign allocations. This allows you to optimze memory usage and performance.

  • --rate_limit=<limit>: Set the maximum rate of sent packages. Default unit: Gbit/s. You can use the --rate_units parameter to change the unit of the maximum rate.

  • For information about how to perform a network latency test, see the Test the eRDMA write latency section of the "Configure eRDMA on an enterprise-level instance" toptic.

References