Elastic Remote Direct Memory Access (eRDMA) is an RDMA network service provided by Alibaba Cloud. eRDMA features low latency, high throughput, and high elasticity. To use eRDMA capabilities on a large scale, the instance types that you use must support eRDMA and use elastic network interfaces (ENIs) that support Elastic RDMA interfaces (ERIs). This topic introduces eRDMA and describes the benefits, use scenarios, and limits of eRDMA.
Introduction
What is eRDMA?
eRDMA is an elastic Remote Direct Memory Access (RDMA) network developed by Alibaba Cloud for the cloud. eRDMA reuses virtual private clouds (VPCs) as the underlying link and uses a congestion control (CC) algorithm that is developed by Alibaba Cloud. Compared with traditional RDMA networks, eRDMA features high throughput and low latency and supports RDMA networking on a large scale within seconds. eRDMA is compatible with traditional high-performance computing (HPC) applications and Transmission Control Protocol/Internet Protocol (TCP/IP) applications.
You can use eRDMA as the basis and deploy HPC applications in the cloud to obtain high-performance application clusters that have high elasticity at low costs. You can also replace a VPC with the eRDMA network to accelerate the performance of other applications.
How to implement the capabilities of eRDMA
The capabilities of eRDAM must be implemented based on the types of instances that support eRDMA. You can create and bind eRDMA-capable elastic network interfaces (ENIs) to implement the capabilities of eRDMA.
Elastic RDMA Interfaces (ERIs) are virtual network interfaces that can be bound to ECS instances. ERIs must depend on ENIs to enable RDMA devices. An ERI reuses the network to which an ENI belongs. This allows you to use the RDMA feature in the original network and enjoy the low latency provided by RDMA without the need to modify service networking.
Benefits
eRDMA provides the following benefits:
High performance
RDMA transfers data from user-mode programs to Host Channel Adapter (HCA) for network transmission by bypassing the kernel stack. This greatly reduces CPU load and latency. eRDMA provides the benefits of traditional RDMA interfaces and applies traditional RDMA technology to VPCs. The ultra-low latency provided by eRDMA allows you to enjoy the benefits of RDMA in cloud networks.
Inclusiveness
You can enable eRDMA for free. To enable eRDMA, you need to only select eRDMA when you purchase an ECS instance. This feature is free of charge.
Large-scale deployment
Traditional RDMA is based on lossless networks. This makes large-scale deployment costly and difficult. eRDMA allows transmission quality changes in VPCs, such as delays and packet losses, by using the Alibaba Cloud-developed CC algorithm. Therefore, eRDMA can ensure good performance in lossy networks.
Scalability
Compared with traditional RDMA interfaces that require separate network cards, eRDMA is based on the SHENLONG architecture and is an RDMA HCA card that can be used in the cloud. When you use ECS, you can dynamically add devices, perform hot migration, and deploy eRDMA in a flexible manner.
Shared VPCs
eRDMA depends on ENIs and reuses networks to which ENIs belong. This allows you to activate the RDMA feature in original networks without the need to modify service networking.
Common scenarios
The TCP/IP protocol is the mainstream network communication protocol based on which many applications are built. With the development of business that is related to data centers, higher requirements are imposed on network performance, such as lower delays and higher throughput. TCP/IP has become a bottleneck that restricts the performance of communication networks due to its limits such as high copy costs, high protocol stack processing, complicated CC algorithm, and frequent context switch.
RDMA helps solve the preceding pain points. RDMA provides features, such as zero-copy and kernel bypass, to prevent costs in copy and frequent context switch. Compared with TCP/IP, RDMA features low latency, high throughput, and low CPU utilization. However, RDMA has few users due to high prices and O&M costs.
eRDMA provided by Alibaba Cloud is designed to provide inclusive capabilities for the cloud. eRDMA meets requirements on low latency and can be used in common scenarios. This way, you can have better user experience in the cloud. Inclusive RDMA networks can be used in a wide range of scenarios. Compared with traditional RDMA, eRDMA can be used in more fields, such as cache databases, big data, HPC, and AI training. Considerable performance gains brought by eRDMA are yielded in the preceding fields.
Limits
Before you use eRDMA, make sure that the following limits are met:
ECS instance: For information about limits on ECS instances, see Configuring eRDMA on enterprise-level instances.