You can deploy a better Remote Procedure Call (bRPC) application on Elastic Compute Service (ECS) instances that support Elastic Remote Direct Memory Access (eRDMA). This way, you can take full advantage of the low latency, high throughput, and low CPU utilization provided by eRDMA to optimize the data transmission efficiency of the bRPC application. This is suitable for scenarios that require high message throughput and are latency sensitive. This topic describes how to deploy a bRPC application on ECS instances that support eRDMA and how to test the performance improvement of the bRPC application when eRDMA is used.
bRPC is a high-performance, general-purpose remote procedure call framework written in C++. bRPC provides a rich set of features and tools to simplify service development and deployment. bRPC is commonly used in search, storage, machine learning, and advertising scenarios. bRPC can be used to build high-concurrency and low-latency microservices and large-scale distributed systems. For more information about bRPC, see Getting Started.
eRDMA is a Remote Direct Memory Access (RDMA) service developed by Alibaba Cloud to ensure high network performance with low latency, high throughput, and high elasticity. For more information, see Overview.
Step 1: Make preparations
Create two eRDMA-capable ECS instances. One ECS instance serves as the server and the other instance serves as the client. During instance creation, take note of the following items:
Instance type: The selected instance type must support eRDMA. For more information, see the Limits section of the "Configure eRDMA on an enterprise-level instance" topic. Example: ecs.g8a.8xlarge.
Image: The selected image must support eRDMA. For more information, see the Limits section of the "Configure eRDMA on an enterprise-level instance" topic. Example: Alibaba Cloud Linux 3.2104 LTS 64-bit.
eRDMA driver: Select Auto-install eRDMA Driver. Then, the eRDMA driver is automatically installed during ECS instance creation.
NoteAfter the instance is started, wait for 3 to 5 minutes for the eRDMA driver to be installed. For more information, see Configure eRDMA on an enterprise-level instance.
Network:
You must enable each instance to access the Internet.
By default, instances in the same virtual private cloud (VPC) can communicate with each other over the internal network.
Elastic network interface (ENI): Select eRDMA Interface on the right side of the ENI section.
For information about other parameters, see Create an instance on the Custom Launch tab.
Step 2: Deploy and compile bRPC
Deploy and compile bRPC on the two ECS instances (server and client). In this example, the Alibaba Cloud Linux 3 operating system is used. For information about how to deploy bRPC in other operating systems, see Build of bRPC.
Log on to the two ECS instances in sequence.
For more information, see Connect to a Linux instance by using a password or key.
Run the following commands to change the connection establishment mode of eRDMA to be compatible with bRPC.
NoteBy default, eRDMA establishes connections in RDMA_CM mode, and bRPC in Out-of-Band (OOB) mode. Therefore, you must change the connection establishment mode of eRDMA to be compatible with bRPC.
sudo sh -c "echo 'options erdma compat_mode=Y' >> /etc/modprobe.d/erdma.conf" sudo dracut --force sudo rmmod erdma sudo modprobe erdma compat_mode=YUnlock the memory lock. For an eRDMA-capable application that requires large memory space, the memory lock needs being unlocked to improve data transmission efficiency.
Run the following command to modify the
limits.conffile:sudo vi /etc/security/limits.confAdd the following content to the end of the file and save and close the file:
* soft memlock unlimited * hard memlock unlimited
Run the following commands to deploy the bRPC application:
sudo yum install git gcc-c++ make openssl-devel gflags-devel protobuf-devel protobuf-compiler leveldb-devel -y git clone https://github.com/apache/brpc.gitWhen you use eRDMA to test bRPC, we recommend that you install the following patches on the server and client to achieve better performance.
Create a file in the
brpcdirectory based on the actual environment. For example, the file name iserdma-multi-sge.patch. Sample commands:cd ~/brpc sudo vi erdma-multi-sge.patchAdd the following content to the file and save and close the file:
diff --git a/src/brpc/rdma/rdma_helper.cpp b/src/brpc/rdma/rdma_helper.cpp index cf1cce95..d2592cbb 100644 --- a/src/brpc/rdma/rdma_helper.cpp +++ b/src/brpc/rdma/rdma_helper.cpp @@ -619,7 +619,7 @@ void DeregisterMemoryForRdma(void* buf) { } int GetRdmaMaxSge() { - return g_max_sge; + return 4; } int GetRdmaCompVector() { -- 2.39.3Run the following command to apply the patch file to the source code of bRPC:
patch -p1 < erdma-multi-sge.patchRun the following commands to compile the source code of bRPC:
sh config_brpc.sh --with-rdma --headers="/usr/include" --libs="/usr/lib64 /usr/bin" make -j cd example/rdma_performance; make -j
Step 3: Test performance
Test the performance of bRPC when eRDMA is used and when eRDMA is not used, and compare the test results to evaluate the performance improvement brought by eRDMA to bRPC.
Test the performance of bRPC in the following scenarios.
Test performance when eRDMA is used
Run the following command on the server to start the server and enable communication over eRDMA:
./server --rdma_gid_index=1 --rdma_prepared_qp_cnt=0 --use_rdma=trueRun the following command on the client to connect to the server and enable communication over eRDMA:
./client --servers=<Private IP address of the server>:8002 --rpc_timeout_ms=-1 --attachment_size=1024 --rdma_gid_index=1 --rdma_prepared_qp_cnt=0 --use_rdma=true --queue_depth=16Take note of the following parameters:
<Private IP address of the server>: Replace the parameter value with the actual private IP address of the server.--attachment_size: Specify the size of data that is attached to each bRPC call or data that is to be transferred. This parameter affects the efficiency of data transmission. Large-block data transmission can more efficiently utilize the advantages of eRDMA. However, the overlarge data size increases the complexity of memory management. We recommend that you set this parameter based on your actual test requirements.--queue_depth: Specify the depth of a request queue, that is, the number of concurrent requests in the queue. A larger value of the queue depth helps better cope with the flood of requests in high-concurrency scenarios and prevents request rejections caused by full queues. However, an overlarge value of the queue depth may occupy more memory resources. We recommend that you set the parameter based on your actual test requirements.
Test performance when eRDMA is not used
Run the following command on the server to start the server and enable communication over TCP:
./server --rdma_gid_index=1 --rdma_prepared_qp_cnt=0 --use_rdma=falseRun the following command on the client to connect to the server and enable communication over TCP:
./client --servers=<Private IP address of the server>:8002 --rpc_timeout_ms=-1 --attachment_size=1024 --rdma_gid_index=1 --rdma_prepared_qp_cnt=0 --use_rdma=false --queue_depth=16Take note of the following parameters:
<Private IP address of the server>: Replace the parameter value with the actual private IP address of the server.--attachment_size: Specify the size of data that is attached to each bRPC call or data that is to be transferred. This parameter affects the efficiency of data transmission. Large-block data transmission can more efficiently utilize the advantages of eRDMA. However, the overlarge data size increases the complexity of memory management. We recommend that you set this parameter based on your actual test requirements.--queue_depth: Specify the depth of a request queue, that is, the number of concurrent requests in the queue. A larger value of the queue depth helps better cope with the flood of requests in high-concurrency scenarios and prevents request rejections caused by full queues. However, an overlarge value of the queue depth may occupy more memory resources. We recommend that you set the parameter based on your actual test requirements.
Obtain results of the two tests to evaluate the performance improvement brought by eRDMA to bRPC in terms of latency by viewing the
Avg-Latencyfield and bandwidth by viewing theQPSfield.