All Products
Search
Document Center

Elastic Compute Service:Deploy a high-performance bRPC application based on eRDMA

Last Updated:Nov 19, 2024

You can deploy a better Remote Procedure Call (bRPC) application on Elastic Compute Service (ECS) instances that support Elastic Remote Direct Memory Access (eRDMA). This way, you can take full advantage of the low latency, high throughput, and low CPU utilization provided by eRDMA to optimize the data transmission efficiency of the bRPC application. This is suitable for scenarios that require high message throughput and are latency sensitive. This topic describes how to deploy a bRPC application on ECS instances that support eRDMA and how to test the performance improvement of the bRPC application when eRDMA is used.

Note
  • bRPC is a high-performance, general-purpose remote procedure call framework written in C++. bRPC provides a rich set of features and tools to simplify service development and deployment. bRPC is commonly used in search, storage, machine learning, and advertising scenarios. bRPC can be used to build high-concurrency and low-latency microservices and large-scale distributed systems. For more information about bRPC, see Getting Started.

  • eRDMA is a Remote Direct Memory Access (RDMA) service developed by Alibaba Cloud to ensure high network performance with low latency, high throughput, and high elasticity. For more information, see Overview.

Step 1: Make preparations

Create two eRDMA-capable ECS instances. One ECS instance serves as the server and the other instance serves as the client. During instance creation, take note of the following items:

  • Instance type: The selected instance type must support eRDMA. For more information, see the Limits section of the "Configure eRDMA on an enterprise-level instance" topic. Example: ecs.g8a.8xlarge.

  • Image: The selected image must support eRDMA. For more information, see the Limits section of the "Configure eRDMA on an enterprise-level instance" topic. Example: Alibaba Cloud Linux 3.2104 LTS 64-bit.

  • eRDMA driver: Select Auto-install eRDMA Driver. Then, the eRDMA driver is automatically installed during ECS instance creation.

    Note

    After the instance is started, wait for 3 to 5 minutes for the eRDMA driver to be installed. For more information, see Configure eRDMA on an enterprise-level instance.

  • Network:

    • You must enable each instance to access the Internet.

    • By default, instances in the same virtual private cloud (VPC) can communicate with each other over the internal network.

    • Elastic network interface (ENI): Select eRDMA Interface on the right side of the ENI section.

For information about other parameters, see Create an instance on the Custom Launch tab.

Step 2: Deploy and compile bRPC

Deploy and compile bRPC on the two ECS instances (server and client). In this example, the Alibaba Cloud Linux 3 operating system is used. For information about how to deploy bRPC in other operating systems, see Build of bRPC.

  1. Log on to the two ECS instances in sequence.

    For more information, see Connect to a Linux instance by using a password or key.

  2. Run the following commands to change the connection establishment mode of eRDMA to be compatible with bRPC.

    Note

    By default, eRDMA establishes connections in RDMA_CM mode, and bRPC in Out-of-Band (OOB) mode. Therefore, you must change the connection establishment mode of eRDMA to be compatible with bRPC.

    sudo sh -c "echo 'options erdma compat_mode=Y' >> /etc/modprobe.d/erdma.conf"
    sudo dracut --force
    sudo rmmod erdma
    sudo modprobe erdma compat_mode=Y
  3. Unlock the memory lock. For an eRDMA-capable application that requires large memory space, the memory lock needs being unlocked to improve data transmission efficiency.

    1. Run the following command to modify the limits.conf file:

      sudo vi /etc/security/limits.conf
    2. Add the following content to the end of the file and save and close the file:

      * soft memlock unlimited
      * hard memlock unlimited
  4. Run the following commands to deploy the bRPC application:

    sudo yum install git gcc-c++ make openssl-devel gflags-devel protobuf-devel protobuf-compiler leveldb-devel -y
    git clone https://github.com/apache/brpc.git
  5. When you use eRDMA to test bRPC, we recommend that you install the following patches on the server and client to achieve better performance.

    1. Create a file in the brpc directory based on the actual environment. For example, the file name is erdma-multi-sge.patch. Sample commands:

      cd ~/brpc
      sudo vi erdma-multi-sge.patch
    2. Add the following content to the file and save and close the file:

      diff --git a/src/brpc/rdma/rdma_helper.cpp b/src/brpc/rdma/rdma_helper.cpp
      index cf1cce95..d2592cbb 100644
      --- a/src/brpc/rdma/rdma_helper.cpp
      +++ b/src/brpc/rdma/rdma_helper.cpp
      @@ -619,7 +619,7 @@ void DeregisterMemoryForRdma(void* buf) {
       }
      
       int GetRdmaMaxSge() {
      -    return g_max_sge;
      +    return 4;
       }
      
       int GetRdmaCompVector() {
      --
      2.39.3
    3. Run the following command to apply the patch file to the source code of bRPC:

      patch -p1 < erdma-multi-sge.patch
    4. Run the following commands to compile the source code of bRPC:

      sh config_brpc.sh --with-rdma --headers="/usr/include" --libs="/usr/lib64 /usr/bin"
      make -j
      cd example/rdma_performance; make -j

Step 3: Test performance

Test the performance of bRPC when eRDMA is used and when eRDMA is not used, and compare the test results to evaluate the performance improvement brought by eRDMA to bRPC.

  1. Test the performance of bRPC in the following scenarios.

    Test performance when eRDMA is used

    1. Run the following command on the server to start the server and enable communication over eRDMA:

      ./server --rdma_gid_index=1 --rdma_prepared_qp_cnt=0 --use_rdma=true
    2. Run the following command on the client to connect to the server and enable communication over eRDMA:

      ./client --servers=<Private IP address of the server>:8002 --rpc_timeout_ms=-1 --attachment_size=1024 --rdma_gid_index=1 --rdma_prepared_qp_cnt=0 --use_rdma=true --queue_depth=16

      Take note of the following parameters:

      • <Private IP address of the server> : Replace the parameter value with the actual private IP address of the server.

      • --attachment_size: Specify the size of data that is attached to each bRPC call or data that is to be transferred. This parameter affects the efficiency of data transmission. Large-block data transmission can more efficiently utilize the advantages of eRDMA. However, the overlarge data size increases the complexity of memory management. We recommend that you set this parameter based on your actual test requirements.

      • --queue_depth: Specify the depth of a request queue, that is, the number of concurrent requests in the queue. A larger value of the queue depth helps better cope with the flood of requests in high-concurrency scenarios and prevents request rejections caused by full queues. However, an overlarge value of the queue depth may occupy more memory resources. We recommend that you set the parameter based on your actual test requirements.

    Test performance when eRDMA is not used

    1. Run the following command on the server to start the server and enable communication over TCP:

      ./server --rdma_gid_index=1 --rdma_prepared_qp_cnt=0 --use_rdma=false
    2. Run the following command on the client to connect to the server and enable communication over TCP:

      ./client --servers=<Private IP address of the server>:8002 --rpc_timeout_ms=-1 --attachment_size=1024 --rdma_gid_index=1 --rdma_prepared_qp_cnt=0 --use_rdma=false --queue_depth=16

      Take note of the following parameters:

      • <Private IP address of the server> : Replace the parameter value with the actual private IP address of the server.

      • --attachment_size: Specify the size of data that is attached to each bRPC call or data that is to be transferred. This parameter affects the efficiency of data transmission. Large-block data transmission can more efficiently utilize the advantages of eRDMA. However, the overlarge data size increases the complexity of memory management. We recommend that you set this parameter based on your actual test requirements.

      • --queue_depth: Specify the depth of a request queue, that is, the number of concurrent requests in the queue. A larger value of the queue depth helps better cope with the flood of requests in high-concurrency scenarios and prevents request rejections caused by full queues. However, an overlarge value of the queue depth may occupy more memory resources. We recommend that you set the parameter based on your actual test requirements.

  2. Obtain results of the two tests to evaluate the performance improvement brought by eRDMA to bRPC in terms of latency by viewing the Avg-Latency field and bandwidth by viewing the QPS field.