All Products
Search
Document Center

Alibaba Cloud Service Mesh:Accelerate network communications between pods in an ASM instance by eRDMA

Last Updated:Oct 08, 2024

Alibaba Cloud Linux 3 provides Shared Memory Communication (SMC), a high-performance network protocol that functions in kernel space. SMC utilizes the Remote Direct Memory Access (RDMA) technology and works with socket interfaces to establish network communications. SMC can significantly optimize the performance of network communications. However, when you use SMC to optimize the performance of network communications in a native Elastic Compute Service (ECS) environment, you must carefully maintain the SMC whitelist and configurations in the network namespace of the related pod to prevent SMC from being unexpectedly downgraded to TCP. Service Mesh (ASM) provides the SMC optimization capability in a controllable network environment (that is, cluster) to automatically optimize the network communications between pods in an ASM instance. You do not need to care about the specific SMC configuration.

Prerequisites

The cluster is added to the ASM instance.

Limits

Note

The feature of enabling SMC in an ASM instance to accelerate network communications is in the beta phase.

Procedure

Step 1: Initialize the related nodes

SMC uses elastic RDMA interface (ERI) to accelerate network communications. Before you enable SMC, you must initialize the related nodes.

  1. Upgrade the kernel version of Alibaba Cloud Linux 3 to 5.10.134-16.3 or later.

    Note

    The known issue of kernel version 5.10.134-16.3 and how to fix it are described in the Known issue section.

    1. Run the uname -r command to view the kernel version. If the kernel version is 5.10.134-16.3 or later, skip the kernel upgrade step.

      $ uname -r
      5.10.134-16.3.al8.x86_64
    2. View kernel versions that can be installed.

      $ sudo yum search kernel --showduplicates | grep kernel-5.10
      Last metadata expiration check: 3:01:27 ago on Tue 09 Apr 2024 07:40:15 AM CST.
      kernel-5.10.134-15.1.al8.x86_64 : The Linux kernel, based on version 5.10.134, heavily modified with backports
      kernel-5.10.134-15.2.al8.x86_64 : The Linux kernel, based on version 5.10.134, heavily modified with backports
      kernel-5.10.134-15.al8.x86_64 : The Linux kernel, based on version 5.10.134, heavily modified with backports
      kernel-5.10.134-16.1.al8.x86_64 : The Linux kernel, based on version 5.10.134, heavily modified with backports
      kernel-5.10.134-16.2.al8.x86_64 : The Linux kernel, based on version 5.10.134, heavily modified with backports
      kernel-5.10.134-16.3.al8.x86_64 : The Linux kernel, based on version 5.10.134, heavily modified with backports
      kernel-5.10.134-16.al8.x86_64 : The Linux kernel, based on version 5.10.134, heavily modified with backports
      [...]
    3. Install the latest version of the kernel or install a specific version of the kernel.

      1. Install the latest version of the kernel:

        $ sudo yum update kernel
      2. Install a specific version of the kernel. In this example, kernel-5.10.134-16.3.al8.x86_64 is used:

        $ sudo yum install kernel-5.10.134-16.3.al8.x86_64
    4. Restart the nodes. After a system restarts, run the uname -r command to check whether the kernel has been upgraded to the expected version.

  2. Configure an elastic network interface (ENI) filter for the Terway network plug-in of the ACK cluster. This way, the Terway network plug-in does not manage the secondary ERI that will be added. For more information, see Configure an ENI filter.

  3. For each node in the cluster, create a secondary ERI and bind it to the node. For more information, see Configure eRDMA on an existing instance.

    Note

    You need to only create secondary ERIs and bind them to nodes. In this scenario, the secondary ERIs must be in the same subnet as the primary network interface cards (NICs). For more information about how to configure a secondary ERI, see the next step.

  4. Configure a secondary ERI on a node.

    1. Save the following script in any directory of the node. Run the sudo chmod +x asm_erdma_eth_config.sh command to grant execute permissions on the script.

      asm_erdma_eth_config.sh

      #!/bin/bash
      
      #
      # Params
      #
      mode=
      mac=
      ipv4=
      mask=
      gateway=
      state=  # UP/DOWN
      
      #
      # Functions
      #
      function find_erdma_eth
      {
              echo "$(rdma link show | awk '{print $NF}')"
      }
      
      function get_erdma_eth_info
      {
              e=$1
              echo "Find ethernet device with erdma: $e"
      
              # UP/DOWN
              ip link show $e | grep -q "state UP" && state="UP" || state="DOWN"
      
              # MAC address
              mac=$(ip a show dev $e | grep ether | awk '{print $2}')
      
              # IPv4 address
              ipv4=$(curl --silent --show-error --connect-timeout 5 \
                      http://100.100.100.200/latest/meta-data/network/interfaces/macs/"$mac"/primary-ip-address \
                      2>&1)
              if [ $? -ne 0 ]; then
                      echo "failed to retrieve $e IPv4 address. Error: $ipv4"
                      exit 1
              fi
      
              # Mask
              mask=$(curl --silent --show-error --connect-timeout 5 \
                      http://100.100.100.200/latest/meta-data/network/interfaces/macs/"$mac"/netmask \
                      2>&1)
              if [ $? -ne 0 ]; then
                      echo "failed to retrieve $e network mask. Error: $mask"
                      exit 1
              fi
      
              # Gateway
              gateway=$(curl --silent --show-error --connect-timeout 5 \
                      http://100.100.100.200/latest/meta-data/network/interfaces/macs/"$mac"/gateway \
                      2>&1)
              if [ $? -ne 0 ]; then
                      echo "failed to retrieve $e gateway. Error: $mask"
                      exit 1
              fi
              echo "- state <$state>, IPv4 <$ipv4>, mask <$mask>, gateway <$gateway>"
      }
      
      function set_erdma_eth
      {
              local eths=()
      
              # find all eths with erdma
              eths=$(find_erdma_eth)
              if [ ${#eths[@]} -eq 0 ]; then
                      echo "Can't find ethernet device with erdma capability"
                      exit 1
              fi
      
              for e in ${eths[@]}
              do
                      if [[ $e == "eth0" ]]; then
                              echo "Skip eth0, no need to configure."
                              continue
                      fi
                      get_erdma_eth_info $e
                      echo "Config.."
      
                      # Enable
                      if [ "$state" == "DOWN" ]; then
                              ip link set $e up 1>/dev/null 2>&1
                              if ip link show $e | grep -q "state UP" ; then
                                      echo "- successed to set $e UP"
                              else
                                      echo "- failed to set $e UP"
                                      exit 1
                              fi
                      else
                              echo "- $e has been activated, nothing to do."
                      fi
      
                      # Set IP & mask
                      if ! ip addr show $e | grep -q "inet\b"; then
                              local eth0_metric=$(ip route | grep "dev eth0 proto kernel scope link" \
                                                      | awk '/metric/ {print $NF}')
                              ip addr add $ipv4/$mask dev $e metric $((eth0_metric + 1)) 1>/dev/null 2>&1
                              if [ $? -eq 0 ]; then
                                      echo "- successed to configure $e IPv4/mask and direct route"
                              else
                                      echo "- failed to configure $e IPv4/mask and direct route"
                              fi
                      else
                              echo "- $e has been configured with IPv4(s), nothing to do."
                      fi
      
                      echo "Complete all configurations of $e"
              done
      }
      
      function reset_erdma_eth
      {
              local eths=()
      
              # Find all eths with erdma
              eths=$(find_erdma_eth)
              if [ ${#eths[@]} -eq 0 ]; then
                      echo "Can't find ethernet device with erdma capability"
                      exit 1
              fi
      
              for e in ${eths[@]}
              do
                      if [[ $e == "eth0" ]]; then
                              echo "Skip eth0, no need to configure."
                              continue
                      fi
                      get_erdma_eth_info $e
                      echo "Reset.."
      
                      # Remove IPv4
                      ip addr flush dev $e scope global 1>/dev/null 2>&1
                      if [ $? -eq 0 ]; then
                              echo "- successed to flush $e IPv4(s)"
                      else
                              echo "- failed to flush $e IPv4(s)"
                      fi
      
                      # Disable
                      ip link set $e down 1>/dev/null 2>&1
                      if [ $? -eq 0 ]; then
                              echo "- successed to set $e DOWN"
                      else
                              echo "- failed to set $e DOWN"
                      fi
                      echo "Complete all resets of $e"
              done
      }
      
      print_help() {
              echo "Usage: $0 [option]"
              echo "Options:"
              echo "  -s            Enable eRDMA-cap Eth and configure its IPv4"
              echo "  -r            Disable eRDMA-cap Eth and remove all its IPv4"
              echo "  -h, --help    Show this help message"
      }
      
      while [ "$1" != "" ]; do
              case $1 in
                      -s)
                              set_erdma_eth
                              exit 0
                              ;;
                      -r)
                              reset_erdma_eth
                              exit 0
                              ;;
                      -h | --help)
                              print_help
                              exit 0
                              ;;
                      *)
                              echo "Invalid option: $1"
                              print_help
                              exit 1
                              ;;
              esac
              shift
      done
      
      if [ -z "$1" ]; then
              print_help
              exit 1
      fi
      Note

      This script is only used to configure a secondary ERI in the current context and is not applicable to other NIC configuration scenarios.

    2. Run the sudo ./asm_erdma_eth_config.sh -s command to set the status of the new secondary ERI to UP and configure an IPv4 address for it. Expected output:

      $ sudo ./asm_erdma_eth_config.sh -s
      Find ethernet device with erdma: eth2
      - state <DOWN>, IPv4 <192.168.x.x>, mask <255.255.255.0>, gateway <192.168.x.x>
      Config..
      - successed to set eth2 UP
      - successed to configure eth2 IPv4/mask and direct route
      Complete all configurations of eth2
    3. (Optional) The preceding steps for configuring the secondary ERI need to be performed again each time the node is restarted. If you require the secondary ERI to be automatically configured when the node is restarted, you can perform the following steps to create the corresponding systemd service.

      1. Add the following asm_erdma_eth_config.service file to the /etc/systemd/system directory of the node, and replace /path/to/asm_erdma_eth_config.sh with the actual path of the asm_erdma_eth_config.sh script for the node.

        asm_erdma_eth_config.service

        [Unit]
        Description=Run asm_erdma_eth_config.sh script after network is up
        Wants=network-online.target
        After=network-online.target
        
        [Service]
        Type=oneshot
        ExecStart=/bin/sh /path/to/asm_erdma_eth_config.sh -s
        RemainAfterExit=yes
        
        [Install]
        WantedBy=multi-user.target
      2. Enable asm_erdma_eth_config.service.

        sudo systemctl daemon-reload
        sudo systemctl enable asm_erdma_eth_config.service

        The secondary ERI is then automatically configured at node startup. After the node is started, you can run the sudo systemctl status asm_erdma_eth_config.service command to view the status of asm_erdma_eth_config.service. The expected state is active. Expected output:

        # sudo systemctl status asm_erdma_eth_config.service
        ● asm_erdma_eth_config.service - Run asm_erdma_eth_config.sh script after network is up
           Loaded: loaded (/etc/systemd/system/asm_erdma_eth_config.service; enabled; vendor preset: enabled)
           Active: active (exited) since [time]
         Main PID: 1689 (code=exited, status=0/SUCCESS)
            Tasks: 0 (limit: 403123)
           Memory: 0B
           CGroup: /system.slice/asm_erdma_eth_config.service
        
        [time] <hostname> sh[1689]: Find ethernet device with erdma: eth2
        [time] <hostname> systemd[1]: Starting Run asm_erdma_eth_config.sh script after network is up...
        [time] <hostname> sh[1689]: - state <DOWN>, IPv4 <192.168.x.x>, mask <255.255.255.0>, gateway <192.168.x.x>
        [time] <hostname> sh[1689]: Config..
        [time] <hostname> sh[1689]: - successed to set eth2 UP
        [time] <hostname> sh[1689]: - successed to configure eth2 IPv4/mask and direct route
        [time] <hostname> sh[1689]: Complete all configurations of eth2
        [time] <hostname> systemd[1]: Started Run asm_erdma_eth_config.sh script after network is up.
      3. If asm_erdma_eth_config.service is no longer needed, you can run the sudo systemctl disable asm_erdma_eth_config.service command to remove it.

Step 2: Deploy test applications

  1. Enable automatic sidecar proxy injection for the default namespace, which is used in the following tests. For more information, see Enable automatic sidecar proxy injection.

  2. Create a fortioserver.yaml file that contains the following content:

    Expand to view the fortioserver.yaml file

    ---
    apiVersion: v1
    kind: Service
    metadata:
      name: fortioserver
    spec:
      ports:
      - name: http-echo
        port: 8080
        protocol: TCP
      - name: tcp-echoa
        port: 8078
        protocol: TCP
      - name: grpc-ping
        port: 8079
        protocol: TCP
      selector:
        app: fortioserver
      type: ClusterIP
    ---
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      labels:
        app: fortioserver
      name: fortioserver
    spec:
      replicas: 1
      selector:
        matchLabels:
          app: fortioserver
      template:
        metadata:
          labels:
            app: fortioserver
          annotations:
            sidecar.istio.io/inject: "true"
            sidecar.istio.io/proxyCPULimit: 2000m
            proxy.istio.io/config: |
              concurrency: 2 
        spec:
          shareProcessNamespace: true
          containers:
          - name: captured
            image: fortio/fortio:latest_release
            ports:
            - containerPort: 8080
              protocol: TCP
            - containerPort: 8078
              protocol: TCP
            - containerPort: 8079
              protocol: TCP
          - name: anolis
            securityContext:
              runAsUser: 0
            image: openanolis/anolisos:latest
            args:
            - /bin/sleep
            - 3650d
    ---
    apiVersion: v1
    kind: Service
    metadata:
      annotations:
          service.beta.kubernetes.io/alibaba-cloud-loadbalancer-health-check-switch: "off"
      name: fortioclient
    spec:
      ports:
      - name: http-report
        port: 8080
        protocol: TCP
      selector:
        app: fortioclient
      type: LoadBalancer
    ---
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      labels:
        app: fortioclient
      name: fortioclient
    spec:
      replicas: 1
      selector:
        matchLabels:
          app: fortioclient
      template:
        metadata:
          annotations:
            sidecar.istio.io/inject: "true"
            sidecar.istio.io/proxyCPULimit: 4000m
            proxy.istio.io/config: |
               concurrency: 4
          labels:
            app: fortioclient
        spec:
          shareProcessNamespace: true
          affinity:
            podAntiAffinity:
              requiredDuringSchedulingIgnoredDuringExecution:
              - labelSelector:
                  matchExpressions:
                  - key: app
                    operator: In
                    values:
                    - fortioserver
                topologyKey: "kubernetes.io/hostname"
          containers:
          - name: captured
            volumeMounts:
            - name: shared-data
              mountPath: /var/lib/fortio
            image: fortio/fortio:latest_release
            ports:
            - containerPort: 8080
              protocol: TCP
          - name: anolis
            securityContext:
              runAsUser: 0
            image: openanolis/anolisos:latest
            args:
            - /bin/sleep
            - 3650d
          volumes:
          - name: shared-data
            emptyDir: {}
    
  3. Use kubectl to connect to the ACK cluster based on the information in the kubeconfig file and then run the following command to deploy the test applications:

    kubectl apply -f fortioserver.yaml
  4. Run the following command to view the status of the test applications:

    kubectl get pods | grep fortio

    Expected output:

    NAME                            READY   STATUS    RESTARTS      
    fortioclient-8569b98544-9qqbj   3/3     Running   0
    fortioserver-7cd5c46c49-mwbtq   3/3     Running   0

    The output indicates that both test applications start as expected.

Step 3: Run a test in the baseline environment and view the baseline test results

After the fortio application starts, the listening port 8080 is exposed. You can access this port to open the console page of the fortio application. To generate test traffic, you can map the port of the fortioclient application to a local port. Then, open the console page of the fortio application on your on-premises host.

  1. Use kubectl to connect to the ACK cluster based on the information in the kubeconfig file and then run the following command to map port 8080 of the fortioclient application to the local port 8080.

    kubectl port-forward service/fortioclient 8080:8080
  2. In the address bar of your browser, enter http://localhost:8080/fortio to access the console of the fortioclient application and modify related configurations.

    image

    Modify the parameter settings on the page shown in the preceding figure according to the following table.

    Parameter

    Example

    URL

    http://fortioserver:8080/echo

    QPS

    100000

    Duration

    30s

    Threads/Simultaneous connections

    64

    Payload

    Enter the following string (128 bytes):

    xhsyL4ELNoUUbC3WEyvaz0qoHcNYUh0j2YHJTpltJueyXlSgf7xkGqc5RcSJBtqUENNjVHNnGXmoMyILWsrZL1O2uordH6nLE7fY6h5TfTJCZtff3Wib8YgzASha8T8g

  3. After the configuration is complete, click Start in the lower part of the page to start the test. The test ends after the progress bar reaches the end.

    image

    After the test ends, the results of the test are displayed on the page. The following figure is for reference only. Test results vary with test environments.

    image

    On the test results page, the x-axis shows the latencies of requests. You can obtain the distribution of the latencies of requests by observing the distribution of data on the x-axis of the histogram. The purple curve shows the number of the processed requests within different response time. The y-axis shows the number of processed requests. At the top of the histogram, the P50, P75, P90, P99, and P99.9 latencies of requests are provided. After you obtain the test data of the baseline environment, you need to enable SMC for the applications to test the performance of the applications after SMC is enabled.

Step 4: Enable network communication acceleration based on SMC for your ASM instance and workloads

  1. Use kubectl to connect to the ASM instance based on the information in the kubeconfig file. Then, run the following command to add the "smcEnabled: true" field to enable network communication acceleration based on SMC.

    $ kubectl edit asmmeshconfig
    
    apiVersion: istio.alibabacloud.com/v1beta1
    kind: ASMMeshConfig
    metadata:
      name: default
    spec:
      ambientConfiguration:
        redirectMode: ""
        waypoint: {}
        ztunnel: {}
      cniConfiguration:
        enabled: true
        repair: {}
      smcEnabled: true
  2. Use kubectl to connect to the ACK cluster based on the information in the kubeconfig file. Then, run the following command to modify the Deployments of the fortioserver and fortioclient applications and add the smc.asm.alibabacloud.com/enabled: "true" annotation to the pods in which the fortioserver and fortioclient applications reside.

    After you enable SMC for the ASM instance, you need to further enable SMC for the workloads. To enable SMC for workloads, you can add the smc.asm.alibabacloud.com/enabled: "true" annotation to the related pods. You must enable SMC for workloads on both the client and server sides.

    1. Modify the Deployment of the fortioclient application.

      $ kubectl edit deployment fortioclient
      
      apiVersion: apps/v1
      kind: Deployment
      metadata:
        ......
        name: fortioclient
      spec:
        ......
        template:
          metadata:
            ......
            annotations:
              smc.asm.alibabacloud.com/enabled: "true"
              
    2. Modify the Deployment of the fortioserver application.

      $ kubectl edit deployment fortioserver
      
      apiVersion: apps/v1
      kind: Deployment
      metadata:
        ......
        name: fortioserver
      spec:
        ......
        template:
          metadata:
            ......
            annotations:
              smc.asm.alibabacloud.com/enabled: "true"
              

Step 5: Run the test in the environment in which SMC is enabled and view the test results

Workloads are restarted after you modify the Deployments. Therefore, you must map the port of the fortioclient application to the local port again by referring to Step 3. Then, start the test again and wait until the test is complete.

image

Compared with the test results when you do not enable SMC, you can see that after you enable SMC for the ASM instance, latencies of requests decrease and the queries per second (QPS) significantly increases.

Known issue

  1. After SMC is enabled for an ECS instance that runs an Alibaba Cloud Linux 3 system with the kernel version of 5.10.134-16.3, the system reports an error message similar to unregister_netdevice: waiting for eth* to become free. Usage count = * when the related pod is restarted. The related pod cannot be deleted successfully.

    The cause of this issue is that the kernel module for SMC does not properly release the reference count of the network interface. You can enable the following hotfix to fix this issue. This issue does not exist in later versions. For more information about kernel hotfixes, see Operations related to kernel hotfixes.

    $ sudo yum install kernel-hotfix-16664924-5.10.134-16.3