All Products
Search
Document Center

Container Service for Kubernetes:Use eRDMA to accelerate container networking in ACK clusters

Last Updated:Dec 12, 2024

Elastic Remote Direct Memory Access (eRDMA) is a low-latency, high-throughput, high-performance, and highly scalable RDMA network service provided by Alibaba Cloud. eRDMA is developed based on the fourth-generation SHENLONG architecture and Virtual Private Cloud (VPC). eRDMA is fully compatible with the RDMA ecosystem and provides an ultra-large, inclusive network for Elastic Compute Service (ECS) instances. This topic describes how to configure and use eRDMA in Container Service for Kubernetes (ACK) clusters.

Prerequisites

An ACK cluster is created. For more information, see Create an ACK managed cluster.

Step 1: Install ACK eRDMA Controller

You can perform the following steps to install ACK eRDMA Controller.

Note
  • If your ACK cluster uses Terway, configure an elastic network interface (ENI) filter for Terway in case Terway modifies the eRDMA ENIs. For more information, see Configure an ENI filter.

  • If a node has multiple ENIs, ACK eRDMA Controller configures routes for additional ENIs of eRDMA with a lower priority than routes for ENIs within the same CIDR block, using a default routing priority of 200. If you need to manually configure ENIs after installing ACK eRDMA Controller, make sure to avoid routing conflicts.

  1. On the Clusters page, find the cluster that you want to manage and click its name. In the left-side navigation pane, choose Operations > Add-ons.

  2. On the Add-ons page, click the Networking tab, find ACK eRDMA Controller, follow the instructions on the page to configure and install the component.

    Parameter

    Description

    preferDriver Driver type

    Select the type of the eRDMA driver used on the cluster nodes. Valid values:

    • default: The default driver mode.

    • compat: The driver mode that is compatible with RDMA over Converged Ethernet (RoCE).

    • ofed: The ofed-based driver mode, which is applicable to GPU models.

    For more information about the types of drivers, see Use eRDMA.

    Specifies whether to assign all eRDMA devices of nodes to pods

    Valid values:

    • True: If you select this check box, all eRDMA devices on the node are assigned to the pod.

    • False: If you do not select this check box, the pod is assigned an eRDMA device based on the non-uniform memory access (NUMA) topology. You must enable the static CPU policy for the node to ensure that NUMA can be allocated to pods and devices. For more information about how to configure CPU policies, see Create a node pool.

    In the left-side navigation pane, choose Workloads > Pods. On the Pods page, select the ack-erdma-controller namespace to view the status of pods and ensure that the component runs as expected.

Step 2: Use eRDMA to accelerate container networking

After you install ACK eRDMA Controller, you can use the following configuration to enable eRDMA for the pod.

Configuration

Configuration method

Description

Enable eRDMA

Specify the resource usage of aliyun/erdma in the container resource of the pod.

spec:
  containers:
  - name: erdma-container
    resources:
      limits:
        aliyun/erdma: 1

Allocate eRDMA devices to the pod by specifying the resources of aliyun/erdma in the pod.

After you allocate RDMA devices, you can view the allocated devices in the pod.

/# ls /dev/infiniband/
rdma_cm  uverbs0

Enable Shared Memory Communication over RDMA (SMC-R)

After you enable eRDMA, specify the network.alibabacloud.com/erdma-smcr: "true" annotation to to accelerate TCP connections in the pod.

metadata:
  annotations:
    network.alibabacloud.com/erdma-smcr: "true"

After you enable SMC-R, eRDMA acceleration can be used only if you configure SMC-R on both ends of the TCP connection.

You can install smc-tools in the pod and run the smcss command to check the acceleration status of the connection.

Note
  • This feature is supported only in Alibaba Cloud Linux 3. The kernel version must be 5.10.134-17 or later. For more information, see Release notes for Alibaba Cloud Linux 3.

  • This feature is not supported if ofed or compat is selected as the eRDMA driver type.

  • Alibaba Cloud ERI eRDMA devices and SMC do not support IPv6 addresses. If applications use IPv6, SMC falls back to TCP.

Scenario 1: GPU models use eRDMA to accelerate NCCL

  1. When you install ACK eRDMA Controller based on Step 1: Install ACK eRDMA Controller, set the preferDriver parameter to ofed to accelerate Nvidia Collective Communication Library (NCCL).

  2. Add GPU-accelerated nodes to the node pool. For more information, see Create a node pool.

  3. Install the eRDMA-related packages when you build an application container image.

    View the installed eRDMA-related packages

    # Debian or Ubuntu: Make sure that the OS name and version in sources.list are the same as those you use. 
    wget -qO - https://mirrors.aliyun.com/erdma/GPGKEY | apt-key add - && echo "deb [ arch=amd64 ] https://mirrors.aliyun.com/erdma/apt/{OS|ubuntu} {Version|focal}/erdma main" | tee /etc/apt/sources.list.d/erdma.list && apt update && apt install -y libibverbs1 ibverbs-providers ibverbs-utils librdmacm1
    
    # For Alibaba Cloud Linux or Red Hat Enterprise Linux (RHEL), specify the OS repository address in the yum.repos.d directory. 
    cat > /etc/yum.repos.d/erdma.repo <<EOF
    [erdma]
    name = ERDMA Repository
    baseurl = http://mirrors.aliyun.com/erdma/yum/redhat/7/erdma/x86_64/
    gpgcheck = 0
    enabled = 1
    EOF
    yum install --disablerepo=*  --enablerepo erdma -y libibverbs ibverbs-providers ibverbs-utils librdmacm
  4. Run a GPU application that uses eRDMA in a cluster. nccl-test is used as an example.

    View the sample template of a GPU application that uses eRDMA

    apiVersion: apps/v1
    kind: StatefulSet
    metadata:
      name: nccltest
    spec:
      selector:
        matchLabels:
          app: nccltest
      serviceName: "nccltest"
      replicas: 2
      template:
        metadata:
          labels:
            app: nccltest
        spec:
          hostNetwork: true 
          dnsPolicy: ClusterFirstWithHostNet
          containers:
          - env:
            - name: NCCL_SOCKET_IFNAME
              value: "eth0"
            - name: NCCL_DEBUG
              value: "INFO"
            - name: NCCL_IB_GID_INDEX
              value: "1"
            image: <nccl-test-image-with-erdma>
            imagePullPolicy: Always
            name: nccltest
            securityContext:
              privileged: true
            resources:
              limits:
                nvidia.com/gpu: "8"
                aliyun/erdma: "1"
              requests:
                nvidia.com/gpu: "8"
                aliyun/erdma: "1"
  5. Verify that eRDMA is used by NCCL.

    You can check the communication type and the number of network interfaces used by NCCL in the application logs. Example:

    image

    The command output indicates that the erdma_0 and erdma_1 eRDMA devices is accelerated.

Scenario 2: Use SMC-R to accelerate application networking

  1. When you install ACK eRDMA Controller based on Step 1: Install ACK eRDMA Controller, set the preferDriver parameter to default to accelerate regular communication.

  2. Create an application that can be accelerated by using SMC-R in a cluster based on the following sample code:

    View the sample template of applications that use SMC-R

    apiVersion: apps/v1
    kind: Deployment
    metadata:
      labels:
        app: app-with-erdma
      name: app-with-erdma
    spec:
      replicas: 2
      selector:
        matchLabels:
          app: app-with-erdma
      template:
        metadata:
          labels:
            app: app-with-erdma
          annotations:
            network.alibabacloud.com/erdma-smcr: "true"
        spec:
          containers:
          - image: <application image>
            imagePullPolicy: Always
            name: app-with-erdma
            resources:
              limits:
                aliyun/erdma: 1
  3. Check the status of network connections in the pod.

    You can install smc-tools in a container and run the smcss command to view the acceleration results.

    /# smcss
    State          UID   Inode   Local Address           Peer Address            Intf Mode 
    ACTIVE         00000 0059964 172.17.192.73:47772     172.17.192.10:80        0000 SMCR

    In the command output, SMCR is displayed in the Mode column, which indicates that eRDMA is used by the connection.