Remote direct memory access (RDMA) is developed to handle the latency of data processing on servers during network transmission.
In RDMA, data to be transmitted is transferred directly from the memory of one computer to that of another computer, without involving any operating systems or protocol stacks. Because the communication process bypasses operating systems and protocol stacks, RDMA can greatly lower the CPU usage, decrease memory replication in the kernel, and reduce context switches between the user mode and kernel mode.
Common RDMA implementations include RDMA over Converged Ethernet (RoCE), InfiniBand, and iWARP.
Alibaba Cloud supports Super Computing Cluster (SCC), RoCE, and Virtual Private Cloud (VPC). RoCE is dedicated to RDMA communication. SCC is mainly used in high-performance computing, artificial intelligence, machine learning, scientific computing, engineering computing, data analysis, audio and video processing, and other scenarios.
RoCE can provide a network speed comparable with the network performance of InfiniBand. It can also support more Ethernet-based applications.
Learn more about Alibaba Cloud ECS Bare Metal Instance and Super Computing Clusters at https://www.alibabacloud.com/help/doc-detail/60576.htm
You can directly purchase a yearly or monthly package of SCC virtual machines on the Elastic Compute Service (ECS) console. For more information, visit https://www.alibabacloud.com/help/doc-detail/61978.htm
Currently, Alibaba Cloud Container Service supports RDMA. You can add SCC ECS instances to a container cluster and deploy an RDMA device plug-in to support RDMA at the scheduling level.
You can run the resourcesLimit rdma/hca: 1 statement to schedule containers to RDMA ECS instances.
Log on to the Container Service console, and then create a Kubernetes cluster. Because SCC is currently supported only in Shanghai, you need to select China East 2 (Shanghai) for the region of the container cluster to be created. After setting other parameters, click to create the cluster and wait until it is successfully created.
On the Container Service console, use a template to deploy a plug-in. Deploy a device plug-in that supports RDMA. Select the corresponding cluster and namespace. The template is shown in the following figure.
apiVersion: v1
kind: ConfigMap
metadata:
name: rdma-devices
namespace: kube-system
data:
config.json: |
{
"mode" : "hca"
}
---
apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
name: rdma-device-plugin
namespace: kube-system
spec:
template:
metadata:
annotations:
scheduler.alpha.kubernetes.io/critical-pod: ""
labels:
name: rdma-sriov-dp-ds
spec:
hostNetwork: true
tolerations:
- key: CriticalAddonsOnly
operator: Exists
containers:
- image: registry.cn-shanghai.aliyuncs.com/acs/rdma-device-plugin
name: k8s-rdma-device-plugin
imagePullPolicy: IfNotPresent
securityContext:
privileged: true
volumeMounts:
- name: device-plugin
mountPath: /var/lib/kubelet/device-plugins
- name: config
mountPath: /k8s-rdma-sriov-dev-plugin
volumes:
- name: device-plugin
hostPath:
path: /var/lib/kubelet/device-plugins
- name: config
configMap:
name: rdma-devices
items:
- key: config.json
path: config.json
apiVersion: v1
kind: Pod
metadata:
name: rdma-test-pod
spec:
restartPolicy: OnFailure
containers:
- image: mellanox/centos_7_4_mofed_4_2_1_2_0_0_60
name: mofed-test-ctr
securityContext:
capabilities:
add: [ "IPC_LOCK" ]
resources:
limits:
rdma/hca: 1
command:
- sh
- -c
- |
ls -l /dev/infiniband /sys/class/net
sleep 1000000
---
apiVersion: v1
kind: Pod
metadata:
name: rdma-test-pod-1
spec:
restartPolicy: OnFailure
containers:
- image: mellanox/centos_7_4_mofed_4_2_1_2_0_0_60
name: mofed-test-ctr
securityContext:
capabilities:
add: [ "IPC_LOCK" ]
resources:
limits:
rdma/hca: 1
command:
- sh
- -c
- |
ls -l /dev/infiniband /sys/class/net
sleep 1000000
Run ib\_read\_bw -q 30
in a container.
Run ib\_read\_bw -q 30 <IP address of the preceding container>
in another container.
Test results show that data can be transmitted between two containers through RDMA. The bandwidth is 5,500 Mbit/s, which is about 44 Gbit/s.
Note: An RDMA communication connection is usually established through TCP or RDMA_CM. If an application chooses the RDMA_CM mode, the assigned IP address of the pod in the VPC plug-in cannot be used as the RDMA_CM address. You need to configure a host network for the container and set bond0 ip as the RDMA_CM communication address.
Analysis on Health Check Logic of a Kubernetes Ingress Controller
Monitor GPU Metrics of a Container Service Kubernetes Cluster
164 posts | 30 followers
FollowAlibaba Cloud Native Community - June 29, 2022
Alibaba Developer - June 17, 2020
Alibaba Cloud Native - October 23, 2024
Xi Ning Wang(王夕宁) - September 20, 2023
Alibaba Container Service - July 16, 2019
Alibaba Developer - April 7, 2020
164 posts | 30 followers
FollowProvides a control plane to allow users to manage Kubernetes clusters that run based on different infrastructure resources
Learn MoreA secure image hosting platform providing containerized image lifecycle management
Learn MoreAlibaba Cloud Container Service for Kubernetes is a fully managed cloud container management service that supports native Kubernetes and integrates with other Alibaba Cloud products.
Learn MoreAn agile and secure serverless container instance service.
Learn MoreMore Posts by Alibaba Container Service
Raja_KT February 15, 2019 at 3:41 am
Good one. I hope you can demonstrate the data transfer from node1 to node2 , bypassing the kernel.