Among many cloud computing providers who have deployed RDMA (Remote Direct Memory Access) networks in their data centers, Alibaba has already gained one preemptive advantage: Alibaba has taken the lead with the scale of its RDMA network in its data centers. Currently, dozens of its data centers support the RDMA network, which significantly reduces latency by 90% and can perfectly meet the requirements in scenarios such as artificial intelligence and scientific computing.
Beijing Winter Olympics Cloud Data Center at Alibaba Cloud
Alibaba Cloud products such as the high-performance cloud disk ESSD, the cloud-native database POLARDB, Super Computing Cluster (SCC), and PAI run on the RDMA network. These highly popular products have shared the benefits of the network technology advances.
Currently, RDMA is the most popular high-performance network technology in the industry, which can significantly reduce data transmission time and is considered the key to increase AI and super computing efficiency. Statistics show that, when the RDMA network is not used, the duration of each task iteration for speech recognition training is between 650 ms and 700 ms, of which 400 ms is the communication latency.
To improve the data transmission speed and meet user needs, leading cloud providers such as Amazon and Microsoft begin to focus on the R&D and deployment of this technology. However, few enterprises have implemented the large-scale application of RDMA in data centers.
In 2016, Alibaba launched a special research project to reform RDMA and improve the transmission performance. Alibaba began to design networks that can meet the large-scale application from the underlying layer of network interface controllers and combined its own vSwitch to maximize the performance. Finally, Alibaba successfully built the high-speed network in the largest data centers in the world, eliminating the transmission speed bottlenecks in clusters and reducing latency by 90%.
Take the 2018 Tmall Double 11 event for example: The RDMA-based cloud storage and e-commerce database server easily processed the large amounts of traffic during peak business hours. SAIC Motor is adopting SCC supported by RDMA to implement simulation and has improved the overall efficiency by 25%.
"RDMA has become essential for high-performance and storage services such as AI and scientific computing. In the future, we will continue to explore network technologies that enable higher bandwidth and deploy a 100G high-speed network to provide enterprises with highly stable and low-latency network services," said Cai Dezhong, Chef Network Architect at Alibaba.
As a cloud service provider ranking 1st in China and top 3 in the world, Alibaba Cloud currently has 56 availability zones in 19 global regions. The total network bandwidth has reached the PB level. Currently, Alibaba Cloud is working on the R&D of the 400G network. The 400G QSFP-DD industry standard put forward by Alibaba Cloud has been widely recognized by global enterprises.
2,599 posts | 762 followers
FollowAlibaba Clouder - April 2, 2020
Alibaba Clouder - November 5, 2018
PM - C2C_Yuan - May 31, 2024
ApsaraDB - January 16, 2023
ApsaraDB - April 19, 2019
Alibaba Container Service - February 13, 2019
2,599 posts | 762 followers
FollowPlan and optimize your storage budget with flexible storage services
Learn MoreOffline SDKs for visual production, such as image segmentation, video segmentation, and character recognition, based on deep learning technologies developed by Alibaba Cloud.
Learn MoreA cost-effective, efficient and easy-to-manage hybrid cloud storage solution.
Learn MoreAlibaba Cloud Function Compute is a fully-managed event-driven compute service. It allows you to focus on writing and uploading code without the need to manage infrastructure such as servers.
Learn MoreMore Posts by Alibaba Clouder