On December 21, 2021, the annual Alibaba Cloud Elastic Computing Summit was held in Shanghai. Wang Zhikun, Alibaba Cloud Elastic Computing Product Director, gave a speech entitled Strong, Reliable, and Ubiquitous Cloud Paves the Way for Innovation. He explained the latest important products and best user practices released by Alibaba Cloud Elastic Computing in detail.
Wang Zhikun, Director of Alibaba Cloud Elastic Computing Products
Elastic computing is the earliest and most basic product of Alibaba Cloud. It has been around for 12 years. During this period, this product has experienced rapid innovation. After the release of the X-Dragon architecture in 2017, Alibaba Cloud released the strategy of One Cloud, Multiple States in 2020. In 2021, the fourth generation of X-Dragon architecture was launched, releasing the eRDMA capability and enabling many customers to innovate more on the cloud.
In recent years, livestreaming, large games, and many core enterprise systems have gradually migrated to the cloud. The requirement for performance is growing. Internet enterprises' core systems need horizontal expansion of resources and different levels of performance bursts in different periods. Customers have begun to pay attention to data security and privacy in more scenarios.
Therefore, enabling customers to have a more secure and efficient operating environment on the cloud is an important issue for Alibaba Cloud to entertain and discuss.
In early 2021, Alibaba Cloud released the seventh-generation ECS instance, which is the first in China to realize the dual bursty capability of network performance and cloud disk performance. This ensures that customers can get a good cost performance in response to the sudden surge in I/O performance. The seventh-generation ECS instance is equipped with TPM 2.0 chips, and Alibaba Cloud's computing capabilities of Credibility and Enclave Full Encryption are newly built. Alibaba Cloud has established a comprehensive, end-to-end, and three-dimensional security protection system.
Based on the extreme performance of the in-house ECS architecture and the support of security capabilities, the seventh-generation ECS instance provides strong support for customers. The seventh-generation ECS instance supports Perfect World Entertainment's new sci-fi open-world game Tower of Fantasy with an extremely high hash rate, providing players with a smooth game experience. Relying on the unique TPM + SGX trust + secure trust mechanism in the operation of Alibaba Cloud's seventh-generation instance, InsightOne has realized the construction of a data intelligence federation and broken the data silos.
Data processing demands from data-intensive applications are growing with the rapid increase of data volume. In addition to hash rate and security, higher requirements are placed on networks. Many online business systems use in-memory databases to cope with massive concurrency challenges. However, the network latency between in-memory databases and different systems has become a new challenge. Network latency has a big impact on the timeliness of the system as the network scale expands in scenarios, such as big data real-time search and computing recommendation engines.
In addition, scenarios (such as AI deep learning and HPC industrial simulation) are more sensitive to latency. The traditional way is to adopt Remote Direct Memory Access (RDMA) networks. It has an advantage in latency, but it requires proprietary equipment, which has a high cost, complex networking, and cannot be applied on a large scale.
The VPC network, which was born on the public cloud, is low-cost, flexible, and convenient and can be used for ultra-large-scale networking. However, due to the limits of the protocol stack and technology, the latency can only reach 20 to 30 microseconds. Alibaba Cloud has made a lot of efforts in product layout and R&D process as we try to balance elasticity, flexibility, low latency, cost, and other factors of the cloud-based network.
Finally, Alibaba Cloud elastic computing implemented eRDMA in the fourth-generation X-Dragon architecture and released China's first RDMA-enhanced instance c7re, achieving the inclusiveness of RDMA technology on the cloud. In the whole process, only one Elastic RDMA Interface (ERI) device needs to be loaded into the user's business system. After a pass-through in the operating system, the data can be transmitted in the VPC network. The overall latency is reduced to five microseconds. This innovative technology allows the eRDMA network and VPC network to form a large network on the cloud so more resources can be pooled and used flexibly.
A new business-oriented acceleration solution based on RDMA enhanced instances has brought significant benefits to many business systems through the standard Verbs interface. For example, the performance of Redis database scenarios can be improved by130%. It can be improved by 30% in AI training scenarios.
As the problems of computing performance, security, and network latency are solved, the bottleneck of the entire system gradually returns to the memory. Since there is a lot of system data in the memory, the bottleneck of the memory wall appears. Memory is much more expensive than hard disks. Last year, Alibaba Cloud released memory instance re6p based on persistent memory technology. This year, we upgraded the product based on the second generation persistent memory technology and released memory instance re7p and performance-enhanced local disk instance i4p.
Memory instance re7p improves the performance by 30% compared with the previous generation. It can support the ultra-large memory capacity ratio of 1:20, which improves the performance ratio of Redis and parameter servers by more than 50%. Performance-enhanced local disk instance i4p is the world's first enhanced instance based on persistent memory technology. The read/write latency is as low as 170 nanoseconds, achieving quasi-memory access performance. The performance in RocksDB and ClickHouse scenarios is improved 2-3 times.
Currently, the large-memory solution innovation enterprise Memvert uses persistent memory instances provided by Alibaba Cloud to implement innovative services in biological genes, finance, and chip design, achieving larger memory, lower cost, and higher efficiency. For example, based on Alibaba Cloud's persistent memory instances, the overall task training efficiency of single-cell gene sequencing is improved 20 times in the biological sciences field.
In 2017, Alibaba Cloud elastic computing introduced the GPU cloud server, which was the first enterprise in China. Today, the parameters of AI deep learning have grown from millions to trillions. User changes need to continue to push Alibaba Cloud to build larger and stronger cloud training clusters to meet hash rate challenges. In addition, scenarios (such as the Metaverse, digital twins, and cloud games) have also driven Alibaba Cloud to improve the layout of heterogeneous computing products.
After five years of development, Alibaba Cloud's heterogeneous computing products have deployed a full family of products for AI, visual computing, and customized computing. More importantly, we have built a series of software products on top of the instances, including AI acceleration engine AIACC and deployment tools, to help customers lower the usage threshold and improve efficiency, thus making it more cost-effective in the cloud.
In early 2019, as China's first cloud service provider to build heterogeneous supercomputing instances on the cloud, Alibaba Cloud helped autonomous driving and natural language processing customers reduce overall cluster training to a minute level. Today, Alibaba Cloud has upgraded again and launched GPU supercomputing instances based on 800G RDMA networks, creating the strongest hash rate and network capabilities on the cloud. Based on the Alibaba Cloud AIACC acceleration tool, AI deep model training efficiency has been improved by up to 9.75 times, helping customers easily meet the challenges of trillions of massive models.
gn7i, the seventh-generation GPU instance of Alibaba Cloud, is fully commercialized. It can build efficient streaming capabilities and support the RGC protocol in scenarios, such as cloud games and metaverse. Customers can easily obtain a strong cloud hash rate with gn7i's out-of-the-box feature. Compared with the previous generation of products, its cost-performance ratio is improved by 130% in AI inference scenarios. In graphics and image scenarios, the performance is improved more than two times.
Red Star Macalline designed the cloud SaaS platform. Its real-time ray tracing requires a high cloud hash rate to realize vivid furniture design renderings. Quasi-real-time rendering is realized with the strong hash rate of the seventh-generation GPU instance gn7i of Alibaba Cloud. The final rendering takes as low as ten minutes and meets the demands of business flexibility.
In addition, Alibaba Cloud has released a new generation of FPGA cloud server instance f5 in the customized computing field, which is 100% more cost-effective than the previous generation. It has implemented strong IP protection through the security protection mechanism of images. Based on the FPGA instance, Snowlake Tech and Alibaba Cloud jointly released the molecular dynamics (MD) FPGA acceleration solution. Compared with other mainstream solutions in the industry, its overall performance has improved dozens of times, and the cost-performance has improved by 400%, driving the development of new drugs and new materials.
Today, Alibaba Cloud's elastic computing products have evolved from one or two products at the beginning to dozens of product families facing hundreds of business scenarios, providing customers with a more cost-effective choice.
As more customers recognize public cloud products and their capabilities, some enterprises have encountered new challenges in the process of cloud migration. For example, due to their industry attributes or business scenarios, data localization is required. This makes Alibaba Cloud's public cloud products continue to expand to serve more customers. At the same time, low-latency data processing scenarios, such as 5G and Internet of Things, have prompted Alibaba Cloud to think about product layout. In the first half of 2021, the One Cloud, Multiple States strategy was released, including intelligent hosting, local region, CloudBox, and other products, to meet the needs of customers for full coverage of hash rate.
The intelligent hosted product dedicated to large cloud customers can create a new intelligent fully-hosted zone in the Availability Zone of Alibaba Cloud public cloud based on the needs of large customers. It produces products consistent with the public cloud, including O&M capabilities and OpenAPI interfaces, helping customers solve problems in use and O&M. At the same time, customers can enjoy the ultra-high stability of the Alibaba Cloud public cloud, and the SLA of the overall service is guaranteed. This enables customers to accelerate business innovation while enjoying the public cloud.
Alibaba CloudBox is a local exclusive public cloud product. It is deployed in the data center of the customer's IDC and outputs various mainstream products, such as the computing, storage, network, security, and database of Alibaba Cloud public cloud. It also provides a consistent experience and usage with Alibaba Cloud public cloud. The advantage of its low building threshold solves the problems during cloud transformation of enterprises. Alibaba Cloud's CloudBox was commercialized in 2021 and has opened nine products and four regions. CloudBox will open to more regions in the future.
Alibaba Cloud provides local regions to solve the problem of data asset residency, which can satisfy the needs of customer data localization. Currently, it is available in Nanjing, China and supports five categories of products and more than 40 small products.
Alibaba Cloud elastic computing is not just an instance. It has been innovating in multiple dimensions. The Alibaba Cloud Elastic Computing Team (with the help of customers and partners) has been working on the innovation of products and improving product layout to create a strong, reliable, and ubiquitous cloud for customers.
Open and Compatible Cloud: Compute Nest Helps Partners' Cloudification
1,029 posts | 252 followers
FollowApsaraDB - September 14, 2023
Alibaba Clouder - April 22, 2021
Alibaba Clouder - May 28, 2018
Alibaba Cloud Community - January 11, 2022
Alibaba Developer - April 18, 2022
Alibaba Clouder - January 10, 2020
1,029 posts | 252 followers
FollowElastic and secure virtual cloud servers to cater all your cloud hosting needs.
Learn MoreHigh Performance Computing (HPC) and AI technology helps scientific research institutions to perform viral gene sequencing, conduct new drug research and development, and shorten the research and development cycle.
Learn MoreAlibaba Cloud Function Compute is a fully-managed event-driven compute service. It allows you to focus on writing and uploading code without the need to manage infrastructure such as servers.
Learn MoreA HPCaaS cloud platform providing an all-in-one high-performance public computing service
Learn MoreMore Posts by Alibaba Cloud Community