As a high-performance branch of file storage, the parallel file system has enjoyed a history of 20 years since its emergence. It has been widely used in high-performance computing fields (such as weather prediction, oil exploration, high-energy physics, automobile manufacturing, chip manufacturing, autonomous driving, and film and television rendering). GPU parallel computing is catching on in the AI era. Alibaba Cloud CPFS has officially evolved into the 2.0 era and has a series of innovations and practices in the technical system of traditional parallel file systems.
With the gradual maturity of the container Kubernetes platform, the running platform of AI training has changed from virtual machine + physical machine to container + cloud computing platform. Under such a background, the traditional parallel file system is facing great challenges.
This article explains how Alibaba Cloud CPFS responds to these challenges and the technical exploration and landing practices of parallel file systems from the following aspects:
This article introduces the lightweight changes of CPFS on the client side.
As we all know, due to the imperfection of general protocols (such as NFS at the beginning of the century), traditional parallel file systems are equipped with dedicated clients. Dedicated clients are the identity symbol of high-performance parallel file systems.
Dedicated clients are an important part of parallel file systems to achieve high performance. Dedicated clients are essential for parallel file systems. They provide MPI-IO interfaces, multi-backend server connection capabilities, SLB capabilities, and standalone data caching capabilities. However, with the advent of the container era, the dedicated client has shown many problems:
You need to slim down the client to solve the problem of dedicated clients and achieve the lightweight NFS protocol. All Linux can easily use CPFS through operating system decoupling, thus unburdening developers. Then, take advantage of the high-performance advantages of distributed file systems. Finally, Kubernetes elastic PV is implemented, and strict data isolation between PVs is implemented. The specific methods include the following three aspects:
NFS is the most widely used protocol in the file storage field. It has matured general purpose and ease of use and is accepted by the majority of customers. CPFS must be compatible with NFS to lower the threshold of CPFS.
Traditional parallel file system re-clients often specify the operating system and kernel version. After the kernel version is upgraded, the client needs to be reinstalled, which leads to high operation and maintenance costs. However, the CPFS-NFS client is in user mode and does not depend on the kernel version. This brings two benefits:
Traditional parallel file system clients require complex configurations to achieve better operation results. For example, Lustre needs to configure the concurrency and block size of the network component LNET, metadata component MDC, and data component OSC, which increases user maintenance costs. The CPFS-NFS client is simple to use and only requires one mount command. The default configuration of the client is completed by the CPFS-NFS client, which lowers the threshold for users.
Parallel file systems usually move the file system logically to the client. For example, Lustre's OSC needs to be aware of which storage servers the file shard (stripe) is located to read data. This increases the resource overhead of CPU and memory on the client. The resource overhead of the CPFS-NFS client is lightweight and is only used to transmit data and necessary metadata operations. The CPU overhead is usually less than one logical core.
With the base capability provided by CPFS parallel I/O and fully symmetric distributed architecture, the NFS protocol has high throughput and high IOPS cluster performance, which far exceeds the performance metrics brought by traditional NAS standalone architecture. For example, under the 200 MB/s/TiB specification, the NFS protocol provides a performance index of 200 MB/s throughputs per TiB capacity. The maximum throughput is 20GB/s, and the maximum IOPS is close to 1 million.
The NFS protocol service forms a protocol cluster and scales horizontally according to the CPFS file system capacity. CPFS-NFS has the ability to load between the client-side and the protocol node. When the client is mounted, you can select the best protocol node to establish a connection based on the protocol node load (including the number of connections, idle bandwidth, and CPU). This effectively avoids performance degradation caused by hot and fat clients crowding on a single protocol node.
CPFS supports multiple mounting methods to meet the requirements of Kubernetes elastic PVs and implement strict data isolation between PVs, including:
Traditional parallel file system clients usually save states. This results in a limited client size. For example, open files and read-write locks are saved on the client. Clients previously performed operations (such as issuing and recalling states to each other) to ensure data consistency. The larger the size of the client, the more resources are consumed. This limits the size of the client.
CPFS-NFS on the client side is stateless. The client is only connected to the storage node and does not increase the load on the client as the client size increases. The CPFS-NFS on the client-side supports 10,000 clients /PODs to simultaneously mount access data.
The CPFS-NFS on the client side is deeply integrated with Alibaba Cloud Container Service for Kubernetes (ACK). CSI supports static and dynamic volume mounting to mount CPFS volumes. Please see tactically provisioned CPFS volumes and Dynamically provisioned CPFS volumes for more information.
Directory-level mount points provide access isolation on the end. When a container is mounted, only subdirectories are mounted, which prevents container applications from directly accessing the entire file system and causing data security risks. CPFS can provide stronger directory isolation using Fileset and ACL: Fileset supports quotas in the future. You can configure the number of files in the directory subtree and the total capacity. ACL can configure user access permissions.
The standard NFS protocol of CPFS access mode helps realize flexible business migration to the cloud for some customers that cannot use CPFS on the cloud due to the original operating system version. At the same time, combined with Alibaba Cloud Container ACK, it provides customers with the dynamic scaling capability of hundreds of PODs per second, realizing fast expansion during busy hours and quick release during free hours and reducing the idle cost of GPU resources.
An important improvement in the ability of file storage CPFS to support the NFS protocol means containers and virtual machines can easily access high-performance CPFS parallel file systems regardless of the Linux version. This helps accelerate the landing of autonomous driving scenarios.
Tech for Innovation | Alibaba Cloud 2022 Milestones and Highlights
1,076 posts | 264 followers
FollowAlibaba Container Service - August 25, 2020
ApsaraDB - March 3, 2020
Alibaba Cloud Community - March 16, 2023
Apache Flink Community China - August 12, 2022
Alibaba Cloud_Academy - September 4, 2023
Alibaba Cloud Data Intelligence - September 6, 2023
1,076 posts | 264 followers
FollowSimple, scalable, on-demand and reliable network attached storage for use with ECS instances, HPC and Container Service.
Learn MorePlan and optimize your storage budget with flexible storage services
Learn MoreBuild your cloud drive to store, share, and manage photos and files online for your enterprise customers
Learn MoreA cost-effective, efficient and easy-to-manage hybrid cloud storage solution.
Learn MoreMore Posts by Alibaba Cloud Community