Fluid is an open source, Kubernetes-native distributed dataset orchestrator and accelerator for data-intensive applications in cloud-native scenarios, such as big data applications and AI applications. This topic describes the overview and features of Fluid.
Features
Fluid provides features by defining the dataset and runtime objects. The following figure shows the features.
Fluid provides native support for dataset abstraction. This feature provides fundamental support for data-intensive applications, enables efficient data access, and improves the cost-effectiveness of data management in multiple aspects.
Fluid provides an extensible data engine plug-in with a unified interface for integration with third-party storage services. A variety of runtimes are supported.
Fluid automates data operations and supports multiple modes to integrate with automated O&M systems.
Fluid accelerates data access by combining the data caching technology with elastic scaling and data affinity-scheduling.
Fluid is independent of runtime platforms and supports Kubernetes clusters, Container Service for Kubernetes (ACK) Edge clusters, and ACK Serverless clusters. Fluid is also suitable for multi-cluster scenarios and hybrid cloud scenarios.
Terms
dataset: a set of logically related data that is used by computing engines. For example, Apache Spark uses datasets in big data scenarios and TensorFlow uses datasets in AI scenarios. Datasets enable intelligent applications and help produce the core values in various industries. Dataset management involves multiple aspects, including security, versions, and data acceleration.
runtime: the execution engine that implements security, version management, and data acceleration for datasets. Runtime also defines a series of lifecycle interfaces. These interfaces are used to manage and accelerate datasets.
AlluxioRuntime: the execution engine of open source Alluxio. AlluxioRuntime supports dataset management and caching and accelerates access to persistent volume claims (PVCs), Ceph, and Cloud Parallel File System (CPFS). You can use AlluxioRuntime in hybrid cloud scenarios.
JuiceFSRuntime: a distributed cache acceleration engine developed based on JuiceFS. JuiceFSRuntime supports scenario-specific data caching and acceleration. For more information about JuiceFS, see Introduction to JuiceFS. For more information about how to use JuiceFS in Fluid, see Use JuiceFS in Fluid.
JindoRuntime: the execution engine of JindoFS developed by the Alibaba Cloud E-MapReduce (EMR) team. JindoRuntime is based on C++ and supports dataset management and caching. JindoRuntime also accelerates access to Object Storage Service (OSS), OSS-HDFS, and Hadoop Distributed File System (HDFS).
EFCRuntime: the runtime for the EFC elastic acceleration client developed by the File Storage NAS (NAS) technical team. EFCRuntime can accelerate access to NAS and CPFS, and supports hot updates and fault tolerance.
ThinRuntime: an extensible general-purpose storage system that allows you to access various storage systems in a low-code way. ThinRuntime reuses the data orchestration management capabilities and core capabilities provided by Fluid to integrate with runtime platforms.
The distributed cache acceleration engines AlluxioRuntime and JuiceFSRuntime in ack-fluid are free open source components provided by third-party open source communities or enterprises. You can choose to install the corresponding server and client components to use the distributed cache acceleration services.
However, Alibaba Cloud is not responsible for the stability, service limits, and security compliance of third-party components. You shall pay close attention to the official websites of the third-party open source communities or enterprises and updates on code hosting platforms, and read and comply with the open source licenses. You are liable for any potential risks related to application development, maintenance, troubleshooting, and security due to the use of third-party components.
Feature | Alluxio | JuiceFS | Jindo | EFC |
Underlying storage | PVC, Ceph, HDFS, CPFS, Network File System (NFS), and OSS | JuiceFS | OSS, OSS-HDFS, and PVC | NAS and CPFS |
Supported by | Open source projects | Open source projects | Alibaba Cloud services | Alibaba Cloud services |