All Products
Search
Document Center

Container Service for Kubernetes:Overview of Fluid

Last Updated:Nov 01, 2024

Fluid is an open source, Kubernetes-native distributed dataset orchestrator and accelerator for data-intensive applications in cloud-native scenarios, such as big data applications and AI applications. This topic describes the overview and features of Fluid.

Features

Fluid provides features by defining the dataset and runtime objects. The following figure shows the features.

image
  • Fluid provides native support for dataset abstraction. This feature provides fundamental support for data-intensive applications, enables efficient data access, and improves the cost-effectiveness of data management in multiple aspects.

  • Fluid provides an extensible data engine plug-in with a unified interface for integration with third-party storage services. A variety of runtimes are supported.

  • Fluid automates data operations and supports multiple modes to integrate with automated O&M systems.

  • Fluid accelerates data access by combining the data caching technology with elastic scaling and data affinity-scheduling.

  • Fluid is independent of runtime platforms and supports Kubernetes clusters, Container Service for Kubernetes (ACK) Edge clusters, and ACK Serverless clusters. Fluid is also suitable for multi-cluster scenarios and hybrid cloud scenarios.

Terms

  • dataset: a set of logically related data that is used by computing engines. For example, Apache Spark uses datasets in big data scenarios and TensorFlow uses datasets in AI scenarios. Datasets enable intelligent applications and help produce the core values in various industries. Dataset management involves multiple aspects, including security, versions, and data acceleration.

  • runtime: the execution engine that implements security, version management, and data acceleration for datasets. Runtime also defines a series of lifecycle interfaces. These interfaces are used to manage and accelerate datasets.

  • AlluxioRuntime: the execution engine of open source Alluxio. AlluxioRuntime supports dataset management and caching and accelerates access to persistent volume claims (PVCs), Ceph, and Cloud Parallel File System (CPFS). You can use AlluxioRuntime in hybrid cloud scenarios.

  • JuiceFSRuntime: a distributed cache acceleration engine developed based on JuiceFS. JuiceFSRuntime supports scenario-specific data caching and acceleration. For more information about JuiceFS, see Introduction to JuiceFS. For more information about how to use JuiceFS in Fluid, see Use JuiceFS in Fluid.

  • JindoRuntime: the execution engine of JindoFS developed by the Alibaba Cloud E-MapReduce (EMR) team. JindoRuntime is based on C++ and supports dataset management and caching. JindoRuntime also accelerates access to Object Storage Service (OSS), OSS-HDFS, and Hadoop Distributed File System (HDFS).

  • EFCRuntime: the runtime for the EFC elastic acceleration client developed by the File Storage NAS (NAS) technical team. EFCRuntime can accelerate access to NAS and CPFS, and supports hot updates and fault tolerance.

  • ThinRuntime: an extensible general-purpose storage system that allows you to access various storage systems in a low-code way. ThinRuntime reuses the data orchestration management capabilities and core capabilities provided by Fluid to integrate with runtime platforms.

Important

The distributed cache acceleration engines AlluxioRuntime and JuiceFSRuntime in ack-fluid are free open source components provided by third-party open source communities or enterprises. You can choose to install the corresponding server and client components to use the distributed cache acceleration services.

However, Alibaba Cloud is not responsible for the stability, service limits, and security compliance of third-party components. You shall pay close attention to the official websites of the third-party open source communities or enterprises and updates on code hosting platforms, and read and comply with the open source licenses. You are liable for any potential risks related to application development, maintenance, troubleshooting, and security due to the use of third-party components.

Feature

Alluxio

JuiceFS

Jindo

EFC

Underlying storage

PVC, Ceph, HDFS, CPFS, Network File System (NFS), and OSS

JuiceFS

OSS, OSS-HDFS, and PVC

NAS and CPFS

Supported by

Open source projects

Open source projects

Alibaba Cloud services

Alibaba Cloud services