SmartData is a core self-developed component of E-MapReduce (EMR). SmartData optimizes storage, caching, and computing for various EMR computing engines in a centralized manner and extends storage features. SmartData is used in data access, data governance, and data security scenarios.
The following figure shows the position of SmartData in EMR.
Composition of SmartData:
- JindoFS core subsystem: provides caching and cache-based acceleration features for various remote storage systems. For more information, see Overview and usage of JindoFS.
- JindoTable core subsystem: provides table- and partition-level optimization and governance for data sources, such as a Hive warehouse. For more information, see Use JindoTable.
- JindoManager: provides a web UI to manage JindoFS and JindoTable services and features. For example, you can view the metrics of the cached data of files and tables.
- JindoSDK: provides a unified SDK for various open source computing engines of EMR. It supports Java, C, C++, and Python programming languages and provides a variety of access interfaces and APIs, such as HCFS interfaces, Portable Operating System Interface (POSIX) interfaces, and table-related interfaces.
- Toolset: includes Jindo tools and the data copy tool Jindo DistCp.
- Various connectors: include the Hadoop connector, Flink connector, and TensorFlow connector. Kite SDK, Apache Beams, Flume, Sqoop, and Kafka are supported.
The data sources that are supported by JindoFS and JindoTable include Alibaba Cloud OSS, Apache Hadoop HDFS, Hive, and Alibaba Cloud MaxCompute.
SmartData is independently developed and released. For more information about SmartData versions, see Overview.
For more information about how to use SmartData, see the following topics: