SmartData is a storage service for the E-MapReduce (EMR) Jindo engine. SmartData provides centralized storage, caching, and computing optimization for EMR computing engines. This service also provides additional features. SmartData consists of JindoFS, JindoTable, and related tools. This topic describes the updates in SmartData 3.6.X.
JindoFS
The following table describes the updates to JindoFS.
Feature | Description |
---|---|
Cross-cloud access to Amazon Simple Storage Service (S3) and other services that use the S3 protocol, and cache-based acceleration for access to these services | JindoFS supports the S3 protocol and can access Amazon S3 and other systems that use the S3 protocol. You can use the cache-based acceleration feature to accelerate access to these systems. |
Cache-based acceleration for access to HDFS | You can use the cache-based acceleration feature to accelerate access to HDFS. |
Transactional loading options MetaSync and Data Cache | Two transactional loading options MetaSync and Data Cache are added to JindoFS. This ensures the transactional properties of preloading tasks and also ensures that intermediate status is not displayed in a metadata view during the loading process. |
Optimization of the cache preloading mechanism |
|
JindoSDK
The following table describes the updates to JindoSDK.
Feature | Description |
---|---|
Local caching | JindoSDK allows you to configure a local caching policy, which enables local data caching even if the SmartData service is not deployed. This speeds up access to Object Storage Service (OSS) data. |
Object Store API at the same level as the FileSystem API | The Object Store API is added to JindoSDK. It is at the same level as the existing FileSystem API. The Object Store API is better suited to storage systems such as OSS and makes access to these systems easier. The Object Store API also improves the efficiency of copy and rename operations. |
Optimization of OSS server-side caching | An OSS access accelerator is provided. After you start the accelerator, you can customize the bandwidth for accessing OSS data based on the capacity of the accelerator. |
OSS second-level domain | Access to OSS by using a second-level domain is supported. After you enable this feature, you can use a second-level domain or an IP address to access OSS in some special environments. This feature is disabled by default. |
JindoTable
The following table describes the updates to JindoTable.
Feature | Description |
---|---|
Tiered storage and archiving of HDFS data in OSS | You can run the MoveTo command to migrate data in multiple tables or partitions to OSS at a time. The command also automatically updates metadata. You can specify filter conditions to filter partitions and configure storage policies to migrate data to OSS. You can archive a large volume of data that is already in OSS at a time. |
Unarchiving and retrieval of data in OSS | You can run the unarchiveTable command to unarchive and retrieve a large volume of data that is archived in OSS. |
Acceleration of the speed at which Presto queries files in the Parquet format | You can use a native engine provided by JindoTable to accelerate the speed at which Presto queries files in the Parquet format. |
Acceleration of the speed at which Spark 3 queries files in the Parquet or ORC format | You can use a native engine provided by JindoTable to accelerate the speed at which Spark 3 queries files in the Parquet or ORC format. |
Acceleration of the speed at which Spark or Presto queries data in HDFS | You can use a native engine provided by JindoTable to accelerate the speed at which Spark or Presto queries data in HDFS. |
Analysis of OSS access logs | You can execute SQL statements to analyze OSS access logs. |
JindoFuse
The following table describes the updates to JindoFuse.
Feature | Description |
---|---|
Training and online mounting scenarios | You can mount a file system to a specific JindoFS namespace or an OSS directory. You can specify an OSS directory in SDK mode. |