All Products
Search
Document Center

E-MapReduce:Use JindoFS in E-MapReduce 3.22.0 or later

Last Updated:Jan 20, 2026

JindoFS is a cloud-native file system that combines OSS and local storage to serve as the next-generation storage system for E-MapReduce. It provides efficient and reliable storage for computing services. This topic describes how to configure and use JindoFS and describes common use cases.

Overview

JindoFS provides the block storage mode and the cache mode.

JindoFS uses a heterogeneous multi-backup mechanism that leverages both local storage and OSS. The Storage Service uses OSS as the storage backend to ensure high data reliability and uses local storage for redundant backups to accelerate data reads. In addition, the local Namespace Service manages JindoFS metadata, providing metadata operation performance comparable to that of HDFS.

Note
  • E-MapReduce 3.20.0 and later support JindoFS. To use JindoFS, you must select the JindoFS service when you create a cluster.
  • This topic describes how to use JindoFS in E-MapReduce 3.22.0 or later. For information about how to use JindoFS in E-MapReduce versions from 3.20.0 to 3.22.0 (exclusive), see Use SmartData in EMR 3.20.0 to 3.22.0.
signal_path

Prepare the environment

  • Create a cluster

    Select E-MapReduce 3.22.0 or a later version, and select SmartData in the Optional Services section. For more information, see Create a cluster.

    create_cluster
  • Configure the cluster
    The JindoFS file system provided by SmartData uses OSS as its storage backend. Therefore, you must configure OSS-related parameters before you use JindoFS. You can configure the parameters in two ways. You can modify the Bigboot parameters after you create a cluster and then restart the SmartData service for the changes to take effect. Alternatively, you can add custom configurations during cluster creation. After the cluster is created, the related services start based on your custom parameters.
    • Initialize parameters after the cluster is created

      All JindoFS-related configurations are located in the Bigboot component.

      1. On the Service Configuration page, click the bigboot tab.

        server_config
      2. Click Custom Configuration.cong_sel
      Note
      • The configuration items in the red boxes are required.
      • JindoFS supports multiple namespaces. This topic uses a namespace named test as an example.
      Parameter Description Example
      jfs.namespaces The namespaces that JindoFS supports. Separate multiple namespaces with commas (,). test
      jfs.namespaces.test.uri The storage backend of the test namespace. oss://oss-bucket/oss-dir
      Note You can also set this parameter to a specific directory in an OSS bucket. The namespace then uses this directory as the root directory to read and write data.
      jfs.namespaces.test.mode The storage mode for the test namespace. block
      Note JindoFS supports the block and cache storage modes.
      jfs.namespaces.test.oss.access.key The AccessKey ID for the OSS storage backend. xxxx
      Note For better performance and stability, use an OSS bucket in the same region and under the same account as the storage backend. The E-MapReduce cluster can then access OSS without a password. You do not need to configure an AccessKey ID and secret.
      jfs.namespaces.test.oss.access.secret The AccessKey secret for the OSS storage backend.

      After you complete the configuration, save and deploy it. Then, restart all components in the SmartData service. You can then start using JindoFS.

      server
    • Add custom configurations when you create a cluster
      E-MapReduce clusters allow you to add custom configurations during creation. For example, to enable passwordless access to OSS in the same region, select Custom Software Settings as shown in the following figure. Configure the parameters for the test namespace as follows:
      [
      {
      "ServiceName":"BIGBOOT",
      "FileName":"bigboot",
      "ConfigKey":"jfs.namespaces","ConfigValue":"test"
      },{
      "ServiceName":"BIGBOOT",
      "FileName":"bigboot",
      "ConfigKey":"jfs.namespaces.test.uri",
      "ConfigValue":"oss://oss-bucket/oss-dir"
      },{
      "ServiceName":"BIGBOOT",
      "FileName":"bigboot",
      "ConfigKey":"jfs.namespaces.test.mode",
      "ConfigValue":"block"
      }
      ]
      kerbernets

Use JindoFS

Using JindoFS is similar to using HDFS. It provides the jfs prefix. To use JindoFS, replace the hdfs prefix with jfs.

JindoFS supports most computing components on EMR clusters, such as Hadoop, Hive, Spark, Flink, Presto, and Impala.

Examples:

  • Shell commands
    hadoop fs -ls jfs://your-namespace/ 
    hadoop fs -mkdir jfs://your-namespace/test-dir
    hadoop fs -put test.log jfs://your-namespace/test-dir/
    hadoop fs -get jfs://your-namespace/test-dir/test.log ./
  • MapReduce job
    hadoop jar /usr/lib/hadoop-current/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.8.5.jar teragen -Dmapred.map.tasks=1000 10737418240 jfs://your-namespace/terasort/input
    hadoop jar /usr/lib/hadoop-current/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.8.5.jar terasort -Dmapred.reduce.tasks=1000 jfs://your-namespace/terasort/input jfs://your-namespace/terasort/output
  • Spark SQL
    CREATE EXTERNAL TABLE IF NOT EXISTS src_jfs (key INT, value STRING) location 'jfs://your-namespace/Spark_sql_test/';

Control disk space usage

JindoFS uses OSS as the data storage backend, which lets you store large volumes of data. However, the capacity of local disks is limited. JindoFS automatically deletes cold data in local disks. Alibaba Cloud uses the node.data-dirs.watermark.high.ratio and node.data-dirs.watermark.low.ratio parameters to adjust the space usage of local disks. The values of both parameters are in the range of 0 to 1 to indicate the percentage of space usage. JindoFS uses the total storage capacity of all data disks by default. The node.data-dirs.watermark.high.ratio parameter specifies the upper limit of space usage on each disk. Less frequently accessed data stored on a disk is released if the space used by JindoFS reaches the upper limit. The node.data-dirs.watermark.low.ratio parameter specifies the lower limit of space usage on each disk. After the space usage of a disk reaches the upper limit, less frequently accessed data is released until the space usage of the disk reaches the lower limit. You can configure the upper limit and lower limit to adjust and assign disk space to JindoFS. Make sure that the upper limit is greater than the lower limit.

Storage policies

JindoFS provides a Storage Policy feature that offers flexible policies for different storage needs. You can set one of the following four storage policies for a directory:

Policy Description
COLD Data has only one backup in OSS and no local backups. This policy is suitable for cold data storage.
WARM

Default policy.

Data has one backup in OSS and one local backup. The local backup can effectively accelerate subsequent reads.

HOT Data has one backup in OSS and multiple local backups. This provides further acceleration for the hottest data.
TEMP Data has only one local backup. This provides high-performance reads and writes for temporary data but reduces data reliability. It is suitable for storing and accessing temporary data.

JindoFS provides an Admin tool to set the storage policy for a directory. The default policy is WARM. New files inherit the storage policy of their parent directory. The command is used as follows:

jindo dfsadmin -R -setStoragePolicy [path] [policy]

Use the following command to retrieve the storage policy of a directory:

jindo dfsadmin -getStoragePolicy [path]
Note [path] specifies the path for which you want to set the policy. -R recursively sets the policy for all subdirectories within the specified path.

Admin tool

JindoFS provides the archive and jindo commands in the Admin tool.
  • The Admin tool provides the archive command to archive cold data.

    This command allows users to explicitly evict local data blocks. For example, Hive partitions tables by day. If business data in partitions that are more than a week old is no longer frequently accessed, you can periodically run the archive command on those partition directories. This evicts the local backups. The file backups are retained only on the backend OSS.

    The archive command is used as follows:

    jindo dfsadmin -archive [path]
    Note [path] is the path of the directory where the files to be archived are located.
  • The Admin tool provides the jindo command, which offers several administrator commands for the Namespace Service.

    jindo dfsadmin [-options]
    Note Run the jindo dfsadmin --help command to view help information.
The Admin tool provides the diff and sync commands for Cache mode.
  • The diff command is used to show the differences between local data and data on the backend storage system.
    jindo dfsadmin -R -diff [path]
    Note By default, the command compares the metadata differences in the subdirectories of the specified [path]. The -R option recursively compares all paths within the specified [path].
  • The sync command is used to synchronize metadata between local storage and the backend storage.
    jindo dfsadmin -R -sync [path]
    Note [path] is the path for which you want to synchronize metadata. By default, only the metadata of the immediate subdirectories of [path] is synchronized. The -R option recursively synchronizes all paths within the specified [path].