All Products
Search
Document Center

Object Storage Service:Quick start for connecting to the OSS-HDFS service from a non-EMR cluster

Last Updated:Jan 19, 2026

The OSS-HDFS service, also known as the JindoFS service, is fully compatible with Hadoop Distributed File System (HDFS) interfaces and supports directory-level operations. The Jindo software development kit (SDK) allows Apache Hadoop computing and analytics applications, such as MapReduce, Hive, Spark, and Flink, to access the OSS-HDFS service. This topic describes how to deploy the JindoSDK on an ECS instance and perform common operations to get started with the OSS-HDFS service.

Note

If you use an Alibaba Cloud EMR cluster, connect to the OSS-HDFS service from the EMR cluster. For more information, see Quick start for connecting to the OSS-HDFS service from an EMR cluster.

Prerequisites

Procedure

  1. Connect to the ECS instance. For more information, see Connect to an ECS instance.

  2. Download and decompress the JindoSDK JAR package. For the download link, see GitHub.

  3. Run the following command to decompress the JindoSDK JAR package.

    The following example shows how to decompress jindosdk-x.x.x-linux.tar.gz. If you use a different version of JindoSDK, replace the JAR package name with the actual one.

    tar zxvf jindosdk-x.x.x-linux.tar.gz
    Note

    x.x.x indicates the version number of the JindoSDK JAR package.

  4. Configure environment variables.

    1. Configure JINDOSDK_HOME.

      In this example, the package is decompressed to the /usr/lib/jindosdk-x.x.x-linux directory:

      export JINDOSDK_HOME=/usr/lib/jindosdk-x.x.x-linux
    2. Configure HADOOP_CLASSPATH.

      export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:${JINDOSDK_HOME}/lib/*
      Important

      Deploy the installation directory and set the environment variables on all required nodes.

  5. Configure the OSS-HDFS service implementation class and AccessKey.

    1. Run the following command to open the core-site.xml configuration file of Hadoop.

      vim /usr/local/hadoop/etc/hadoop/core-site.xml
    2. In the core-site.xml file, configure the JindoSDK DLS implementation class.

      <configuration>
          <property>
              <name>fs.AbstractFileSystem.oss.impl</name>
              <value>com.aliyun.jindodata.oss.JindoOSS</value>
          </property>
      
          <property>
              <name>fs.oss.impl</name>
              <value>com.aliyun.jindodata.oss.JindoOssFileSystem</value>
          </property>
      </configuration>
    3. In the core-site.xml file, configure the AccessKey pair of your Alibaba Cloud account or a RAM user that has the required permissions.

      For more information about the permissions required for a RAM user in this scenario, see Grant a RAM user the permissions to connect to the OSS-HDFS service from a non-EMR cluster.

      <configuration>
          <property>
              <name>fs.oss.accessKeyId</name>
              <value>xxx</value>
          </property>
      
          <property>
              <name>fs.oss.accessKeySecret</name>
              <value>xxx</value>
          </property>
      </configuration>
  6. Configure the OSS-HDFS service Endpoint.

    You must configure an Endpoint to access an OSS bucket. Use the following path format: oss://<Bucket>.<Endpoint>/<Object>. For example, oss://examplebucket.cn-hangzhou.oss-dls.aliyuncs.com/exampleobject.txt. The JindoSDK uses the Endpoint in the access path to access the corresponding OSS-HDFS service API.

  7. Use HDFS Shell commands to perform common operations on the OSS-HDFS service.

    • Upload a file

      The following example shows how to upload the examplefile.txt file from the local root directory to examplebucket:

      hdfs dfs -put examplefile.txt oss://examplebucket.cn-hangzhou.oss-dls.aliyuncs.com/
    • Download a file

      The following example shows how to download the exampleobject.txt file from examplebucket to the local /tmp directory:

      hdfs dfs -get oss://examplebucket.cn-hangzhou.oss-dls.aliyuncs.com/exampleobject.txt  /tmp/

    For more information about other operations, see Access the OSS-HDFS service using Hadoop shell commands.

Appendix 1: Performance tuning

To tune performance, you can add the following configuration items to the core-site.xml file of Hadoop. These configuration items are supported only in JindoSDK 4.0 and later.

<configuration>

    <property>
          <!-- The directories for temporary files written by the client. You can configure multiple directories separated by commas. In a multi-user environment, configure read and write permissions. -->
        <name>fs.oss.tmp.data.dirs</name>
        <value>/tmp/</value>
    </property>

    <property>
          <!-- The number of retries after a failed attempt to access OSS. -->
        <name>fs.oss.retry.count</name>
        <value>5</value>
    </property>

    <property>
          <!-- The timeout period for OSS requests, in milliseconds. -->
        <name>fs.oss.timeout.millisecond</name>
        <value>30000</value>
    </property>

    <property>
          <!-- The timeout period for connecting to OSS, in milliseconds. -->
        <name>fs.oss.connection.timeout.millisecond</name>
        <value>3000</value>
    </property>

    <property>
          <!-- The number of concurrent threads for uploading a single file to OSS. -->
        <name>fs.oss.upload.thread.concurrency</name>
        <value>5</value>
    </property>

    <property>
          <!-- The size of the queue for concurrent upload tasks to OSS. -->
        <name>fs.oss.upload.queue.size</name>
        <value>5</value>
    </property>

    <property>
          <!-- The maximum number of concurrent upload tasks to OSS within a process. -->
        <name>fs.oss.upload.max.pending.tasks.per.stream</name>
        <value>16</value>
    </property>

    <property>
          <!-- The size of the queue for concurrent download tasks from OSS. -->
        <name>fs.oss.download.queue.size</name>
        <value>5</value>
    </property>

    <property>
          <!-- The maximum number of concurrent download tasks from OSS within a process. -->
        <name>fs.oss.download.thread.concurrency</name>
        <value>16</value>
    </property>

    <property>
          <!-- The buffer size for pre-reading data from OSS. -->
        <name>fs.oss.read.readahead.buffer.size</name>
        <value>1048576</value>
    </property>

    <property>
          <!-- The number of buffers for concurrently pre-reading data from OSS. -->
        <name>fs.oss.read.readahead.buffer.count</name>
        <value>4</value>
    </property>

</configuration>