OSS-HDFS (JindoFS) is fully compatible with Hadoop Distributed File System (HDFS) API operations and supports directory-level operations. JindoSDK allows Apache Hadoop-based computing and analysis applications, such as MapReduce, Hive, Spark, and Flink, to access HDFS. This topic describes how to deploy JindoSDK on an Elastic Compute Service (ECS) instance and perform common operations related to OSS-HDFS.
If you use an Alibaba Cloud E-MapReduce (EMR) cluster, connect the EMR cluster to OSS-HDFS by using the methods described in Connect EMR clusters to OSS-HDFS.
Prerequisites
By default, an Alibaba Cloud account has the permissions to connect non-EMR clusters to OSS-HDFS and perform common operations related to OSS-HDFS. An Alibaba Cloud account or a RAM user that is granted the required permissions is created. If you want to use a RAM user to connect non-EMR clusters to OSS-HDFS, the RAM user must have the required permissions. For more information, see Grant a RAM user permissions to connect non-EMR clusters to OSS-HDFS.
An ECS instance is created. For more information, see Create an instance
A Hadoop environment is created. For more information about how to install Hadoop, see Step 2: Create a Hadoop runtime environment.
OSS-HDFS is enabled for a bucket and permissions are granted to the RAM role to access OSS-HDFS. For more information, see Enable OSS-HDFS and grant access permissions.
Video tutorial
The following video provides an example on how to connect non-EMR clusters to OSS-HDFS and perform common operations.
Procedure
Connect to an ECS instance. For more information, see Connect to an instance.
Download and decompress the JindoSDK JAR package. To download JindoSDK, visit GitHub.
Run the following command to decompress the JindoSDK JAR package:
The following sample code provides an example on how to decompress a JindoSDK JAR package named
jindosdk-x.x.x-linux.tar.gz
. If you use another version of JindoSDK, replace the package name with the name of the corresponding JindoSDK JAR package.tar zxvf jindosdk-x.x.x-linux.tar.gz
Notex.x.x indicates the version number of the JindoSDK JAR package.
Configure environment variables.
Configure
JINDOSDK_HOME
.The following sample code provides an example on how to decompress the package to the /usr/lib/jindosdk-x.x.x-linux directory:
export JINDOSDK_HOME=/usr/lib/jindosdk-x.x.x-linux
Configure
HADOOP_CLASSPATH
.export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:${JINDOSDK_HOME}/lib/*
ImportantSpecify the installation directory of the package and configure environment variables on all required nodes.
Configure the implementation class of OSS-HDFS and specify the AccessKey pair that you want to use to access the bucket.
Run the following command to go to the Hadoop configuration file named core-site.xml:
vim /usr/local/hadoop/etc/hadoop/core-site.xml
Configure the JindoSDK DLS implementation class in the core-site.xml file.
<configuration> <property> <name>fs.AbstractFileSystem.oss.impl</name> <value>com.aliyun.jindodata.oss.JindoOSS</value> </property> <property> <name>fs.oss.impl</name> <value>com.aliyun.jindodata.oss.JindoOssFileSystem</value> </property> </configuration>
In the core-site.xml file, configure the AccessKey pair of the Alibaba Cloud account or the RAM user that has the required permissions.
For more information about the permissions that a RAM user must have in this scenario, see Grant a RAM user permissions to connect non-EMR clusters to OSS-HDFS.
<configuration> <property> <name>fs.oss.accessKeyId</name> <value>xxx</value> </property> <property> <name>fs.oss.accessKeySecret</name> <value>xxx</value> </property> </configuration>
Specify the endpoint of OSS-HDFS.
You must specify the endpoint of OSS-HDFS if you want to use OSS-HDFS to access OSS buckets. We recommend that you configure the access path in the following format:
oss://<Bucket>.<Endpoint>/<Object>
. Example:oss://examplebucket.cn-hangzhou.oss-dls.aliyuncs.com/exampleobject.txt
. After you configure the access path, JindoSDK calls the corresponding OSS-HDFS operation based on the specified endpoint in the access path.You can also configure the endpoint of OSS-HDFS by using other methods. The endpoints that are configured by using different methods have different priorities. For more information, see Appendix 1: Other methods used to configure the endpoint of OSS-HDFS.
Run HDFS Shell commands to perform common operations that are related to OSS-HDFS.
Upload local files
Run the following command to upload a local file named examplefile.txt in the local root directory to a bucket named examplebucket:
hdfs dfs -put examplefile.txt oss://examplebucket.cn-hangzhou.oss-dls.aliyuncs.com/
Download objects
Run the following command to download an object named exampleobject.txt from a bucket named examplebucket to the root directory named /tmp on your computer:
hdfs dfs -get oss://examplebucket.cn-hangzhou.oss-dls.aliyuncs.com/exampleobject.txt /tmp/
For more information, see Use Hadoop Shell commands to access OSS-HDFS.