The OSS-HDFS service, also known as the JindoFS service, is fully compatible with Hadoop Distributed File System (HDFS) interfaces and supports directory-level operations. The Jindo software development kit (SDK) allows Apache Hadoop computing and analytics applications, such as MapReduce, Hive, Spark, and Flink, to access the OSS-HDFS service. This topic describes how to deploy the JindoSDK on an ECS instance and perform common operations to get started with the OSS-HDFS service.
If you use an Alibaba Cloud EMR cluster, connect to the OSS-HDFS service from the EMR cluster. For more information, see Quick start for connecting to the OSS-HDFS service from an EMR cluster.
Prerequisites
By default, an Alibaba Cloud account has the permissions to connect to the OSS-HDFS service from a non-EMR cluster and perform common operations. If you want to use a Resource Access Management (RAM) user, the RAM user must have the required permissions. For more information, see Grant a RAM user the permissions to connect to the OSS-HDFS service from a non-EMR cluster.
For example, if your deployment environment is Alibaba Cloud ECS, you must purchase an ECS instance.
A Hadoop environment is created. For more information, see Create a Hadoop runtime environment.
The OSS-HDFS service is enabled for a bucket, and access to the bucket is authorized. For more information, see Enable the OSS-HDFS service.
Procedure
Connect to the ECS instance. For more information, see Connect to an ECS instance.
Download and decompress the JindoSDK JAR package. For the download link, see GitHub.
Run the following command to decompress the JindoSDK JAR package.
The following example shows how to decompress
jindosdk-x.x.x-linux.tar.gz. If you use a different version of JindoSDK, replace the JAR package name with the actual one.tar zxvf jindosdk-x.x.x-linux.tar.gzNotex.x.x indicates the version number of the JindoSDK JAR package.
Configure environment variables.
Configure
JINDOSDK_HOME.In this example, the package is decompressed to the /usr/lib/jindosdk-x.x.x-linux directory:
export JINDOSDK_HOME=/usr/lib/jindosdk-x.x.x-linuxConfigure
HADOOP_CLASSPATH.export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:${JINDOSDK_HOME}/lib/*ImportantDeploy the installation directory and set the environment variables on all required nodes.
Configure the OSS-HDFS service implementation class and AccessKey.
Run the following command to open the core-site.xml configuration file of Hadoop.
vim /usr/local/hadoop/etc/hadoop/core-site.xmlIn the core-site.xml file, configure the JindoSDK DLS implementation class.
<configuration> <property> <name>fs.AbstractFileSystem.oss.impl</name> <value>com.aliyun.jindodata.oss.JindoOSS</value> </property> <property> <name>fs.oss.impl</name> <value>com.aliyun.jindodata.oss.JindoOssFileSystem</value> </property> </configuration>In the core-site.xml file, configure the AccessKey pair of your Alibaba Cloud account or a RAM user that has the required permissions.
For more information about the permissions required for a RAM user in this scenario, see Grant a RAM user the permissions to connect to the OSS-HDFS service from a non-EMR cluster.
<configuration> <property> <name>fs.oss.accessKeyId</name> <value>xxx</value> </property> <property> <name>fs.oss.accessKeySecret</name> <value>xxx</value> </property> </configuration>
Configure the OSS-HDFS service Endpoint.
You must configure an Endpoint to access an OSS bucket. Use the following path format:
oss://<Bucket>.<Endpoint>/<Object>. For example,oss://examplebucket.cn-hangzhou.oss-dls.aliyuncs.com/exampleobject.txt. The JindoSDK uses the Endpoint in the access path to access the corresponding OSS-HDFS service API.Use HDFS Shell commands to perform common operations on the OSS-HDFS service.
Upload a file
The following example shows how to upload the examplefile.txt file from the local root directory to examplebucket:
hdfs dfs -put examplefile.txt oss://examplebucket.cn-hangzhou.oss-dls.aliyuncs.com/Download a file
The following example shows how to download the exampleobject.txt file from examplebucket to the local /tmp directory:
hdfs dfs -get oss://examplebucket.cn-hangzhou.oss-dls.aliyuncs.com/exampleobject.txt /tmp/
For more information about other operations, see Access the OSS-HDFS service using Hadoop shell commands.