This topic describes how to use open source HDFS clients to access LindormDFS.
Prerequisites
Java Development Kit (JDK) 1.7 or later versions are installed.
The IP address of your client is added to the whitelist of your Lindorm instance. For more information, see Configure whitelists.
Usage notes
If your client is deployed on an ECS instance, the ECS instance and the Lindorm instance meet the following requirements to ensure network connectivity:
The ECS instance and the Lindorm instance are deployed in the same region. We recommend that you also deploy the two instances in the same zone to reduce network latency.
The ECS instance and the Lindorm instance belong to the same virtual private cloud (VPC).
Download the client
You can download the Apache Hadoop SDK V2.7.3 package hadoop-2.7.3.tar.gz
from the Apache Hadoop official site.
Configure Apache Hadoop
Run the following command to decompress the downloaded SDK package:
tar -zxvf hadoop-2.7.3.tar.gz
Run the following command to configure environment variables:
export HADOOP_HOME=/${Hadoop installation directory}/hadoop-2.7.3
Run the following command to go to the
hadoop
directory:cd $HADOOP_HOME
Run the following commands to add the
JAVA_HOME
variable to thehadoop-env.sh
file in theetc/hadoop/
directory. In this example, Java is installed in the/opt/install/java
directory.# set to the root of your Java installation export JAVA_HOME=/opt/install/java
Modify the
etc/hadoop/hdfs-site.xml
file. The following sample file shows how to modify thehdfs-site.xml
file. You must replace ${Instance ID} in the file with the ID of your Lindorm instance.<configuration> <property> <name>dfs.nameservices</name> <value>${Instance ID}</value> </property> <property> <name>dfs.client.failover.proxy.provider.${Instance ID}</name> <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value> </property> <property> <name>dfs.ha.automatic-failover.enabled</name> <value>true</value> </property> <property> <name>dfs.ha.namenodes.${Instance ID}</name> <value>nn1,nn2</value> </property> <property> <name>dfs.namenode.rpc-address.${Instance ID}.nn1</name> <value>${Instance ID}-master1-001.lindorm.rds.aliyuncs.com:8020</value> </property> <property> <name>dfs.namenode.rpc-address.${Instance ID}.nn2</name> <value>${Instance ID}-master2-001.lindorm.rds.aliyuncs.com:8020</value> </property> </configuration>
You can also use the configuration file that is automatically generated by the system. For more information, see Activate LindormDFS.
The preceding example shows how to configure Apache Hadoop on a single instance. The
${Instance ID}
field is the ID of a single Lindorm instance. To configure Apache Hadoop on multiple instances, add multiple replicas of all<property>
attributes in the example to the<configuration>
attribute based on the number of the instances and replace the instance ID in each replica with the ID of an instance on which you want to configure Apache Hadoop.
Examples of common operations
Upload a local file.
Create a directory.
$HADOOP_HOME/bin/hadoop fs -mkdir hdfs://${Instance ID}/test
Prepare a file and upload the file to the created directory in LindormDFS.
echo "test" > test.log $HADOOP_HOME/bin/hadoop fs -put test.log hdfs://${Instance ID}/test
View the uploaded file.
$HADOOP_HOME/bin/hadoop fs -ls hdfs://${Instance ID}/test
Download the file to your local computer.
$HADOOP_HOME/bin/hadoop fs -get hdfs://${Instance ID}/test/test.log
NoteYou must replace ${Instance ID} in the preceding commands with the ID of your Lindorm instance.