Use open source HDFS clients to connect to and use LindormDFS - Lindorm

This topic describes how to use open source HDFS clients to access LindormDFS.

Prerequisites

Java Development Kit (JDK) 1.7 or later versions are installed.
The IP address of your client is added to the whitelist of your Lindorm instance. For more information, see Configure whitelists.

Usage notes

If your client is deployed on an ECS instance, the ECS instance and the Lindorm instance meet the following requirements to ensure network connectivity:

The ECS instance and the Lindorm instance are deployed in the same region. We recommend that you also deploy the two instances in the same zone to reduce network latency.
The ECS instance and the Lindorm instance belong to the same virtual private cloud (VPC).

Download the client

You can download the Apache Hadoop SDK V2.7.3 package hadoop-2.7.3.tar.gz from the Apache Hadoop official site.

Configure Apache Hadoop

Run the following command to decompress the downloaded SDK package:
```
tar -zxvf hadoop-2.7.3.tar.gz
```

Run the following command to configure environment variables:

export HADOOP_HOME=/${Hadoop installation directory}/hadoop-2.7.3

Run the following command to go to the hadoop directory:
```
cd $HADOOP_HOME
```
Run the following commands to add the JAVA_HOME variable to the hadoop-env.sh file in the etc/hadoop/ directory. In this example, Java is installed in the /opt/install/java directory.
```
# set to the root of your Java installation
export JAVA_HOME=/opt/install/java
```

Modify the etc/hadoop/hdfs-site.xml file. The following sample file shows how to modify the hdfs-site.xml file. You must replace ${Instance ID} in the file with the ID of your Lindorm instance.

<configuration>
  <property>
        <name>dfs.nameservices</name>
        <value>${Instance ID}</value>
    </property>
    <property>
       <name>dfs.client.failover.proxy.provider.${Instance ID}</name>
       <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
    </property>
    <property>
       <name>dfs.ha.automatic-failover.enabled</name>
       <value>true</value>
    </property>
    <property>
       <name>dfs.ha.namenodes.${Instance ID}</name>
       <value>nn1,nn2</value>
    </property>
     <property>
       <name>dfs.namenode.rpc-address.${Instance ID}.nn1</name>
       <value>${Instance ID}-master1-001.lindorm.rds.aliyuncs.com:8020</value>
    </property>
    <property>
       <name>dfs.namenode.rpc-address.${Instance ID}.nn2</name>
       <value>${Instance ID}-master2-001.lindorm.rds.aliyuncs.com:8020</value>
    </property>
</configuration>

Note

You can also use the configuration file that is automatically generated by the system. For more information, see Activate LindormDFS.
The preceding example shows how to configure Apache Hadoop on a single instance. The ${Instance ID} field is the ID of a single Lindorm instance. To configure Apache Hadoop on multiple instances, add multiple replicas of all <property> attributes in the example to the <configuration> attribute based on the number of the instances and replace the instance ID in each replica with the ID of an instance on which you want to configure Apache Hadoop.

Examples of common operations

Upload a local file.

Create a directory.

$HADOOP_HOME/bin/hadoop fs -mkdir hdfs://${Instance ID}/test

Prepare a file and upload the file to the created directory in LindormDFS.

echo "test" > test.log
$HADOOP_HOME/bin/hadoop fs -put test.log hdfs://${Instance ID}/test

View the uploaded file.

 $HADOOP_HOME/bin/hadoop fs -ls hdfs://${Instance ID}/test

Download the file to your local computer.
```
$HADOOP_HOME/bin/hadoop fs -get hdfs://${Instance ID}/test/test.log
```
Note
You must replace ${Instance ID} in the preceding commands with the ID of your Lindorm instance.