Hortonworks Data Platform (HDP) is a big data platform released by Hortonworks and consists of open source components such as Hadoop, Hive, and HBase. Hadoop 3.1.1 is included in HDP 3.0.1 and supports Object Storage Service (OSS). However, earlier versions of HDP do not support OSS. This topic uses HDP 2.6.1.0 as an example to describe how to configure HDP 2.6 to read and write OSS data.
Prerequisites
An HDP 2.6.1.0 cluster is created.If you do not have an HDP 2.6.1.0 cluster, you can use one of the following methods to create an HDP 2.6.1.0 cluster:
- Use Ambari to create an HDP 2.6.1.0 cluster.
- If Ambari is not available, you can manually create an HDP 2.6.1.0 cluster.
Procedure
- Download the HDP 2.6.1.0 package that supports OSS.
- Run the following command to decompress the downloaded package:
sudo tar -xvf hadoop-oss-hdp-2.6.1.0-129.tar
Sample success response:
hadoop-oss-hdp-2.6.1.0-129/ hadoop-oss-hdp-2.6.1.0-129/aliyun-java-sdk-ram-3.0.0.jar hadoop-oss-hdp-2.6.1.0-129/aliyun-java-sdk-core-3.4.0.jar hadoop-oss-hdp-2.6.1.0-129/aliyun-java-sdk-ecs-4.2.0.jar hadoop-oss-hdp-2.6.1.0-129/aliyun-java-sdk-sts-3.0.0.jar hadoop-oss-hdp-2.6.1.0-129/jdom-1.1.jar hadoop-oss-hdp-2.6.1.0-129/aliyun-sdk-oss-3.4.1.jar hadoop-oss-hdp-2.6.1.0-129/hadoop-aliyun-2.7.3.2.6.1.0-129.jar
- Modify the directories of the JAR packages. Note In this topic, all contents enclosed by ${} are environment variables. Modify the environment variables based on the actual environment.
- Move the Hadoop-aliyun-2.7.3.2.6.1.0-129.jar package to the ${/usr/hdp/current}/hadoop-client/ directory. Run the following command to check whether the directory is modified:
sudo ls -lh /usr/hdp/current/hadoop-client/hadoop-aliyun-2.7.3.2.6.1.0-129.jar
Sample success response:
-rw-r--r-- 1 root root 64K Oct 28 20:56 /usr/hdp/current/hadoop-client/hadoop-aliyun-2.7.3.2.6.1.0-129.jar
- Move other jar packages to the ${/usr/hdp/current}/hadoop-client/lib/ directory. Run the following command to check whether the directory is modified:
sudo ls -ltrh /usr/hdp/current/hadoop-client/lib
Sample success response:
total 27M ...... drwxr-xr-x 2 root root 4.0K Oct 28 20:10 ranger-hdfs-plugin-impl drwxr-xr-x 2 root root 4.0K Oct 28 20:10 ranger-yarn-plugin-impl drwxr-xr-x 2 root root 4.0K Oct 28 20:10 native -rw-r--r-- 1 root root 114K Oct 28 20:56 aliyun-java-sdk-core-3.4.0.jar -rw-r--r-- 1 root root 513K Oct 28 20:56 aliyun-sdk-oss-3.4.1.jar -rw-r--r-- 1 root root 13K Oct 28 20:56 aliyun-java-sdk-sts-3.0.0.jar -rw-r--r-- 1 root root 211K Oct 28 20:56 aliyun-java-sdk-ram-3.0.0.jar -rw-r--r-- 1 root root 770K Oct 28 20:56 aliyun-java-sdk-ecs-4.2.0.jar -rw-r--r-- 1 root root 150K Oct 28 20:56 jdom-1.1.jar
- Move the Hadoop-aliyun-2.7.3.2.6.1.0-129.jar package to the ${/usr/hdp/current}/hadoop-client/ directory. Run the following command to check whether the directory is modified:
- Perform the preceding operations on all HDP nodes.
- Use Ambari to add configurations. If your cluster does not use Ambari for management, modify core-site.xml. In this example, Ambari is used. The following table describes the configurations that you must add.
Parameter Description fs.oss.endpoint Specify the endpoint of the region in which the bucket that you want to access is located. Example: oss-cn-zhangjiakou-internal.aliyuncs.com.
fs.oss.accessKeyId Enter the AccessKey ID used to access OSS. fs.oss.accessKeySecret Enter the AccessKey secret used to access OSS. fs.oss.impl Specify the class used to implement the OSS file system based on Hadoop. Set the value to org.apache.hadoop.fs.aliyun.oss.AliyunOSSFileSystem. fs.oss.buffer.dir Specify the name of the directory used to store temporary files. We recommend that you set this parameter to /tmp/oss.
fs.oss.connection.secure.enabled Specify whether to enable HTTPS. Performance may be affected when HTTPS is enabled. We recommend that you set this parameter to false.
fs.oss.connection.maximum Specify the maximum number of connections to OSS. We recommend that you set this parameter to 2048.
For more information about more parameters, visit Hadoop-Aliyun module.
- Restart the cluster as prompted by Ambari.
- Test whether data can be read from and written to OSS.
- Run the following command to test whether data can be read from OSS:
sudo hadoop fs -ls oss://${your-bucket-name}/
- Run the following command to test whether data can be written to OSS:
sudo hadoop fs -mkdir oss://${your-bucket-name}/hadoop-test
If data can be read from and written to OSS, the configurations are successful. Otherwise, check whether the configurations are correct.
- Run the following command to test whether data can be read from OSS:
- To run MapReduce jobs, run the following command to move the HDP 2.6.1.0 package to the hdfs://hdp-master:8020/hdp/apps/2.6.1.0-129/mapreduce/mapreduce.tar.gz package: Note In this example, MapReduce jobs are used. For more information about how to run jobs of other types, refer to the following step and code. For example, to run TEZ jobs, move the HDP 2.6.1.0 package to the hdfs://hdp-master:8020/hdp/apps/2.6.1.0-129/tez/tez.tar.gz package.
sudo su hdfs sudo cd sudo hadoop fs -copyToLocal /hdp/apps/2.6.1.0-129/mapreduce/mapreduce.tar.gz sudo hadoop fs -rm /hdp/apps/2.6.1.0-129/mapreduce/mapreduce.tar.gz sudo cp mapreduce.tar.gz mapreduce.tar.gz.bak sudo tar zxf mapreduce.tar.gz sudo cp /usr/hdp/current/hadoop-client/hadoop-aliyun-2.7.3.2.6.1.0-129.jar hadoop/share/hadoop/tools/lib/ sudo cp /usr/hdp/current/hadoop-client/lib/aliyun-* hadoop/share/hadoop/tools/lib/ sudo cp /usr/hdp/current/hadoop-client/lib/jdom-1.1.jar hadoop/share/hadoop/tools/lib/ sudo tar zcf mapreduce.tar.gz hadoop sudo hadoop fs -copyFromLocal mapreduce.tar.gz /hdp/apps/2.6.1.0-129/mapreduce/
Verify the configurations
You can test TeraGen and TeraSort to check whether the configurations take effect.
- Run the following command to test TeraGen:
sudo hadoop jar /usr/hdp/current/hadoop-mapreduce-client/hadoop-mapreduce-examples.jar teragen -Dmapred.map.tasks=100 10995116 oss://{bucket-name}/1G-input
Sample success response:
18/10/28 21:32:38 INFO client.RMProxy: Connecting to ResourceManager at cdh-master/192.168.0.161:8050 18/10/28 21:32:38 INFO client.AHSProxy: Connecting to Application History server at cdh-master/192.168.0.161:10200 18/10/28 21:32:38 INFO aliyun.oss: [Server]Unable to execute HTTP request: Not Found [ErrorCode]: NoSuchKey [RequestId]: 5BD5BA7641FCE369BC1D052C [HostId]: null 18/10/28 21:32:38 INFO aliyun.oss: [Server]Unable to execute HTTP request: Not Found [ErrorCode]: NoSuchKey [RequestId]: 5BD5BA7641FCE369BC1D052F [HostId]: null 18/10/28 21:32:39 INFO terasort.TeraSort: Generating 10995116 using 100 18/10/28 21:32:39 INFO mapreduce.JobSubmitter: number of splits:100 18/10/28 21:32:39 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1540728986531_0005 18/10/28 21:32:39 INFO impl.YarnClientImpl: Submitted application application_1540728986531_0005 18/10/28 21:32:39 INFO mapreduce.Job: The url to track the job: http://cdh-master:8088/proxy/application_1540728986531_0005/ 18/10/28 21:32:39 INFO mapreduce.Job: Running job: job_1540728986531_0005 18/10/28 21:32:49 INFO mapreduce.Job: Job job_1540728986531_0005 running in uber mode : false 18/10/28 21:32:49 INFO mapreduce.Job: map 0% reduce 0% 18/10/28 21:32:55 INFO mapreduce.Job: map 1% reduce 0% 18/10/28 21:32:57 INFO mapreduce.Job: map 2% reduce 0% 18/10/28 21:32:58 INFO mapreduce.Job: map 4% reduce 0% ... 18/10/28 21:34:40 INFO mapreduce.Job: map 99% reduce 0% 18/10/28 21:34:42 INFO mapreduce.Job: map 100% reduce 0% 18/10/28 21:35:15 INFO mapreduce.Job: Job job_1540728986531_0005 completed successfully 18/10/28 21:35:15 INFO mapreduce.Job: Counters: 36 ...
- Run the following command to test TeraSort:
sudo hadoop jar /usr/hdp/current/hadoop-mapreduce-client/hadoop-mapreduce-examples.jar terasort -Dmapred.map.tasks=100 oss://{bucket-name}/1G-input oss://{bucket-name}/1G-output
Sample success response:
18/10/28 21:39:00 INFO terasort.TeraSort: starting ... 18/10/28 21:39:02 INFO mapreduce.JobSubmitter: number of splits:100 18/10/28 21:39:02 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1540728986531_0006 18/10/28 21:39:02 INFO impl.YarnClientImpl: Submitted application application_1540728986531_0006 18/10/28 21:39:02 INFO mapreduce.Job: The url to track the job: http://cdh-master:8088/proxy/application_1540728986531_0006/ 18/10/28 21:39:02 INFO mapreduce.Job: Running job: job_1540728986531_0006 18/10/28 21:39:09 INFO mapreduce.Job: Job job_1540728986531_0006 running in uber mode : false 18/10/28 21:39:09 INFO mapreduce.Job: map 0% reduce 0% 18/10/28 21:39:17 INFO mapreduce.Job: map 1% reduce 0% 18/10/28 21:39:19 INFO mapreduce.Job: map 2% reduce 0% 18/10/28 21:39:20 INFO mapreduce.Job: map 3% reduce 0% ... 18/10/28 21:42:50 INFO mapreduce.Job: map 100% reduce 75% 18/10/28 21:42:53 INFO mapreduce.Job: map 100% reduce 80% 18/10/28 21:42:56 INFO mapreduce.Job: map 100% reduce 86% 18/10/28 21:42:59 INFO mapreduce.Job: map 100% reduce 92% 18/10/28 21:43:02 INFO mapreduce.Job: map 100% reduce 98% 18/10/28 21:43:05 INFO mapreduce.Job: map 100% reduce 100% ^@18/10/28 21:43:56 INFO mapreduce.Job: Job job_1540728986531_0006 completed successfully 18/10/28 21:43:56 INFO mapreduce.Job: Counters: 54 ...
If the tests are successful, the configurations take effect.