×
Community Blog Running Mapreduce Workload in Alibaba Cloud EMR Cluster

Running Mapreduce Workload in Alibaba Cloud EMR Cluster

In this article, we’ll explain how to run map-reduce jobs in the Alibaba Cloud EMR Cluster.

Introduction

E-MapReduce (EMR) is a cloud-native open-source big data platform that provides you with easy-to-integrate open-source big data computing and storage engines such as Hadoop, Hive, Spark, Flink, Presto, and ClickHouse. EMR allows you to adjust computing resources based on your business needs and deploy the resources on Alibaba Cloud Elastic Search Service (ECS), Alibaba Cloud Container Service for Kubernetes (ACK), and Apsara Stack. In this blog, we are going to see how to run map-reduce jobs in the Alibaba Cloud EMR Cluster.

Step-1: Create an EMR cluster as shown in the image below.

1

Step-2: Upload the hadoop-mapreduce-examples-2.7.2 and the file to be processed into the Alibaba Cloud OSS as shown below

2

Step-3: Log in to the master node using ssh as shown below

3

Step-4: Get jar file and txt file from OSS using the command wget.

4

Step-5: Run following commands

5.1: hadoop fs -ls / -> to check hadoop file system directory
5.2: hadoop fs -mkdir /input -> to create a directory input
5.3: hadoop fs -mkdir /output -> to create an output directory in the Hadoop file system
5.4: hadoop fs -put file.txt /input/ -> to upload the downloaded story file to the Hadoop file system
5.5: hadoop fs -ls /input -> to view the uploaded file

5

Step-6: Running the job using the command below

6

Step-7: Running the following command to view the file

7.2: hadoop fs -ls /output/res -> to view content in the /output/res directory
7.2: hadoop fs -get /output/res/part-r-00004
7.3: ls
7.4: vim part-r-00004 -> to open the file as shown below

7

Step-8: Getting the frequency of the word Broken in the file

8.1: hadoop jar hadoop-mapreduce-examples-2.7.2.jar grep /input/ /output/res1 Broken
8.2: hadoop fs -ls /output/res1 -> move to the res1 folder
8.3: hadoop fs -cat /output/res1/part-r-00000 -> to view the result file

8

Step-9: Generating random text

9.1: hadoop jar hadoop-mapreduce-examples-2.7.2.jar randomtextwriter -D mapreduce.randomtextwriter.totalbytes=100000000 /output/res2 -> to generate random text
9.2: hadoop fs -ls /output/res2 -> to view resulted file
9.3: vim part-m-00000 -> to view the file generated as shown below

9

Conclusion

Alibaba Cloud E-MapReduce (EMR), a cloud-native open-source big data platform, provides easy-to-integrate open-source big data computing and storage engines such as Hadoop, Hive, Spark, Flink, Presto, and ClickHouse. The Alibaba Cloud EMR service can also be used to create an EMR cluster within minutes with just a few mouse clicks. In this blog, we have provided an overview of the steps involved in running MapReduce workloads in the Alibaba Cloud EMR Cluster.

0 1 0
Share on

GAVASKAR S

12 posts | 3 followers

You may also like

Comments

GAVASKAR S

12 posts | 3 followers

Related Products