How to Install Hadoop Cluster on Ubuntu 16.04

In this tutorial, we will learn how to setup an Apache Hadoop on a single node cluster in an Alibaba Cloud Elastic Compute Service (ECS) instance with Ubuntu 16.04.

Install Hadoop

Before starting, you will need to download the latest version of the Hadoop from their official website. You can download it with the following command:

wget http://www-eu.apache.org/dist/hadoop/common/hadoop-3.1.0/hadoop-3.1.0.tar.gz

Once the download is completed, extract the downloaded file with the following command:

tar -xvzf hadoop-3.1.0.tar.gz

Next, move the extracted directory to the /opt with the following command:

mv hadoop-3.1.0 /opt/hadoop

Next, change the ownership of the hadoop directory using the following command:

chown -R hadoop:hadoop /opt/hadoop/

Next, you will need to set an environment variable for Hadoop. You can do this by editing .bashrc file:

First, log in to hadoop user:

su - hadoop

Next, open .bashrc file:

nano .bashrc

Add the following lines at the end of the file:

export HADOOP_HOME=/opt/hadoop
export HADOOP_INSTALL=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin

Save and close the file, when you are finished. Then, initialize the environment variables using the following command:

source .bashrc

Next, you will also need to setup Java environment variable for Hadoop. You can do this by editing hadoop-env.sh file:

First, find the default Java path using the following command:

readlink -f /usr/bin/java | sed "s:bin/java::"

Output:

/usr/lib/jvm/java-8-openjdk-amd64/jre/

Now, open hadoop-env.sh file and paste above output in the hadoop-env.sh file:

nano /opt/hadoop/etc/hadoop/hadoop-env.sh

Make the following changes:

#export JAVA_HOME=${JAVA_HOME}
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64/jre/

Save and close the file, when you are finished.

Configure Hadoop

Next, you will need to configure multiple configuration files to setup Hadoop infrastructure. First, log in with hadoop user and create a directory for hadoop file system storage:

mkdir -p /opt/hadoop/hadoopdata/hdfs/namenode
mkdir -p /opt/hadoop/hadoopdata/hdfs/datanode

First, you will need to edit core-site.xml file. This file contains the Hadoop port number information, file system allocated memory, data store memory limit and the size of Read/Write buffers.

nano /opt/hadoop/etc/hadoop/core-site.xml

Make the following changes:

<configuration>
<property>
  <name>fs.default.name</name>
    <value>hdfs://localhost:9000</value>
</property>
</configuration>

Save the file, then open the hdfs-site.xml file. This file contains the replication data value, namenode path and datanode path for local file systems.

nano /opt/hadoop/etc/hadoop/hdfs-site.xml

Make the following changes:

<configuration>
<property>
 <name>dfs.replication</name>
 <value>1</value>
</property>

<property>
  <name>dfs.name.dir</name>
    <value>file:///opt/hadoop/hadoopdata/hdfs/namenode</value>
</property>

<property>
  <name>dfs.data.dir</name>
    <value>file:///opt/hadoop/hadoopdata/hdfs/datanode</value>
</property>
</configuration>

Save the file, then open the mapred-site.xml file.

nano /opt/hadoop/etc/hadoop/mapred-site.xml

Make the following changes:

<configuration>
 <property>
  <name>mapreduce.framework.name</name>
   <value>yarn</value>
 </property>
</configuration>

Save the file, then open the yarn-site.xml file:

nano /opt/hadoop/etc/hadoop/yarn-site.xml

Make the following changes:

<configuration>
 <property>
  <name>yarn.nodemanager.aux-services</name>
    <value>mapreduce_shuffle</value>
 </property>
</configuration>

Save and close the file, when you are finished. If you would like to have more information, like how to install java to get started, access Hadoop services and test hadoop, please go to this tutorial.

Related Market Product

Apache-Solr powered by Websoft9(Ubuntu16.04)

Websoft9 Apache Solr is a pre-configured, ready to run image for running Apache Solr on Alibaba Cloud.Solr is a standalone enterprise search server with a REST-like API.

Related Products

Anti-DDoS Pro

Alibaba Cloud Anti-DDoS Pro is a paid service that features a set of high-defensive IPs, and acts as a protective barrier for the origin. It safeguards network servers under high volume DDoS attacks. After configuring the high defensive IPs for the network servers, all traffic passes through the Anti-DDoS Pro instance before rerouting to the origin.

Elastic Compute Service

Alibaba Cloud Elastic Compute Service (ECS) provides fast memory and the latest Intel CPUs to help you to power your cloud applications and achieve faster results with low latency. All ECS instances come with Anti-DDoS protection to safeguard your data and applications from DDoS and Trojan attacks.

Related Course

Hadoop Cluster Installation on Alibaba Cloud ECS

This course is designed to help users who want to understand Big Data technology, as well as data processing through cloud products. Through learning, users can fully understand the construction and basic use of Hadoop cluster, SSH protocol foundation, Hadoop directory structure, ECS security group configuration. It provides a reference for enterprises to build Hadoop cluster by themselves.

Community

How to Install Hadoop Cluster on Ubuntu 16.04

Install Hadoop

Configure Hadoop

Related Blog Posts

How to Safeguard Apache Web Server on Ubuntu

How to Install and Configure Seafile on Ubuntu 16.04

Related Market Product

Apache-Solr powered by Websoft9(Ubuntu16.04)

Related Documentation

Deploy a Java Web project

Harden Hadoop environment security

Related Products

Anti-DDoS Pro

Elastic Compute Service

Hadoop Cluster Installation on Alibaba Cloud ECS

Read previous post:

Read next post:

Alibaba Clouder

You may also like

Comments

Alibaba Clouder

Related Products

ECS(Elastic Compute Service)

Anti-DDoS

A Free Trial That Lets You Build Big!