Introduction to paths of frequently used files - E-MapReduce

This topic describes the paths of files that are frequently used in E-MapReduce (EMR). You can log on to the master node of your cluster to view the file paths.

DataLake cluster

Big data service directories

Big data services are installed in directories in the /opt/apps/xxx format. Examples:

HDFS: /opt/apps/HDFS/hdfs-current
Hive: /opt/apps/HIVE/hive-current
Hudi: /opt/apps/HUDI/hudi-current
YARN: /opt/apps/YARN/yarn-current
Presto: /opt/apps/PRESTO/presto-current
Ranger: /opt/apps/RANGER/ranger-current

You can also log on to the master node of your cluster and run the env |grep xxx command to query the directory where a service is installed. Replace xxx with the related service name.

For example, you can run the env |grep hive command to query the directory where the Hive service is installed.

JINDOTABLE_EXTRA_CLASSPATH=/opt/apps/METASTORE/metastore-current/hive2
HIVE_HOME=/opt/apps/HIVE/hive-current
HIVE_LOG_DIR=/var/log/taihao-apps/hive
HIVE_CONF_DIR=/etc/taihao-apps/hive-conf
PATH=/opt/apps/JINDOSDK/jindosdk-current/bin:/opt/apps/HADOOP-COMMON/hadoop-common-current/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/opt/apps/HIVE/hive-current/bin:/opt/apps/JINDODATA/jindodata-current/bin:/opt/apps/JINDODATA/jindodata-current/sbin:/opt/apps/SPARK-EXTENSION/spark-extension-current/bin:/opt/apps/SPARK3/spark-current/bin:/root/bin
OLDPWD=/var/log/emr/hive

Log directories

Logs are stored in directories in the /var/log/emr/xxx format. Examples:

Spark: /var/log/emr/spark/
Hive: /var/log/emr/hive/
YARN: /var/log/emr/yarn/
JindoSDK: /var/log/emr/jindosdk/

Configuration file directories

Configuration files are stored in directories in the /etc/emr/xxx format. Examples:

HDFS: /etc/emr/hdfs-conf/
Spark: /etc/emr/spark-conf/
Hive: /etc/emr/hive-conf/
Hudi: /etc/emr/hudi-conf/
Knox: /etc/emr/knox-conf/
YARN: /etc/emr/hadoop-conf/
ZooKeeper: /etc/emr/zookeeper-conf/

Hadoop cluster

Big data service directories

Big data services are installed in directories in the /usr/lib/xxx format. Examples:

Hadoop: /usr/lib/hadoop-current
Spark: /usr/lib/spark-current
Hive: /usr/lib/hive-current
Flink: /usr/lib/flink-current
Flume: /usr/lib/flume-current

You can also log on to the master node of your cluster and run the env |grep xxx command to query the directory where a service is installed.

For example, you can run the following command to query the directory where the Spark service is installed:

env |grep spark

The following information is returned. /usr/lib/spark-current is the directory where the Spark service is installed.

SPARK_HOME=/usr/lib/spark-current
SPARK_CONF_DIR=/etc/ecm/spark-conf
SPARK_LOG_DIR=/mnt/disk1/log/spark
PATH=/usr/lib/sqoop-current/bin:/usr/lib/jindosdk-current/bin:/usr/lib/hudi-current/bin:/usr/lib/hive-current/hcatalog/bin:/usr/lib/hive-current/bin:/usr/lib/datafactory-current/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/usr/lib/flow-agent-current/bin:/usr/lib/hadoop-current/bin:/usr/lib/hadoop-current/sbin:/usr/lib/jindodata-current//bin:/usr/lib/jindodata-current//sbin:/usr/lib/spark-current/bin:/usr/lib/hadoop-current/bin:/usr/lib/hadoop-current/sbin:/root/bin
HADOOP_CLASSPATH=/opt/apps/extra-jars/*:/usr/lib/spark-current/yarn/spark-3.2.1-yarn-shuffle.jar
SPARK_PID_DIR=/usr/lib/spark-current/pids

Log directories

Service logs are stored in directories in the /mnt/disk1/log/xxx format. Examples:

YARN ResourceManager logs: /mnt/disk1/log/hadoop-yarn in the master node
YARN NodeManager logs: /mnt/disk1/log/hadoop-yarn in a core node or a task node
HDFS NameNode logs: /mnt/disk1/log/hadoop-hdfs in the master node
HDFS DataNode logs: /mnt/disk1/log/hadoop-hdfs in a core node or a task node
Hive logs: /mnt/disk1/log/hive in the master node
ESS logs: /mnt/disk1/log/ess/ in the master node, a core node, or a task node

Configuration file directories

Configuration files are stored in directories in the /etc/ecm/xxx format. Examples:

Hadoop: /etc/ecm/hadoop-conf/
Spark: /etc/ecm/spark-conf/
Hive: /etc/ecm/hive-conf/
Flink: /etc/ecm/flink-conf/
Flume: /etc/ecm/flume-conf/

If you log on to your cluster in SSH mode, you can only view the parameters in configuration files. To modify the parameters in configuration files, you must log on to the EMR console.

Data directories

Cached data in JindoFS: /mnt/disk1/jindodata/