High availability feature of YARN

0.0.201

This topic describes the high availability feature of YARN and related configurations.

Overview

Hadoop YARN is a distributed cluster resource management system based on the master-slave architecture. ResourceManager is the master component that manages cluster resources and schedules jobs. NodeManager is the slave component that manages and monitors the jobs of a single node.

The high availability feature of YARN provides the following functionalities:

ResourceManager High Availability (RM HA) allows you to start multiple ResourceManager processes on different nodes. This helps prevent single points of failure (SPOFs). For more information, see ResourceManager High Availability.
A ResourceManager restart can continuously synchronize information about and the status of an application to ZooKeeper in real time. When a ResourceManager is restarted, the status of an application is reloaded. This ensures that the application can be automatically restored after an EMR cluster is upgraded or restarted. For more information, see ResourceManager Restart.
A NodeManager restart can continuously synchronize information about and the status of a running container to a local storage system, such as LevelDB. When a NodeManager is restarted, the status of a container is reloaded. This ensures that running containers are not affected during node upgrades or restarts. For more information, see NodeManager Restart.

Applications and containers are not interrupted in common scenarios, such as SPOFs of ResourceManagers, ResourceManager upgrades or restarts, and NodeManager upgrades or restarts.

Dependency

The high availability feature of YARN implements distributed election and stores application information and the status metadata based on ZooKeeper. This ensures strong consistency for clusters. The following table describes the configuration item of ZooKeeper.

Configuration file	Configuration item	Recommended value	Description

Configuration file	Configuration item	Recommended value	Description
core-site.xml	hadoop.zk.address	<zk1-host>:<zk1-port>,<zk2-host>:<zk2-port>,<zk3-host>:<zk3-port>	The endpoint of ZooKeeper, which is used to store the leader election information and information about and the status of applications. If you want to configure multiple endpoints, separate them with commas (,).

Functionalities

RM HA

RM HA allows you to start multiple ResourceManager processes on different nodes. However, only one active ResourceManager is elected to record and synchronize the basic information about and status of applications to ZooKeeper. If an error occurs on the active ResourceManager or the node on which the active ResourceManager resides, standby ResourceManagers can elect a new active ResourceManager based on the distributed locking mechanism of ZooKeeper. The new active ResourceManager restores the information about and the status of all applications from ZooKeeper and continues to provide resource management and scheduling services. This way, SPOFs can be prevented.

The following table describes the configuration items that are related to RM HA.

Configuration file	Configuration item	Recommended value	Description

Configuration file	Configuration item	Recommended value	Description
yarn-site.xml	yarn.resourcemanager.ha.enabled	true	Specifies whether to enable the RM HA feature. Default value: false.
	yarn.resourcemanager.ha.automatic-failover.enabled	true or left empty	Specifies whether to enable automatic failover. Default value: true.
	yarn.resourcemanager.ha.automatic-failover.embedded	true or left empty	Specifies whether to use the embedded leader elector to elect the active ResourceManager. Default value: true.
	yarn.resourcemanager.ha.curator-leader-elector.enabled	true	Specifies whether to use non-curator components. Default value: false.
	yarn.resourcemanager.ha.automatic-failover.zk-base-path	Left empty	The root directory in which the leader elector of ZooKeeper is stored. Default value: /yarn-leader-election.
	yarn.resourcemanager.ha.rm-ids	rm1,rm2,rm3	The IDs of multiple ResourceManagers. Separate multiple ResourceManager IDs with commas (,).
	yarn.resourcemanager.cluster-id	<cluster-id>	The ID of the cluster. The storage path of RM HA relies on this configuration item.
	yarn.resourcemanager.hostname.<rm-id>	Left empty	The hostname of a ResourceManager. This configuration is a ResourceManager instance-level configuration. Multiple ResourceManager instances exist.
	yarn.resourcemanager.address.<rm-id>	Left empty	The remote procedure call (RPC) address that is used by the YARN client to submit jobs. This configuration is a ResourceManager instance-level configuration. Multiple ResourceManager instances exist.
	yarn.resourcemanager.scheduler.address.<rm-id>	Left empty	The RPC address from which ApplicationMasters request resources. This configuration is a ResourceManager instance-level configuration. Multiple ResourceManager instances exist.
	yarn.resourcemanager.resource-tracker.address.<rm-id>	Left empty	The RPC address that is used by a NodeManager to report the status of resources and containers. This configuration is a ResourceManager instance-level configuration. Multiple ResourceManager instances exist.
	yarn.resourcemanager.admin.address.<rm-id>	Left empty	The RPC address to which the Admin command is submitted. This configuration is a ResourceManager instance-level configuration. Multiple ResourceManager instances exist.
	yarn.resourcemanager.webapp.address.<rm-id>	Left empty	The HTTP address that is used to access the web UI of ResourceManager. This configuration is a ResourceManager instance-level configuration. Multiple ResourceManager instances exist.
	yarn.resourcemanager.webapp.https.address.<rm-id>	Left empty	The HTTPS address that is used to access the web UI of ResourceManager. This configuration item is available only if you set the `yarn.http.policy` parameter to HTTPS_ONLY. This configuration is a ResourceManager instance-level configuration. Multiple ResourceManager instances exist.

ResourceManager restarts

The following types of restarts are available for ResourceManagers:

Non-work-preserving restart: After a ResourceManager is restarted, all running applications are resubmitted based on the restored information about and the status of applications. This type of ResourceManager restart has significant impacts on applications. All running applications are stopped and then resubmitted for running.
Work-preserving restart: After a ResourceManager is restarted, the ResourceManager takes over all running applications based on the restored information about and the status of applications. This minimizes the impacts on the running applications.

The following table describes the configurations of the work-preserving restart of ResourceManagers.

Configuration file	Configuration item	Recommended value	Description

Configuration file	Configuration item	Recommended value	Description
yarn-site.xml	yarn.resourcemanager.recovery.enabled	true	Specifies whether to enable the ResourceManager restart feature. Default value: false.
	yarn.resourcemanager.work-preserving-recovery.enabled	true or left empty	Specifies whether to enable work-preserving restart of ResourceManagers. Default value: true.
	yarn.resourcemanager.store.class	org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore	The state-store implementation class. Only ZooKeeper supports the RM HA feature.
	yarn.resourcemanager.zk-state-store.parent-path	Left empty	The ZooKeeper path where the information about and the status of applications maintained by a ResourceManager are stored. Default value: /rmstore.

NodeManager restarts

NodeManager restarts ensure that containers that are running on nodes continue to run after a NodeManager is restarted and recovered within a short period of time. You do not need to stop and then rerun the running containers. This ensures that applications are not interrupted when a NodeManager is restarted.

The following table describes the configuration items of NodeManager restarts.

Configuration file	Configuration item	Recommended value	Description

Configuration file	Configuration item	Recommended value	Description
yarn-site.xml	yarn.nodemanager.recovery.enabled	true	Specifies whether to enable the NodeManager restart feature. Default value: false.
	yarn.nodemanager.recovery.dir	/home/hadoop/yarn-nm-recovery	The local directory in which container information and status data are stored. Default value: `${hadoop.tmp.dir}/yarn-nm-recovery`. We recommend that you use a system disk directory other than /tmp and ensure that the hadoop user has read and write permissions. This prevents NodeManagers from being affected by data loss in the /tmp directory or the processing of damaged data disks. The `/home/hadoop/yarn-nm-recovery` directory is recommended.
	yarn.nodemanager.recovery.supervised	true	Specifies whether to retain local data when a NodeManager exits. If you set this configuration item to true, the local data can be restored after a NodeManager restart. Default value: false.
	yarn.nodemanager.address	`${yarn.nodemanager.hostname}:8041` or `0.0.0.0:8041`	The RPC address of a NodeManager, which can be used as the ID of the NodeManager. If the port number is not set to 0, a fixed port is used. In this case, you do not need to change the value of this configuration item. If the port number is set to 0, a random port is used. In this case, a NodeManager restart is invalid because the NodeManager ID changes after the NodeManager restart. We recommend that you use a fixed port.