View the health status of nodes - E-MapReduce - Alibaba Cloud Documentation Center

You can check whether a node is run as expected based on the health status of the node. The health status is formed based on the check results of multiple health check items. This topic describes how to view the health status of a node and related health check items.

Prerequisites

An E-MapReduce (EMR) cluster is created. For more information, see Create a cluster.

Limits

This topic is applicable only to DataLake, Dataflow, online analytical processing (OLAP), DataServing, and custom clusters.

View the latest health status of nodes

Go to the Nodes tab.
1. Log on to the EMR console. In the left-side navigation pane, click EMR on ECS.
2. In the top navigation bar, select the region in which your cluster resides and select a resource group based on your business requirements.
3. On the EMR on ECS page, find the desired cluster and click Nodes in the Actions column.

On the Nodes tab, view the health status of nodes in each node group.

Green number in the Health Status column: indicates the number of nodes in the Good state in the current node group.
Yellow number in the Health Status column: indicates the number of nodes in the Warning state in the current node group.
Red number in the Health Status column: indicates the number of nodes in the Abnormal state in the current node group.
Gray number in the Health Status column: indicates the number of nodes in the Unknown state and nodes in the Stateless state in the current node group.

On the Nodes tab, click the icon on the left of the name of a node group. In the node list that appears, you can view the health status of each node in the Health Status column.

A node may be in the following states: Good, Warning, Abnormal, Unknown, and Stateless. Different states are indicated by different icons.

Icon	Health status	Description
	Good	The node is run as expected.
	Warning	The node is run as expected, but hidden risks are detected based on the health check items of the node. You need to focus on the hidden risks.
	Abnormal	The node is unavailable. Serious issues are detected based on the health check items of the node. You must troubleshoot the issues at the earliest opportunity.
	Stateless	No health check is performed on the node after an installation process or a manual stop. You do not need to focus on nodes that are in this state.
	Unknown	The results of health check items of the node cannot be obtained. If no issue occurs in the business, you do not need to focus on nodes that are in this state.

View health check items of a node

On the Nodes tab, find the desired node group and click the icon on the left of the name of the node group.
Find the desired node and click View Check Items to the right of the health status in the Health Status column.

In the panel that appears, view the latest results of health check items and the health check history of the current node.

The following table describes the health check items. The value of each check item is indicated by u.

Name	Description	Threshold	Unit
status_alive	Checks whether the node status is normal.	None	-
host_fd_usage	Checks the usage of the file descriptor.	Warning: 95 ≤ u < 99 Abnormal: u ≥ 99	%
host_disk_fault	Checks whether a disk exception occurs on the underlying layer.	None	-
host_system_env	Checks the availability of important configuration files, Java, and Python.	None	-
host_service_env	Checks whether storage directories and package files on which the cluster services depend are available.	None	-
host_network_transmit_drop_rate	Checks the outbound packet loss rate during network transmission.	Warning: 1.0 ≤ u < 2.5 Abnormal: u ≥ 2.5	%
host_network_receive_error_rate	Checks the inbound packet error rate during network transmission.	Warning: 0.1 ≤ u < 0.5 Abnormal: u ≥ 0.5	%
host_disk_io_latency	Checks the average disk read/write latency.	Warning: 400 ≤ u < 800 Abnormal: u ≥ 800	ms
host_network_receive_error_rate	Checks the inbound packet loss rate during network transmission.	Warning: 1.0 ≤ u < 2.5 Abnormal: u ≥ 2.5	%
host_network_transmit_error_rate	Checks the outbound packet error rate during network transmission.	Warning: 0.1 ≤ u < 0.5 Abnormal: u ≥ 0.5	%
host_system_fault	Checks whether a system exception occurs on the underlying layer.	None	-
host_cpu_usage	Checks the CPU load of the node.	Warning: 95 ≤ u < 99 Abnormal: u ≥ 99	%
host_disk_inode_usage	Checks the index node (inode) usage of disks.	Warning: 90 ≤ u < 99 Abnormal: u ≥ 99	%
host_mem_usage	Checks the memory usage of the node.	Warning: 95 ≤ u < 99 Abnormal: u ≥ 99	%
host_disk_space_usage	Checks the disk usage.	Warning: 90 ≤ u < 99 Abnormal: u ≥ 99	%