About Abnormal Cluster Loads or Status

Dive into expert troubleshooting tips for managing your Alibaba Cloud Elasticsearch cluster, addressing common issues, and shard allocation for optimized performance.

FAQ About Abnormal Cluster Loads or Status

The CPU utilization and loads of some nodes in an Elasticsearch cluster are normal, whereas other nodes are in the idle state. What do I do?

This issue is caused by unbalanced loads on the cluster. Unbalanced loads may be caused by several reasons, which include inappropriate shard settings, uneven segment sizes, unseparated hot and cold data, and persistent connections that are used for Service Load Balancer (SLB) instances and multi-zone architecture. Resolve the issue based on the actual scenario. For more information, see Unbalanced loads on a cluster.

What do I do if an Elasticsearch cluster is in a state indicated by the color yellow?

● Cause

If the number of replica shards that you specify for an index is greater than the number of nodes minus 1, the cluster enters a state indicated by the color yellow.

● Solutions

Run the command to query the distribution of shards for indexes and identify the index that is in a state indicated by the color yellow. Then, change the number of replica shards for the index to 0. After the cluster recovers to a normal state, change the number of replica shards for the index from 0 to the original setting.

PUT test/_settings
{
  "index" : {
    "number_of_replicas":"0"
  }
}

What do I do if an Elasticsearch cluster is in a state indicated by the color red due to heavy loads?

If an error occurs on the node on which primary shards are distributed, the cluster enters a state indicated by the color red. You can run the GET /_cat/indices?v command to query the distribution of shards for indexes and identify the index that is in a state indicated by the color red. Then, troubleshoot the issue based on the causes and solutions described in the following table.

Cause	Solution
The resources of the cluster are insufficient due to unbalanced loads on nodes.	Change the total number of primary and replica shards to an integral multiple of the number of data nodes in the cluster to balance loads on nodes. For more information, see [What do I do if shards are not evenly distributed on nodes in an Elasticsearch cluster?](https://www.alibabacloud.com/help/en/es/support/faq-about-alibaba-cloud-elasticsearch-clusters?spm=a2c63.p38356.0.0.40777c42mIWWwd#section-ceh-v6l-poq)
The cluster stores invalid indexes.	Clear invalid indexes on a regular basis, such as monitoring indexes whose names start with .monitoring. For more information about how to configure monitoring indexes, see [Configure monitoring indexes.](https://www.alibabacloud.com/help/en/es/user-guide/configure-monitoring-indexes#task-2458007)
Shards are not allocated to nodes.	Run the command to query the reason why shards are not allocated to nodes and resolve the issue based on the actual situation. After the issue is resolved, run the command to reallocate shards to nodes.
The cache occupies a large amount of resources.	Run the command to clear the cache.
A cluster update operation such as configuration upgrade is being performed on the cluster.	Pause the update operation and select Forced Update on the Upgrade/Downgrade page to forcefully update the cluster. For more information, see [Upgrade the configuration of a cluster](https://www.alibabacloud.com/help/en/es/user-guide/upgrade-the-configuration-of-a-cluster#task-2446863).
The resources of the cluster are insufficient because the cluster uses low specifications such as 1 vCPU and 2 GiB of memory or 2 vCPUs and 4 GiB of memory.	Upgrade the configuration of the cluster. For more information, see [Upgrade the configuration of a cluster.](https://www.alibabacloud.com/help/en/es/user-guide/upgrade-the-configuration-of-a-cluster#task-2446863)
The disk usage exceeds 85%.	We recommend that you delete the historical data you no longer require or that you expand the capacity of disks. For more information, see [High disk usage and read-only indexes.](https://www.alibabacloud.com/help/en/es/support/high-disk-usage-and-read-only-indexes#concept-2415081)

Monitoring data or an alert shows that the CPU utilization of my Elasticsearch cluster is excessively high. What do I do?

Troubleshoot the issue based on the causes and solutions described in the following table.

Cause	Solution
The number of queries or write requests per second spikes.	Reduce the number of queries or write requests per second for the cluster, reduce the amount of data to write to the cluster in parallel, or scale out or up the cluster. We recommend that you perform stress testing in the production environment and select appropriate specifications.
The cache for indexes occupies a large amount of resources.	Run the command to clear the cache.
The cluster uses low specifications.	Upgrade the configuration of the cluster. For more information, see [Upgrade the configuration of a cluster](https://www.alibabacloud.com/help/en/es/user-guide/upgrade-the-configuration-of-a-cluster#task-2446863).
Loads on nodes in the cluster are unbalanced.	Change the total number of primary and replica shards to an integral multiple of the number of data nodes in the cluster to balance loads on nodes. For more information, see [What do I do if shards are not evenly distributed on nodes in an Elasticsearch cluster?](https://www.alibabacloud.com/help/en/es/support/faq-about-alibaba-cloud-elasticsearch-clusters?spm=a2c63.p38356.0.0.40777c42mIWWwd#section-ceh-v6l-poq)

What do I do if the disk usage of my Elasticsearch cluster is excessively high?

Run the command to delete invalid indexes. After the disk usage is lower than 75%, forcefully upgrade the configuration of disks in the Elasticsearch console. For more information, see Upgrade the configuration of a cluster. If the disk usage of a node is excessively high, you must optimize the configuration of shards. For more information, see What do I do if shards are not evenly distributed on nodes in an Elasticsearch cluster?

Monitoring data or an alert shows that the memory usage of my Elasticsearch cluster is excessively high. What do I do?

Troubleshoot the issue based on the causes and solutions described in the following table.

Cause	Solution
The cache for the cluster occupies a large amount of memory.	If the cache for the cluster occupies a large amount of memory for a short period of time, run the command to clear the cache. If the cache for the cluster occupies a large amount of memory for a long period of time, upgrade the configuration of the cluster. For more information, see [Upgrade the configuration of a cluster](https://www.alibabacloud.com/help/en/es/user-guide/upgrade-the-configuration-of-a-cluster#task-2446863). The memory usage of the cluster may periodically increase but no alert is generated, which may be caused by business fluctuations or memory reclaim of the cluster. This is a normal phenomenon.
The read or write throughput of the cluster is high.	Stop the read or write operation, install a throttling plug-in, and then enable the throttling feature of the plug-in. For more information, see [Use the aliyun-qos plug-in](https://www.alibabacloud.com/help/en/es/user-guide/use-the-aliyun-qos-plug-in#task-2438917).
Invalid indexes occupy a large amount of memory.	Delete invalid indexes such as monitoring indexes whose names start with .monitoring to release resources. You can specify a retention duration for such indexes. For more information, see [Configure monitoring indexes](https://www.alibabacloud.com/help/en/es/user-guide/configure-monitoring-indexes#task-2458007).
Shards are not evenly distributed on nodes, and loads on nodes are unbalanced.	Change the total number of primary and replica shards to an integral multiple of the number of data nodes in the cluster and make sure that shards are evenly distributed on nodes to balance loads on the nodes. For more information, see [What do I do if shards are not evenly distributed on nodes in an Elasticsearch cluster?](https://www.alibabacloud.com/help/en/es/support/faq-about-alibaba-cloud-elasticsearch-clusters?spm=a2c63.p38356.0.0.40777c42mIWWwd#section-ceh-v6l-poq)
Abnormal queries exist. For example, a user on the business side sends a query request that contains a string with numerous special characters.	Run the command to obtain the ID of the time-consuming query task and run the command to obtain the detailed query statement and save and analyze the statement. You can also call the [task cancel API,](https://www.elastic.co/guide/en/elasticsearch/reference/7.10/tasks.html#task-cancellation) restart the cluster, or restart only heavily loaded nodes in the cluster to quickly cancel the query.

What do I do if shards are not evenly distributed on nodes in an Elasticsearch cluster?

Appropriately plan shards and reallocate shards for nodes. Make sure that the total number of primary and replica shards is an integral multiple of the number of data nodes in the cluster. This ensures that data is evenly distributed on each data node and prevents heavy loads on a node due to uneven shard distribution. The following descriptions provide examples on how to allocate primary and replica shards for nodes:

● If the cluster has three data nodes, you can configure three primary shards and one replica shard for each primary shard. The total number of primary and replica shards that you can configure is six.

● If the cluster has eight data nodes, you can configure four primary shards and one replica shard for each primary shard. The total number of primary and replica shards that you can configure is eight. Alternatively, you can configure eight primary shards and one replica shard for each primary shard. In this case, the total number of primary and replica shards that you can configure is 16.

My Elasticsearch cluster is heavily loaded, and the cluster logs contain the following error message: java.lang.StackOverflowError for the entire cluster. What do I do?

The error message indicates that a stack overflow error occurs because the amount of data written to the stack by using Lucene exceeds the upper limit. This issue is related to regular expression-based queries and fuzzy match. This issue is fixed in Elasticsearch V6.0 and later. We recommend that you upgrade the configuration of the cluster at the earliest opportunity or optimize the query statement that you use. For more information, see java.lang.StackOverflowError for the entire cluster.

How do I query the size of the JVM heap memory that is allocated to an Elasticsearch cluster?

Run the command. By default, the Java Virtual Machine (JVM) heap memory of an Elasticsearch cluster is half of the memory of the cluster. You cannot change the size of the JVM heap memory of an Elasticsearch cluster.

Alibaba Cloud Elasticsearch not only provides robust features for managing complex data workloads but also offers a user-friendly interface and seamless scalability. With our 30 Day Free Trial, you can explore these capabilities firsthand:

Embark on Your 30-Day Free Trial

Experience Alibaba Cloud Elasticsearch today and transform your data management journey with precision, efficiency, and peace of mind.

Community

About Abnormal Cluster Loads or Status

FAQ About Abnormal Cluster Loads or Status

Read previous post:

Read next post:

Data Geek

You may also like

Comments

Data Geek

Related Products

Resource Management

Big Data Consulting for Data Technology Solution

Big Data Consulting Services for Retail Solution

Cloud Shell