Cluster restart or update error - Elasticsearch - Alibaba Cloud Documentation Center

0.0.201

This topic describes the details, possible causes, and solutions for an error that occurs when you perform a cluster restart or update.

Problem description

When you perform a restart or update for an Elasticsearch cluster, the system reports the following error message:

The operation cannot be performed because the cluster is unhealthy or contains indexes in the close state. We recommend that you perform the operation again after the cluster becomes healthy or the indexes are enabled.

Causes and solutions

The system reports the preceding error message if the cluster meets one or more of the following conditions:

The cluster contains indexes in the close state.
You can run the GET /_cat/indices?v command to view the status of indexes. If an index is in the close state, you can run the POST /<index_name>/_open command to set the state of the index to open.

The cluster is in the state indicated by the color red or yellow.

You can run the GET /_cat/health?v command to view the status of the cluster. The following table describes the common causes and solutions.

Cause	Solution


Cause	Solution
Shards are automatically allocated, and the maximum number of retries is reached. A maximum of 5 retries are allowed.	We recommend that you run the `POST /_cluster/reroute?retry_failed=true` command to reallocate shards.
The primary and replica shards of an index are allocated to the same node, which is indicated by the following error message: the shard cannot be allocated to the same node on which a copy of the shard already exists.	We recommend that you set the number of replica shards to 0 and set it back to 1 after the cluster becomes normal.
The maximum number of shards that can be simultaneously allocated is reached.	Wait until shards are allocated. If shards are still not allocated after a period of time, you can run the `GET _cluster/allocation/explain` command to view the reason why shards are not allocated.
One or more nodes in the cluster are disconnected.	Run the `GET _cat/nodes?v` command to check whether one or more nodes in the cluster are disconnected. We recommend that you restart the nodes that are disconnected.
The disk usage of a node in the cluster is high.	After the disk usage of the node drops below 85%, we recommend that you restart the node to make the diagnostic results normal.
The heap memory usage of the cluster is high, and the operation is suspended.	We recommend that you perform throttling and set the states of historical indexes to close to reduce memory consumption.
Other causes	If the cluster contains shards that are not allocated, you can view the CPU utilization and heap memory usage of the cluster and run the `GET _cluster/allocation/explain` command to obtain the reason why shards are not allocated.

The cluster is in a normal state but is heavily loaded.

The following table describes the common troubleshooting methods, causes, and solutions.

Troubleshooting method	Cause	Solution


Troubleshooting method	Cause	Solution
View the monitoring data for disk usage. Run the `GET _cat/allocation` command. Run the `GET _cluster/allocation/explain` command. View logs.	The disk usage reaches 85%.	If the disk usage reaches 85%, shard creation may be affected. We recommend that you perform one or more of the following operations to resolve this issue. After you perform the operations, you can view the monitoring data for the disk usage to check whether the disk usage drops below 85%. Delete historical indexes. Expand disks. Set the number of replica shards to 0.
View the monitoring data for CPU utilization and the information about hot threads.	The CPU utilization reaches 85%.	If the CPU utilization reaches 85%, cluster stability may be affected. You can view the monitoring data for read QPS and write QPS and reduce traffic, scale out the cluster, or upgrade the configuration of the cluster.
View the monitoring data for heap memory usage, logs, and the monitoring data of the old gc collection count and old gc collecting.ms metrics.	The heap memory usage is greater than or equal to 75%.	If the heap memory usage is excessively high, cluster stability may be affected. We recommend that you perform one or more of the following operations to resolve the issue: Reduce read and write traffic. Upgrade the configuration of the cluster. Set the states of historical indexes to close to reduce memory consumption.
View the monitoring data of the NodeLoad_1m(value) metric.	The value of the NodeLoad_1m(value) metric for a node is greater than the number of vCPUs for the node.	If the value of the NodeLoad_1m(value) metric for a node is greater than the number of vCPUs for the node, the node is heavily loaded. You can view the monitoring data for read QPS, write QPS, and disk throughput and reduce read or write traffic, scale out the cluster, or upgrade the configuration of the cluster at the earliest opportunity.

Note

For more information about metrics, see Metrics and exception handling suggestions.
For more information about logs, see Query logs.

Feedback

Previous: FAQ about common errorsNext: Unbalanced loads on a cluster

On this page （1, T）

Problem description

Causes and solutions

About Alibaba Cloud

Our Global Network

Quick Start

Global Offices

Olympic Games Paris 2024 New

Stade Roland Garros – Glitz from the Past New

Place de la Concorde – “Breaking” the Barriers New

Vaires-sur-Marne Nautical Stadium – Sports with Sustainability New

International Broadcast Center – Images, Sounds, and Data that Captivate Billions New

Customer Success Stories New

Trust Center

Security & Compliance Center

Cloud Compliance Resources

Security Compliance FAQs

Product & Feature Update New

Cloud Forward

Press Room

Alibaba Cloud e-Magazine New

Alibaba Cloud in Analyst Research

Notice

Go Global Service New

Go Global Alliance with Alibaba Cloud

China Gateway Hot

Information Compliance

China Gateway - MLPS 2.0 Compliance New

China Gateway - Networking

China Gateway - Global Application Acceleration New

China Gateway - Security

China Gateway - Data Security New

ICP Support Hot

China Gateway - Omnichannel Data Mid-End New

China Gateway - Organizational Data Mid-End New

China Gateway - Business Mid-End New

China Gateway - AI Service for Conversational Chatbots New

China Gateway - Online Education

China Gateway - Domain Registration

Work at Alibaba Cloud

Experienced Professionals

Students and Graduates

Free Trial

Pricing

Promo Center

Price Reduction

Pay Less and Deploy More

FinOps

Elastic Compute Service (ECS)

Simple Application Server (SAS)

Elastic GPU Service

Elastic Desktop Service (EDS)

Object Storage Service (OSS)

Cloud Enterprise Network (CEN)

Web Application Firewall (WAF)

Domain Names

Container Compute Service (ACS)

Secure Access Service Edge (SASE)

Intelligent Media Services(IMS)

Edge Security Acceleration (ESA)(Original DCDN)

Intelligent Media Management

DingTalk Enterprise

YiDA

Alibaba Cloud Model Studio

Apsara Prime - For Easy Cloud Product Selection

Alibaba Cloud ECS - Cater All Your Cloud Hosting Needs

1TB CDN—Get Free 1 TB Outbound Traffic Plan Now

Security—Under Attack? Get Free Security Support

Short Message Service - Free Testing is Available

Elastic Compute Service (ECS) Hot

CloudBox

Compute Nest

Dedicated Host Hot

ECS Bare Metal Instance

Elastic GPU Service Featured

Simple Application Server (SAS) Hot

Auto Scaling

Cloud Phone Beta

Elastic Desktop Service (EDS) Featured

Batch Compute

Elastic High Performance Computing (E-HPC)

Super Computing Cluster (SCC)

Function Compute (FC)