This topic describes the details, possible causes, and solutions for an error that occurs when you perform a cluster restart or update.

Problem description

When you perform a restart or update for an Elasticsearch cluster, the system reports the following error message:

The operation cannot be performed because the cluster is unhealthy or contains indexes in the close state. We recommend that you perform the operation again after the cluster becomes healthy or the indexes are enabled.

Causes and solutions

The system reports the preceding error message if the cluster meets one or more of the following conditions:
  • The cluster contains indexes in the close state.

    You can run the GET /_cat/indices?v command to view the status of indexes. If an index is in the close state, you can run the POST /<index_name>/_open command to set the state of the index to open.

  • The cluster is in the state indicated by the color red or yellow.
    You can run the GET /_cat/health?v command to view the status of the cluster. The following table describes the common causes and solutions.
    CauseSolution
    CauseSolution
    Shards are automatically allocated, and the maximum number of retries is reached. A maximum of 5 retries are allowed.We recommend that you run the POST /_cluster/reroute?retry_failed=true command to reallocate shards.
    The primary and replica shards of an index are allocated to the same node, which is indicated by the following error message: the shard cannot be allocated to the same node on which a copy of the shard already exists.We recommend that you set the number of replica shards to 0 and set it back to 1 after the cluster becomes normal.
    The maximum number of shards that can be simultaneously allocated is reached.Wait until shards are allocated. If shards are still not allocated after a period of time, you can run the GET _cluster/allocation/explain command to view the reason why shards are not allocated.
    One or more nodes in the cluster are disconnected.Run the GET _cat/nodes?v command to check whether one or more nodes in the cluster are disconnected. We recommend that you restart the nodes that are disconnected.
    The disk usage of a node in the cluster is high.After the disk usage of the node drops below 85%, we recommend that you restart the node to make the diagnostic results normal.
    The heap memory usage of the cluster is high, and the operation is suspended.We recommend that you perform throttling and set the states of historical indexes to close to reduce memory consumption.
    Other causesIf the cluster contains shards that are not allocated, you can view the CPU utilization and heap memory usage of the cluster and run the GET _cluster/allocation/explain command to obtain the reason why shards are not allocated.
  • The cluster is in a normal state but is heavily loaded.
    The following table describes the common troubleshooting methods, causes, and solutions.
    Troubleshooting methodCauseSolution
    Troubleshooting methodCauseSolution
    • View the monitoring data for disk usage.
    • Run the GET _cat/allocation command.
    • Run the GET _cluster/allocation/explain command.
    • View logs.
    The disk usage reaches 85%.If the disk usage reaches 85%, shard creation may be affected. We recommend that you perform one or more of the following operations to resolve this issue. After you perform the operations, you can view the monitoring data for the disk usage to check whether the disk usage drops below 85%.
    • Delete historical indexes.
    • Expand disks.
    • Set the number of replica shards to 0.
    View the monitoring data for CPU utilization and the information about hot threads.The CPU utilization reaches 85%.If the CPU utilization reaches 85%, cluster stability may be affected. You can view the monitoring data for read QPS and write QPS and reduce traffic, scale out the cluster, or upgrade the configuration of the cluster.
    View the monitoring data for heap memory usage, logs, and the monitoring data of the old gc collection count and old gc collecting.ms metrics.The heap memory usage is greater than or equal to 75%.If the heap memory usage is excessively high, cluster stability may be affected. We recommend that you perform one or more of the following operations to resolve the issue:
    • Reduce read and write traffic.
    • Upgrade the configuration of the cluster.
    • Set the states of historical indexes to close to reduce memory consumption.
    View the monitoring data of the NodeLoad_1m(value) metric.The value of the NodeLoad_1m(value) metric for a node is greater than the number of vCPUs for the node.If the value of the NodeLoad_1m(value) metric for a node is greater than the number of vCPUs for the node, the node is heavily loaded. You can view the monitoring data for read QPS, write QPS, and disk throughput and reduce read or write traffic, scale out the cluster, or upgrade the configuration of the cluster at the earliest opportunity.
    Note