When your Alibaba Cloud Elasticsearch cluster's disk usage exceeds 85%, the cluster or Kibana may stop providing services. This troubleshooting guide helps you quickly restore write access and prevent recurrence.
Disclaimer: This topic may contain information about third-party products. Such information is only for reference. Alibaba Cloud does not make any guarantee, express or implied, with respect to the performance and reliability of third-party products, as well as potential impacts of operations on the products.
Symptoms
-
Error message:
FORBIDDEN/12/index read-only / allow delete (api) -
Cluster health status shows red. In severe cases, some nodes do not join the cluster. Run
GET /_cat/nodes?vto view the nodes in the cluster. Some shards are unassigned. RunGET /_cat/allocation?vto view shard allocation.NoteIf cluster health status is red, the primary shards are unavailable, and data may be lost.
-
Kibana returns
internal server errorwhen creating ingest pipelines or enrolling Beats -
Cluster or Kibana monitoring shows disk usage approaching 100%
Root cause
Disk usage thresholds
Elasticsearch monitors disk usage and takes automatic actions at three watermark levels:
-
85% (low watermark): The system stops allocating new shards to this node
-
90% (high watermark): The system relocates shards to nodes with lower disk usage
-
95% (flood stage): The system forces all indexes to read-only by adding the
read_only_allow_deleteattribute. Write operations fail.
Quick fix (10-15 minutes)
-
Delete old or unused indexes to free disk space:
WarningDeleted data cannot be restored. To preserve data, consider increasing storage capacity instead.
curl -u <username>:<password> -XDELETE http://<host>:<port>/<index-name>-
<host>- Your cluster's internal or public endpoint. Configure the access allowlist before running this command. -
If the cluster is unresponsive, trigger a forced restart and run this command during restart
-
-
Even after freeing disk space, indexes may remain read-only. Remove the lock by setting the
index.blocks.read_only_allow_deleteattribute tonull:PUT /_all/_settings { "index.blocks.read_only_allow_delete": null } -
Verify cluster health status. If status is red, run
GET /_cat/allocation?vto check for unassigned shards. -
If the cluster has unassigned shards, run
GET /_cluster/allocation/explainto determine the cause. If the output indicates exhausted allocation retries (similar to below), runPOST /_cluster/reroute?retry_failed=true.
-
If cluster health status remains red after all recovery steps, contact Alibaba Cloud technical support.
Prevention
To avoid future incidents, enable disk usage monitoring and set up alerts when disk usage exceeds 80%. Configure alert channels to ensure alerts reach the operations team promptly. For setup instructions, see Configure monitoring and alerting.