Elasticsearch clusters are dynamic systems that require close monitoring, especially concerning disk usage. Exceeding the designated watermark levels can prompt errors and disrupt the normal operations of an Elasticsearch instance. In this article, we will navigate through resolving high disk usage demonstrated on Alibaba Cloud Elasticsearch, which provides robust features to efficiently handle such scenarios.
The Elasticsearch cluster uses three watermark levels to safeguard against excessive disk usage:
Property Name | Default Value | Description |
---|---|---|
cluster.routing.allocation.disk.watermark.low | 85% | Low watermark |
cluster.routing.allocation.disk.watermark.high | 90% | High watermark |
cluster.routing.allocation.disk.watermark.flood_stage | 95% | Flood stage watermark |
When disk usage reaches the flood stage watermark (95%), Elasticsearch prevents any further writes to the disk to avoid a situation where the disk is completely full. This invokes a read-only mode for indices on the node.
To begin resolving issues, it's vital first to analyze our cluster's state by checking shard allocation:
GET _cat/shards?v=true
If shards are still allocated to a node nearing full capacity, further investigation is needed using the allocation explain API:
GET _cluster/allocation/explain
{
"index": "my-index",
"shard": 0,
"primary": false,
"current_node": "my-node"
}
To resume writing immediately, you might consider temporarily increasing the watermark levels and removing write blocks:
PUT _cluster/settings
{
"persistent": {
"cluster.routing.allocation.disk.watermark.low": "90%",
"cluster.routing.allocation.disk.watermark.high": "95%",
"cluster.routing.allocation.disk.watermark.flood_stage": "97%"
}
}
And to remove the read-only block:
PUT */_settings?expand_wildcards=all
{
"index.blocks.read_only_allow_delete": null
}
It's crucial to note that while the immediate response can quickly restore functionality, it is an interim solution.
For a sustainable resolution, consider adding nodes to your cluster or expanding existing nodes' disks. For instance, if a data_hot node is overwhelmed, you might:
Deleting older, unnecessary indices can also free up disk space:
DELETE my-index
Once you've implemented a long-term solution, you can reset the disk watermark levels:
PUT _cluster/settings
{
"persistent": {
"cluster.routing.allocation.disk.watermark.low": null,
"cluster.routing.allocation.disk.watermark.high": null,
"cluster.routing.allocation.disk.watermark.flood_stage": null
}
}
Avoiding disk space crunches requires proactive measures, including regular monitoring and alerting for disk usage, and thoughtful planning with total_shards_per_node settings to balance shard distribution among nodes.
We invite users to share their experiences and solutions regarding disk space issues in Elasticsearch.
Alibaba Cloud’s Elasticsearch offers a comprehensive environment to implement and manage your indexes effectively, handling high-volume data with ease. Beginning your data journey is straightforward with our tailored cloud solutions providing everything necessary to transform your data architecture.
Embark on Your 30-Day Free Trial with Alibaba Cloud Elasticsearch and elevate your data operations today.
Optimizing High CPU Usage for Elasticsearch on Alibaba Cloud
Data Geek - August 6, 2024
Data Geek - August 7, 2024
Data Geek - September 26, 2024
Alibaba Clouder - January 4, 2021
Alibaba Clouder - January 29, 2021
Alibaba Cloud Product Launch - December 12, 2018
Alibaba Cloud Elasticsearch helps users easy to build AI-powered search applications seamlessly integrated with large language models, and featuring for the enterprise: robust access control, security monitoring, and automatic updates.
Learn MoreCloud Backup is an easy-to-use and cost-effective online data management service.
Learn MoreAlibaba Cloud provides products and services to help you properly plan and execute data backup, massive data archiving, and storage-level disaster recovery.
Learn MoreProtect, backup, and restore your data assets on the cloud with Alibaba Cloud database services.
Learn MoreMore Posts by Data Geek