This topic describes how to use the index lifecycle management (ILM) feature to separate hot data and cold data in an Alibaba Cloud Elasticsearch cluster. The separation enables the hot-warm architecture. This architecture improves the read/write performance of the cluster, automates the maintenance of hot data and cold data, and reduces production costs.
Background information
Phase | Description |
---|---|
hot | If an index is in this phase, time series data can be written to the index in real time and can be rolled over based on the number of documents in the index, the volume of data stored in the index, and the duration of the index. The data is rolled over by using the rollover API. |
warm | If an index is in this phase, data is no longer written to the index, and only data queries can be performed on the index. |
cold | If an index is in this phase, the index is no longer updated, few queries are performed on the index, and the query process slows down. |
delete | If an index is in this phase, the index will be deleted. |
You can use one of the following methods to attach an ILM policy to one or more indexes:
- Attach an ILM policy to an index template: If you use this method, the ILM policy takes effect on all indexes that have the same alias. In this topic, this method is used.
- Attach an ILM policy to a single index: If you use this method, the ILM policy takes effect only on the current index. The new index generated during a rollover is not affected by the ILM policy.
You can use the ILM feature for time series data, cold data, and hot data to significantly
reduce data storage costs. This topic provides an example on how to use the ILM feature
for cold data and hot data. The following descriptions provide the business scenario:
- Write data to an index in an Elasticsearch cluster in real time. When the volume of data in the index reaches a specific level, the system rolls over excess data to a new index.
- The original index stays in the hot phase for 30 minutes and enters the warm phase.
- In the warm phase, the system shrinks the original index and merges the segments in the index. The index enters the cold phase 1 hour after the rollover starts.
- In the cold phase, the system migrates the index from hot nodes to warm nodes to separate hot data and cold data. The index is deleted 2 hours after the rollover starts.
Precautions
- You must configure ILM policies based on your business model. For example, we recommend that you configure different aliases and ILM policies for indexes with different structures. This facilitates index management.
- If you want to use the rollover feature, the name of an initial index must end with an auto-increment six-digit number, such as -000001. Otherwise, ILM policies cannot take effect. For example, an initial index is named myindex-000001. During a rollover, a new index named myindex-000002 is generated. If the names of your indexes do not meet the preceding requirements, we recommend that you reindex the data in the indexes.
- For indexes in the hot phase, the system writes data to the indexes. To ensure that data is written in chronological order, we recommend that you do not write data to indexes in the warm or cold phase. For example, for the warm phase, set actions to shrink or read only. This way, indexes are read only after they enter the warm phase.
Procedure
Step 1: Create an Alibaba Cloud Elasticsearch cluster that uses the hot-warm architecture and view the hot or warm attribute of nodes in the cluster
An Elasticsearch cluster that uses the hot-warm architecture contains both hot nodes
and warm nodes. This architecture improves the performance and stability of Elasticsearch
clusters. The following table describes the differences between hot nodes and warm
nodes.
Node type | Type of data stored | Read and write performance | Specifications | Disk |
---|---|---|---|---|
Hot node | Recent data, such as log data over the last two days. | High | High, such as 32 vCPUs and 64 GiB of memory | We recommend that you use a standard SSD. You can specify the storage space based on the volume of data. |
Warm node | Historical data, such as log data before the last two days. | Low | Low, such as 8 vCPUs and 32 GiB of memory |
We recommend that you use an ultra disk. You can specify the storage space based on the volume of data. |
Step 2: Configure an ILM policy for indexes
Step 3: Verify data distribution
Step 4: Update the ILM policy
Step 5: Switch the ILM policy
FAQ
Q: How do I configure a check interval for an ILM policy?
A: The system periodically checks for indexes that match an ILM policy. The default
interval is 10 minutes. If matched indexes are detected, the system rolls over data
for the indexes. For example, you set max_docs to 1000 when you create an ILM policy. In this case, if the system detects that the number of documents in an index reaches
1,000 during a check, the system triggers a rollover for the index. You can configure
the indices.lifecycle.poll_interval parameter to change the check interval. This ensures that data is rolled over for
indexes in a timely manner.
Notice Set this parameter to an appropriate value. A small value may cause node overload.
In this example, this parameter is set to 1m.
PUT _cluster/settings
{
"transient": {
"indices.lifecycle.poll_interval":"1m"
}
}