If your business is in the off-peak hours of traffic or the volume of data stored in your cluster decreases, you can remove data nodes from your cluster to scale in the cluster. This topic describes how to remove data nodes from a cluster to scale in the cluster.
Prerequisites
The following operation is performed:
In the Kibana console of your cluster, check whether your cluster stores indexes in the close state. If your cluster stores such indexes, you must open the indexes. Otherwise, the upgrade fails.
Run the following command to view the statuses of indexes:
GET /_cat/indices?v
Run the following command to open an index in the close state:
POST /<index_name>/_open
Replace <index_name> with the name of the index in the close state.
Precautions
After you remove data nodes from a cluster, the system restarts the cluster. The time required for the restart varies based on the size, data volume, and load of your cluster. We recommend that you remove data nodes during off-peak hours.
If the indexes of your cluster have replica shards and the load of your cluster is normal, your cluster can still provide services during a restart. The load of a cluster is normal if the CPU utilization of each node in the cluster is about 60%, the heap memory usage of each node in the cluster is about 50%, and the value of NodeLoad_1m for each node is less than the number of vCPUs for the node.
To ensure the high availability of an Elasticsearch cluster, the replica shards and primary shards of an index in the cluster must be distributed on different nodes. For a single-zone cluster, an excessive number of shards may affect cluster changes. We recommend that you set the number of replica shards for a single-zone cluster to 1 or to a value that is one less than the number of data nodes after a cluster scale-in. Before you make a change to a multi-zone cluster, you must make sure that the number of replica shards of each index in the cluster is less than the number of zones in which the cluster is deployed. After the change is complete, you can manually increase the number of replica shards based on your business requirements. For more information about how to change the number of replica shards of indexes in a cluster, see Index Templates.
If the indexes of your cluster do not have replica shards, the load of the cluster is excessively high, and large amounts of data are written to or queried in your cluster, access to your cluster may time out when you remove data nodes from the cluster. Before you remove data nodes from your cluster, we recommend that you configure a retry mechanism for your client to reduce the impact on your business.
Remove data nodes
- Log on to the Alibaba Cloud Elasticsearch console.
- In the left-side navigation pane, click Elasticsearch Clusters.
- Navigate to the desired cluster.
- In the top navigation bar, select the resource group to which the cluster belongs and the region where the cluster resides.
- On the Elasticsearch Clusters page, find the cluster and click its ID.
In the lower-right corner of the Basic Information page, choose .
In the Remove Data Nodes section of the page that appears, configure the Node Type and Nodes to Remove parameters.
Select the data nodes that you want to remove.
The system selects data nodes.
The system checks whether the conditions for removing the data nodes are met. After the check is passed, click OK. Then, the system migrates data stored in the selected data nodes and removes the data nodes.
NoteIf one or more conditions are not met, a message appears. You must handle the related exceptions as prompted. After the exceptions are handled, you can remove the data nodes.
You select data nodes by yourself. To select data nodes, perform the following steps:
Select data nodes from the node list.
After you select data nodes, the system checks whether the conditions for removing the data nodes are met. If one or more conditions are not met, a message appears. You must handle the related exceptions as prompted. After the exceptions are handled, you can remove the data nodes.
Check item
Expected result
Cluster status
The cluster is in the Normal state (indicated by the color green).
Index allocation configuration
The
cluster.routing.allocation.enable: all
parameter is set to all. This value indicates that all types of shards can be allocated to all data nodes in the cluster.Distribution of replica shards for each index
The replica shards of each index are distributed on different data nodes.
Number of remaining data nodes after the scale-in
The number of remaining data nodes after the scale-in is greater than or equal to two. For a multi-zone cluster, the number of data nodes in each zone is greater than or equal to two, and the numbers of remaining data nodes in all zones are the same.
Disk usage of the destination data node for data migration
If you want to migrate data during the scale-in, the disk usage of the destination data node after the scale-in is no more than 75%.
Memory usage of the destination data node for data migration
If you want to migrate data during the scale-in, the memory usage of the destination data node after the scale-in is no more than 70%.
Number of shards on each data node that you want to remove
No shards are stored on each data node that you want to remove.
Migrate data.
For security purposes, make sure that no data is stored on the data nodes you want to remove. If these data nodes store data, the system prompts you to migrate the data. After the data is migrated, no index data is stored on the data nodes, and no index data is written to the data nodes.
Click Data Migration Tool in the message that appears.
The data migration tool implements smooth data migration based on Elasticsearch shard allocation filters. The data migration does not affect your service availability.
In the Migrate Data dialog box, select a data migration method.
Parameter
Description
Smart Migration
The system selects the data nodes whose data is to be migrated.
Custom Migration
You must select the data nodes whose data you want to migrate.
Read the terms of data migration, select the check box, and click OK.
Then, the system restarts the cluster. During the restart, you can view the data migration progress in the Tasks dialog box. After the cluster is restarted, the data stored on the selected data nodes is migrated.
NoteDuring data migration, you can click Pause in the Tasks dialog box to pause the migration.
In the lower-right corner of the Basic Information page, choose again.
In the Remove Data Nodes section of the page that appears, select the data nodes whose data is migrated and click OK.
Then, the system restarts the cluster. During the restart, you can view the scale-in progress in the Tasks dialog box. After the cluster is restarted, the data nodes are removed from the cluster.
Roll back data migration
Data migration is time-consuming. Cluster status changes or data modifications may result in a data migration failure. You can view detailed information in the Tasks dialog box. To roll back data migration, perform the following steps:
- Log on to the Kibana console of your Elasticsearch cluster and go to the homepage of the Kibana console as prompted. For more information about how to log on to the Kibana console, see Log on to the Kibana console.Note In this example, an Elasticsearch V6.7.0 cluster is used. Operations on clusters of other versions may differ. The actual operations in the console prevail.
- In the left-side navigation pane of the page that appears, click Dev Tools.
On the Console tab of the page that appears, run the following command to obtain the IP addresses of the data nodes whose data is migrated:
GET _cluster/settings
If the command is successfully run, the following result is returned:
{ "transient": { "cluster": { "routing": { "allocation": { "exclude": { "_ip": "192.168.xx.xx,192.168.xx.xx,192.168.xx.xx" } } } } } }
Roll back data.
Roll back the data on some data nodes. Use the exclude parameter to exclude the data nodes whose data you do not want to roll back.
PUT _cluster/settings { "transient": { "cluster": { "routing": { "allocation": { "exclude": { "_ip": "192.168.xx.xx,192.168.xx.xx" } } } } } }
Roll back the data on all data nodes.
PUT _cluster/settings { "transient": { "cluster": { "routing": { "allocation": { "exclude": { "_ip": null } } } } } }
Run the following command to check whether the data is rolled back:
GET _cluster/settings
If the command output does not contain the IP addresses of the data nodes whose data is rolled back, the rollback is successful. You can also check the rollback progress based on whether shards are reallocated to the data nodes.
NoteYou can run the
GET _cat/shards?v
command to check the status of a data migration or rollback task.