A multi-zone Elasticsearch cluster provides optimized disaster recovery capabilities. The system automatically selects the zones that have sufficient Elastic Compute Service (ECS) instances to deploy a cluster. If replica shards are configured for indexes in the cluster and nodes in one zone fail, the nodes in the remaining zones can still provide services without interruption. This significantly enhances the availability of the cluster. In addition, you can perform a switchover in the Elasticsearch console to isolate the faulty nodes. Then, the system adds computing resources to the remaining zones to make up for the resources lost in the zone that contains the faulty nodes. This topic describes how to deploy a multi-zone Elasticsearch cluster and perform a switchover and recovery for a zone.
Scenarios
You can deploy an Alibaba Cloud Elasticsearch cluster by using one of the following methods:
In one zone: This is the default deployment method. In most cases, it is used to handle non-critical workloads.
Across two zones: This deployment method implements cross-zone disaster recovery. In most cases, it is used to handle production workloads.
Across three zones: This deployment method implements high availability. We recommend that you use this deployment method to handle production workloads that have high requirements for service availability.
Deploy a multi-zone Elasticsearch cluster
Operations
When you purchase an Alibaba Cloud Elasticsearch cluster, you can select the number of zones for the cluster. If you select two or three zones, the system deploys the cluster across these zones. You do not need to select each zone. The system automatically selects zones to deploy the cluster. For more information, see Create an Alibaba Cloud Elasticsearch cluster and Parameters on the buy page.
If you choose to deploy a cluster across zones, the Elasticsearch console displays only the zones where nodes for receiving network traffic from clients are deployed, such as Hangzhou Zone I. The system deploys the cluster to the zones that have sufficient ECS instances, such as Hangzhou Zone H and Hangzhou Zone J.
Precautions
Category | Precaution |
Nodes |
|
Replica shards of indexes |
Note If a cluster contains indexes for which no replica shards are configured, data loss may occur when you perform a switchover or recovery for a zone. Make sure that indexes in your cluster are configured based on the preceding suggestions, and perform routine O&M and troubleshooting on your cluster. |
Configuration description
During the deployment of the cluster, the system automatically enables shard allocation awareness for the cluster. For more information, see Shard allocation awareness. The following table describes the parameters that are configured for an Elasticsearch cluster deployed across the cn-hangzhou-f and cn-hangzhou-g zones.
Parameter | Description | Example |
cluster.routing.allocation.awareness.attributes | Important Do not call an Elasticsearch API operation to change the value of this parameter. Otherwise, an exception may occur. Specifies the node attributes that are used to enable shard allocation awareness for the cluster. The system adds the Enode.attr.zone_id parameter to the start parameters of a node in a multi-zone cluster to identify the zone of the node. For example, a node of a multi-zone cluster is deployed in the cn-hangzhou-g zone. In this case, the system adds | zone_id |
cluster.routing.allocation.awareness.force.zone_id.values | Specifies whether to enable forced awareness for shard allocation. Forced awareness prevents a zone from being overloaded when other zones become unavailable. For example, the index of an Elasticsearch cluster that is deployed across the cn-hangzhou-f and cn-hangzhou-g zones has one primary shard and three replica shards. Based on the shard allocation awareness policy, the system allocates two shards to each of the cn-hangzhou-f and cn-hangzhou-g zones. If you configure the cluster.routing.allocation.awareness.force.zone_id.values parameter and the cn-hangzhou-f zone becomes unavailable, forced awareness prevents the system from reallocating the shards of the cn-hangzhou-f zone to the cn-hangzhou-g zone. Note By default, this parameter is not configured. This indicates that forced awareness is disabled for shard allocation. You can configure this parameter based on your business requirements. | ["cn-hangzhou-f", "cn-hangzhou-g"] |
Perform a switchover and recovery
If your Elasticsearch cluster is deployed across zones and the nodes in one zone become faulty, you can perform a switchover for the zone. The system removes the nodes from this zone and transmits the network data sent from clients to only nodes in the other zones that are in the Enabled state. After the faulty nodes in the zone where a switchover is performed recover, you can perform a recovery for the zone. The system adds the nodes removed during the switchover to the zone and transmits the network data sent from clients to nodes in all the zones that are in the Enabled state.
Before the switchover, you must make sure that the indexes in the cluster have replica shards. This ensures normal read and write operations on the cluster after the switchover.
- Log on to the Alibaba Cloud Elasticsearch console.
- In the left-side navigation pane, click Elasticsearch Clusters.
- Navigate to the desired cluster.
- In the top navigation bar, select the resource group to which the cluster belongs and the region where the cluster resides.
- On the Elasticsearch Clusters page, find the cluster and click its ID.
In the Node Visualization section of the Basic Information page of the cluster, perform a switchover.
Move the pointer over the zone for which a switchover needs to be performed and click Switch Over.
In the Confirm Operation message, click Continue.
Then, the system restarts the cluster to make the switchover take effect. After the switchover is successful, the state of the zone changes from Enabled to Disabled.
NoteWhen you perform a switchover, the system automatically adds the corresponding number of nodes, such as dedicated master nodes, client nodes, and data nodes, to the remaining zones in the Enabled state. This ensures that your Elasticsearch cluster has sufficient computing resources. However, the success rate of adding nodes to the zones cannot be ensured because of various factors, such as underlying resource inventory and scheduling concurrency limits.
After a switchover, computing resources and the maximum workloads that the cluster can handle are reduced. To reduce the impact of faults on your cluster, you need to control the cluster usage and perform operations such as throttling at the earliest opportunity when faults occur.
If your indexes have replica shards before you perform the switchover, the status of your cluster becomes abnormal (indicated by the color yellow) after the switchover. In this case, after the switchover is complete, you can log on to the Kibana console and configure parameters for the cluster by referring to the following command. This operation is used to allocate the shards in the zone for which the switchover is performed to the remaining zones. After the shards are allocated, the status of the cluster becomes normal (indicated by the color green).
PUT /_cluster/settings { "persistent" : { "cluster.routing.allocation.awareness.force.zone_id.values" : {"0": null, "1": null, "2": null} } }
In the Node Visualization section, recover the zone for which the switchover is performed.
Move the pointer over the zone and click Switch Back.
In the Confirm Operation message, click Continue.
Then, the system restarts your Elasticsearch cluster to make the recovery take effect. After the recovery is successful, the state of the zone changes from Disabled to Enabled.
NoteWhen you recover the zone, the system removes the nodes that were added during the switchover, such as dedicated master nodes, client nodes, and data nodes. In addition, the system migrates the data stored on the removed data nodes to other data nodes.