Deploy and use a multi-zone Elasticsearch cluster - Elasticsearch

A multi-zone Elasticsearch cluster provides optimized disaster recovery capabilities. The system automatically selects the zones that have sufficient Elastic Compute Service (ECS) instances to deploy a cluster. If replica shards are configured for indexes in the cluster and nodes in one zone fail, the nodes in the remaining zones can still provide services without interruption. This significantly enhances the availability of the cluster. In addition, you can perform a switchover in the Elasticsearch console to isolate the faulty nodes. Then, the system adds computing resources to the remaining zones to make up for the resources lost in the zone that contains the faulty nodes. This topic describes how to deploy a multi-zone Elasticsearch cluster and perform a switchover and recovery for a zone.

Scenarios

You can deploy an Alibaba Cloud Elasticsearch cluster by using one of the following methods:

In one zone: This is the default deployment method. In most cases, it is used to handle non-critical workloads.
Across two zones: This deployment method implements cross-zone disaster recovery. In most cases, it is used to handle production workloads.
Across three zones: This deployment method implements high availability. We recommend that you use this deployment method to handle production workloads that have high requirements for service availability.

Deploy a multi-zone Elasticsearch cluster

Operations

When you purchase an Alibaba Cloud Elasticsearch cluster, you can select the number of zones for the cluster. If you select two or three zones, the system deploys the cluster across these zones. You do not need to select each zone. The system automatically selects zones to deploy the cluster. For more information, see Create an Alibaba Cloud Elasticsearch cluster and Parameters on the buy page.

Important

If you choose to deploy a cluster across zones, the Elasticsearch console displays only the zones where nodes for receiving network traffic from clients are deployed, such as Hangzhou Zone I. The system deploys the cluster to the zones that have sufficient ECS instances, such as Hangzhou Zone H and Hangzhou Zone J.

Precautions

Category

Precaution

Nodes

You must purchase three dedicated master nodes.
The number of data nodes, warm nodes, and client nodes must be a multiple of the number of zones. For more information about zones, see Regions and zones.
If you choose to deploy your cluster across two zones, Alibaba Cloud Elasticsearch uses one of the following methods to deploy your cluster:
- If the current region has at least three zones and all these zones have sufficient ECS instances, the dedicated master nodes of the cluster are deployed in three zones. In this case, if nodes in one zone fail, your cluster can still select a dedicated master node.
- If the current region has only two zones or only two zones in the region have sufficient ECS instances, the dedicated master nodes are deployed in the two zones. If the nodes in the zone that contains only one dedicated master node fail, your cluster can still select a dedicated master node. If the nodes in the zone that contains two dedicated master nodes fail, you must perform a switchover in the Elasticsearch console. Before the faulty nodes are recovered, you cannot perform write operations on the cluster but can still perform read operations on the cluster.
  Note
  For important production business, we recommend that you do not select a region that has only one or two zones.

Replica shards of indexes

If an Elasticsearch cluster is deployed across two zones and one zone becomes unavailable, the other zone is used to provide services. Therefore, you must configure at least one replica shard for each primary shard of an index.
By default, one replica shard is configured for each primary shard of an index. If you do not have specific requirements for read performance, you can use the default setting.

If an Elasticsearch cluster is deployed across three zones and one or two of the zones become unavailable, the remaining zones are used to provide services. Therefore, you must configure at least two replica shards for each primary shard of an index.
By default, one replica shard is configured for each primary shard of an index. Therefore, you must change the number of replica shards in the index template. For more information, see Index Templates. The following sample code provides an example on how to change the number of replica shards to 2 in the index template:
```
PUT _template/template_1
{
  "template": "*",
  "settings": {
    "number_of_replicas": 2
  }
}                                
```

Note

If a cluster contains indexes for which no replica shards are configured, data loss may occur when you perform a switchover or recovery for a zone. Make sure that indexes in your cluster are configured based on the preceding suggestions, and perform routine O&M and troubleshooting on your cluster.

Configuration description

During the deployment of the cluster, the system automatically enables shard allocation awareness for the cluster. For more information, see Shard allocation awareness. The following table describes the parameters that are configured for an Elasticsearch cluster deployed across the cn-hangzhou-f and cn-hangzhou-g zones.

Parameter	Description	Example
cluster.routing.allocation.awareness.attributes	Important Do not call an Elasticsearch API operation to change the value of this parameter. Otherwise, an exception may occur. Specifies the node attributes that are used to enable shard allocation awareness for the cluster. The system adds the Enode.attr.zone_id parameter to the start parameters of a node in a multi-zone cluster to identify the zone of the node. For example, a node of a multi-zone cluster is deployed in the cn-hangzhou-g zone. In this case, the system adds `-Enode.attr.zone_id=cn-hangzhou-g` to the startup parameters of the node. Therefore, the cluster.routing.allocation.awareness.attributes parameter has a fixed value of zone_id.	zone_id
cluster.routing.allocation.awareness.force.zone_id.values	Specifies whether to enable forced awareness for shard allocation. Forced awareness prevents a zone from being overloaded when other zones become unavailable. For example, the index of an Elasticsearch cluster that is deployed across the cn-hangzhou-f and cn-hangzhou-g zones has one primary shard and three replica shards. Based on the shard allocation awareness policy, the system allocates two shards to each of the cn-hangzhou-f and cn-hangzhou-g zones. If you configure the cluster.routing.allocation.awareness.force.zone_id.values parameter and the cn-hangzhou-f zone becomes unavailable, forced awareness prevents the system from reallocating the shards of the cn-hangzhou-f zone to the cn-hangzhou-g zone. Note By default, this parameter is not configured. This indicates that forced awareness is disabled for shard allocation. You can configure this parameter based on your business requirements.	["cn-hangzhou-f", "cn-hangzhou-g"]

Perform a switchover and recovery

If your Elasticsearch cluster is deployed across zones and the nodes in one zone become faulty, you can perform a switchover for the zone. The system removes the nodes from this zone and transmits the network data sent from clients to only nodes in the other zones that are in the Enabled state. After the faulty nodes in the zone where a switchover is performed recover, you can perform a recovery for the zone. The system adds the nodes removed during the switchover to the zone and transmits the network data sent from clients to nodes in all the zones that are in the Enabled state.

Important

Before the switchover, you must make sure that the indexes in the cluster have replica shards. This ensures normal read and write operations on the cluster after the switchover.

Log on to the Alibaba Cloud Elasticsearch console.
In the left-side navigation pane, click Elasticsearch Clusters.
Navigate to the desired cluster.
1. In the top navigation bar, select the resource group to which the cluster belongs and the region where the cluster resides.
2. On the Elasticsearch Clusters page, find the cluster and click its ID.
In the Node Visualization section of the Basic Information page of the cluster, perform a switchover.
1. Move the pointer over the zone for which a switchover needs to be performed and click Switch Over.
2. In the Confirm Operation message, click Continue.
  Then, the system restarts the cluster to make the switchover take effect. After the switchover is successful, the state of the zone changes from Enabled to Disabled.
  Note
  When you perform a switchover, the system automatically adds the corresponding number of nodes, such as dedicated master nodes, client nodes, and data nodes, to the remaining zones in the Enabled state. This ensures that your Elasticsearch cluster has sufficient computing resources. However, the success rate of adding nodes to the zones cannot be ensured because of various factors, such as underlying resource inventory and scheduling concurrency limits.
  After a switchover, computing resources and the maximum workloads that the cluster can handle are reduced. To reduce the impact of faults on your cluster, you need to control the cluster usage and perform operations such as throttling at the earliest opportunity when faults occur.
If your indexes have replica shards before you perform the switchover, the status of your cluster becomes abnormal (indicated by the color yellow) after the switchover. In this case, after the switchover is complete, you can log on to the Kibana console and configure parameters for the cluster by referring to the following command. This operation is used to allocate the shards in the zone for which the switchover is performed to the remaining zones. After the shards are allocated, the status of the cluster becomes normal (indicated by the color green).
```
PUT /_cluster/settings
{
    "persistent" : {
        "cluster.routing.allocation.awareness.force.zone_id.values" : {"0": null, "1": null, "2": null}
    }
}
```
In the Node Visualization section, recover the zone for which the switchover is performed.
1. Move the pointer over the zone and click Switch Back.
2. In the Confirm Operation message, click Continue.
  Then, the system restarts your Elasticsearch cluster to make the recovery take effect. After the recovery is successful, the state of the zone changes from Disabled to Enabled.
  Note
  When you recover the zone, the system removes the nodes that were added during the switchover, such as dedicated master nodes, client nodes, and data nodes. In addition, the system migrates the data stored on the removed data nodes to other data nodes.