This topic describes how to perform a disaster recovery drill for a PolarDB for MySQL cluster.
Overview
You can use the disaster recovery drill feature to perform a disaster recovery drill on a PolarDB for MySQL cluster. Disaster recovery drills can be performed in the node or zone dimension. In the node, you can select only one node for a drill. After the drill starts and faults are injected, the specified node becomes unavailable. The PolarDB for MySQL cluster can be recovered based on the selected recovery policy. For more information about recovery policies, see Recovery phase.
The disaster recovery drill feature is in canary release. To use the feature, contact us.
Prerequisites
The PolarDB for MySQL cluster must meet the following requirements:
Database edition: Enterprise Edition.
Database engine:
MySQL 5.6 with revision version 5.6.1.0.42 or later.
MySQL 5.7 with revision version 5.7.1.0.26 or later.
MySQL 8.0.1 with revision version 8.0.1.1.30.2 or later.
MySQL 8.0.2 with revision version 8.0.2.2.19 or later.
Precautions
Disaster recovery drills entail risks such as transient connections and data loss. To prevent your business from being affected, we recommend that you clone a new cluster and perform a drill on it. You can also use the intelligent stress testing feature together to simulate production workloads and perform fault drills.
Notes on data loss risks:
When you perform a disaster recovery drill in the zone dimension and if RPO is less than 60, data loss may occur.
However, no data loss occurs under the following conditions: the drill is performed in the node dimension, the cluster contains hot standby nodes or read-only nodes, and RPO is 0.
Procedure
Log on to the PolarDB console.
In the left-side navigation pane, click Clusters.
In the upper-left corner, select the region where the cluster to which you want to connect is deployed.
Find the cluster and click its ID.
In the left-side navigation pane, click Service Availability. On the page that appears, click the Drill tab.
On the Drill page, select the zone or nodes where you want to perform the drill. Click Start Drill.
NoteOn the Drill page, the Intelligent Stress Testing and Clone Cluster shortcuts are provided.
If you select a zone, the drill is performed on all nodes in the zone.
In the Fault Injection Method dialog box, select Fault Injection By Node. Click OK.
On the Drill page, you can view the status of the drill task and the phases in the Drill List section.
Node dimension
Zone dimension
Drill phases
Fault injection
Inject faults based on the Drill Nodes and Fault Injection Method values.
Recovery phase
The following sections detail the recovery phase.
Node dimension
If you select the primary node to perform the drill and the cluster contains a hot standby node, the hot standby node is promoted to the primary node.
If you select the primary node to perform the drill and the cluster contains a read-only node but not a hot standby node, the read-only node is promoted to the primary node.
If you select the primary node to perform the drill and the cluster does not contain a hot standby node or a read-only node, the drill is performed in the zone dimension.
If you select read-only node to perform the drill, a new read-only node is created.
Zone dimension
If you select a zone to perform the drill, the drill is performed on all nodes in the zone. The business is switched over to the secondary zone.
The following figure shows the architecture before the drill starts.
The following figure shows the drill performed on all nodes in the zone.
The primary node becomes available, as shown in the following figure.
The system determines whether to rebuild other nodes in the new primary zone (zone 2) based on the resources available in that zone. The following figure shows the case where other nodes are rebuilt in the new primary zone.
Post-drill phase
This phase is triggered only for drills performed in the zone dimension. The time consumed in this phase is related to the data volume and network latency. The business is not affected in this phase.
Other nodes rebuilt in the new primary zone
Rebuild read-only nodes
Switch back primary node
Other nodes not rebuilt in the new primary zone
Rebuild read-only nodes
Switch back primary node