All Products
Search
Document Center

PolarDB:Disaster recovery drills

Last Updated:Aug 09, 2024

This topic describes how to perform a disaster recovery drill for a PolarDB for MySQL cluster.

Overview

You can use the disaster recovery drill feature to perform a disaster recovery drill on a PolarDB for MySQL cluster. Disaster recovery drills can be performed in the node or zone dimension. In the node, you can select only one node for a drill. After the drill starts and faults are injected, the specified node becomes unavailable. The PolarDB for MySQL cluster can be recovered based on the selected recovery policy. For more information about recovery policies, see Recovery phase.

Note

The disaster recovery drill feature is in canary release. To use the feature, contact us.

Prerequisites

The PolarDB for MySQL cluster must meet the following requirements:

  • Database edition: Enterprise Edition.

  • Database engine:

    • MySQL 5.6 with revision version 5.6.1.0.42 or later.

    • MySQL 5.7 with revision version 5.7.1.0.26 or later.

    • MySQL 8.0.1 with revision version 8.0.1.1.30.2 or later.

    • MySQL 8.0.2 with revision version 8.0.2.2.19 or later.

Precautions

Disaster recovery drills entail risks such as transient connections and data loss. To prevent your business from being affected, we recommend that you clone a new cluster and perform a drill on it. You can also use the intelligent stress testing feature together to simulate production workloads and perform fault drills.

Note

Notes on data loss risks:

  • When you perform a disaster recovery drill in the zone dimension and if RPO is less than 60, data loss may occur.

  • However, no data loss occurs under the following conditions: the drill is performed in the node dimension, the cluster contains hot standby nodes or read-only nodes, and RPO is 0.

Procedure

  1. Log on to the PolarDB console.

  2. In the left-side navigation pane, click Clusters.

  3. In the upper-left corner, select the region where the cluster to which you want to connect is deployed.

  4. Find the cluster and click its ID.

  5. In the left-side navigation pane, click Service Availability. On the page that appears, click the Drill tab.

  6. On the Drill page, select the zone or nodes where you want to perform the drill. Click Start Drill.

    Note

    image

  7. In the Fault Injection Method dialog box, select Fault Injection By Node. Click OK. image

  8. On the Drill page, you can view the status of the drill task and the phases in the Drill List section.

    • Node dimensionimage

    • Zone dimensionimage

Drill phases

Fault injection

Inject faults based on the Drill Nodes and Fault Injection Method values.

Recovery phase

The following sections detail the recovery phase.

Node dimension

  • If you select the primary node to perform the drill and the cluster contains a hot standby node, the hot standby node is promoted to the primary node.

  • If you select the primary node to perform the drill and the cluster contains a read-only node but not a hot standby node, the read-only node is promoted to the primary node.

  • If you select the primary node to perform the drill and the cluster does not contain a hot standby node or a read-only node, the drill is performed in the zone dimension.

  • If you select read-only node to perform the drill, a new read-only node is created.

Zone dimension

If you select a zone to perform the drill, the drill is performed on all nodes in the zone. The business is switched over to the secondary zone.

  • The following figure shows the architecture before the drill starts.

    image
  • The following figure shows the drill performed on all nodes in the zone.

    image
  • The primary node becomes available, as shown in the following figure.

    image

    The system determines whether to rebuild other nodes in the new primary zone (zone 2) based on the resources available in that zone. The following figure shows the case where other nodes are rebuilt in the new primary zone.

    image

Post-drill phase

This phase is triggered only for drills performed in the zone dimension. The time consumed in this phase is related to the data volume and network latency. The business is not affected in this phase.

  • Other nodes rebuilt in the new primary zone

    • Rebuild read-only nodes

      image
    • Switch back primary node

      image
  • Other nodes not rebuilt in the new primary zone

    • Rebuild read-only nodes

      image
    • Switch back primary node

      image