Use replication pairs to implement disaster recovery - Elastic Compute Service

If the primary disk in an activated replication pair fails, you can perform a failover on the replication pair to enable read and write permissions on the secondary disk and then attach the secondary disk to a temporary Elastic Compute Service (ECS) instance to ensure business continuity. After you repair the primary disk, perform a reverse replication on the replication pair to replicate data from the secondary disk to the primary disk for disaster recovery. After the reverse replication is complete, you can switch your business back to the primary disk. This topic describes how to implement disaster recovery for a single disk by using a replication pair.

Limits

If a replication pair is added to a replication pair-consistent group, you cannot separately perform a failover or reverse replication on the replication pair. You must manage the replication pairs in the replication pair-consistent group in a centralized manner. For more information, see Use replication pair-consistent groups to implement disaster recovery.

Prerequisites

The primary disk is detached from the associated ECS instance before you perform a reverse replication and is in the Unattached state. For information about how to detach a disk, see Detach a data disk. Alternatively, the ECS instance to which the primary disk is attached is in the Stopped state if the disk is not detached from the instance.
Note
The reverse replication feature replicates data from the secondary disk to the primary disk. To ensure successful reverse replication, the primary disk must be read-only.
(Recommended) Snapshots are created for the disks to back up disk data. For information about how to create snapshots of disks to back up disk data, see Create a snapshot.
Note
You are charged for the created snapshots. For information about the billing of snapshots, see Snapshots.

(Optional) Step 1: Perform a disaster recovery drill

After you activate a replication pair, the async replication feature continuously replicates data from the primary disk to the secondary disk. You can perform a disaster recovery drill to clone the data at the latest recovery point on the secondary disk to a new disk, which is called drill disk, to verify the integrity and correctness of applications at the secondary site. A disaster recovery drill does not affect the async replication feature, and a failure at the primary site does not affect the drill. However, a failure at the secondary site may cause an exception on the drill.

Log on to the Elastic Block Storage (EBS) console.
In the left-side navigation pane, choose Enterprise-level Features > Async Replication.
In the top navigation bar, select the region and resource group to which the resource belongs.
Find the replication pair on which you want to perform a disaster recovery drill and click the ID of the replication pair. A disaster recovery drill is also referred to as a disaster recovery walkthrough.
In the Walkthrough section, click Create walkthroughs.
In the Create walkthroughs message, confirm the region, zone, category, and size of the drill disk and click OK.
After you create the disaster recovery drill, a pay-as-you-go disk that has the same category and size as the secondary disk is created in the zone in which the secondary disk resides. The new disk contains data at the latest recovery point on the secondary disk.
Note
- You can create multiple disaster recovery drills to back up data at different recovery points based on your business requirements.
- After the disaster recovery drill is complete, we recommend that you delete the drill in the Walkthrough section and release the drill disk at the earliest opportunity to reduce costs.

Step 2: Perform a failover

Warning

The failover feature suspends the async replication feature. To prevent data loss, perform a failover only when the primary disk fails.

In the top navigation bar, select the region in which the secondary disk resides. Example: China (Beijing).
Find the replication pair that includes the faulty primary disk and choose > Perform Failover in the Actions column.
In the message that appears, read the notes and click OK.
After a failover is performed, the status of the replication pair changes to Failed Over.
Attach the secondary disk to a temporary ECS instance and fail over to the disk.
For more information, see Create an instance on the Custom Launch tab and Attach a data disk.

Step 3: Perform a reverse replication

In the top navigation bar, select the region in which the secondary disk resides. Example: China (Beijing).
Find the replication pair on which you performed a failover and choose > Perform Reverse Replication in the Actions column.
In the Perform Reverse Replication dialog box, read the notes and click Create Snapshot to create a snapshot for the primary disk.
Important
In reverse replication, the original data on the primary disk is overwritten by the data replicated from the secondary disk. To back up disk data and prevent data loss, we recommend that you create a snapshot for the primary disk. If you created a snapshot for the primary disk after the disk was repaired, you do not need to create another snapshot for the disk in this dialog box.
Click OK to replicate data from the secondary disk to the primary disk.
The status of the replication pair changes to Stopped. The primary and secondary roles of the disks in the replication pair are switched. In the replication pair list, you can view the primary disk in the Primary Disk/Region/Zone column and the secondary disk in the Secondary Disk/Region/Zone column.
Note
The original primary disk is automatically changed to the secondary disk, and the original secondary disk is automatically changed to the primary disk. Example:
- Before a reverse replication is performed, Disk A in the China (Beijing) region serves as the primary disk and Disk B in the China (Shanghai) region serves as the secondary disk.
- After a reverse replication is performed, Disk B in the China (Shanghai) region becomes the primary disk and Disk A in the China (Beijing) region becomes the secondary disk.
In the Actions column that corresponds to the replication pair on which you performed a reverse replication, click Activate. In the dialog box that appears, click OK to asynchronously replicate data from the original secondary disk to the original primary disk.
After the status of the replication pair changes to Normal, all data on the original secondary disk is replicated to the original primary disk.
(Optional) Restore the primary and secondary roles of the disks in the replication pair to the original roles.
In the previous steps of reverse replication, the primary and secondary roles of the disks in the replication pair are switched. If you want to restore the primary and secondary roles of the disks to the original roles, perform the following steps:
1. Find the replication pair that you want to manage and obtain the region in the Secondary Disk/Region/Zone column. In the top navigation bar, select the region that you obtained.
2. Find the replication pair on which you performed a reverse replication and choose > Perform Failover in the Actions column.
3. Choose > Perform Reverse Replication in the Actions column.
4. After the reverse replication is complete, click Activate in the Actions column to re-activate the replication pair.
5. In the replication pair list, find the replication pair and view the disks in the Primary Disk/Region/Zone and Secondary Disk/Region/Zone columns to verify the result.