This topic describes the basic capabilities and benefits of the Elastic Block Storage (EBS) async replication feature for Elastic Compute Service (ECS) disaster recovery.
Overview
Cloud Backup provides cross-region and cross-zone disaster recovery capabilities based on the async replication feature to meet various business requirements.
Async replication is implemented on disks without the need to install an agent on the protected instance.
If a fault occurs on the primary system, the business system is switched to the disaster recovery system. This effectively prevents system failures caused by regional disasters, ensures business availability, and meets the recovery point objective (RPO) and recovery time objective (RTO) goals of your business.
Async replication is a feature that protects data across regions or across zones within the same region based on the data replication capability of EBS. For more information, see Overview.
The following table describes the differences between continuous data replication (CDR) and async replication.
Item | CDR | Async replication |
Item | CDR | Async replication |
Scenarios | Disaster recovery for a single virtual machine (VM). If you do not mind intrusions into the system, you can use this replication technology. | Disaster recovery that ensures the consistency of VM groups. If you do not expect intrusions into the system, you can use this replication technology. |
Intrusive to the system | Yes | No |
Replication implementation | An agent is installed on the operating system of the protected instance, so that Cloud Backup replicates data written to the disks and sends the data to a gateway in real time. The gateway stores the data in an Object Storage Service (OSS) bucket and then writes the data to the disk at the disaster recovery site. | Data is replicated by using the async replication and snapshot features. |
Recovery implementation | Supports multiple recovery points. A shadow ECS instance and a gateway server are created for the protected ECS instance at the disaster recovery site. Cloud Backup reads data from the OSS bucket to the shadow ECS instance, writes the data to the ECS instance at the disaster recovery site, and then creates a recovery point based on the snapshot mechanism. | Supports only a single recovery point. Cloud Backup creates a recovery point by replicating the snapshot to the disaster recovery site. |
Consistency group | Not supported | Supported |
Benefits of disaster recovery
Agentless replication
Async replication does not require agents, does not intrude into the system, is universally applicable to operating systems, and does not consume computing resources at the disaster recovery site.
Multi-VM consistency
ECS disaster recovery provides multi-VM consistency to meet the high requirements for enterprise applications.
Ease of use
After you create a protection group for an application, you can add all the ECS instances of the application to the protection group and enable replication. You do not need to focus on the mappings between disks and ECS instances. ECS instances and disks are mapped by Cloud Backup.
Terms
Term | Description |
Term | Description |
site pair | Cross-region and cross-zone disaster recovery is implemented based on async replication. Async replication is used to replicate data from one site to another site across regions or across zones in a region. Therefore, you must pair two sites according to your business requirements. These two sites are referred to as a site pair. Protection groups must be created for the site pair. Disaster recovery is implemented only in the forward direction for the protection groups in a site pair. For example, disaster recovery is performed from Protection Group A to Protection Group B, and the forward protection is initiated from Region 1 to Region 2. Disaster recovery is performed from Protection Group C to Protection Group D, and the forward protection is initiated from Region 2 to Region 1. In this case, you must create two site pairs. A protection group can belong to only one site pair. Only one replication technology can be used for one site pair. |
protection group |
|
protected instance | An ECS instance or database that is protected by Cloud Backup. Database protection will be supported in the future. Roles are classified into primary and secondary roles. Primary roles refer to the instances on which services are running, and secondary roles refer to the instances that are used for disaster recovery. |
production site | The zone or region where your production business operates initially. |
disaster recovery site | The zone or region for disaster recovery of your production business. |
failover | The process of switching services to the disaster recovery site when a fault occurs at the production site. Failover is classified into planned failover and unplanned failover. The difference lies in whether the ECS instance at the production site fails during the switchover. |
failback | The process of switching services from the disaster recovery site to the production site when the fault at the production site is rectified. |
forward protection | The replication direction of the protection group and ECS instances. In forward protection, data and services are replicated from the production site to the disaster recovery site. |
reverse protection | The replication direction of the protection group and ECS instances. After a failover, the disaster recovery site (Site B) becomes the new production site, and the production site (Site A) becomes the new disaster recovery site. In this case, after the replication is started, data is replicated from Site B to Site A. The reverse protection takes effect on the site pair. After a failback, Site A becomes the production site and Site B becomes the disaster recovery site again. In this case, after the replication is started, data is replicated from Site A to Site B. The forward protection resumes on the site pair. |
Architecture
The following figure shows the technical architecture of disaster recovery based on CDR and async replication.
Supported disaster recovery scenarios
Disaster recovery scenario | Type |
Disaster recovery scenario | Type |
failover |
|
failback |
|
Disaster recovery process
To implement disaster recovery protection for critical applications in the Cloud Backup console, perform the following steps:
Step 1: Plan resources.
Before you perform disaster recovery, you must plan the required compute, network, and storage resources. You must determine the number of servers, storage capacity, and virtual private clouds (VPCs).
Step 2: Create a disaster recovery site pair.
Create VPCs and vSwitches for the disaster recovery site, and configure CIDR blocks. During the test, you can use the default configurations to create VPCs and vSwitches. You can also configure the same VPC CIDR block and vSwitch CIDR block for the production site and the disaster recovery site. During actual disaster recovery, you can configure CIDR blocks as required.
Step 3: Configure network and security settings.
Create resource mappings, including the zone mapping, vSwitch mapping, and security group mapping.
Step 4: Create a protection group.
Step 5: Add protected instances.
Add instances to be protected.
Step 6: Start replication.
Start disaster recovery protection, a process of replicating data from the production site to the disaster recovery site.
You can perform a fault drill if the protection group is in Incremental Replication status or has a recovery point. For more information, see Fault drill.
Step 7: Perform a failover.
Switch After Data Synchronization
During the failover, Cloud Backup stops the protected instances in the protection group, and performs the final data synchronization after all the protected instances are stopped. The failover starts after the data is synchronized. This ensures that the data at the disaster recovery site is the same as that at the production site. This type of failover applies to scenarios such as planned fault drills and business migration.
Switch Now
During the failover, Cloud Backup attempts to stop the protected instances in the protection group. Cloud Backup does not wait until all the protected instances are stopped or perform the final data synchronization. Some data may be lost within the recovery point objective (RPO) range. This type of failover applies to scenarios where a fault cannot be rectified within a short period of time at the production site and business must be immediately switched to the disaster recovery site.
Billing
If you use the async replication feature for disaster recovery, the following fees are incurred:
The fees for using Cloud Backup clients for ECS disaster recovery
This feature is in public preview. You are not charged for using Cloud Backup clients for ECS disaster recovery during the public preview.
The fees for using the pay-as-you-go ECS instances and disks created at the disaster recovery site are included in your ECS bills. For more information, see Pay-as-you-go.
The fees for the traffic generated during cross-region replication are included in your ECS bills. For more information, see Billing of disk disaster recovery.