A fault drill is an essential part of the overall disaster recovery process. This topic describes how to perform a fault drill for Elastic Compute Service (ECS) disaster recovery.
Benefits
A fault drill is an important part of disaster recovery. It allows you to run a protected ECS instance on the cloud to verify whether your applications can run as expected. A fault drill has the following benefits:
Allows you to easily check whether an application under disaster recovery protection can run on the disaster recovery site as expected.
Familiarizes yourself with the disaster recovery process and makes sure that a smooth failover can be performed when the production site encounters a failure.
Prerequisites
A protection group is in the Incremental Replication state or has a recovery point. For information about how to create a protection group for cross-region and cross-zone disaster recovery, see Start replication for cross-zone disaster recovery and Start replication for cross-region disaster recovery.
Virtual private clouds (VPCs), vSwitch mappings, and security group mappings are created in the Create Custom Drill Environment scenario.
Procedure
Create a fault drill environment.
Perform the fault drill.
Verify applications or services.
Clear the drill environment.
Select a method to create a drill environment
Creation method | Scenario | Advantages | Disadvantages |
Automatically Create and Start Drill Environment | This method is suitable for scenarios where your services are independent and you can verify your services without the need to configure a public network or another network. Note For example, an ECS application that provides internal services does not require the configurations of Server Load Balancer (SLB) instances, domain names, or security groups (open ports). |
After you specify a protection group, Cloud Backup automatically creates the resources required for the drill environment, including the VPCs, vSwitch mappings, and security group mappings. | You cannot customize the name prefixes of the ECS instances that are newly created after the drill is performed. |
Create Custom Drill Environment | This method is suitable for scenarios where your services interact with other networks and additional network configurations are required for your service verification. The drill environment can be retained after it is configured. Note For example, if multiple ECS instances provide services over SLB, you must configure SLB instances, domain names, and security groups (open ports) for your services. |
|
You must independently specify the drill VPC and configure vSwitch mappings and security group mappings. The drill may fail due to instance IP address conflicts. |
Automatically create and start a drill environment
Log on to the Cloud Backup console.
In the left-side navigation pane, choose .
If you are not using EBS Async Replication, click Switch to EBS Async Replication.
On the Site Pairs tab, click the site pair and then click the Fault Drill tab.
NoteYou can also go to the Protection Group tab, select a protection group, and click Fault Drill in the Actions column.
Click Automatically Create and Start Drill Environment.
In the Start Drill dialog box, select a protection group from the Protection Group drop-down list and click Next.
Preview the resources of the protection group and click OK.
NoteIf the type and operating system of the ECS instance at the disaster recovery site do not meet your requirements, you can select a proper instance type and operating system based on the ECS instance type for the production site, the operating system, and the system prompt. You can change the instance type and operating system by performing the Change Instance Type and Modify Operating System operations. If the instance family and operating system do not meet your requirements, submit a ticket to contact Alibaba Cloud technical support.
Before you enable replication for disaster recovery, you can also perform the Modify User Data and Modify Disaster Recovery IP operations.
If Abnormal is displayed in the IP Address column, the IP address is already in use. In this case, you must remove the original drill ECS instance or change the vSwitch mapping in the network configuration.
In the Confirm to Start Drill dialog box, click OK to start the drill.
ImportantCloud Backup suspends the replication for the protection group, and creates new drill disks based on the last recovery point. In most cases, replication of the protection group is automatically resumed in 5 minutes. After the drill disks are created, Cloud Backup creates a drill ECS instance at the disaster recovery site. When the protection group is in the drill state, you can verify your services.
During a protection group drill, the ECS instance at the disaster recovery site is automatically started and the ECS instance at the production site is not automatically stopped. We recommend that you evaluate your services and isolate production traffic by using security groups and network isolation to prevent service risks.
The status of the drill protection group changes to Initializing, Drill in Progress, and Drill Group Created in sequence.
After the drill ECS instance is started, verify your services.
Clear the drill environment.
Delete the fault drill group.
If you delete the fault drill group, resources that are created during the drill, such as the ECS instances, disks, elastic network interfaces (ENIs), snapshots, and images, are also deleted.
To delete the fault drill group, click Delete Fault Drill Group in the Actions column.
NoteYou can also go to the Protection Group tab and delete the specified fault drill group in the Actions column. You can delete multiple fault drill groups at a time.
Delete the drill environment.
In the Drill Environment section, click Delete.
Create a custom drill environment
Log on to the Cloud Backup console.
In the left-side navigation pane, choose .
If you are not using EBS Async Replication, click Switch to EBS Async Replication.
On the Site Pairs tab, click the site pair and then click the Fault Drill tab.
NoteYou can also go to the Protection Group tab, select a protection group, and click Fault Drill in the Actions column.
Click Create Custom Drill Environment.
In the Create Drill Environment dialog box, set the prefix of the name of the newly created ECS instance after the fault drill is performed, select Drill VPC from the drop-down list, and then click OK.
NoteThe value of ECS Instance Prefix is the prefix of the name of the newly created ECS instance. For example, if the name of the ECS instance used for the fault drill is
ecse
and the value of ECS Instance Prefix isDrill_test_20230925_
, the name of the newly created ECS instance isDrill_test_20230925_ecs
.You must create the VPC for the drill environment at the disaster recovery site in advance.
Configure the network of the drill environment.
In the Drill Environment section, click Details next to the Drill Network Configuration parameter.
In the Drill Network Configuration dialog box, add the vSwitch mapping and security group mapping.
In the Drill Environment section, click Start Drill to start a drill.
In the Start Drill dialog box, select a protection group from the Protection Group drop-down list and click Next.
Preview the resources of the protection group and click OK.
NoteIf the type and operating system of the ECS instance at the disaster recovery site do not meet your requirements, you can select a proper instance type and operating system based on the ECS instance type for the production site, the operating system, and the system prompt. You can change the instance type and operating system by performing the Change Instance Type and Modify Operating System operations. If the instance family and operating system do not meet your requirements, submit a ticket to contact Alibaba Cloud technical support.
Before you enable replication for disaster recovery, you can also perform the Modify User Data and Modify Disaster Recovery IP operations.
If Abnormal is displayed in the IP Address column, the IP address is already in use. In this case, you must remove the original drill ECS instance or change the vSwitch mapping in the network configuration.
In the Confirm to Start Drill dialog box, click OK to start the drill.
ImportantCloud Backup suspends the replication for the protection group, and creates new drill disks based on the last recovery point. In most cases, replication of the protection group is automatically resumed in 5 minutes. After the drill disks are created, Cloud Backup creates a drill ECS instance at the disaster recovery site. When the protection group is in the drill state, you can verify your services.
During a protection group drill, the ECS instance at the disaster recovery site is automatically started and the ECS instance at the production site is not automatically stopped. We recommend that you evaluate your services and isolate production traffic by using security groups and network isolation to prevent service risks.
The status of the drill protection group changes to Initializing, Drill in Progress, and Drill Group Created in sequence.
After the drill ECS instance is started, verify your services.
Clear the drill environment.
Delete the fault drill group.
If you delete the fault drill group, resources that are created during the drill, such as the ECS instances, disks, ENIs, snapshots, and images, are also deleted.
To delete the fault drill group, click Delete Fault Drill Group in the Actions column.
NoteYou can also go to the Protection Group tab and delete the specified fault drill group in the Actions column. You can delete multiple fault drill groups at a time.
Delete the drill environment.
In the Drill Environment section, click Delete.