All Products
Search
Document Center

Cloud Backup:Fault drill

Last Updated:May 31, 2024

A fault drill is an essential part of the overall disaster recovery process. This topic describes how to perform a fault drill for Elastic Compute Service (ECS) disaster recovery.

Benefits

A fault drill is an important part of disaster recovery. It allows you to run a protected ECS instance on the cloud to verify whether your applications can run as expected. A fault drill has the following benefits:

  • Allows you to easily check whether an application under disaster recovery protection can run on the disaster recovery site as expected.

  • Familiarizes yourself with the disaster recovery process and makes sure that a smooth failover can be performed when the production site encounters a failure.

Prerequisites

Procedure

  1. Create a fault drill environment.

  2. Perform the fault drill.

  3. Verify applications or services.

  4. Clear the drill environment.

Select a method to create a drill environment

Creation method

Scenario

Advantages

Disadvantages

Automatically Create and Start Drill Environment

This method is suitable for scenarios where your services are independent and you can verify your services without the need to configure a public network or another network.

Note

For example, an ECS application that provides internal services does not require the configurations of Server Load Balancer (SLB) instances, domain names, or security groups (open ports).

  • Simple configurations

After you specify a protection group, Cloud Backup automatically creates the resources required for the drill environment, including the VPCs, vSwitch mappings, and security group mappings.

You cannot customize the name prefixes of the ECS instances that are newly created after the drill is performed.

Create Custom Drill Environment

This method is suitable for scenarios where your services interact with other networks and additional network configurations are required for your service verification. The drill environment can be retained after it is configured.

Note

For example, if multiple ECS instances provide services over SLB, you must configure SLB instances, domain names, and security groups (open ports) for your services.

  • You can customize the name prefixes of the ECS instances that are newly created after the drill is performed to quickly identify the ECS instances for fault drills.

  • You can plan and create resources for the drill environment. For example, you can manually create a drill VPC or select a VPC at the disaster recovery site, and create vSwitch mappings and security group mappings.

  • Complex configurations

You must independently specify the drill VPC and configure vSwitch mappings and security group mappings. The drill may fail due to instance IP address conflicts.

Automatically create and start a drill environment

  1. Log on to the Cloud Backup console.

  2. In the left-side navigation pane, choose Disaster Recovery > ECS Disaster Recovery.

  3. If you are not using EBS Async Replication, click Switch to EBS Async Replication.image

  4. On the Site Pairs tab, click the site pair and then click the Fault Drill tab.

    Note

    You can also go to the Protection Group tab, select a protection group, and click Fault Drill in the Actions column.

  5. Click Automatically Create and Start Drill Environment.

  6. In the Start Drill dialog box, select a protection group from the Protection Group drop-down list and click Next.image.png

  7. Preview the resources of the protection group and click OK.image.png

    Note
    • If the type and operating system of the ECS instance at the disaster recovery site do not meet your requirements, you can select a proper instance type and operating system based on the ECS instance type for the production site, the operating system, and the system prompt. You can change the instance type and operating system by performing the Change Instance Type and Modify Operating System operations. If the instance family and operating system do not meet your requirements, submit a ticket to contact Alibaba Cloud technical support.

    • Before you enable replication for disaster recovery, you can also perform the Modify User Data and Modify Disaster Recovery IP operations.

    • If Abnormal is displayed in the IP Address column, the IP address is already in use. In this case, you must remove the original drill ECS instance or change the vSwitch mapping in the network configuration.

  8. In the Confirm to Start Drill dialog box, click OK to start the drill.image.png

    Important
    • Cloud Backup suspends the replication for the protection group, and creates new drill disks based on the last recovery point. In most cases, replication of the protection group is automatically resumed in 5 minutes. After the drill disks are created, Cloud Backup creates a drill ECS instance at the disaster recovery site. When the protection group is in the drill state, you can verify your services.

    • During a protection group drill, the ECS instance at the disaster recovery site is automatically started and the ECS instance at the production site is not automatically stopped. We recommend that you evaluate your services and isolate production traffic by using security groups and network isolation to prevent service risks.

    The status of the drill protection group changes to Initializing, Drill in Progress, and Drill Group Created in sequence.

    image.png

  9. After the drill ECS instance is started, verify your services.

    image.png

  10. Clear the drill environment.

    1. Delete the fault drill group.

      If you delete the fault drill group, resources that are created during the drill, such as the ECS instances, disks, elastic network interfaces (ENIs), snapshots, and images, are also deleted.

      To delete the fault drill group, click Delete Fault Drill Group in the Actions column.

      Note

      You can also go to the Protection Group tab and delete the specified fault drill group in the Actions column. You can delete multiple fault drill groups at a time.

    2. Delete the drill environment.

      In the Drill Environment section, click Delete.

Create a custom drill environment

  1. Log on to the Cloud Backup console.

  2. In the left-side navigation pane, choose Disaster Recovery > ECS Disaster Recovery.

  3. If you are not using EBS Async Replication, click Switch to EBS Async Replication.image

  4. On the Site Pairs tab, click the site pair and then click the Fault Drill tab.

    Note

    You can also go to the Protection Group tab, select a protection group, and click Fault Drill in the Actions column.

  5. Click Create Custom Drill Environment.

  6. In the Create Drill Environment dialog box, set the prefix of the name of the newly created ECS instance after the fault drill is performed, select Drill VPC from the drop-down list, and then click OK.image.png

    Note
    • The value of ECS Instance Prefix is the prefix of the name of the newly created ECS instance. For example, if the name of the ECS instance used for the fault drill is ecse and the value of ECS Instance Prefix is Drill_test_20230925_, the name of the newly created ECS instance is Drill_test_20230925_ecs.

    • You must create the VPC for the drill environment at the disaster recovery site in advance.

  7. Configure the network of the drill environment.image.png

    1. In the Drill Environment section, click Details next to the Drill Network Configuration parameter.

    2. In the Drill Network Configuration dialog box, add the vSwitch mapping and security group mapping.image.png

  8. In the Drill Environment section, click Start Drill to start a drill.image.png

  9. In the Start Drill dialog box, select a protection group from the Protection Group drop-down list and click Next.image.png

  10. Preview the resources of the protection group and click OK.image.png

    Note
    • If the type and operating system of the ECS instance at the disaster recovery site do not meet your requirements, you can select a proper instance type and operating system based on the ECS instance type for the production site, the operating system, and the system prompt. You can change the instance type and operating system by performing the Change Instance Type and Modify Operating System operations. If the instance family and operating system do not meet your requirements, submit a ticket to contact Alibaba Cloud technical support.

    • Before you enable replication for disaster recovery, you can also perform the Modify User Data and Modify Disaster Recovery IP operations.

    • If Abnormal is displayed in the IP Address column, the IP address is already in use. In this case, you must remove the original drill ECS instance or change the vSwitch mapping in the network configuration.

  11. In the Confirm to Start Drill dialog box, click OK to start the drill.image.png

    Important
    • Cloud Backup suspends the replication for the protection group, and creates new drill disks based on the last recovery point. In most cases, replication of the protection group is automatically resumed in 5 minutes. After the drill disks are created, Cloud Backup creates a drill ECS instance at the disaster recovery site. When the protection group is in the drill state, you can verify your services.

    • During a protection group drill, the ECS instance at the disaster recovery site is automatically started and the ECS instance at the production site is not automatically stopped. We recommend that you evaluate your services and isolate production traffic by using security groups and network isolation to prevent service risks.

    The status of the drill protection group changes to Initializing, Drill in Progress, and Drill Group Created in sequence.

    image.png

  12. After the drill ECS instance is started, verify your services.image.png

  13. Clear the drill environment.

    1. Delete the fault drill group.

      If you delete the fault drill group, resources that are created during the drill, such as the ECS instances, disks, ENIs, snapshots, and images, are also deleted.

      To delete the fault drill group, click Delete Fault Drill Group in the Actions column.

      Note

      You can also go to the Protection Group tab and delete the specified fault drill group in the Actions column. You can delete multiple fault drill groups at a time.

    2. Delete the drill environment.

      In the Drill Environment section, click Delete.