Zone-disaster recovery provided by MaxCompute is used to overcome unexpected failure scenarios, such as carrier network failures, data center power outages, data center facility failures, and cluster failures. You can enable multi-zone storage disaster recovery and multi-zone high-availability computing to significantly reduce business downtime and meet business assurance requirements and industry compliance requirements.
Feature introduction
To use this feature, click the application link and fill in the application form for trial use of new features to apply for enabling storage disaster recovery. For more information about how to apply for trial use of new features, see Apply for trial use of new features.
MaxCompute zone-disaster recovery extends the availability of data storage and computing services from a single zone to three zones in the same region and utilizes physical isolation and low-latency network connectivity of the three zones to provide real-time data synchronization and fault isolation capabilities across data centers. This prevents business systems from being interrupted due to failures in a single data center and improves the risk resistance capability of your business.
MaxCompute zone-disaster recovery includes multi-zone storage disaster recovery and multi-zone high-availability computing. The following content describes the details.
Multi-zone storage disaster recovery: supports redundant storage of existing data in a project across three zones. Incremental data can be simultaneously written to the storage services in three zones. Before the multi-zone storage disaster recovery capability is introduced, local storage in a single zone is used for existing data in a project. When the system encounters zone-level failures, multi-zone storage disaster recovery can ensure that data read and write services are not interrupted and data is not lost. This can help you achieve a zero recovery point objective (RPO) for data.
Multi-zone high-availability computing: binds multi-zone high-availability computing resources to a project for which storage disaster recovery is enabled to implement zone-disaster recovery for data storage and computing. You can reserve sufficient multi-zone high-availability computing resources in multiple zones. If a zone-level failure occurs, computing resources are automatically switched from the faulty zone to a zone that can normally provide services.
Disaster recovery guidelines
After zone-disaster recovery is enabled, the following recovery operations are performed when a zone-level failure occurs:
You are notified of the failure information from Alibaba Cloud MaxCompute.
The server immediately allocates computing resources from the zone that can normally provide services. The system checks the integrity and availability of data such as tables, partitions, and permissions in the project.
If a job that is submitted by a client fails to run, you must submit the job again. You do not need to modify the configurations for accessing MaxCompute, such as the endpoint, authentication information, project name, and quota name.
After the job resumes running, you must continue to monitor the upper-layer business operations to ensure that the business is properly running.
Scenarios
Finance
Financial services provided by banks require constant analysis and processing of business transaction data and need to prevent business interruptions caused by data center failures.
Critical infrastructure
Data analysis systems in industries such as power supply, water utility, and transportation need to prevent the interruptions of key information services that are critical for livelihoods of people when data centers fail.
Benefits
Redundant data backup
Reduced business downtime
Compliance with industry standards
Better upper-layer business customer experience
Limits
Zone-disaster recovery is supported only in the China (Shenzhen), China East 2 Finance, and China (Hong Kong) regions.
Only metadata, user permissions, package-based permissions, common tables, and Delta tables in projects support storage disaster recovery. Resources do not support storage disaster recovery.
Billing
After storage disaster recovery is implemented, MaxCompute is billed based on the storage disaster recovery mode. For more information about the billing of storage disaster recovery, see Storage disaster recovery fee (pay-as-you-go).
To implement multi-zone high-availability computing, you must purchase multi-zone high-availability computing resources. For more information about the billing of multi-zone high-availability computing resources, see Computing fees (subscription).
Instructions
To implement zone-disaster recovery for storage and computing, you must enable multi-zone storage disaster recovery and multi-zone high-availability computing.
Enable multi-zone storage disaster recovery
Log on to the MaxCompute console, and select a region in the top navigation bar.
In the left-side navigation pane, choose
.On the Intra-zone Disaster Recovery Management page, click Enable Intra-zone Disaster Recovery.
In the dialog box that appears, select the MaxCompute project for which you want to implement storage disaster recovery and select the check box.
Click OK.
After you enable multi-zone storage disaster recovery for a project, the system starts to prepare for storage disaster recovery for the project data. In the preparation process, the project data that is stored in a single zone is migrated to the other two zones for multi-zone storage disaster recovery. The data preparation process takes about two days. After the data preparation is complete, the project has the storage disaster recovery capability.
NoteDuring the preparation for storage disaster recovery, the running of jobs is not affected and the business is not interrupted.
During the preparation for storage disaster recovery, if a task is writing data to historical table partitions in streaming mode, the preparation task does not start but waits until the streaming write operations are complete and submitted. We recommend that you write data to new partitions on a daily or weekly basis to ensure that storage disaster recovery is implemented for all tables and partitions.
The local backup data and time travel query data that are generated before storage disaster recovery is enabled are retained in the original zone for local storage. The local backup data and time travel query data that are generated after storage disaster recovery is enabled are distributed to three zones for redundant storage.
Enable multi-zone high-availability computing
To enable multi-zone high-availability computing, you must purchase multi-zone high-availability computing resources and configure the multi-zone high-availability computing resources in the default computing quota of the desired project.
Log on to the MaxCompute console, and select a region in the top navigation bar.
In the left-side navigation pane, choose
.On the Quotas page, click New Quota.
On the resource purchase page, configure the parameters. The following table describes the key parameters.
Parameter
Description
Specifications Type
Select Multi-zone HA Computing Resource.
Multi-zone HA CU
Select the number of compute units (CUs) that you want to purchase.
NoteYou must purchase at least 50 CUs. If existing CUs are available, you can purchase new CUs. The number of new CUs must be an integer multiple of 1.
Click Buy Now. Read the terms of service and then complete the payment.
After multi-zone high-availability computing resources are purchased, you can view the created multi-zone high-availability computing resources on the Quotas page.
Configure the multi-zone high-availability computing resources in the default computing quota of the desired project.
In the left-side navigation pane, choose
.Find the desired project, and click Manage in the Actions column.
On the Parameter Configuration tab, click Edit in the Basic Information section.
Set the Default Quota parameter to the multi-zone high-availability computing resources, and click Submit.
Related operations
Disable disaster recovery
To disable disaster recovery for a project, you can find the desired project on the Intra-zone Disaster Recovery Management page, and click Disable Disaster Recovery in the Actions column. Then, enter the project name as prompted, and click OK.
After disaster recovery is disabled, project data is redistributed to a single zone for local storage.
A project no longer has the disaster recovery capability immediately after you disable disaster recovery. We recommend that you do not disable disaster recovery unless necessary.