All Products
Search
Document Center

Well-Architected Framework:Best Practices for Storage Data Disaster Recovery

Last Updated:Oct 27, 2023

Best Practices for Storage Disaster Recovery

Redundant Storage Technologies

Alibaba Cloud Object Storage Service (OSS) provides two types of storage redundancy: local redundancy and intra-city redundancy. They cover data redundancy mechanisms from single availability zones to multi-availability zones to ensure data durability and availability. Local redundancy storage adopts data redundancy storage mechanism within a single availability zone (AZ) and stores user data redundantly on multiple devices in multiple facilities within the same AZ. Local redundancy storage ensures data durability and availability in case of hardware failures. Intra-city redundancy storage adopts data redundancy storage mechanism within multiple availability zones (AZs) and stores user data redundantly in multiple AZs within the same region. Intra-city redundancy storage can ensure normal access to data even when one availability zone is unavailable.

Cross-Region Replication

Alibaba Cloud OSS provides the Cross-Region Replication (CRR) feature, which automatically and asynchronously replicates files (objects) from one OSS bucket in a different data center (region) to a target bucket in another region, achieving cross-region disaster recovery. If users have extremely high requirements for data security and availability, they can maintain a copy of all written data in another data center to prepare for a major disaster (such as an earthquake or tsunami) that could cause the destruction of one OSS data center. Block Storage Cloud Disks have the ability to asynchronously replicate data from one cloud disk to another cloud disk in a different region or another availability zone within the same region. This achieves data disaster backup for storing data. With this feature, users can establish disaster recovery capabilities for critical businesses to protect database data and improve business continuity.

Version Control

Version control is a data protection feature at the level of storage buckets. After enabling version control, data overwrite and delete operations are saved as historical versions. If an object is accidentally overwritten or deleted, it can be restored to any historical version in the bucket. When data is deleted from OSS and needs to be recovered, version control can be used to recover the deleted data.

Scheduled Backup

Object Storage Service (OSS) data can be regularly backed up to Alibaba Cloud Cloud Backup using the scheduled backup feature. If an object is accidentally lost, it can be recovered using Cloud Backup. File Storage NAS is seamlessly integrated with Cloud Backup. When backing up General NAS, Cloud Backup does not take file system snapshots, but instead uses an efficient file system scanning mechanism. By configuring backup policies to create multiple backup copies of data, files can be promptly restored in the event of data loss or damage.

Consistent Replication Data Verification

The consistency replica groups of block storage cloud disks can centrally manage and operate the asynchronous replication of multiple cloud disks in disaster recovery scenarios across multiple cloud disks. At the same time, they can ensure that the data in the same replica group can be restored to the same point in time, enabling instance-level or multi-instance-level disaster recovery protection in disaster scenarios.

Best Practices for Database Disaster Recovery

Database Backup

Alibaba Cloud databases have the ability to back up and recover databases. Except for a few products that require manual enabling of backup and recovery capabilities for massive data (such as ClickHouse and Lindorm), most common database products have automatic backup enabled by default. After backup, the database instance can be restored to the availability zone in the same or a different region. Additionally, Alibaba Cloud provides Database Backup Service (DBS) for customized management of backup and recovery strategies, satisfying the basic requirements of database disaster recovery.

Disk Redundancy

Alibaba Cloud databases cloud disk version utilizes the storage capability of Alibaba Cloud cloud disks. Data reliability is guaranteed by using multi-replica redundancy. High availability versions of databases also have the redundancy capability of the primary and secondary nodes.

Intra-city Disaster Recovery

Apart from basic versions of only one availability zone, Alibaba Cloud database services have high availability with primary and secondary nodes. Data synchronization is achieved through data replication between the primary and secondary nodes in real-time. The background control system can detect node exceptions in near real-time and trigger high availability switchover based on the detected exceptions. Users can choose multi-availability zone deployment to achieve intra-city disaster recovery for their database products.

Inter-city Disaster Recovery and Global Active-Active

Inter-city disaster recovery can be achieved by backing up and recovering databases in different regions. However, this approach has poor timeliness. The cost of long-distance transmission is already low enough to support real-time data transmission for large volumes of data. Therefore, real-time transmission methods are now the main approach for inter-city disaster recovery. Alibaba Cloud Data Transmission Service (DTS) can support real-time synchronization of mainstream relational databases (such as Alibaba Cloud RDS and PolarDB). By using the low-latency internal network for efficient transmission across regions, stable database disaster recovery capabilities between different regions can be achieved. Some services natively support bidirectional data synchronization. Combined with multi-write designs for business, cost-effective disaster recovery construction can be achieved. Data synchronization products also support Redis, MongoDB, PolarDB-X, and other products, while data migration products support even more products such as DB2, Teradata, and HBase. In addition, PolarDB's Global Database Network (GDN) achieves data synchronization between multiple clusters in different regions within the same country. GDN provides inter-city disaster recovery capabilities.

Flashback Query

In addition to real-time data synchronization for disaster recovery scenarios, disaster recovery construction also needs to consider abnormal handling caused by human factors such as mistakenly deleted data. In traditional methods, it is only possible to check and handle by restoring backups to pre-abnormal operation. Alibaba Cloud PolarDB has the flashback query capability, which allows users to quickly locate anormaly based on a point-in-time flashback query after an abnormal operation, and quickly restore deleted data, greatly improving fault recovery efficiency.