Tair (Redis OSS-compatible) is a high-performance key-value database service that you can use to store large volumes of important data in various scenarios. This topic describes the disaster recovery solutions provided by Tair (Redis OSS-compatible).
Evolution of disaster recovery solutions
Instances may fail due to several reasons, such as device or power failures in data centers. In this case, disaster recovery can help ensure data consistency and service availability.
Disaster recovery solution | Protection level | Description |
★★★☆☆ | The master and replica nodes are deployed on different machines in the same zone. If the master node fails, the high availability (HA) system performs a failover to prevent service interruption caused by a single point of failure (SPOF). | |
★★★★☆ | The master and replica nodes are deployed in two different zones of the same region. If the zone in which the master node resides is disconnected due to factors such as a power or network failure, the HA system performs a failover to ensure continuous availability of the entire instance. | |
★★★★★ | In the architecture of Global Distributed Cache, a distributed instance consists of multiple child instances that synchronize data in real time by using synchronization channels. The channel manager monitors the health status of child instances and handles exceptions that occur on child instances, such as a switchover between the primary and secondary databases. Global Distributed Cache is suitable for scenarios such as geo-disaster recovery, active geo-redundancy, nearby application access, and load balancing. |
Single-zone HA solution
All instances support a single-zone HA architecture. The HA system monitors the health status of master and replica nodes and performs failovers to prevent service interruption caused by SPOFs.
Deployment architecture | Description |
A standard master-replica instance runs in a master-replica architecture. If the HA system detects a failure on the master node, the system switches the workloads from the master node to the replica node and the replica node assumes the role of the master node. After recovery, the original master node works as the replica node. | |
On a cluster multi-replica instance, data is stored on data shards. Each data shard consists of a master node and multiple replica nodes. The master and replica nodes are deployed on different machines in an HA architecture. If the master node fails, the HA system performs a master-replica switchover to ensure high service availability. | |
|
Zone-disaster recovery solution
Tair standard instances and cluster instances support zone-disaster recovery across two data centers. If your workloads are deployed in a single region and require disaster recovery, you can select the zones that support zone-disaster recovery when you create a Tair instance. For more information about how to create a Tair instance, see Step 1: Create an instance.
After you create a zone-disaster recovery instance, a replica node that has the same specifications as the master node is deployed in a different zone from the master node. The master node synchronizes data to the replica node over a dedicated channel.
If a power failure or network error occurs on the master node, the replica node assumes the role of the master node. The system calls an API operation on the configuration server to update routing information for proxy nodes. Tair also provides an optimized Redis synchronization mechanism. Similar to global transaction identifiers (GTIDs) of MySQL, Tair uses global operation identifiers (OpIDs) to indicate synchronization offsets and runs lock-free threads in the background to search for OpIDs. The system asynchronously synchronizes append-only file (AOF) binary logs (binlogs) from the master node to the replica node. You can throttle synchronization to ensure optimal service performance.
Cross-region disaster recovery solution
As your business expands into multiple regions, cross-region and long-distance access can result in high latency and deteriorate user experience. The Global Distributed Cache feature of Alibaba Cloud Tair can help reduce the high latency caused by cross-region access. The feature has the following benefits:
Allows you to directly create child instances or specify the child instances that must be synchronized without the need to build redundancy into your application. This significantly reduces the complexity of application design and allows you to focus on application development.
Provides the geo-replication capability to implement geo-disaster recovery or active geo-redundancy.
This feature applies to cross-region data synchronization scenarios and global business deployment in industries such as multimedia, gaming, and e-commerce. For more information, see Overview of Global Distributed Cache for Tair.