As a high-performing in-memory database, Tair is often used to store great amounts of important data. To ensure data security, Tair provides a variety of disaster recovery solutions.
Tair disaster recovery solutions
A Tair instance may fail for unexpected reasons, such as a device failure or a power failure in a data center. In this case, disaster recovery can help ensure data consistency and service availability. Tair provides multiple disaster recovery solutions for different use cases.
Disaster recovery solution | Protection level | Description |
★★★☆☆ | The master node and replica node are deployed on different devices within the same zone. If the master node fails, the High Availability (HA) system performs a failover to prevent service interruption caused by a single point of failure (SPOF). | |
★★★★☆ | The master node and replica node are deployed in two different zones within the same region. If the zone in which the master node resides is disconnected due to force majeure factors such as a power or network failure, the high-availability system performs a failover to ensure continuous availability of the entire instance. | |
★★★★★ | In the architecture of Global Distributed Cache, a distributed instance consists of multiple child instances that synchronize data among each other in real time by using synchronization channels. The channel manager monitors the health status of child instances and handles exceptions that occur on child instances, such as a switchover between the primary and secondary databases. Global Distributed Cache is suitable for scenarios such as geo-disaster recovery, active geo-redundancy, nearby application access, and load balancing. |
Single-zone HA solution
Tair instances support a single-zone HA architecture. The HA system monitors the health status of the master node and replica node and performs failovers to prevent service interruption caused by SPOFs.
Architecture | Description |
A standard master-replica instance runs in a master-replica architecture. If the HA system detects a failure on the master node, the system switches the workloads from the master node to the replica node and the replica node takes over the role of the master node. The original master node works as the replica node after recovery. | |
In a master-replica cluster instance, data is stored on data shards. Each data shard consists of a master node and a replica node. The master node and replica node are deployed on different machines in an HA architecture. If the master node fails, the HA system performs a failover to ensure high service availability. For more information about the components of a cluster instance, see Cluster master-replica instances. | |
For more information, see Read/write splitting architecture. |
Zone-disaster recovery solution
Tair standard instances and cluster instances support zone-disaster recovery across two data centers within a single region. If your workloads are deployed in a single region and have high requirements for disaster recovery, you can select the zones that support zone-disaster recovery when you create a Tair instance. For more information about how to create a Tair instance, see Create an instance.
When you create a zone-disaster recovery instance, the master node and replica node with the same specifications are deployed in different zones. The master node synchronizes data to the replica node through a dedicated channel.
If a power failure or network error occurs on the master node, the replica node takes over the role of the master node. The system calls an API operation on the configuration server to update routing information for proxy servers. In addition, Tair provides an optimized Redis synchronization mechanism. Similar to the global transaction identifiers (GTIDs) of MySQL, Tair uses global operation identifiers (OpIDs) to indicate synchronization offsets and runs lock-free threads in the background to search for OpIDs. The system asynchronously synchronizes AOF binary logs (binlogs) from the master node to the replica node. You can throttle synchronization to ensure the service performance of Tair.
Cross-region disaster recovery solution
If your business is growing rapidly into an increasing number of regions, cross-region and long-distance access can result in high latency and deteriorate user experience. The Global Distributed Cache feature of Alibaba Cloud can help you reduce the high latency caused by cross-region access. Global Distributed Cache for Tair has the following benefits:
You can directly create child instances or specify the child instances that need to be synchronized without implementing redundancy in your application logic. This greatly reduces the complexity of application design and allows you to focus on application development.
The geo-replication capability is provided for you to implement geo-disaster recovery or active geo-redundancy.
This feature applies to cross-region data synchronization scenarios and global business deployment in industries such as multimedia, gaming, and e-commerce. For more information, see Overview.