×
Community Blog PolarDB-X Disaster Recovery Capabilities: Cross-AZ, Cross-Region, and Cross-Cloud

PolarDB-X Disaster Recovery Capabilities: Cross-AZ, Cross-Region, and Cross-Cloud

The article discusses the evolution and expansion of PolarDB-X and the introduction of its Global Database Network (GDN) product feature.

By Ziyang

The development of things generally follows a pattern from simple to complex, from micro to macro, from local to global, and from specific to general. This is no different for databases. As a distributed database, PolarDB-X has evolved over more than a decade, following the same pattern to continuously develop. It is now a significant part of the distributed database field.

In this article, we will explore the dimension of space and discuss how PolarDB-X has continuously developed and expanded its cross-space deployment capabilities. We will also cover its high availability and disaster recovery capabilities across different spatial scopes. Additionally, we will introduce GDN (Global Database Network), the latest product feature of PolarDB-X.

The Space Philosophy of PolarDB-X

Before delving into the high availability and disaster recovery capabilities, let's first discuss the space philosophy of PolarDB-X In database systems, by increasing space usage, transforming space forms, adjusting the spatial distribution, or extending the spatial scope, many benefits can be achieved:

  1. Space for Time: The most typical example is database indexing. By increasing the space occupied by indexes, query times can be reduced. In PolarDB-X, there are various types of indexes: Local Indexes at the partition level, GSI (Global Secondary Index) at the table level, and CCI (Columnar Clustered Index) for analytical query scenarios.
  2. Space for Isolation: The most typical example is read-write splitting. By adding redundant data through data synchronization, read and write operations can be isolated. PolarDB-X naturally supports a read-write separation architecture, allowing the addition of one or more read-only instances to the primary instance. Write traffic is directed to the primary instance, while read traffic is directed to the read-only instances, reducing the load on the primary instance and ensuring fast response times for read operations.
  3. Space for Cost Reduction: Data can be accessed normally. The most typical example is the separation of hot and cold data. By transforming the storage form of cold data, storage costs can be reduced. PolarDB-X naturally supports archiving cold data, moving it from online tables to low-cost storage media (such as OSS), with format conversion and compression. This significantly reduces costs while still allowing real-time access to the cold data.
  4. Space for Security: The most typical examples are backups and encryption. By retaining full and incremental backup data, rapid data recovery can be achieved. Data encryption provides better security, though it increases data storage space. PolarDB-X supports multi-dimensional backup and recovery capabilities, including whole-instance consistent backup and recovery based on PITR, SQL flashback based on binlogs, flashback queries based on undo log, snapshot queries and recovery based on columnar storage, and TDE data encryption and full-encryption security.
  5. Space for ONLINE: The most typical example is ONLINE DDL. By allocating temporary space and using a series of algorithms, DDL changes can be made without locking, keeping the business always online. Based on the ideas from the paper "Online, Asynchronous Schema Change in F1," PolarDB-X, combined with distributed transaction capabilities, supports various types of ONLINE DDL changes through data copying, data backfilling, and smooth metadata switching. Notably, the OMC (Online Modify Column) capability offers stronger online capabilities compared with native MySQL.
  6. Space for Availability: The most typical example is data replicas. By replicating data across multiple copies and using a consensus protocol to ensure data consistency, high availability with RPO=0 can be achieved. The data nodes of PolarDB-X use the self-developed X-Paxos protocol to achieve strong data consistency and high availability and ensure RTO<20s, meeting financial-grade data consistency requirements.

PolarDB-X Disaster Recovery Capabilities

When it refers to high availability and disaster recovery, we are largely discussing data replication or data copy technologies. Whether using standard asynchronous master-slave replication, semi-synchronous master-slave replication, or strong synchronous replication based on Paxos or Raft, these methods extend the spatial distribution of data to achieve high availability and disaster recovery. Therefore, they have a relative spatial scope, and different disaster recovery solutions can be applied across racks, data centers (the data center is referred to as AZ in this article), regions, and clouds. Generally, the larger the spatial coverage, the stronger the disaster recovery capability, but the complexity also increases. Currently, the common high-availability/disaster recovery architectures in the database industry include:

  1. Single data center: Within a single data center, asynchronous replication or strong synchronous replication based on Paxos/Raft can be performed. This setup can handle failures of a single data copy or failures at the rack level.
  2. Multiple data centers in the same region: Typically, this involves three data centers within the same region, with one data copy per data center. Replication can be either asynchronous or strong synchronous based on Paxos/Raft. This architecture can handle disaster recovery at the data center level but is unable to meet multi-region multi-active requirements.
  3. Three data centers across two regions: A two-region replication architecture based on Paxos/Raft, consisting of a primary region and a remote disaster recovery region. The primary region handles most of the traffic, while the remote disaster recovery region is primarily for disaster recovery and does not provide multi-active services under normal circumstances. In this architecture, the primary region has two data centers, each with two copies, and the disaster recovery region has one copy. Majority-based strong synchronization is achieved within the primary region. Compared with the architecture of multiple data centers in the same region based on Paxos/Raft, this architecture offers similar performance and also geo-disaster recovery capability. When the primary region becomes unavailable, the system can switch to the disaster recovery region, but it cannot guarantee RPO=0.
  4. Five data centers across three regions: A multi-region replication architecture based on Paxos/Raft, spanning three regions. Region A has two data centers, Region B has two data centers, and Region C has one data center, with each data center having one data copy. You can designate Region A or B as the primary region. In this architecture, majority-based strong synchronization requires cross-regional communication, which can significantly impact performance. However, the advantage is that a disaster in any one region will not affect the system's availability, and it can ensure RPO=0.
  5. Global Database: An architecture for building globally distributed multi-active systems, where write operations occur in a central location, and read capabilities are provided locally in each region. This can support one-primary-multiple-secondary, multiple-primary-multiple-secondary, and other deployment models. The key feature of this architecture is its flexibility and agility, allowing the creation of single-active, dual-active, or multi-active disaster recovery configurations as needed. Generally, this uses asynchronous replication. PolarDB-X's GDN is an example of this architecture, which will be elaborated later.
  6. Geo-Partitioning: A partitioning architecture based on geographic attributes, providing local read-write capabilities based on user geographic attributes. This solution requires support for geographic space attributes at the database kernel level. When creating databases and tables, you can specify regional dimension information, such as the region or the primary region a table needs to cover. You can even specify different primary regions for different partitions of the same table. Compared with the previous architectures, which mainly achieve disaster recovery by adjusting the deployment of database instances, this architecture allows for more granular control. A single set of database instances can simultaneously meet the disaster recovery requirements for single AZ, multiple AZs, and multiple regions, with control down to the partition level. However, it requires a higher cost for business adaptation, and cross-regional strong synchronization can be performance-intensive.

The following table summarizes and compares the disaster recovery architectures listed above:

Architecture Scope Minimum Data Center Data Replication Advantages and Disadvantages
Single Data Center Rack / Single Replica Failure One Asynchronous Asynchronous replication with good performance but RPO>0
Synchronous Synchronous replication with slight performance loss but RPO=0 (in case of minority failure)
Multiple Data Centers in the Same Region Single data center level Two Asynchronous Asynchronous replication with good performance but RPO>0
Three Synchronous More widely used, average RT increase of approximately 1ms
Three Data Centers across Two Regions Data Center, Region Three data centers and two regions Synchronous More widely used, average RT increase of approximately 1ms
Five Data Centers across Three Regions Data Center, Region Five data centers and three regions Synchronous Higher construction costs for data centers, average RT increase of approximately 5~10ms, depending on the physical distance between regions
Geo-Partitioning Data Center, Region Three data centers and three regions Synchronous The service has an adaptation cost (the region attribute is added to the table partition), average RT increase of approximately 5~10ms, depending on the physical distance between regions
Global Database Data Center, Region Two data centers and Two regions Asynchronous More widely used, near-read and remote write, suitable for disaster recovery scenarios with more reads and fewer writes in different regions

The data nodes of PolarDB-X use a multi-replica architecture, such as three or five replicas. To ensure strong consistency among replicas (RPO=0), PolarDB-X uses the majority-based replication protocol of Paxos. Each write must be acknowledged by more than half of the nodes, so even if one node fails, the cluster can still provide services normally. In addition, PolarDB-X offers data replication capabilities that are highly compatible with the MySQL master-slave replication protocol, based on CDC Binlog. This allows for data synchronization between two independent PolarDB-X instances. Based on these fundamental capabilities, PolarDB-X supports various disaster recovery architectures, including:

Single data center (Paxos 3 Replicas): Can handle the failure of a single minority node.

1

Three data centers in the same region (Paxos 3 Replicas): Can handle the failure of a single data center.

2

Three data centers across two regions (Paxos 5 Replicas): Can handle regional failures.

3

GDN multi-active (asynchronous master-slave replication): Can handle regional failures and provide local read capabilities.

4

Detailed Explanation of PolarDB-X GDN

The disaster recovery solutions of PolarDB-X based on Paxos have been detailed in our previous articles.

Now, let's unveil the details of PolarDB-X GDN.

What is GDN?

Global Database Network (GDN) is a network that consists of multiple PolarDB-X instances that are distributed in multiple regions within the same country. Its basic form is illustrated in the following figure, where multiple PolarDB-X instances form a cross-regional high-availability disaster recovery cluster with one master and multiple slave nodes.

5

With GDN, two core objectives can be achieved:

Active Geo-redundancy

If you deploy applications in multiple regions but deploy databases only in the primary region, applications that are not deployed in the primary region must communicate with the databases that may be located in a geographically distant region. This results in high latency and poor performance. GDNs replicate data across regions at low latencies and provide cross-region read/write splitting. GDNs allow applications to read data from the nearest database. This allows databases to be accessed within seconds.

Geo-disaster Recovery

GDNs support geo-disaster recovery regardless of whether applications are deployed in the same region. If a fault occurs in the region where the primary instance is deployed, you need only to manually switch over the service to a secondary instance. The switchover between the primary and secondary instances can be completed within 120 seconds.

The primary-secondary mode is just the basic form of GDN, which requires minimal modifications on the application side to integrate with GDN. A more advanced form of GDN is the dual-primary mode or even multi-primary mode. When an application needs to build a multi-region, multi-active, and multi-write disaster recovery architecture, the common approach involves first making cellular modifications on the business side. Then, on the database side, it involves setting up bidirectional synchronization links, configuring loop prevention strategies for synchronization, and defining conflict resolution policies when write conflicts occur. With the advanced forms of GDN, combined with the partitioning capabilities of PolarDB-X, achieving multi-active and multi-write disaster recovery becomes simpler. For example:

1. PolarDB-X offers a variety of partition types: Key partitions, List partitions, Range partitions, and also supports subpartitioning by combining various partition types. Additionally, it even allows users to submit custom UDFs as partition functions. By associating the cellular routing rules with the partitioning rules of PolarDB-X's partitioned tables, the multi-write architecture becomes simpler, more flexible, and safer. We can set the aligned partitions in each region (cell) to be read-write, and the unaligned partitions to be read-only (write-prohibited). This makes traffic switching more flexible, allowing for overall switching at the cell level or partial switching of specific partitions to another cell. Furthermore, by adding write prohibition on the database side, even if there are routing errors on the application side, PolarDB-X, as the final gatekeeper, rejects illegal write operations to avoid write conflicts. As mentioned above, this is difficult to achieve in traditional scenarios with non-distributed databases and non-partitioned tables, but PolarDB-X can certainly support the traditional dual-write architecture as well. Both approaches are not contradictory.

6

2. In a multi-active and multi-write architecture, bidirectional replication links need to be configured between different regions. The biggest challenge with bidirectional replication links is preventing replication loops. PolarDB-X includes a lightweight loop prevention mechanism based on server_id, which is simple and easy to use. More details will be provided in the following section.

GDN Technical Architecture

The technical foundation of GDN is real-time data replication. PolarDB-X not only supports CDC Binlog, which is fully compatible with MySQL binlog, but also offers a data replication product called CDC Replica, which is highly compatible with MySQL Replication. Based on these two core capabilities, the GDN replication network can be quickly established.

7
GDN Technical Architecture

Note:

  1. CDC binlogs support two types of services: single-stream binlogs and multi-stream binlogs, which can exist at the same time. For more information, please refer to Log Service.
  2. In GDN scenarios, you can choose between single-stream or multi-stream replication based on actual conditions. If the single-stream replication link reaches a performance bottleneck, consider using the multi-stream replication mode. The specific bottleneck varies depending on the scenario.
  3. For a detailed introduction to Replica master-slave replication, please refer to the official documentation: Replicate Data from PolarDB-X to PolarDB-X and Replication.

Core Capabilities of GDN:

Diverse DML Replication:

DML operations trigger data changes and record the change details in binlogs in the form of events. By replaying the change events in the binlog, the modified data can be replicated to the downstream. For DML-based data replication, GDN supports various optional strategies to meet different replication needs, balancing performance and consistency.

Replication Policy Performance Transaction Consistency
TRANSACTION Moderate High 1. This policy features serial replication and ensures transactional integrity. It is suitable for scenarios with high requirements for transactional consistency, such as in the financial sector, where an RPO > 0 can be tolerated, but the integrity of transactions cannot be compromised.
2. Transactional integrity can only be guaranteed in a single-stream binlog replication scenario because multi-stream binlog, by its implementation principle, needs to break transactional integrity to achieve higher parallelism.
3. Parallel replication between transactions without data conflicts is not supported.
SERIAL Moderate Low 1. This policy features serial replication but does not guarantee transactional integrity. It is suitable for scenarios where transactional consistency is not a high priority, but there is a strong requirement for serialization, such as when there are foreign key constraints or business sequence dependencies between tables.
2. Under this policy, each change is processed in an auto-commit mode, meaning that each transaction is a single-node transaction and does not trigger a distributed transaction. Therefore, compared to the TRANSACTION replication strategy, it generally offers better performance, unless each transaction in the binlog contains only one data change, in which case the performance difference between the two strategies is minimal.
SPLIT Good Low 1. This policy features parallel replication but does not guarantee transactional integrity. It is suitable for scenarios where there are no requirements for dependencies between data, as long as eventual consistency is maintained.
2. The parallel replication mode is at the row level, meaning that data from different tables and with different primary keys will be placed into separate execution threads for full parallel writes.
3. It is the default policy for GDN.
MERGE Excellent Low 1. This policy uses techniques like change compression and batch writes for parallel replication, but it does not guarantee the type of changes or the integrity of the change transactions. It is suitable for scenarios where there are no requirements for dependencies between data, and the type of changes is not a concern, as long as eventual consistency is maintained.
2. The parallel replication mode is at the row level, and data is batched by table for full parallel and batch writes.

Integrated DDL Replication

Efficient DML replication can ensure the consistency of table content between upstream and downstream clusters in GDN. On the other hand, DDL replication is the core capability that ensures schema consistency between the primary and secondary clusters in GDN. Schema consistency between GDN's primary and secondary clusters is not limited to table structure consistency but extends to all database objects. This ensures that the secondary cluster can handle application traffic during a failover. Otherwise, even the loss of a single index could lead to severe performance issues.

PolarDB-X supports various types of database objects, including common MySQL-compatible tables, columns, indexes, views, and functions. Additionally, it supports custom object types such as TableGroup, Sequence, GSI, CCI, and Java Function, along with many custom extension syntaxes such as TTL, Locality, OMC, Local Index, Move Partition, and Alter Index.

DDL replication is a challenging capability. It is not simply a matter of extracting DDL SQL from the primary cluster's binlog and executing it on the secondary cluster. It involves multiple challenges related to compatibility, consistency, stability, and availability. PolarDB-X's built-in native replication capability encapsulates these complexities at the kernel level, providing an efficient and easy-to-use DDL replication solution. This can be summarized in the following aspects:

  1. High Compatibility: Compared with using external synchronization tools for data replication between PolarDB-X instances, integrated native replication has no SQL compatibility issues. There are no concerns about cross-product or cross-version compatibility, ensuring more reliable and stable data replication. In addition, native kernel-level replication is more convenient to identify SQL semantics and pass through additional information for targeted processing and optimization. For example, DDL that identifies partition change types adjusts the where condition of DML SQL in a timely manner, DDL that identifies locality types removes or replaces DN information in SQL with corresponding DN id from instances, and DDL that identifies column store types makes targeted SQL rewrite.
  2. Asynchronous DDL Execution: Time-consuming DDL operations can block the data replication pipeline and cause significant data delays, which is a common issue in data replication. In distributed database scenarios, performing DDLs on very large tables can take even longer, leading to more severe blocking and delay issues. In general, many types of DDL operations can be executed asynchronously, such as analyze table, alter table add index, and alter table change partition definition. PolarDB-X can implement self-contained and efficient asynchronous DDL processing at the kernel level (beta feature). It can even support two-phase DDL processing that is coordinated between the primary and secondary clusters, achieving "zero-blocking" for most DDL replications (WIP).
  3. Consistent DDL Replication: When building a GDN data replication pipeline based on multi-stream binlog, DDL replication must consider the coordination and consistency between different replication links. Each replication link must wait for other links after receiving a DDL SQL statement. Only when all links have received the DDL SQL can the DDL operation be replicated to the secondary cluster. Otherwise, this can lead to inconsistencies between DML traffic and the schema, causing anomalies or data errors. GDN provides a Distributed DDL Replication Engine that uses an efficient multi-path coordination algorithm to achieve consistent DDL replication in a distributed environment.

8

Lightweight Bidirectional Replication

PolarDB-X provides a lightweight bidirectional replication capability that is compatible with MySQL and can filter loop traffic based on server_id, as shown in the following figure:

9

The advantages of this capability can be summarized in two aspects:

High Performance

Filtering traffic directly based on the server_id in the binlog event header is more lightweight compared with external synchronization tools, which typically use transaction tables for traffic filtering. This method incurs no additional performance overhead and offers better scalability. Transaction tables are intrusive to the user's database and require DML operations on the transaction table for each synchronized transaction, leading to write amplification.

Easy to Use

To quickly set up a bidirectional replication architecture with loop traffic filtering, simply execute two change master statements on each of the two PolarDB-X instances.

Syntax:
CHANGE MASTER TO option [, option] ... [ channel_option ]

option: {
  IGNORE_SERVER_IDS = (server_id_list)
}

Example:
R1: CHANGE MASTER ... , IGNORE_SERVER_IDS = (100) , ...
R2: CHANGE MASTER ... , IGNORE_SERVER_IDS = (200) , ...

Comprehensive Conflict Detection

In data replication paths, especially in traditional multi-active, multi-write architectures where cellular rules and table partitioning rules are not aligned, conflicts may occur due to bugs or unexpected situations, such as both sides modifying the same primary key or unique key. A conflict detection mechanism is needed to mitigate these risks. PolarDB-X's data replication includes a built-in conflict detection mechanism that supports the following types of conflicts:

Uniqueness conflicts caused by INSERT operations

When an INSERT statement violates a uniqueness constraint during synchronization, such as when two PolarDB-X instances simultaneously or nearly simultaneously insert a record with the same primary key value, the INSERT will fail on the other end because a record with the same primary key already exists.

Inconsistent records caused by UPDATE operations

If the records to be updated do not exist in the destination instance, PolarDB-X can automatically convert the UPDATE operation into an INSERT operation (not enabled by default). However, uniqueness conflicts may occur.

The primary keys or unique keys of the records to be inserted may conflict with those of existing records in the destination instance.

Non-existent records to be deleted

The records to be deleted do not exist in the destination instance. In this case, it ignores the DELETE operation regardless of the conflict resolution policy that you specify.

For data conflicts, PolarDB-X provides configuration parameters to control the resolution strategy. These can be adjusted using the CHANGE MASTER command during the creation or maintenance of the replication path.

CHANGE MASTER TO option [, option] ... [ channel_option ]

option: {
   CONFLICT_STRATEGY = {OVERWRITE|INTERRUPT|IGNORE}
}

OVERWRITE: Replaces the conflicting data with a REPLACE INTO statement, which is the default resolution strategy.

INTERRUPT: Interrupts the replication path.

IGNORE: Ignores the conflict.

It is important to note that conflict resolution is a best-effort mechanism. In a multi-active, multi-write architecture, it is challenging to completely prevent data conflicts due to factors such as differences in system time between the two ends of the synchronization and the presence of synchronization delays. Therefore, it is crucial to make corresponding modifications at the business level to ensure that records with the same primary key, business key, or unique key are only updated on one of the bidirectionally synchronized PolarDB-X instances. Aligning cellular rules with table partitioning rules and relying on the database kernel to enforce partition write prohibitions is the simplest and most effective way to avoid data conflicts. In addition, native GDN capabilities make this even simpler.

Snapshot-level Data Reconciliation

GDN provides native data reconciliation capabilities with the following advantages of its built-in data verification:

1. Support for multiple verification modes:

Direct Verification: This mode verifies data based on the latest data from the primary and secondary instances. In scenarios with frequent incremental changes, this method may introduce errors and require multiple rounds of review or manual verification.

Snapshot Verification: PolarDB-X supports TSO-based snapshot queries. The global binlog ensures linearly consistent sync points. GDN leverages these capabilities to build a TSO-mapping that ensures data consistency between the primary and secondary clusters. It then performs data verification based on consistent snapshots, which is error-free and more user-friendly.

10

2. Low resource consumption and fast verification:

GDN's data verification uses a combination of checksum verification and detailed row-by-row comparison. First, it divides the data into segments through a sampling algorithm and compares the checksums of each segment. If the checksums do not match, it switches to a detailed row-by-row comparison. Compared with traditional one-by-one row comparisons, this method can achieve up to 10 times better performance and lower resource consumption.

For detailed usage instructions, please refer to the official documentation: Verify Data in the Primary and Secondary Instances - Cloud-Native Database PolarDB (PolarDB) - Alibaba Cloud Help Center.

Easy Data Correction

Data correction can be performed in two scenarios: Regular online data correction and data correction after a failover. For the former, GDN provides a data correction solution based on shared locks, primarily suitable for unidirectional master-slave replication. The implementation is straightforward: Based on the results of data verification, obtain the diff data; group the diff data into batches; for each batch, acquire a shared lock on the data in the secondary instance; retrieve the latest values of the data from the primary instance; and write the updated values to the secondary instance to complete the data correction (Beta version).

10_

For the latter, if a failover is performed using an emergency forced switch, it is possible that some binlogs from the primary instance have not been replicated to the secondary instance. This means that after the switch, the new primary instance may be missing some data. If the original primary instance cannot be recovered, the corresponding data will ultimately be lost. However, if the original primary instance is eventually restored after the disaster recovery, there is an opportunity to compensate the missing binlog data to the new primary instance.

How to compensate? Do you need to identify and manually handle each entry one by one? You can do it this way, of course. But it does not have to be that cumbersome. PolarDB-X provides a full mirror matching capability that makes data compensation easier. You can enable full mirror matching on the original replication link before the switch and then consume the remaining binlog data for automatic compensation:

For Insert events in the binlog, when replaying to the new primary instance, use the INSERT IGNORE strategy.

For Delete events in the binlog, when replaying to the primary instance, append all column values to the WHERE condition. If they do not match, the event is automatically skipped.

For Update events in the binlog, when replaying to the primary instance, append all column values to the WHERE condition. If they do not match, the event is automatically skipped.

Of course, this method is not foolproof. If certain columns of a record have been changed back and forth, the compensated data might not meet expectations. However, such cases are rare, and in most scenarios, the full mirror matching capability can achieve automatic compensation.

Future versions of GDN are planning more advanced automatic compensation capabilities. Building on the full mirror matching feature, these capabilities will combine the binlogs from the new primary instance to accurately determine if a particular piece of data has been modified after the failover. This will help in precisely identifying whether there are double write conflicts between the data to be compensated and the target data.

You can enable or disable the full mirror matching feature with a single SQL command.

CHANGE MASTER TO option [, option] ... [ channel_option ]

option: {
   COMPARE_ALL = {true|false}
}

Fast Failover

For the public cloud offering, the PolarDB-X console provides a visualized interface for switching GDN primary and secondary instances. There are two types of switching methods:

Regular Switch (RPO=0): This is a planned switch used when all instances within the cluster and the data replication links between them are operating normally. When performing a planned switch, all instances in the cluster will be set to read-only, and the duration of this read-only period depends on the time it takes to eliminate data replication lag. A planned switch does not result in data loss.

Forced Switch (RPO>=0): When the primary instance experiences an unrecoverable failure within a short period, a forced emergency switch can be considered to prioritize business continuity. During an emergency switch, the selected secondary instance will immediately be promoted to the primary instance. Any uncommitted transactions from the previous primary instance that have not been replicated to the secondary instance will be lost. The amount of data loss depends on the replication lag.

PolarDB-X also provides convenient atomic APIs. In scenarios where you are building your own GDN based on PolarDB-X, you can combine and call these atomic APIs to create your own switch task flow.

# To disable/enable writes for the entire PolarDB-X instance
alter instance set read_only = {true|false}

# show full master status can be used to view the latest TSO of the binlog on the primary instance 
mysql> show full master status ;
+---------------+-----------+--------------------------------------------------------+-------------+-----------+-----------+-------------+-------------+-------------+--------------+------------+---------+
| FILE          | POSITION  | LASTTSO                                                | DELAYTIMEMS | AVGREVEPS | AVGREVBPS | AVGWRITEEPS | AVGWRITEBPS | AVGWRITETPS | AVGUPLOADBPS | AVGDUMPBPS | EXTINFO |
+---------------+-----------+--------------------------------------------------------+-------------+-----------+-----------+-------------+-------------+-------------+--------------+------------+---------+
| binlog.006026 | 221137311 | 721745002099847993617485152365471498250000000000000000 |         211 |         0 |         0 |           0 |           0 |           0 |            0 |          0 |         |
+---------------+-----------+--------------------------------------------------------+-------------+-----------+-----------+-------------+-------------+-------------+--------------+------------+---------+

# show slave status can be used to view the latest TSO consumed by the secondary instance
mysql> show slave status \G;
*************************** 1. row ***************************
                  ...,...
                  Exec_Master_Log_Tso: 721745002099847993617485152365471498250000000000000000
                  ...,...

Excellent Performance

PolarDB-X GDN supports the construction of a distributed data replication pipeline with multi-stream parallel replication capabilities based on CDC multi-stream binlog. It offers powerful scalability, allowing you to adjust the level of parallelism based on actual load conditions. This ensures low-latency data replication even in cross-region high network latency scenarios, thereby keeping the RPO close to zero. Of course, achieving the best possible performance for single-stream replication is also our goal. Below is the performance data for single-stream replication in TPCC and Sysbench scenarios. We are continuously working to optimize these further.

Test environment: PolarDB-X Instance Configuration (Dedicated Type)

• CN:16core 128G * 4

• DN: 16core 128G * 4 (Version 5.7)

• CDC: 32core 64G * 2

• Number of Multi-Streams: 6

Test data:

Test case Test metrics Single-stream RPS Multi-stream RPS Latency
Sysbench_oltp_write_only
• 32 tables
• 256 thread
• 10000000 per table
• TPS: 3w/s
• QPS: 12w/s
4w/s 12w/s Multi-stream: < 2s
TPCC transaction stress testing
• 1000 warehouses
• 500 concurrency
• tpmC: 30w 10w/s 20w/s Multi-stream: < 2s

PolarDB-X GDN Cross-cloud Deployment in the Future

11

Alibaba Cloud provides the MyBase product, which allows you to manage database instances deployed across multiple cloud providers including IDCs within a single MyBase space. Through the multi-cloud management capabilities provided by MyBase, you can achieve cross-cloud disaster recovery deployment for databases. The cross-cloud GDN capability of PolarDB-X based on MyBase is on the way. For readers interested in multi-cloud disaster recovery, stay tuned.

Summary

PolarDB-X's disaster recovery capabilities are always a work in progress. From cross-AZ to cross-Region, and now to cross-Cloud, we have taken a significant step forward. However, this is just a small step on the continuous journey of improvement. With ongoing efforts, we believe that PolarDB-X's disaster recovery capabilities will continue to improve. In the future, it will evolve into a more native global database with global disaster capabilities.

0 1 0
Share on

ApsaraDB

443 posts | 93 followers

You may also like

Comments