By Guxu
PolarDB-X, as a cloud-native distributed relational database, supports distributing the data of a logical table across multiple data nodes through sharding rules. It also supports changing these sharding rules for data repartition. Data repartition is one of the core capabilities of distributed databases. When the business increases, it can distribute data across more nodes to achieve horizontal scale-out. When business rules change dramatically, it can distribute data according to new sharding rules, thus improving query performance and better adapting to new business rules. The following figure illustrates how a non-partitioned table can be online changed into a sharded table through a simple DDL statement. For more information, see Alibaba Cloud documentation.
Back in the heyday of distributed database middleware, developers tried various ways to solve the problem of data repartition. Without exception, these solutions were complex and risky, requiring developers to perform repartition during low-traffic periods in the middle of the night, and the whole process required detailed plans, justifications, and rollback strategies. Due to the distributed nature of the systems, during the transition period when switching from old to new data, they had to struggle against the choice between data consistency and availability. As a result, many solutions involved a write-stop phase, a data verification phase, or both.
After thoroughly evaluating all technical details of data repartition in distributed databases, PolarDB-X achieves strong data consistency, high availability, and transparency to businesses, and can be implemented by a simple DDL statement.
This topic discusses data repartition at table level. For example, you can repartition the data in a sharded table to different data nodes based on a new rule. However, the repartition process takes time, during which new incremental data enters the system. Once all data has been repartitioned, old and new data must be switched, and finally, dual writing should stop and all old data should be deleted. After carefully studying the details of the above process, it can be summarized into three key sub-issues according to which we will evaluate these implementations:
A traditional repartition process is as follows:
However, upon further examination, we find that data consistency cannot guaranteed in many steps:
In addition to data consistency, there are still many cumbersome but important issues:
PolarDB-X is a distributed database that separates storage from compute. Therefore, its architecture includes Compute Node (CN), Data Node (DN), and Global Meta Service (GMS). CNs are responsible for SQL parsing, optimization, and execution, DNs manage data storage, and GMS stores metadata. For performance reasons, each CN caches a copy of the metadata.
In distributed incremental data dual-writing scenarios, the two ends of dual writing are often located on different data nodes, making standalone transactions unavailable. As previously discussed, all of the XA transactions, binlog synchronization, and triggers cannot guarantee strong consistency between the two ends of dual writing. However, PolarDB-X uses its built-in TSO-based distributed transactions to implement incremental data synchronization, thus ensuring strong consistency of the data switched by read traffic at any time during the repartition process.
Note that if the shard key column value of a row is modified, the row of data might be routed to another data node. Then, what is actually executed at this time is the delete operation of the original data node + the insert operation of the new data node. Therefore, during data repartition, due to the different sharding rules before and after, an update operation on a row could become a distributed transaction involving four data nodes (and even more if there are global secondary indexes). PolarDB-X will handle all of these issues, allowing users to use it as a standalone database without awareness.
PolarDB-X performs existing data synchronization in segments. For each segmented synchronization, PolarDB-X attempts to obtain the S lock of the source data within a TSO transaction before writing to the destination. If the destination segment contains the same data, it indicates that the data has been synchronized during the incremental dual-writing phase and can be ignored. However, distributed transactions, like standalone transactions, can cause deadlocks. When adding an S lock to a segment of the original table during the existing data synchronization, a large volume of business update traffic may lead to distributed deadlocks. Therefore, PolarDB-X provides a distributed deadlock detection module to address this issue. After the deadlock is released, the existing data synchronization module retries the operation.
First, let's take a look at how the Orphan Data Anomaly issue as mentioned earlier occurs. When incremental data dual writing is initiated, the metadata in the memory of PolarDB-X compute nodes is not refreshed simultaneously but in a sequence. Therefore, there is always a period when some compute nodes have started dual writing while others have not. This leads to the following situation:
This issue has been discussed in detail in Google's paper Online, Asynchronous Schema Change in F1. PolarDB-X introduces Online Schema Change [8] to address such issues. For more information, please refer to the previous articles. For repartition, PolarDB-X introduces the states as shown in the following figure to ensure that any two adjacent states are compatible and avoid data consistency issues. Specifically, let's look at some of the most critical states:
• target_delete_only and target_write_only: As mentioned above, when we have multiple compute nodes, directly enabling incremental double writing will cause Orphan Data Anomaly. Therefore, before enabling dual writing, all compute nodes should reach the target_delete_only state first and then the target_write_only state (which is the dual writing state). Under the target_delete_only state, compute nodes will only execute delete statements (update statements will be converted to delete ones before executing). For example, in the above figure: CN1 reaches the target_delete_only state first, so even the dual writing is not enabled, it can still delete the data with id=3 from the new table to ensure data consistency.
• source_delete_only and source_absent: As discussed earlier, directly stopping dual writing of the old table can cause data inconsistency. Therefore, PolarDB-X introduces the source_delete_only state before source_absent (the state when dual writing is stopped). It also ensures that the Orphan Data Anomaly issue will not occur when the old table is offline.
PolarDB-X provides users with the ability to change sharding rules (that is, repartition) through DDL. However, the ACID of DDL should also be guaranteed, and it may take a long time to repartition data, so system failures due to power outages or other reasons are inevitable.
PolarDB-X also implements a stable DDL execution framework that divides DDL into many steps, each of which is idempotent. This ensures that the DDL task can be interrupted at any time and then resumed or rolled back. By linking all steps through DDL and excluding all manual operations, developers no longer need to design repartition plans or manually perform database operations in the middle of the night.
After MySQL introduced Online DDL capabilities in version 5.7, DDL could run more effectively with read and write transactions, which is significantly improved compared with previous versions. The basic principle of Online DDL is to obtain MDL only at critical moments, rather than holding an MDL throughout the entire DDL process. When executing repartition, PolarDB-X also obtains MDLs in multiple stages, allowing higher concurrency for transactions. However, MDLs are fair locks and may cause metadata deadlocks.
Obtaining MDL locks multiple times improves performance but increases the possibility of metadata deadlocks. Once a metadata deadlock occurs, all subsequent read and write transactions are blocked. The default MDL timeout in MySQL is one year, which poses a greater risk than ordinary data deadlocks. Therefore, PolarDB-X provides a distributed metadata deadlock detection module to release distributed metadata deadlocks at critical moments.
Flexible sharding rule change capability is crucial for distributed databases. PolarDB-X supports three types of tables: non-partitioned tables, broadcast tables, and sharded tables. With sharding rule change capabilities, users can convert data tables into any of these types to better adapt to business growth. In addition to sharding rule change capabilities, PolarDB-X also ensures strong data consistency, high availability, transparency to businesses, and ease of use. This article briefly discusses the various technical points used in PolarDB-X to implement sharding rule changes. As can be seen, integrating sharding rule change capabilities into the database kernel is necessary to solve many data consistency issues. This is also one of the distinguishing features of distributed databases compared with distributed database middleware.
The sharding rule change capability is just one of many features in PolarDB-X. For more information, please refer to other articles about PolarDB-X.
Try out database products for free:
Interpretation of PolarDB-X Data Distribution (2): Hash vs. Range
ApsaraDB - August 29, 2024
ApsaraDB - November 8, 2024
ApsaraDB - August 15, 2024
ApsaraDB - April 20, 2023
ApsaraDB - April 20, 2023
ApsaraDB - November 17, 2023
Alibaba Cloud PolarDB for MySQL is a cloud-native relational database service 100% compatible with MySQL.
Learn MoreLeverage cloud-native database solutions dedicated for FinTech.
Learn MoreLindorm is an elastic cloud-native database service that supports multiple data models. It is capable of processing various types of data and is compatible with multiple database engine, such as Apache HBase®, Apache Cassandra®, and OpenTSDB.
Learn MoreMigrate your legacy Oracle databases to Alibaba Cloud to save on long-term costs and take advantage of improved scalability, reliability, robust security, high performance, and cloud-native features.
Learn MoreMore Posts by ApsaraDB