This topic describes the key technical principles of PolarDB-X.
Distributed linear scalability
PolarDB-X horizontally partitions table data across multiple data nodes (DNs) using partitioning functions. PolarDB-X supports common partitioning functions, such as hash and range partitioning.
In the following example, the shop table in the orders database is distributed across 12 partitions, from orders_00 to `orders_11`, based on the hash value of each row's ID attribute. These partitions are evenly distributed across four data nodes. This data distribution is transparent to users. The distributed SQL layer of PolarDB-X automatically routes queries to the correct nodes and aggregates the results from different partitions and nodes.

Scale-out and migration
As your business and data volume grow, you may need to add more data nodes. When you add a new data node to an instance, PolarDB-X automatically initiates a scale-out task to rebalance the data.
In the following example, the orders table data is initially distributed across four data nodes. After the number of data nodes is increased from 4 to 6, PolarDB-X automatically initiates a scale-out task to migrate some partitions from the existing nodes to the new ones. This migration process runs in the background, uses idle resources, and does not affect your online business.

High availability and disaster recovery
In a production environment, database instances are typically deployed with multiple replicas to ensure high availability and data durability. Modern databases often use a majority consensus replication protocol, such as Paxos, to ensure strong consistency between replicas. The protocol requires at least three nodes, and each write operation must be acknowledged by a majority of the nodes. This allows the instance to continue operating even if one node fails. PolarDB-X uses the X-Paxos replication protocol developed by Alibaba. X-Paxos is an enhanced version of Paxos that provides extensive optimizations for functionality and performance. This protocol has supported the Double 11 Shopping Festival for over a decade, demonstrating its stability and reliability.
The Paxos replication protocol lets you deploy a PolarDB-X instance across multiple data centers to achieve data center-level disaster recovery. Common deployment methods include three data centers in the same city or three data centers across two regions. The latter is often used for hybrid cloud deployments. Due to the nature of the Paxos protocol, one of the three data centers typically functions as the primary data center, which is responsible for handling external services.
Distributed transactions
PolarDB-X natively supports distributed transactions and guarantees atomicity, consistency, isolation, and durability (ACID).
PolarDB-X uses a Timestamp Oracle (TSO) and multiversion concurrency control (MVCC) to ensure consistent snapshot reads. This prevents reading the intermediate state of a distributed transaction, such as a money transfer. As shown in the following figure, when a compute node (CN) commits a transaction, it obtains a timestamp from the TSO. Then, the CN commits the timestamp and data to the multiversion storage engine on a data node (DN). During a read operation, if a query involves data from multiple partitions, PolarDB-X retrieves a global timestamp to use as the read version. It then assesses the visibility of each row's version to ensure that only data from transactions committed before this global timestamp is read.
Distributed transactions are a fundamental feature of distributed systems. For example, in a read/write splitting solution, transactional data versions are synchronized to learner replicas to ensure that read-only instances do not read stale data due to synchronization latency. In the global data change log, distributed transactions are sorted by timestamp. When performing a point-in-time recovery (PITR), PolarDB-X uses these timestamps to identify the globally consistent data version for a specific point in time.
Integrated centralized-distributed architecture
PolarDB-X supports an integrated centralized-distributed architecture.
This architecture allows PolarDB-X to combine the scalability and resilience of distributed databases with the manageability and performance of centralized databases. You can seamlessly switch between centralized and distributed modes. Data nodes can operate independently in a centralized mode and are fully compatible with the single-node database model. As your business grows and requires a distributed system, the architecture can be seamlessly transitioned to a distributed model in place. During this upgrade, distributed components integrate with the existing data nodes without requiring data migration or application modifications.
To support this architecture, PolarDB-X instances are available in two editions: Standard Edition (centralized) and Enterprise Edition (distributed). You can upgrade a Standard Edition instance to an Enterprise Edition instance in place.
HTAP
PolarDB-X supports hybrid transactional and analytical processing (HTAP). This allows it to handle highly concurrent transactional requests and complex analytical queries simultaneously. Analytical queries operate on large datasets and involve complex computations, such as aggregating data over a specific period. Compared to simple queries, analytical queries take longer to execute and consume more computing resources, potentially taking several seconds or minutes to complete.
To enhance the performance of complex analytical queries, PolarDB-X uses In-Memory Column Index (IMCI) technology. This technology, combined with vectorized operators, significantly improves analytical processing performance.
Compatibility with the MySQL ecosystem
One of the core design goals of PolarDB-X is compatibility with MySQL and its ecosystem. This compatibility covers SQL syntax, transaction behavior, and data import and export. For more information, see Compatibility with MySQL.
PolarDB-X is compatible with the MySQL protocol. You can connect to PolarDB-X instances using common MySQL clients and drivers, such as Java Database Connectivity (JDBC), ODBC, and Golang drivers. PolarDB-X is also compatible with MySQL protocols such as SSL, Prepare, and Load.
PolarDB-X is compatible with various MySQL DML, DAL, and DDL syntaxes, including the following:
Most MySQL functions, including JSON, encryption, and decryption functions.
Views, common table expressions (CTEs), window functions, and analytical functions in MySQL 8.0.
Various MySQL data types, including precision for types such as TIMESTAMP and DECIMAL.
Common MySQL string character sets and collations.
Most
information_schemaviews.
In addition, PolarDB-X is compatible with the MySQL binary logging replication protocol. You can treat a PolarDB-X cluster as a standard MySQL node and use another MySQL node as the synchronization source or destination for the PolarDB-X cluster. Because the binary logging format of PolarDB-X is the same as that of MySQL, it can also be used in change data capture (CDC) scenarios. For example, you can use tools such as Canal to synchronize data from PolarDB-X to other storage solutions.