PolarDB-X centralized-distributed integration - PolarDB - Alibaba Cloud Documentation Center

This document introduces the concepts related to the centralized-distributed integration of PolarDB-X.

Background information

You can select a centralized or distributed database based on your business requirements. For most small and medium-sized enterprises (SMEs), a centralized database can meet routine business requirements. A centralized database provides a moderate scale of resources, is cost-effective, and is relatively easy to maintain and operate. A distributed database delivers high performance and can manage complex business scenarios in an efficient manner. A distributed database can meet requirements for high throughput, extensive storage capacity, minimal latency, easy scalability, and robust availability. However, distributed databases are more expensive and have higher technical barriers and O&M costs, which may make them less suitable for SMEs.

However, SMEs may also experience sudden business growth and require databases with high concurrency and high throughput to process their business. They may have certain requirements for database scalability. As the business grows, an enterprise that uses a centralized database may require distributed scaling to meet increasing requirements.

Therefore, PolarDB Distributed Edition (abbreviated as: PolarDB-X) introduces the integrated centralized-distributed architecture capability, which combines the availability and extensibility of a distributed database with the features and performance of a centralized database in a single database system.

Features

In a centralized-distributed integrated database, data nodes are separated as a centralized form, fully compatible with the standalone database form. When business growth requires distributed scaling, the architecture can be upgraded in place to a distributed form. Distributed components seamlessly connect to the existing data nodes without requiring data migration or application modifications, allowing you to benefit from the availability and scalability of distributed architecture.

Instance editions

The main product offerings of PolarDB-X are divided into Standard Edition (centralized architecture) and Enterprise Edition (distributed architecture). The technical architecture is as follows:

Standard Edition (centralized architecture)
PolarDB-X Standard Edition is a centralized form, provided by multiple replicas of data nodes (DN) from the distributed architecture. Standard Edition supports a minimum specification of 2 cores and 4 GB specifications.
PolarDB-X Standard Edition uses the Paxos majority replication protocol. Compared with the primary-secondary replication protocol of MySQL, it ensures strong consistency between replicas and meets financial-grade high availability (RPO=0, RTO＜10 seconds). At the same time, the self-developed Lizard distributed transaction engine provides more reliable high availability and about 35% performance improvement compared with the native MySQL distributed engine.
Enterprise Edition (distributed architecture)
PolarDB-X Enterprise Edition is a distributed form that includes complete distributed components (compute nodes (CN), data nodes (DN), change data capture (CDC), columnar nodes (COLUMNAR), and global meta service (GMS)). PolarDB-X Enterprise Edition is highly compatible with the MySQL ecosystem, supports strongly consistent distributed transactions and distributed parallel queries, and supports distributed horizontal scaling. The technical architecture is as follows:

Upgrade an instance from Standard Edition to Enterprise Edition

With the rapid development of business, PolarDB-X Standard Edition may encounter centralized bottlenecks, such as large single tables causing decreased query efficiency, high-concurrency queries causing the database to remain in a high-load state for extended periods, or inability to meet analytical requirements. In these cases, vertical scaling of the database can no longer meet the new requirements, and vertical scaling also becomes less cost-effective.

PolarDB-X provides the ability to upgrade an instance from Standard Edition to Enterprise Edition, leveraging distributed features and HTAP functionality to solve problems encountered on centralized databases while maintaining the experience and performance of using a standalone MySQL database.

Note

Standard Edition and Enterprise Edition use the same set of DNs. The upgrade process directly adds distributed components such as CN, CDC, and GMS to the Standard Edition without requiring data migration. Whether switching or rolling back, there is only one copy of data, so you do not need to worry about data inconsistency.
Centralized tables are converted in place to distributed single tables, which, combined with Online DDL capabilities, provides distributed scalability.
After upgrading an instance from Standard Edition to Enterprise Edition, the connection string remains unchanged, and no business changes are required. The upgrade process only involves a minute-level transient connection.
Transparent distribution, deep compatibility with the centralized MySQL ecosystem, no application modifications required.
To address the additional performance overhead introduced by distributed features, PolarDB-X combines table groups and partition groups technology to ensure that related data is distributed in a centralized manner, thereby optimizing distributed transactions and complex queries and aligning performance with centralized architecture in single-partition scenarios.

Storage resource pools and elasticity specifications

In the design for distributed linear scaling, PolarDB-X introduces the concepts of storage pools and Locality for centralized capabilities, allowing for on-demand distributed scaling.

Storage resource pools: DNs are divided into non-overlapping resource pools, supporting the addition or reduction of DNs at the individual storage pool level.
Locality: Objects in the database (databases, tables, partitions) are associated with different resource pools through the Locality property.

Note

The following two typical scenarios illustrate how storage resource pools and Locality enable on-demand distributed scaling:

Scenario 1: If the original centralized business is a multi-tenant SaaS system, after upgrading from Standard Edition to Enterprise Edition, you can use vertical partitioning to distribute tenants across different storage resource pools. Each resource pool maintains the single-table form, achieving distributed scaling, as shown in storage resource pool 1 in the diagram above.
Scenario 2: If the original centralized e-commerce business experiences an increase in user volume and concurrency, horizontal splitting is needed to horizontally split the centralized data across multiple DN nodes in a resource pool to achieve scaling goals, as shown in storage resource pool 3 in the diagram above.

Centralized data in distributed form needs to be distributed using Online DDL capabilities, as shown in the following diagram:

Note

Multiple single tables from the original business: These can continue to maintain the single-table form, evolving into distributed vertical partitioning. After expanding the distributed nodes within a single storage pool, multiple single tables can be evenly distributed across multiple DNs in the storage pool.
Large tables from the original business: These can be changed online to distributed tables, evolving into distributed horizontal scaling. After expanding the distributed nodes within a single storage pool, the partitions of distributed tables will automatically undergo data balancing scheduling.
Multiple tables from the original business: Large tables are changed online to distributed tables, single tables continue to be maintained and divided into multiple storage pools, and the overall evolution is a combination of distributed vertical and horizontal splitting scenarios, achieving linear capability through resource expansion.
For each DN, due to different data distributions, the actual resource requirements also differ. You can individually upgrade or downgrade each DN node through data node management to achieve flexible elasticity specifications and improve overall resource utilization.