All Products
Search
Document Center

PolarDB:Transparent distribution

Last Updated:Jan 10, 2025

This topic describes what transparent distribution in PolarDB-X is and how transparent distribution works.

Background information

When an application that runs based on a conventional stand-alone MySQL database encounters a resource or performance bottleneck, users upgrade the database to a distributed database as a proved effective solution to meet their business requirements.

After the upgrade, however, users tend to suffer the following problems in their use of the distributed database:

  • How is data distributed across the database?

  • Which tables need to be horizontally split? How should a table be partitioned and what partition keys should be used? What is the proper number of partitions for a table?

  • Which data nodes house my data? How do I handle a resource imbalance?

To solve these problems, users need to understand how distributed databases work and the scenarios in which distributed databases can be used. The proper utilization of distributed databases is not easy.

To make distributed databases easier to use and upgrade from stand-alone databases, Alibaba Cloud provides the in-kernel transparent distribution feature in PolarDB-X 2.0.

Note

This feature only supports databases in AUTO mode.

What is transparent distribution?

The transparent distribution feature of PolarDB-X 2.0 provides users with default partitioning policies and default data distribution policies to allow users to connect applications to distributed databases for performance boosts without modifying the applications.

The core capability of transparent distribution boils down to helping users determine the partitioning scheme for a table and the distribution of the table data across PolarDB-X data nodes (DNs).

For users who switch from a conventional stand-alone MySQL database to a distributed PolarDB-X database, the SQL statements to create business tables fall into two categories:

  • Category one: The SQL statements do not explicitly use the MySQL partitioning syntax. Tables created by using these SQL statements can be called non-partitioned MySQL tables.

  • Category two: The SQL statements explicitly use the MySQL partitioning syntax, for example, SQL statements that carry Partition By Hashs(id). Tables created by using these SQL statements can be called partitioned MySQL tables.

If users create tables by using a category-two statement, the users are considered to have chosen a partitioning scheme for their business. The reason is that the statement explicitly uses the MySQL partitioning syntax. These tables are also considered as manually partitioned tables by PolarDB-X. Partitions in these tables are evenly distributed to DNs for balanced loads.

Tables created from category-one statements come in large numbers and are the focus of transparent distribution of PolarDB-X, which aims to help users choose a partitioning scheme and a data distribution scheme.

The transparent distribution feature of PolarDB-X provides three partitioning and data distribution schemes for tables created from category-one statements:

  • Non-partitioned table: In this scheme, MySQL tables stay non-partitioned in PolarDB-X, which assigns fixed DNs to the tables. This way, users use the tables in the same way as they use conventional stand-alone MySQL tables.

  • Sharded table: In this scheme, MySQL tables stay non-partitioned in PolarDB-X. However, the tables are automatically sharded and distributed to DNs.

  • Partitioned table: In this scheme, non-partitioned MySQL tables are automatically partitioned by PolarDB-X.

    • Primary tables are horizontally HASH partitioned based on primary keys.

    • Global indexes are used by default, and horizontal partitioning is performed based on index columns.

Therefore, the transparent distribution feature works in three modes based on the table partitioning and data distribution scheme:

  • Table sharding

  • Automatic partitioning

  • Manual partitioning (default)

Working modes of transparent distribution

You can choose a working mode to suit your business requirements based on your desired partitioning or data distribution scheme when you connect your application to a distributed database, especially when you connect an application to such a database for the first time.

The following table describes the working modes and their pros and cons:

Working mode

Description

Pros and cons

Table sharding

For all tables in a logical database:

  • If a logical table does not explicitly use the MySQL partitioning syntax, the table is not partitioned and is randomly distributed to DNs of PolarDB-X for load balancing.

  • If a logical table explicitly uses the MySQL partitioning syntax, the table is partitioned and each partition is evenly distributed to DNs.

Pros:

  • Easy to use with zero application modification.

  • Best SQL compatibility.

  • Maximum proximity to MySQL in SQL query performance.

Cons:

  • Unbalanced loads among DNs due to varying table access frequency and scanned rows.

Automatic partitioning

If tables in a logical database do not explicitly use the MySQL partitioning syntax:

  • The tables are automatically horizontally partitioned based on their primary keys.

  • Global indexes are automatically used, and horizontal partitioning is performed based on index columns.

Pros:

  • Easy to use with almost zero application modification.

  • Horizontal scalability.

Cons:

  • Compromised write throughput because multiple global indexes need to be maintained.

  • Unbalanced loads among DNs and poor scalability due to varying hot spots of global indexes.

Manual partitioning (default)

For all tables in a logical database:

  • If a logical table does not explicitly use the MySQL partitioning syntax, the table is regarded as non-partitioned tables and assigned to the same DN.

  • If a logical table explicitly uses the MySQL partitioning syntax, the table is partitioned and each partition is evenly distributed to DNs.

Pros:

  • Adaptability to different business scenarios.

  • Provides optimal performance, load balancing, and linear scalability.

Cons:

  • More difficult to understand and use.

  • Application modification required.

Note

The three working modes can be used in combination. For example, you can use manual partitioning in combination with table sharding or automatic partitioning to meet multiple requirements.

For more information about the suitable scenarios of the modes, see Best Practices.