Interpretation of PolarDB-X Data Distribution (1)

By Mengshi

The data of tables in PolarDB-X is distributed to each data node in the cluster. This article describes how data is distributed in PolarDB-X.

Data Node

Data nodes (DN) are developed based on the three-node (by default) MySQL cluster, and data is synchronized within the cluster by using the Paxos protocol.

Partition

In PolarDB-X, a partition is the smallest unit that data can be split.

A partition corresponds to a table (also known as a physical table) in a data node (MySQL). Note that the partition here is different from that of a partitioned table in MySQL.

Data is routed to different partitions based on specific algorithms.

Local Index

A local index refers to the index within the partition. It is on the data node and maintained by MySQL.

Global Index

A global index consists of a set of homogeneous partitions.

It contains two types of fields: key field and covering field. The key field is an index field, while the covering field includes all the other fields.

A global index has two attributes: partition key and partition algorithm. The partition key is the key field of the global index.

It will also create a local index on the key field to improve query efficiency in partitions.

For queries of the key field, compute nodes can directly locate the corresponding partition, and then locate the data row through the local index in the partition, which is more efficient.

The following figure shows an example of a global index.

Global Clustered Index

A global clustered index is a special global index.

The covering field of the global clustered index always contains all fields of the table except the key field of the index.

You can add or delete fields to a table, or modify the types of fields by executing DDL statements, and the covering field of the global clustered index will change at the same time.

The global clustered index can effectively avoid the cost of scanning the base table.

Index and Table

The local index and global index are collectively referred to as an index.

In PolarDB-X, a table consists of multiple primary and secondary indexes.

By default, the primary key is the global clustered index, and the secondary index is the global index whose covering field contains only the primary key.

The primary key is also known as the base table.

In most cases, PolarDB-X users can create indexes by using statements such as CREATE INDEX/ALTER TABLE ADD INDEX, and PolarDB-X will automatically create global indexes or local indexes.

For an operation that creates an index, PolarDB-X creates the following indexes:

• Create a global index if the type of the key field of the index is supported. As described in the preceding section, each partition of the global index also contains a local index.

• Create a local index on all clustered indexes (including the primary key).

For example, for the orders table:

create table orders (
   id bigint, 
   buyer_id varchar(128) comment 'buyer', 
   seller_id varchar(128) comment 'seller',
   primary key(id),
   index sdx(seller_id),
   index bdx(buyer_id)
)

The orders table contains three global indexes. The complete index information can be viewed by using SHOW FULL CREATE TABLE:

• PRIMARY KEY (the base table): Its key field is ID, its covering fields are buyer_id and seller_id, and its partition key is the key field ID. This parameter contains two local indexes the seller_id and buyer_id.

mysql> show full create table orders\G
*************************** 1. row ***************************
       TABLE: orders
CREATE TABLE: CREATE PARTITION TABLE `orders` (
        `id` bigint(20) NOT NULL,
        `buyer_id` varchar(128) DEFAULT NULL COMMENT 'buyer',
        `seller_id` varchar(128) DEFAULT NULL COMMENT 'seller',
        PRIMARY KEY (`id`),
        GLOBAL INDEX /* bdx_$d751 */ `bdx` (`buyer_id`) ...,
        GLOBAL INDEX /* sdx_$145a */ `sdx` (`seller_id`) ...,
        LOCAL KEY `_local_sdx` (`seller_id`),
        LOCAL KEY `_local_bdx` (`buyer_id`)
) ENGINE = InnoDB DEFAULT CHARSET = utf8mb4
...

• Secondary Index sdx: The key field is seller_id, the covering field is the primary key ID of the orders table, and the partition key is the key field seller_id. This parameter also contains the local index seller_id.

mysql> show full create table sdx_$145a\G
*************************** 1. row ***************************
       TABLE: sdx_$145a
CREATE TABLE: CREATE TABLE `sdx_$145a` (
        `id` bigint(20) NOT NULL,
        `seller_id` varchar(128) DEFAULT NULL COMMENT 'seller',
        PRIMARY KEY (`id`),
        LOCAL KEY `auto_shard_key_seller_id` USING BTREE (`seller_id`)
) ENGINE = InnoDB DEFAULT CHARSET = utf8mb4
...

• Secondary Index bdx: The key field is buyer_id, the covering field is the primary key ID of the orders table, and the partition key is the key field buyer_id. This parameter also contains the local index buyer_id.

mysql> show full create table bdx_$d751\G
*************************** 1. row ***************************
       TABLE: bdx_$d751
CREATE TABLE: CREATE TABLE `bdx_$d751` (
        `id` bigint(20) NOT NULL,
        `buyer_id` varchar(128) DEFAULT NULL COMMENT 'buyer',
        PRIMARY KEY (`id`),
        LOCAL KEY `auto_shard_key_buyer_id` USING BTREE (`buyer_id`)
) ENGINE = InnoDB DEFAULT CHARSET = utf8mb4
...

Each global index consists of its partitions, as shown in the following figure:

Partitioning Algorithms

Partition Key

Each global index is distributed according to the partition key. PolarDB-X partition keys support single-column and multiple-columns.

Global indexes can be created for the following field types:

• TINYINT/TINYINT UNSIGNED

• SMALLINT/SMALLINT UNSIGNED

• MEDIUMINT/MEDIUMINT UNSIGNED

• INT/INT UNSIGNED

• BIGINT/BIGINT UNSIGNED

• DATE

• DATETIME

• TIMESTAMP

• CHAR/VARCHAR

Note: Since the key field of the global index is the partition key, global indexes can be created for the preceding fields in PolarDB-X.

The value of the partition key can be modified. For example, the value of seller_id in a row can be changed from 1 to 2 in the preceding example.

The field definition of the partition key can also be modified. For example, the type of seller_id can be changed from varchar to bigint in the preceding example. However, this operation will trigger the data repartition of the global index.

Hash

Consistent hashing is supported in PolarDB-X, as shown in the following figure.

The consistent hashing algorithm will divide the range of hash values (Long.MIN to Long.MAX) into consecutive intervals that correspond to partitions.

When the partition to which a key belongs is calculated:

Calculate the hash value of the key field by using different hash functions based on the field type.
Locate the corresponding interval the hash value belongs to. This is a binary search process, and the partition corresponding to the interval is the partition to which the key belongs.

Consistent hashing also supports multiple fields. When some field values are missing, partition pruning is performed.

Consistent hashing allows you to split and merge partitions.

Note: The global index uses the consistent hashing partitioning algorithm by default.

Range

PolarDB-X supports partitioning by the range of data:

Different from consistent hashing, range partitioning directly divides the interval based on the range of keys.

Keys are compared in byte order to determine which interval they belong to.

Range partitioning also supports multiple fields. When multiple fields are compared with the range start point, the fields are compared in sequence. The N+1 field is compared only when the N field has the same value.

Therefore, if some field values are missing, range partitioning can still perform partition pruning based on the prefix, which does not result in a full partition scan.

Range partitioning also allows you to split and merge partitions.

List

List partitioning is a special type of range partitioning, with each partition containing only one value.

Join Group (WIP)

A join group consists of multiple tables. For tables in the same join group, the join operation attempts to perform pushdown optimization. If the pushdown is successful, the efficiency will be higher. For tables that are not in the same join group, the join operation does not perform pushdown optimization.

Notes:

• By default, a database contains only one join group to which all tables in the database belong.

• It is recommended that you divide the tables that cannot be joined into multiple join groups based on your business.

• The change of the join group (for example, changing the join group of a table from a to b) only involves the change of the metadata, at almost no cost.

Table Group and Partition Group

A table group is a set of global indexes.

Indexes in the same table group must have the same data distribution, including:

• The same number of partitions
• The same partitioning algorithm
• The same partition key (the data types of partition keys are the same while the names are unnecessarily the same)
• The same join group for the table to which the global index belongs

Note that the same data type means the same length, character set, and collation. For example, varchar(20) and varchar(30) are considered as different data types.

For global indexes in the same table group, partitions with the same interval (or subscript) form a partition group.

For example, for a table group that uses consistent hashing:

This table group contains three global indexes: idx1, idx2, and idx3. Each global index has 91 partitions from p0 to p90. The hash values of partitions with the same subscript have the same interval.

The table group is divided into 91 partition groups from Partition Group 0 to Partition Group 90.

PolarDB-X ensures that global indexes in the same table group are aligned in data distribution, which means:

• All partitions in the same partition group are on the same data node.

• Split, migrate, and merge operations are performed in units of partition groups.

For example:

In the above example, in the initial state, all the partitions of Partition Group 45, 46, and 47 are on DN1, and all the partitions of Partition Group 48 are on DN2.

When Partition Group 47 migrates to DN2, the three partitions in Partition Group 47 need to be migrated to DN2 together.

Table groups can be explicitly created or implicitly built in.

When you create a table or index, if you do not specify a table group, PolarDB-X will automatically search for an implicit built-in table group that matches the current join group (the default join group or the specified join group). If no implicit built-in table group is found, a new one will be created.

For example, when creating a table in an empty database:

CREATE TABLE t1 (id INT PRIMARY KEY);

The primary key of t1 creates a table group with the partition key INT, partitioning algorithm consistent hashing, and the number of partitions 64 (the default number of pre-sharding is 64). Assume that the table group is named tg0.

When creating the second table:

CREATE TABLE t2 (
    id INT PRIMARY KEY，
    c1 INT,
    c2 VARCHAR(64),
    INDEX idx1(c1),
    INDEX idx2(c2)
);

t2 contains three global indexes: Primary Key, idx1, and idx2.

For both the Primary Key and idx1, the partition key is INT, the partitioning algorithm is consistent hashing, and the number of partitions is 64. Therefore, both the Primary Key and idx1 are matched to the table group tg0 and added to tg0.

For idx2, as its type is VARCHAR(64), which does not match the existing table group, it will create a new table group.

Note: In general, users do not need to pay attention to the allocation of table groups, nor specify a table group for the global index.

Reasonable Division of Join Groups

With table groups and partition groups, it is possible to perform pushdown optimization for the Join operations on the global index key. For instance, in the example above:

idx1 join idx2 on idx1.key=idx2.key

idx1 and idx2 are joined by key. Since they belong to the same table group, idx1 and idx2 records with the same key value must be in the same partition group.

The partitions of idx1 and idx2 in the same partition group must be on the same DN.

Therefore, this Join operation can be pushed down to the DN for execution. That is, join two partitions on the same DN. This optimization is referred to as the Join pushdown.

Since data transmission across networks is avoided, Join pushdown can greatly improve the performance of Join.

However, changes to the global index in the table group such as migration and splitting must be performed synchronically based on the partition group. This will increase the cost of changes and the granularity of changes is larger than that of partition-level changes.

In the preceding example, assume that idx3 is placed in one table group while idx1 and idx2 are placed in another table group. We can perform more fine-grained migration operations, such as only migrating Partition 47 of idx3 to DN2, as shown in the figure:

When PolarDB-X distributes a table group to a global index, it searches for the join group of the table. Decreasing the size of a single join group and increasing the number of join groups can effectively reduce the size of a single table group and improve the efficiency of operations such as migration.

Therefore, we recommend that you divide the tables that will not be joined into different join groups according to your business.

For example, since user tables and address tables can be joined, place them in the same join group. Item tables and inventory tables can be joined, so they are placed in the same join group.

In this way, although the primary key <user id> of the user table and the primary key <item id> of the item table may have the same data type and the same partitioning algorithm, they are not scheduled to the same table group, which reduces the cost of migration.

Database

Each time a database (also known as a logical database) is created in PolarDB-X, a unique database (also known as a physical database) is also created on each DN involved.

Locality (WIP)

PolarDB-X locality is a property of a database, join group, table group, or partition group. It is used to mark which DN or DNs the objects belong to.

These objects, from low to high, are:

Partition Group
Table Group
Join Group
Database

If the locality property is not set for a low-level object, the locality property of the high-level object is inherited. If the locality property is set for a low-level object, the locality property of the high-level object is ignored.

For example:

CREATE DATABASE d1 LOCALITY='DN=DN1,DN2'

If you set locality to DN1 and DN2 for Database d1, it means that the data of the entire database is distributed across DN1 and DN2 by default.

Create a table group tg1 in d1:

CREATE TABLEGROUP tg1 LOCALITY='DN=DN1'

It means that tg1 is distributed only on DN1 by default.

Create a table group tg2 in d1:

CREATE TABLEGROUP tg2

tg2 is not specified with the locality attribute. Therefore, tg2 inherits the locality attribute of d1 and its data is distributed on DN1 and DN2.

Adjust the locality of a partition group of TG2:

ALTER TABLEGROUP tg2 ALTER PARTITION GROUP p0 SET LOCALITY='DN=DN3'

After setting the locality of the Partition Group p0 to DN3, the data of p0 will be migrated to DN3.

Note: If the DN of an object is set to a single DN by using the locality, the data of the object will not be scheduled in the cluster but will be fixed on the specified DN.

Two Special Distributions

Non-partitioned Table

A non-partitioned table is a table with one partition and no partition keys. Its global index contains only the primary key.

By default, all non-partitioned tables in a database are in the same table group which contains only one partition group. That is, all non-partitioned tables in a database are on the same DN by default.

We can create more table groups and specify different locality properties for these table groups to divide non-partitioned tables on different DNs.

Broadcast Table

A broadcast table does not have a partition key but has only one partition. All the DNs where the database resides contain a replica of this partition.

All broadcast tables in a database are in the same table group which contains only one partition group. This table group is not allowed or necessary to be modified.

When a new DN is added to the cluster, the table group to which the broadcast table belongs is replicated to the new DN. Then, the partition groups of other table groups can be migrated to the new DN.

To Be Continued

In the next article, we will share why it is designed in this way, and we will continue to discuss:

• Hash vs Range

• The importance of the global index & the global clustered index in distributed databases

• The reason why the local index is needed

• The reason why there are so many groups Partition Group, Table Group, and Join Group, and what we should focus on

Community

Interpretation of PolarDB-X Data Distribution (1)

Data Node

Partition

Local Index

Global Index

Global Clustered Index

Index and Table

Partitioning Algorithms

Partition Key

Hash

Range

List

Join Group (WIP)

Table Group and Partition Group

Reasonable Division of Join Groups

Database

Locality (WIP)

Two Special Distributions

Non-partitioned Table

Broadcast Table

To Be Continued

Read previous post:

Read next post:

ApsaraDB

You may also like

Comments

ApsaraDB

Related Products

PolarDB for Xscale

PolarDB for PostgreSQL

PolarDB for MySQL

LedgerDB