×
Community Blog An Example of PolarDB-X range_hash Sharding and Analysis of Design Boundaries

An Example of PolarDB-X range_hash Sharding and Analysis of Design Boundaries

This short article offers an example of PolarDB-X sharding, design analysis, and improvement strategies.

By digoal

Use a Hash Function That Contains Double Fields for Sharding

A shard key must be of a character data type or a numeric data type.

Routing Method: Calculate a hash value using the last N characters of a shard key. Then, use a hash function to calculate a route. N specifies the third parameter for the hash function. For example, if the RANGE_HASH(COL1, COL2, N) function is used, COL1 is selected and truncated to obtain the last N characters for calculation. If COL1 does not exist, COL2 is selected for calculation.

Scenarios: Two shard keys are required, and the value of only one shard key is used for queries. For example, data in your database can be routed to eight physical database shards, and the following business requirements must be met:

Database sharding is performed on an order table based on buyer IDs and order IDs.

Only buyer IDs or order IDs are specified as the condition for queries.

In this case, you can execute the following DDL statement to create an order table:

create table test_order_tb (  
 id bigint not null auto_increment,  
 seller_id varchar(30) DEFAULT NULL,  
 order_id varchar(30) DEFAULT NULL,  
 buyer_id varchar(30) DEFAULT NULL,  
 create_time datetime DEFAULT NULL,  
 primary key(id)  
) ENGINE=InnoDB DEFAULT CHARSET=utf8   
dbpartition by RANGE_HASH(buyer_id, order_id, 10)   
tbpartition by RANGE_HASH(buyer_id, order_id, 10)   
tbpartitions 3;   

Description

The two shard keys cannot be modified.

If the two shard keys point to different database shards or table shards, data fails to be inserted.

Design Analysis of Two Restrictions on the Description Section

Why these two restrictions?

Let’s take range_hash(sell_id, order_id, 5) as an example:

1.  The two shard keys cannot be changed.

After the field values of a shard are modified, the new value is recomputed based on range_hash. The new value may not belong to the current sharded table. Then, rows may be migrated to other shards in the DN or other shards in other DNs. Cross-database transactions also involve distributed transactions.

We deduce that the current polardb-x version does not support it for the time being, so limit this operation first.

2.  When inserting data, if two shard keys point to different databases or tables, the insertion will fail.

As with the previous restriction, the routing logic becomes more complex if the two shard keys point to different databases or tables.

sell_id=?: It can be routed to a DN.

order_id=?: According to the logic of range_hash, when sell_id is not empty, the shard position is computed by sell_id. Such a tuple cannot decide which shard to put by using order_id value and can only be routed to all DNs and then summarized by order_id=?.

Therefore, the current version polardb-x made this restriction, requiring that all keys in range_hash must point to the same shard, regardless of whether the user checks sell_id=? or order_id=? You can figure out which shard should be routed to since they have been restricted to appear in the same shard.

However, this will conflict with the business. For example:

range_hash(sell_id, order_id, 5)

How can we ensure sell id is consistent with the top 5 order id (or the calculated hash value mod)?

If you are a business developer, do you want to follow the restrictions imposed by the database on your business?

Suggested Improvement Strategies

1.  The two shard keys cannot be changed.

It can be modified, but cross-database transactions need to be implemented because after modifying the partition key, this new record may be migrated to another DN.

2.  The key value can be different, but the original restriction can be retained.

For example, another parameter can be used to specify whether to allow modification. The optimizer can still use the previous optimization strategy under the restriction.

sell_id=?: It can be routed to a DN.

order_id=?: It routes to all DNs and then summarizes them.

Additional fields can be included depending on the specific needs and requirements.

PolarDB-X: https://www.alibabacloud.com/product/polardb-for-xscale

PolarDB-X Document: https://www.alibabacloud.com/help/en/polardb/latest/polardb-x-overview

0 1 0
Share on

digoal

282 posts | 25 followers

You may also like

Comments