ShardingSphere PostgreSQL Proxy-based Sharding

By Digoal

Background

PostgreSQL has some sharding plug-ins or mpp products that closely integrate with databases, such as Citus, PG-XC, PG-XL, PG-X2, AntDB, Greenplum, Redshift, Asterdata, pg_shardman, and PL/Proxy. Some of these databases are highly commercialized and are suitable for a broader range of scenarios. However, they are more moderate or scenario-oriented. PostgreSQL also has ShardingSphere PostgreSQL sharding, a middleware similar to MySQL's most popular sharding middleware.

ShardingSphere is suitable for services that involve complete sharding and extremely simple database logic. The best practices and problems of MySQL sharding also apply to ShardingSphere.

ShardingSphere supports multiple modes: Sharding-JDBC, Sharding-Proxy, and Sharding Sidecar. Select a mode according to the situation at hand.

Features

1) Data sharding

Database and table sharding
Read/write splitting
Sharding strategy customization
Decentralized distributed primary key

2) Distributed transaction

Unified transaction API
XA strong-consistency transaction
Flexible transaction

3) Database management

Dynamic configuration
Orchestration and management
Data masking
Observable tracing
Elastic scaling (planning)

Configuration Template

cd apache-shardingsphere-incubating-4.0.0-sharding-proxy-bin  
cd conf

Example:

# ll  
total 24  
-rw-r--r-- 1 501 games 3019 Jul 30  2019 config-encrypt.yaml  
-rw-r--r-- 1 501 games 3582 Apr 22  2019 config-master_slave.yaml  
-rw-r--r-- 1 501 games 4278 Apr 22  2019 config-sharding.yaml  
-rw-r--r-- 1 501 games 1322 Apr 22  2019 logback.xml  
-rw-r--r-- 1 501 games 1918 Jul 30  2019 server.yaml

config-xxx.yaml

Data sharding example:

schemaName: #逻辑数据源名称  
  
dataSources: #数据源配置，可配置多个data_source_name  
  <data_source_name>: #与Sharding-JDBC配置不同，无需配置数据库连接池  
    url: #数据库url连接  
    username: #数据库用户名  
    password: #数据库密码  
    connectionTimeoutMilliseconds: 30000 #连接超时毫秒数  
    idleTimeoutMilliseconds: 60000 #空闲连接回收超时毫秒数  
    maxLifetimeMilliseconds: 1800000 #连接最大存活时间毫秒数  
    maxPoolSize: 65 #最大连接数  
  
shardingRule: #省略数据分片配置，与Sharding-JDBC配置一致

Read/write splitting example:

schemaName: #逻辑数据源名称  
dataSources: #省略数据源配置，与数据分片一致  
masterSlaveRule: #省略读写分离配置，与Sharding-JDBC配置一致

Common configuration example:

conf/server.yaml

Proxy属性  
#省略与Sharding-JDBC一致的配置属性  

props:  
  acceptor.size: #用于设置接收客户端请求的工作线程个数，默认为CPU核数*2  
  proxy.transaction.type: #默认为LOCAL事务，允许LOCAL，XA，BASE三个值，XA采用Atomikos作为事务管理器，BASE类型需要拷贝实现ShardingTransactionManager的接口的jar包至lib目录中  
  proxy.opentracing.enabled: #是否开启链路追踪功能，默认为不开启。详情请参见[链路追踪](/cn/features/orchestration/apm/)  
  check.table.metadata.enabled: #是否在启动时检查分表元数据一致性，默认值: false  
  proxy.frontend.flush.threshold: # 对于单个大查询,每多少个网络包返回一次  
  
  
权限验证  
用于执行登录Sharding Proxy的权限验证。配置用户名、密码、可访问的数据库后，必须使用正确的用户名、密码才可登录Proxy。  
  
authentication:  
  users:  
    root: # 自定义用户名  
      password: root # 自定义用户名  
    sharding: # 自定义用户名  
      password: sharding # 自定义用户名  
      authorizedSchemas: sharding_db, masterslave_db # 该用户授权可访问的数据库，多个用逗号分隔。缺省将拥有root权限，可访问全部数据库。

Test Example

Set up a test environment.

Java:

https://www.java.com/zh_CN/download/help/linux_x64rpm_install.xml#download    
yum install -y java

V12 database:

pg_hba.conf  
host all r1 0.0.0.0/0 md5

user: r1  
pwd: PW123321!  
schemaname: digoal

create role r1 login encrypted password 'PW123321!';  
create database db0 with owner r1;  
create database db1 with owner r1;  
create database db2 with owner r1;  
create database db3 with owner r1;

Primary database:

127.0.0.1:1921:db0  
127.0.0.1:1921:db1  
127.0.0.1:1921:db2  
127.0.0.1:1921:db3

Corresponding standby database with PostgreSQL physical stream replication:

127.0.0.1:1922:db0  
127.0.0.1:1922:db1  
127.0.0.1:1922:db2  
127.0.0.1:1922:db3

Common configuration:

vi /root/apache-shardingsphere-incubating-4.0.0-sharding-proxy-bin/conf/server.yaml  
  
authentication:  
  users:  
    r1:  
      password: PW123321!  
      authorizedSchemas: db0,db1,db2,db3  
props:  
  executor.size: 16  
  sql.show: false

Test Configuration of Horizontal Sharding

vi /root/apache-shardingsphere-incubating-4.0.0-sharding-proxy-bin/conf/config-sharding.yaml  
  
schemaName: sdb  
  
dataSources:  

  db0:   
    url: jdbc:postgresql://localhost:1921/db0  
    username: r1  
    password: PW123321!  
    connectionTimeoutMilliseconds: 30000  
    idleTimeoutMilliseconds: 60000  
    maxLifetimeMilliseconds: 1800000  
    maxPoolSize: 65 
 
  db1:  
    url: jdbc:postgresql://localhost:1921/db1  
    username: r1  
    password: PW123321!  
    connectionTimeoutMilliseconds: 30000  
    idleTimeoutMilliseconds: 60000  
    maxLifetimeMilliseconds: 1800000  
    maxPoolSize: 65  

  db2:  
    url: jdbc:postgresql://localhost:1921/db2  
    username: r1  
    password: PW123321!  
    connectionTimeoutMilliseconds: 30000  
    idleTimeoutMilliseconds: 60000  
    maxLifetimeMilliseconds: 1800000  
    maxPoolSize: 65  

  db3:  
    url: jdbc:postgresql://localhost:1921/db3  
    username: r1  
    password: PW123321!  
    connectionTimeoutMilliseconds: 30000  
    idleTimeoutMilliseconds: 60000  
    maxLifetimeMilliseconds: 1800000  
    maxPoolSize: 65  
  
shardingRule:  

  tables:  

    t_order:  
      actualDataNodes: db${0..3}.t_order${0..7}  
      databaseStrategy:  
        inline:  
          shardingColumn: user_id  
          algorithmExpression: db${user_id % 4}  
      tableStrategy:   
        inline:  
          shardingColumn: order_id  
          algorithmExpression: t_order${order_id % 8}  
      keyGenerator:  
        type: SNOWFLAKE  
        column: order_id  
    t_order_item:  
      actualDataNodes: db${0..3}.t_order_item${0..7}  
      databaseStrategy:  
        inline:  
          shardingColumn: user_id  
          algorithmExpression: db${user_id % 4}  
      tableStrategy:  
        inline:  
          shardingColumn: order_id  
          algorithmExpression: t_order_item${order_id % 8}  
      keyGenerator:  
        type: SNOWFLAKE  
        column: order_item_id  
  bindingTables:  
    - t_order,t_order_item  
  defaultTableStrategy:  
    none:  :

1) Start ShardingSphere and monitor port 8001.

cd bin  
./start.sh 8001

2) Check the log to confirm that ShardingSphere has started before you proceed.

Test

The schemaName of the corresponding sharding configuration for the database is connected.

psql -h 127.0.0.1 -p 8001 -U r1 sdb

3) In the target database, the URL is created automatically based on the configured policy.

create table t_order(order_id int8 primary key, user_id int8, info text, c1 int, crt_time timestamp);  
create table t_order_item(order_item_id int8 primary key, order_id int8, user_id int8, info text, c1 int, c2 int, c3 int, c4 int, c5 int, crt_time timestamp);

sdb=> \dt  
           List of relations  
 Schema |     Name      | Type  | Owner   
--------+---------------+-------+-------  
 public | t_order0      | table | r1  
 public | t_order1      | table | r1  
 public | t_order2      | table | r1  
 public | t_order3      | table | r1  
 public | t_order4      | table | r1  
 public | t_order5      | table | r1  
 public | t_order6      | table | r1  
 public | t_order7      | table | r1  
 public | t_order_item0 | table | r1  
 public | t_order_item1 | table | r1  
 public | t_order_item2 | table | r1  
 public | t_order_item3 | table | r1  
 public | t_order_item4 | table | r1  
 public | t_order_item5 | table | r1  
 public | t_order_item6 | table | r1  
 public | t_order_item7 | table | r1  
(16 rows)

Note: Ideally, you should set the ShardingSphere to false to know how it parses and routes SQL statements and configures sql.show. Otherwise, a large number of logs will be printed, causing performance problems.

vi /root/apache-shardingsphere-incubating-4.0.0-sharding-proxy-bin/conf/server.yaml  
  
authentication:  
  users:  
    r1:  
      password: PW123321!  
      authorizedSchemas: db0,db1,db2,db3  
props:  
  executor.size: 16  
  sql.show: true  # 表示把解析的sql打印到日志

4) Test writes and queries.

sdb=> insert into t_order (user_id, info, c1, crt_time) values (0,'a',1,now());  
  
sdb=> insert into t_order (user_id, info, c1, crt_time) values (1,'b',2,now());  
  
sdb=> insert into t_order (user_id, info, c1, crt_time) values (2,'c',3,now());  
  
sdb=> insert into t_order (user_id, info, c1, crt_time) values (3,'c',4,now());  
  
  
sdb=> select * from t_order;  
      order_id      | user_id | info | c1 |          crt_time            
--------------------+---------+------+----+----------------------------  
 433352561047633921 |       0 | a    |  1 | 2020-02-09 19:48:21.856555  
 433352585668198400 |       1 | b    |  2 | 2020-02-09 19:48:27.726815  
 433352610813050881 |       2 | c    |  3 | 2020-02-09 19:48:33.721754  
 433352628370407424 |       3 | c    |  4 | 2020-02-09 19:48:37.907683  
(4 rows)  
  
sdb=> select * from t_order where user_id=1;  
      order_id      | user_id | info | c1 |          crt_time            
--------------------+---------+------+----+----------------------------  
 433352585668198400 |       1 | b    |  2 | 2020-02-09 19:48:27.726815  
(1 row)

5) Query ShardingSphere logs.

[root@iZbp135pwcjjoxqgfpw9k1Z logs]# pwd  
/root/apache-shardingsphere-incubating-4.0.0-sharding-proxy-bin/logs  
[root@iZbp135pwcjjoxqgfpw9k1Z logs]# ll  
total 216  
-rw-r--r-- 1 root root 214639 Feb  9 19:49 stdout.log

sdb=> insert into t_order (user_id, order_id, info, c1 , crt_time) values (1,1,md5(random()::text), random()*1000, now());  
  
sdb=> select * from t_order where user_id=1 and order_id=1;  
 order_id | user_id |               info               | c1  |          crt_time            
----------+---------+----------------------------------+-----+----------------------------  
        1 |       1 | e1475ff0d830df92ab4a920e9ae0606b | 694 | 2020-02-09 19:55:14.541556  
(1 row)  
  
sdb=> insert into t_order (user_id, order_id, info, c1 , crt_time) values (1,1,md5(random()::text), random()*1000, now()) on conflict (order_id) do update set info=excluded.info,c1=excluded.c1,crt_time=excluded.crt_time;  
  
sdb=> select * from t_order where user_id=1 and order_id=1;  
 order_id | user_id |               info               | c1  |          crt_time            
----------+---------+----------------------------------+-----+----------------------------  
        1 |       1 | b63a5a4385af47339708eb880c3e1bd1 | 570 | 2020-02-09 19:56:09.045826  
(1 row)

6) Use pgbench to conduct a stress test.

vi test.sql  
\set user_id random(1,100000000)  
\set order_id random(1,2000000000)  
\set order_item_id random(1,2000000000)  
insert into t_order (user_id, order_id, info, c1 , crt_time) values (:user_id, :order_id,random()::text, random()*1000, now()) on conflict (order_id) do update set info=excluded.info,c1=excluded.c1,crt_time=excluded.crt_time;   
insert into t_order_item (order_item_id, user_id, order_id, info, c1,c2,c3,c4,c5,crt_time) values (:order_item_id, :user_id,:order_id,random()::text, random()*1000,random()*1000,random()*1000,random()*1000,random()*1000, now()) on conflict(order_item_id) do nothing;  
  
pgbench -M simple -n -r -P 1 -f ./test.sql -c 24 -j 24  -h 127.0.0.1 -p 8001 -U r1 sdb -T 120  
progress: 1.0 s, 1100.9 tps, lat 21.266 ms stddev 6.349  
progress: 2.0 s, 1253.0 tps, lat 18.779 ms stddev 7.913  
progress: 3.0 s, 1219.0 tps, lat 20.083 ms stddev 13.212

Read/Write Splitting Test

Note that if the UDF executes internal write operations, it is routed to the read-only database, resulting in an error.

schemaName: msdb  
  
dataSources:  
  ds_master:  
    url: jdbc:postgresql://localhost:1921/db0  
    username: r1  
    password: PW123321!  
    connectionTimeoutMilliseconds: 30000  
    idleTimeoutMilliseconds: 60000  
    maxLifetimeMilliseconds: 1800000  
    maxPoolSize: 65  
  ds_slave0:  
    url: jdbc:postgresql://localhost:1922/db0  
    username: r1  
    password: PW123321!  
    connectionTimeoutMilliseconds: 30000  
    idleTimeoutMilliseconds: 60000  
    maxLifetimeMilliseconds: 1800000  
    maxPoolSize: 65  
  
masterSlaveRule:  
  name: ds_ms  
  masterDataSourceName: ds_master  
  slaveDataSourceNames:   
    - ds_slave0

The test also applies to ApsaraDB RDS for PostgreSQL.

Please refer to the ShardingSphere documentation to know more about its features and usage restrictions. For instance, you can view hang by executing the ‘explain’ statement and print the following result in logs:

line 1:0 mismatched input 'explain' expecting {SELECT, INSERT, UPDATE, DELETE, CREATE, ALTER, DROP, TRUNCATE, GRANT, REVOKE, SET, BEGIN, COMMIT, ROLLBACK, SAVEPOINT, CALL, START, RESET, SHOW}

The above part includes the SQL commands that ShardingSphere currently supports.

Besides proxy mode, ShardingSphere supports JDBC drive mode with better performance and customized sharding algorithms. You can refer to its official documentation for more details.

Reference

https://shardingsphere.apache.org/

Community

ShardingSphere PostgreSQL Proxy-based Sharding

Background

Features

Configuration Template

Test Example

Test Configuration of Horizontal Sharding

Read/Write Splitting Test

Reference

Read previous post:

Read next post:

digoal

You may also like

Comments

digoal

Related Products

PolarDB for PostgreSQL

AnalyticDB for PostgreSQL

ApsaraDB RDS for PostgreSQL

Database for FinTech Solution