By Digoal
PostgreSQL has some sharding plug-ins or mpp products that closely integrate with databases, such as Citus, PG-XC, PG-XL, PG-X2, AntDB, Greenplum, Redshift, Asterdata, pg_shardman, and PL/Proxy. Some of these databases are highly commercialized and are suitable for a broader range of scenarios. However, they are more moderate or scenario-oriented. PostgreSQL also has ShardingSphere PostgreSQL sharding, a middleware similar to MySQL's most popular sharding middleware.
ShardingSphere is suitable for services that involve complete sharding and extremely simple database logic. The best practices and problems of MySQL sharding also apply to ShardingSphere.
ShardingSphere supports multiple modes: Sharding-JDBC, Sharding-Proxy, and Sharding Sidecar. Select a mode according to the situation at hand.
1) Data sharding
2) Distributed transaction
3) Database management
cd apache-shardingsphere-incubating-4.0.0-sharding-proxy-bin
cd conf
Example:
# ll
total 24
-rw-r--r-- 1 501 games 3019 Jul 30 2019 config-encrypt.yaml
-rw-r--r-- 1 501 games 3582 Apr 22 2019 config-master_slave.yaml
-rw-r--r-- 1 501 games 4278 Apr 22 2019 config-sharding.yaml
-rw-r--r-- 1 501 games 1322 Apr 22 2019 logback.xml
-rw-r--r-- 1 501 games 1918 Jul 30 2019 server.yaml
config-xxx.yaml
Data sharding example:
schemaName: #逻辑数据源名称
dataSources: #数据源配置,可配置多个data_source_name
<data_source_name>: #与Sharding-JDBC配置不同,无需配置数据库连接池
url: #数据库url连接
username: #数据库用户名
password: #数据库密码
connectionTimeoutMilliseconds: 30000 #连接超时毫秒数
idleTimeoutMilliseconds: 60000 #空闲连接回收超时毫秒数
maxLifetimeMilliseconds: 1800000 #连接最大存活时间毫秒数
maxPoolSize: 65 #最大连接数
shardingRule: #省略数据分片配置,与Sharding-JDBC配置一致
Read/write splitting example:
schemaName: #逻辑数据源名称
dataSources: #省略数据源配置,与数据分片一致
masterSlaveRule: #省略读写分离配置,与Sharding-JDBC配置一致
Common configuration example:
conf/server.yaml
Proxy属性
#省略与Sharding-JDBC一致的配置属性
props:
acceptor.size: #用于设置接收客户端请求的工作线程个数,默认为CPU核数*2
proxy.transaction.type: #默认为LOCAL事务,允许LOCAL,XA,BASE三个值,XA采用Atomikos作为事务管理器,BASE类型需要拷贝实现ShardingTransactionManager的接口的jar包至lib目录中
proxy.opentracing.enabled: #是否开启链路追踪功能,默认为不开启。详情请参见[链路追踪](/cn/features/orchestration/apm/)
check.table.metadata.enabled: #是否在启动时检查分表元数据一致性,默认值: false
proxy.frontend.flush.threshold: # 对于单个大查询,每多少个网络包返回一次
权限验证
用于执行登录Sharding Proxy的权限验证。配置用户名、密码、可访问的数据库后,必须使用正确的用户名、密码才可登录Proxy。
authentication:
users:
root: # 自定义用户名
password: root # 自定义用户名
sharding: # 自定义用户名
password: sharding # 自定义用户名
authorizedSchemas: sharding_db, masterslave_db # 该用户授权可访问的数据库,多个用逗号分隔。缺省将拥有root权限,可访问全部数据库。
Set up a test environment.
Java:
https://www.java.com/zh_CN/download/help/linux_x64rpm_install.xml#download
yum install -y java
V12 database:
pg_hba.conf
host all r1 0.0.0.0/0 md5
user: r1
pwd: PW123321!
schemaname: digoal
create role r1 login encrypted password 'PW123321!';
create database db0 with owner r1;
create database db1 with owner r1;
create database db2 with owner r1;
create database db3 with owner r1;
Primary database:
127.0.0.1:1921:db0
127.0.0.1:1921:db1
127.0.0.1:1921:db2
127.0.0.1:1921:db3
Corresponding standby database with PostgreSQL physical stream replication:
127.0.0.1:1922:db0
127.0.0.1:1922:db1
127.0.0.1:1922:db2
127.0.0.1:1922:db3
Common configuration:
vi /root/apache-shardingsphere-incubating-4.0.0-sharding-proxy-bin/conf/server.yaml
authentication:
users:
r1:
password: PW123321!
authorizedSchemas: db0,db1,db2,db3
props:
executor.size: 16
sql.show: false
vi /root/apache-shardingsphere-incubating-4.0.0-sharding-proxy-bin/conf/config-sharding.yaml
schemaName: sdb
dataSources:
db0:
url: jdbc:postgresql://localhost:1921/db0
username: r1
password: PW123321!
connectionTimeoutMilliseconds: 30000
idleTimeoutMilliseconds: 60000
maxLifetimeMilliseconds: 1800000
maxPoolSize: 65
db1:
url: jdbc:postgresql://localhost:1921/db1
username: r1
password: PW123321!
connectionTimeoutMilliseconds: 30000
idleTimeoutMilliseconds: 60000
maxLifetimeMilliseconds: 1800000
maxPoolSize: 65
db2:
url: jdbc:postgresql://localhost:1921/db2
username: r1
password: PW123321!
connectionTimeoutMilliseconds: 30000
idleTimeoutMilliseconds: 60000
maxLifetimeMilliseconds: 1800000
maxPoolSize: 65
db3:
url: jdbc:postgresql://localhost:1921/db3
username: r1
password: PW123321!
connectionTimeoutMilliseconds: 30000
idleTimeoutMilliseconds: 60000
maxLifetimeMilliseconds: 1800000
maxPoolSize: 65
shardingRule:
tables:
t_order:
actualDataNodes: db${0..3}.t_order${0..7}
databaseStrategy:
inline:
shardingColumn: user_id
algorithmExpression: db${user_id % 4}
tableStrategy:
inline:
shardingColumn: order_id
algorithmExpression: t_order${order_id % 8}
keyGenerator:
type: SNOWFLAKE
column: order_id
t_order_item:
actualDataNodes: db${0..3}.t_order_item${0..7}
databaseStrategy:
inline:
shardingColumn: user_id
algorithmExpression: db${user_id % 4}
tableStrategy:
inline:
shardingColumn: order_id
algorithmExpression: t_order_item${order_id % 8}
keyGenerator:
type: SNOWFLAKE
column: order_item_id
bindingTables:
- t_order,t_order_item
defaultTableStrategy:
none: :
1) Start ShardingSphere and monitor port 8001.
cd bin
./start.sh 8001
2) Check the log to confirm that ShardingSphere has started before you proceed.
Test
The schemaName of the corresponding sharding configuration for the database is connected.
psql -h 127.0.0.1 -p 8001 -U r1 sdb
3) In the target database, the URL is created automatically based on the configured policy.
create table t_order(order_id int8 primary key, user_id int8, info text, c1 int, crt_time timestamp);
create table t_order_item(order_item_id int8 primary key, order_id int8, user_id int8, info text, c1 int, c2 int, c3 int, c4 int, c5 int, crt_time timestamp);
sdb=> \dt
List of relations
Schema | Name | Type | Owner
--------+---------------+-------+-------
public | t_order0 | table | r1
public | t_order1 | table | r1
public | t_order2 | table | r1
public | t_order3 | table | r1
public | t_order4 | table | r1
public | t_order5 | table | r1
public | t_order6 | table | r1
public | t_order7 | table | r1
public | t_order_item0 | table | r1
public | t_order_item1 | table | r1
public | t_order_item2 | table | r1
public | t_order_item3 | table | r1
public | t_order_item4 | table | r1
public | t_order_item5 | table | r1
public | t_order_item6 | table | r1
public | t_order_item7 | table | r1
(16 rows)
Note: Ideally, you should set the ShardingSphere to false to know how it parses and routes SQL statements and configures sql.show. Otherwise, a large number of logs will be printed, causing performance problems.
vi /root/apache-shardingsphere-incubating-4.0.0-sharding-proxy-bin/conf/server.yaml
authentication:
users:
r1:
password: PW123321!
authorizedSchemas: db0,db1,db2,db3
props:
executor.size: 16
sql.show: true # 表示把解析的sql打印到日志
4) Test writes and queries.
sdb=> insert into t_order (user_id, info, c1, crt_time) values (0,'a',1,now());
sdb=> insert into t_order (user_id, info, c1, crt_time) values (1,'b',2,now());
sdb=> insert into t_order (user_id, info, c1, crt_time) values (2,'c',3,now());
sdb=> insert into t_order (user_id, info, c1, crt_time) values (3,'c',4,now());
sdb=> select * from t_order;
order_id | user_id | info | c1 | crt_time
--------------------+---------+------+----+----------------------------
433352561047633921 | 0 | a | 1 | 2020-02-09 19:48:21.856555
433352585668198400 | 1 | b | 2 | 2020-02-09 19:48:27.726815
433352610813050881 | 2 | c | 3 | 2020-02-09 19:48:33.721754
433352628370407424 | 3 | c | 4 | 2020-02-09 19:48:37.907683
(4 rows)
sdb=> select * from t_order where user_id=1;
order_id | user_id | info | c1 | crt_time
--------------------+---------+------+----+----------------------------
433352585668198400 | 1 | b | 2 | 2020-02-09 19:48:27.726815
(1 row)
5) Query ShardingSphere logs.
[root@iZbp135pwcjjoxqgfpw9k1Z logs]# pwd
/root/apache-shardingsphere-incubating-4.0.0-sharding-proxy-bin/logs
[root@iZbp135pwcjjoxqgfpw9k1Z logs]# ll
total 216
-rw-r--r-- 1 root root 214639 Feb 9 19:49 stdout.log
sdb=> insert into t_order (user_id, order_id, info, c1 , crt_time) values (1,1,md5(random()::text), random()*1000, now());
sdb=> select * from t_order where user_id=1 and order_id=1;
order_id | user_id | info | c1 | crt_time
----------+---------+----------------------------------+-----+----------------------------
1 | 1 | e1475ff0d830df92ab4a920e9ae0606b | 694 | 2020-02-09 19:55:14.541556
(1 row)
sdb=> insert into t_order (user_id, order_id, info, c1 , crt_time) values (1,1,md5(random()::text), random()*1000, now()) on conflict (order_id) do update set info=excluded.info,c1=excluded.c1,crt_time=excluded.crt_time;
sdb=> select * from t_order where user_id=1 and order_id=1;
order_id | user_id | info | c1 | crt_time
----------+---------+----------------------------------+-----+----------------------------
1 | 1 | b63a5a4385af47339708eb880c3e1bd1 | 570 | 2020-02-09 19:56:09.045826
(1 row)
6) Use pgbench to conduct a stress test.
vi test.sql
\set user_id random(1,100000000)
\set order_id random(1,2000000000)
\set order_item_id random(1,2000000000)
insert into t_order (user_id, order_id, info, c1 , crt_time) values (:user_id, :order_id,random()::text, random()*1000, now()) on conflict (order_id) do update set info=excluded.info,c1=excluded.c1,crt_time=excluded.crt_time;
insert into t_order_item (order_item_id, user_id, order_id, info, c1,c2,c3,c4,c5,crt_time) values (:order_item_id, :user_id,:order_id,random()::text, random()*1000,random()*1000,random()*1000,random()*1000,random()*1000, now()) on conflict(order_item_id) do nothing;
pgbench -M simple -n -r -P 1 -f ./test.sql -c 24 -j 24 -h 127.0.0.1 -p 8001 -U r1 sdb -T 120
progress: 1.0 s, 1100.9 tps, lat 21.266 ms stddev 6.349
progress: 2.0 s, 1253.0 tps, lat 18.779 ms stddev 7.913
progress: 3.0 s, 1219.0 tps, lat 20.083 ms stddev 13.212
Note that if the UDF executes internal write operations, it is routed to the read-only database, resulting in an error.
schemaName: msdb
dataSources:
ds_master:
url: jdbc:postgresql://localhost:1921/db0
username: r1
password: PW123321!
connectionTimeoutMilliseconds: 30000
idleTimeoutMilliseconds: 60000
maxLifetimeMilliseconds: 1800000
maxPoolSize: 65
ds_slave0:
url: jdbc:postgresql://localhost:1922/db0
username: r1
password: PW123321!
connectionTimeoutMilliseconds: 30000
idleTimeoutMilliseconds: 60000
maxLifetimeMilliseconds: 1800000
maxPoolSize: 65
masterSlaveRule:
name: ds_ms
masterDataSourceName: ds_master
slaveDataSourceNames:
- ds_slave0
The test also applies to ApsaraDB RDS for PostgreSQL.
Please refer to the ShardingSphere documentation to know more about its features and usage restrictions. For instance, you can view hang by executing the ‘explain’ statement and print the following result in logs:
line 1:0 mismatched input 'explain' expecting {SELECT, INSERT, UPDATE, DELETE, CREATE, ALTER, DROP, TRUNCATE, GRANT, REVOKE, SET, BEGIN, COMMIT, ROLLBACK, SAVEPOINT, CALL, START, RESET, SHOW}
The above part includes the SQL commands that ShardingSphere currently supports.
Besides proxy mode, ShardingSphere supports JDBC drive mode with better performance and customized sharding algorithms. You can refer to its official documentation for more details.
User Preference Recommendation System - PostgreSQL Approximate Computing Application
PostgreSQL Quickly Converts All Native Partition Tables to Ordinary Tables
ApsaraDB - November 22, 2022
ApsaraDB - November 21, 2022
Alibaba Cloud Serverless - November 10, 2022
Alibaba Clouder - February 11, 2019
digoal - April 22, 2021
digoal - May 16, 2019
Alibaba Cloud PolarDB for PostgreSQL is an in-house relational database service 100% compatible with PostgreSQL and highly compatible with the Oracle syntax.
Learn MoreAn online MPP warehousing service based on the Greenplum Database open source program
Learn MoreAn on-demand database hosting service for PostgreSQL with automated monitoring, backup and disaster recovery capabilities
Learn MoreLeverage cloud-native database solutions dedicated for FinTech.
Learn MoreMore Posts by digoal