By digoal
This article describes the deployment of open-source PolarDB for PostgreSQL in an environment with shared storage and multiple compute nodes.
The shared storage adopts Network Block Device (NBD). This environment is relatively easy to deploy. You can build it as long as you have a server, lowering the threshold for learning PolarDB for PG. However, it is not recommended to use it in the production environment. you are advised to use commercial SAN or commercial distributed shared block storage in the production environment. The setup method is similar to the content in this article. Skip the NBD deployment and replace the NBD with another shared block device.
Environment:
8 cores, 32 GB
ecs shared storage, 3 network shared disks, 1
172.25.9.67
1 ecs rw
172.25.9.68
2 ecs ro
172.25.9.69
172.25.9.70
ECS needs to be able to connect to the public network. The installation process includes some installation operations, such as obtaining cpan and yum.
The open-source address of PolarDB for PostgreSQL: https://github.com/ApsaraDB/PolarDB-for-PostgreSQL
Please refer to Network Block Device for Testing RAC and Shared Storage Version of PolarDB for PostgreSQL for the three sections above.
Check the block devices used for NBD: vdb, vdc, and vdd
[root@iZbp10sz66ubwpqzg2ry5gZ ~]# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
vda 253:0 0 100G 0 disk
└─vda1 253:1 0 100G 0 part /
vdb 253:16 0 100G 0 disk
vdc 253:32 0 100G 0 disk
vdd 253:48 0 100G 0 disk
Write the NBD server configuration file. Note: There must be no space at the end of the normal configuration line of nbd.conf. Otherwise, the startup will fail.
# vi /root/nbd.conf
# This is a comment
[generic]
# The [generic] section is required, even if nothing is specified
# there.
# When either of these options are specified, nbd-server drops
# privileges to the given user and group after opening ports, but
# _before_ opening files.
# user = nbd
# group = nbd
listenaddr = 0.0.0.0
port = 1921
[export1]
exportname = /dev/vdb
readonly = false
multifile = false
copyonwrite = false
flush = true
fua = true
sync = true
[export2]
exportname = /dev/vdc
readonly = false
multifile = false
copyonwrite = false
flush = true
fua = true
sync = true
[export3]
exportname = /dev/vdd
readonly = false
multifile = false
copyonwrite = false
flush = true
fua = true
sync = true
Start nbd-server:
# nbd-server -C /root/nbd.conf
# netstat -anp|grep 1921
tcp 0 0 0.0.0.0:1921 0.0.0.0:* LISTEN 1296/nbd-server
Please refer to Network Block Device for Testing RAC and Shared Storage Version of PolarDB for PostgreSQL.
After restarting the server, perform the command-modprobe nbd to load the module and then mount NBD:
nbd-client 172.25.9.67 1921 -N export1 /dev/nbd0
nbd-client 172.25.9.67 1921 -N export2 /dev/nbd1
nbd-client 172.25.9.67 1921 -N export3 /dev/nbd2
[root@iZbp13tgwor95f2508zo4oZ ~]# nbd-client 172.25.9.67 1921 -N export1 /dev/nbd0
Negotiation: ..size = 102400MB
bs=1024, sz=107374182400 bytes
[root@iZbp13tgwor95f2508zo4oZ ~]# nbd-client 172.25.9.67 1921 -N export2 /dev/nbd1
Negotiation: ..size = 102400MB
bs=1024, sz=107374182400 bytes
[root@iZbp13tgwor95f2508zo4oZ ~]# nbd-client 172.25.9.67 1921 -N export3 /dev/nbd2
Negotiation: ..size = 102400MB
bs=1024, sz=107374182400 bytes
vi /etc/rc.local
nbd-client 172.25.9.67 1921 -N export1 /dev/nbd0
nbd-client 172.25.9.67 1921 -N export2 /dev/nbd1
nbd-client 172.25.9.67 1921 -N export3 /dev/nbd2
#nbd-client -d /dev/nbd0
#nbd-client -d /dev/nbd1
#nbd-client -d /dev/nbd2
https://github.com/ApsaraDB/PolarDB-for-PostgreSQL/blob/POLARDB_11_STABLE/README.md
https://github.com/ApsaraDB/PolarDB-FileSystem/blob/master/Readme.md
yum install -y cmake3 gcc gcc-c++ libaio-devel git unzip
ln -s /usr/bin/cmake3 /usr/bin/cmake
https://github.com/HardySimpson/zlog/releases
wget https://github.com/HardySimpson/zlog/archive/refs/tags/1.2.15.tar.gz
tar -zxvf 1.2.15.tar.gz
cd zlog-1.2.15/
make
make install
echo "/usr/local/lib" >> /etc/ld.so.conf
ldconfig
ldconfig -v|grep zlog
... ...
libzlog.so.1.2 -> libzlog.so.1.2
su - root
cd ~
# Note the branch. We recommend that you use the stable branch or the release branch. For more information, see the corresponding documentation.
git clone https://github.com/ApsaraDB/PolarDB-FileSystem
cd PolarDB-FileSystem
./autobuild.sh
... ...
[100%] Linking CXX executable ../../../bin/pfsdaemon
[100%] Built target pfsdaemon
~/PolarDB-FileSystem-master
end compile, binary's in ./bin, library's in ./lib
./install.sh
install pfsd success!
There are different paths to different block devices, and different directories can also be created in the same block device. This way, block devices are mapped to directories, thus mapping to the tablespace of the database.
Block Device Renaming
PFS only supports block devices that start with specific characters for access. We recommend that all block device access nodes share the same name to access shared block devices through a soft link.
Execute on the NBD client host:
ln -s /dev/nbd0 /dev/nvme0n1
ln -s /dev/nbd1 /dev/nvme0n2
ln -s /dev/nbd2 /dev/nvme0n3
Block Device Initialization
Note: Only perform PFS operations on RW nodes to format shared block devices:
pfs -C disk mkfs nvme0n1
pfs -C disk mkfs nvme0n2
pfs -C disk mkfs nvme0n3
...
pfs mkfs succeeds!
Block Device Mounting
Start PFS and mount shared disks on RW and RO nodes, respectively:
/usr/local/polarstore/pfsd/bin/start_pfsd.sh -p nvme0n1
/usr/local/polarstore/pfsd/bin/start_pfsd.sh -p nvme0n2
/usr/local/polarstore/pfsd/bin/start_pfsd.sh -p nvme0n3
... ...
pfsdaemon nvme0n1 start success
pfsdaemon nvme0n2 start success
pfsdaemon nvme0n3 start success
vi /etc/rc.local
/usr/local/polarstore/pfsd/bin/start_pfsd.sh -p nvme0n1
/usr/local/polarstore/pfsd/bin/start_pfsd.sh -p nvme0n2
/usr/local/polarstore/pfsd/bin/start_pfsd.sh -p nvme0n3
# /usr/local/polarstore/pfsd/bin/stop_pfsd.sh nvme0n1
# /usr/local/polarstore/pfsd/bin/stop_pfsd.sh nvme0n2
# /usr/local/polarstore/pfsd/bin/stop_pfsd.sh nvme0n3
su - root
yum install -y readline-devel zlib-devel perl-CPAN bison flex git
cpan -fi Test::More IPC::Run
useradd polardb
su - polardb
cd ~
wget https://github.com/ApsaraDB/PolarDB-for-PostgreSQL/archive/refs/heads/POLARDB_11_STABLE.zip
unzip POLARDB_11_STABLE.zip
cd PolarDB-for-PostgreSQL-POLARDB_11_STABLE
./polardb_build.sh --noinit --with-pfsd
su - polardb
cd ~
$HOME/tmp_basedir_polardb_pg_1100_bld/bin/initdb -D primary -E UTF8 --lc-collate=C --lc-ctype=en_US.utf8 -U polardb
# shared storage initialization
su - root
pfs -C disk mkdir /nvme0n1/shared_data
# This step is also executed with root.
/home/polardb/tmp_basedir_polardb_pg_1100_bld/bin/polar-initdb.sh /home/polardb/primary/ /nvme0n1/shared_data/
Node Configuration:
su – polardb
cd ~/primary # Note
Open postgresql.conf and add the following configuration items:
listen_addresses = '0.0.0.0'
port = 5432
max_connections = 1000
unix_socket_directories = '., /tmp'
tcp_keepalives_idle = 60
tcp_keepalives_interval = 10
tcp_keepalives_count = 10
shared_buffers = 8GB
maintenance_work_mem = 1GB
dynamic_shared_memory_type = posix
parallel_leader_participation = off
random_page_cost = 1.1
log_destination = 'csvlog'
logging_collector = on
log_directory = 'log'
log_truncate_on_rotation = on
log_checkpoints = on
log_connections = on
log_disconnections = on
log_error_verbosity = verbose
log_lock_waits = on
log_statement = 'ddl'
log_timezone = 'Asia/Shanghai'
log_autovacuum_min_duration = 0
autovacuum_vacuum_cost_delay = 0ms
datestyle = 'iso, mdy'
timezone = 'Asia/Shanghai'
lc_messages = 'en_US.UTF-8'
lc_monetary = 'en_US.UTF-8'
lc_numeric = 'en_US.UTF-8'
lc_time = 'en_US.UTF-8'
default_text_search_config = 'pg_catalog.english'
polar_hostid=1 # Note
polar_enable_shared_storage_mode=on
polar_disk_name='nvme0n1' # Note
polar_datadir='/nvme0n1/shared_data/' # Note
polar_vfs.localfs_mode=off
shared_preload_libraries='$libdir/polar_vfs,$libdir/polar_worker'
polar_storage_cluster_name='disk'
synchronous_standby_names='replica1, replica2' # Note
Open the pg_hba.conf and add the following configuration items:
host replication polardb 172.25.9.68/32 trust
host replication polardb 172.25.9.69/32 trust
host replication polardb 172.25.9.70/32 trust
host all all 0.0.0.0/0 md5
Start and Check:
su - polardb
Start:
$HOME/tmp_basedir_polardb_pg_1100_bld/bin/pg_ctl start -D $HOME/primary
Check:
$HOME/tmp_basedir_polardb_pg_1100_bld/bin/psql -p 5432 -d postgres -c 'select version();'
version
---------------------------------------------------------------------------------------------------------
PostgreSQL 11.9 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-44), 64-bit
(1 row)
Configure environment variables for ease of use:
su - polardb
vi ~/.bashrc
# Append the following parameters:
export PS1="$USER@`/bin/hostname -s`-> "
export PGPORT=5432
export PGDATA=/home/polardb/primary
export LANG=en_US.utf8
export PGHOME=/home/polardb/tmp_basedir_polardb_pg_1100_bld
export LD_LIBRARY_PATH=$PGHOME/lib:/lib64:/usr/lib64:/usr/local/lib64:/lib:/usr/lib:/usr/local/lib:$LD_LIBRARY_PATH
export DATE=`date +"%Y%m%d%H%M"`
export PATH=$PGHOME/bin:$PATH:.
export MANPATH=$PGHOME/share/man:$MANPATH
export PGHOST=127.0.0.1
export PGUSER=polardb
export PGDATABASE=postgres
alias rm='rm -i'
alias ll='ls -lh'
unalias vi
Create the corresponding replication slot for physical stream replication of the read-only node next:
su - polardb
$HOME/tmp_basedir_polardb_pg_1100_bld/bin/psql -p 5432 -d postgres -c "select pg_create_physical_replication_slot('replica1');"
$HOME/tmp_basedir_polardb_pg_1100_bld/bin/psql -p 5432 -d postgres -c "select pg_create_physical_replication_slot('replica2');"
Currently, mapping multiple block devices to different tablespaces is not supported.
There are multiple disks, but how do you use them? Lvm2 logical volume can be used for management. It can be configured on the NBD server to make a large volume.
pvcreate
vgcreate
lvcreate
lvextend
Use PFS to view the contents of the shared storage after initialization:
[root@iZbp13tgwor95f2508zo4oZ ~]# pfs -C disk ls /nvme0n1/
File 1 4194304 Mon Nov 8 14:21:58 2021 .pfs-paxos
File 1 1073741824 Mon Nov 8 14:22:03 2021 .pfs-journal
Dir 1 1280 Mon Nov 8 15:23:05 2021 shared_data
total 2105344 (unit: 512Bytes)
[root@iZbp13tgwor95f2508zo4oZ ~]# pfs -C disk ls /nvme0n1/shared_data
Dir 1 512 Mon Nov 8 15:11:38 2021 base
Dir 1 7424 Mon Nov 8 15:11:39 2021 global
Dir 1 0 Mon Nov 8 15:11:39 2021 pg_tblspc
Dir 1 10368 Mon Nov 8 18:06:41 2021 pg_wal
Dir 1 896 Mon Nov 8 18:06:41 2021 pg_logindex
Dir 1 0 Mon Nov 8 15:11:39 2021 pg_twophase
Dir 1 512 Mon Nov 8 17:57:42 2021 pg_xact
Dir 1 0 Mon Nov 8 15:11:40 2021 pg_commit_ts
Dir 1 256 Mon Nov 8 15:11:40 2021 pg_multixact
Dir 1 512 Mon Nov 8 15:23:18 2021 polar_fullpage
total 0 (unit: 512Bytes)
su - polardb
Node Initialization:
cd ~
$HOME/tmp_basedir_polardb_pg_1100_bld/bin/initdb -D replica1 # Note
Node Configuration:
cd ~/replica1 # Note
Open postgresql.conf and add the following configuration items:
listen_addresses = '0.0.0.0'
port = 5432
max_connections = 1000
unix_socket_directories = '., /tmp'
tcp_keepalives_idle = 60
tcp_keepalives_interval = 10
tcp_keepalives_count = 10
shared_buffers = 8GB
maintenance_work_mem = 1GB
dynamic_shared_memory_type = posix
parallel_leader_participation = off
random_page_cost = 1.1
log_destination = 'csvlog'
logging_collector = on
log_directory = 'log'
log_truncate_on_rotation = on
log_checkpoints = on
log_connections = on
log_disconnections = on
log_error_verbosity = verbose
log_lock_waits = on
log_statement = 'ddl'
log_timezone = 'Asia/Shanghai'
log_autovacuum_min_duration = 0
autovacuum_vacuum_cost_delay = 0ms
datestyle = 'iso, mdy'
timezone = 'Asia/Shanghai'
lc_messages = 'en_US.UTF-8'
lc_monetary = 'en_US.UTF-8'
lc_numeric = 'en_US.UTF-8'
lc_time = 'en_US.UTF-8'
default_text_search_config = 'pg_catalog.english'
polar_hostid=2 # Note
polar_enable_shared_storage_mode=on
polar_disk_name='nvme0n1' # Note
polar_datadir='/nvme0n1/shared_data/' # Note
polar_vfs.localfs_mode=off
shared_preload_libraries='$libdir/polar_vfs,$libdir/polar_worker'
polar_storage_cluster_name='disk'
Open the pg_hba.conf and add the following configuration items:
host replication polardb 172.25.9.68/32 trust
host replication polardb 172.25.9.69/32 trust
host replication polardb 172.25.9.70/32 trust
host all all 0.0.0.0/0 md5
Create recovery.conf and add the following configuration items:
polar_replica='on'
recovery_target_timeline='latest'
primary_slot_name='replica1' # Note
primary_conninfo='host=172.25.9.68 port=5432 user=polardb dbname=postgres application_name=replica1' # Note
Configure environment variables for ease of use:
su - polardb
vi ~/.bashrc
# Append the following parameters:
export PS1="$USER@`/bin/hostname -s`-> "
export PGPORT=5432
export PGDATA=/home/polardb/replica1 # Note
export LANG=en_US.utf8
export PGHOME=/home/polardb/tmp_basedir_polardb_pg_1100_bld
export LD_LIBRARY_PATH=$PGHOME/lib:/lib64:/usr/lib64:/usr/local/lib64:/lib:/usr/lib:/usr/local/lib:$LD_LIBRARY_PATH
export DATE=`date +"%Y%m%d%H%M"`
export PATH=$PGHOME/bin:$PATH:.
export MANPATH=$PGHOME/share/man:$MANPATH
export PGHOST=127.0.0.1
export PGUSER=polardb
export PGDATABASE=postgres
alias rm='rm -i'
alias ll='ls -lh'
unalias vi
Node Initialization:
$HOME/tmp_basedir_polardb_pg_1100_bld/bin/initdb -D replica2 # Note
Node Configuration:
cd ~/replica2 # Note
Open postgresql.conf and add the following configuration items:
listen_addresses = '0.0.0.0'
port = 5432
max_connections = 1000
unix_socket_directories = '., /tmp'
tcp_keepalives_idle = 60
tcp_keepalives_interval = 10
tcp_keepalives_count = 10
shared_buffers = 8GB
maintenance_work_mem = 1GB
dynamic_shared_memory_type = posix
parallel_leader_participation = off
random_page_cost = 1.1
log_destination = 'csvlog'
logging_collector = on
log_directory = 'log'
log_truncate_on_rotation = on
log_checkpoints = on
log_connections = on
log_disconnections = on
log_error_verbosity = verbose
log_lock_waits = on
log_statement = 'ddl'
log_timezone = 'Asia/Shanghai'
log_autovacuum_min_duration = 0
autovacuum_vacuum_cost_delay = 0ms
datestyle = 'iso, mdy'
timezone = 'Asia/Shanghai'
lc_messages = 'en_US.UTF-8'
lc_monetary = 'en_US.UTF-8'
lc_numeric = 'en_US.UTF-8'
lc_time = 'en_US.UTF-8'
default_text_search_config = 'pg_catalog.english'
polar_hostid=3 # Note
polar_enable_shared_storage_mode=on
polar_disk_name='nvme0n1' # Note
polar_datadir='/nvme0n1/shared_data/' # Note
polar_vfs.localfs_mode=off
shared_preload_libraries='$libdir/polar_vfs,$libdir/polar_worker'
polar_storage_cluster_name='disk'
Open the pg_hba.conf and add the following configuration items:
host replication polardb 172.25.9.68/32 trust
host replication polardb 172.25.9.69/32 trust
host replication polardb 172.25.9.70/32 trust
host all all 0.0.0.0/0 md5
Create recovery.conf and add the following configuration items:
polar_replica='on'
recovery_target_timeline='latest'
primary_slot_name='replica2' # Note
primary_conninfo='host=172.25.9.68 port=5432 user=polardb dbname=postgres application_name=replica2' # Note
Configure environment variables for ease of use:
su - polardb
vi ~/.bashrc
# Append the following parameters:
export PS1="$USER@`/bin/hostname -s`-> "
export PGPORT=5432
export PGDATA=/home/polardb/replica2 # Note
export LANG=en_US.utf8
export PGHOME=/home/polardb/tmp_basedir_polardb_pg_1100_bld
export LD_LIBRARY_PATH=$PGHOME/lib:/lib64:/usr/lib64:/usr/local/lib64:/lib:/usr/lib:/usr/local/lib:$LD_LIBRARY_PATH
export DATE=`date +"%Y%m%d%H%M"`
export PATH=$PGHOME/bin:$PATH:.
export MANPATH=$PGHOME/share/man:$MANPATH
export PGHOST=127.0.0.1
export PGUSER=polardb
export PGDATABASE=postgres
alias rm='rm -i'
alias ll='ls -lh'
unalias vi
Start and Check:
su - polardb
Start:
$HOME/tmp_basedir_polardb_pg_1100_bld/bin/pg_ctl start -D $HOME/replica1
Check:
$HOME/tmp_basedir_polardb_pg_1100_bld/bin/psql -p 5432 -d postgres -c 'select version();'
Start and Check:
su - polardb
Start:
$HOME/tmp_basedir_polardb_pg_1100_bld/bin/pg_ctl start -D $HOME/replica2
Check:
$HOME/tmp_basedir_polardb_pg_1100_bld/bin/psql -p 5432 -d postgres -c 'select version();'
Instance Check and Test
After the deployment, you need to check and test the instance to ensure the master node supports data writing and the read-only node supports data reading.
Log on to the RW master node, create a test table, and insert sample data:
$HOME/tmp_basedir_polardb_pg_1100_bld/bin/psql -q -p 5432 -d postgres -c "create table t(t1 int primary key, t2 int);insert into t values (1, 1),(2, 3),(3, 3);"
Log on to any RO read-only node and query the sample data that was inserted:
$HOME/tmp_basedir_polardb_pg_1100_bld/bin/psql -q -p 5432 -d postgres -c "select * from t;"
t1 | t2
----+----
1 | 1
2 | 3
3 | 3
(3 rows)
RW Replication Status Check:
postgres=# select * from pg_stat_replication ;
-[ RECORD 1 ]----+------------------------------
pid | 17361
usesysid | 10
usename | polardb
application_name | replica1
client_addr | 172.25.9.69
client_hostname |
client_port | 56684
backend_start | 2021-11-08 15:34:04.711213+08
backend_xmin |
state | streaming
sent_lsn | 0/166A090
write_lsn | 0/166A090
flush_lsn | 0/166A090
replay_lsn | 0/166A090
write_lag |
flush_lag |
replay_lag |
sync_priority | 1
sync_state | sync
-[ RECORD 2 ]----+------------------------------
pid | 17363
usesysid | 10
usename | polardb
application_name | replica2
client_addr | 172.25.9.70
client_hostname |
client_port | 45858
backend_start | 2021-11-08 15:34:30.442495+08
backend_xmin |
state | streaming
sent_lsn | 0/166A090
write_lsn | 0/166A090
flush_lsn | 0/166A090
replay_lsn | 0/166A090
write_lag |
flush_lag |
replay_lag |
sync_priority | 2
sync_state | potential
postgres=# select * from pg_replication_slots ;
-[ RECORD 1 ]-------+----------
slot_name | replica1
plugin |
slot_type | physical
datoid |
database |
temporary | f
active | t
active_pid | 17361
xmin |
catalog_xmin |
restart_lsn | 0/1669C78
confirmed_flush_lsn |
-[ RECORD 2 ]-------+----------
slot_name | replica2
plugin |
slot_type | physical
datoid |
database |
temporary | f
active | t
active_pid | 17363
xmin |
catalog_xmin |
restart_lsn | 0/1669C78
confirmed_flush_lsn |
Stress Testing (before optimization):
pgbench -i -s 100
The read-only stress testing is performed on all nodes at the same time.
pgbench -M prepared -n -r -P 1 -c 32 -j 32 -T 120 -S
The reading/writing stress testing is performed on the RW node.
pgbench -M prepared -n -r -P 1 -c 8 -j 8 -T 120
Query throughput is basically a linear increase with the increase in the number of nodes (because the data is in memory).
RW Node:
Reading/Writing Stress Testing:
pgbench -M prepared -n -r -P 1 -c 8 -j 8 -T 1200
progress: 192.0 s, 0.0 tps, lat 0.000 ms stddev 0.000
progress: 193.0 s, 0.0 tps, lat 0.000 ms stddev 0.000
progress: 194.0 s, 559.0 tps, lat 58.408 ms stddev 456.270
progress: 195.0 s, 1616.7 tps, lat 4.959 ms stddev 18.420
progress: 196.0 s, 2153.2 tps, lat 3.709 ms stddev 1.102
progress: 197.0 s, 646.0 tps, lat 3.635 ms stddev 1.042
progress: 198.0 s, 0.0 tps, lat 0.000 ms stddev 0.000
progress: 199.0 s, 0.0 tps, lat 0.000 ms stddev 0.000
progress: 200.0 s, 283.1 tps, lat 104.779 ms stddev 595.861
progress: 201.0 s, 2214.0 tps, lat 3.620 ms stddev 1.123
progress: 202.0 s, 2153.0 tps, lat 3.709 ms stddev 1.096
progress: 203.0 s, 2377.8 tps, lat 3.369 ms stddev 0.977
progress: 204.0 s, 2313.3 tps, lat 3.460 ms stddev 0.987
progress: 205.0 s, 2329.9 tps, lat 3.429 ms stddev 0.976
progress: 206.0 s, 2283.7 tps, lat 3.508 ms stddev 1.052
progress: 207.0 s, 2098.3 tps, lat 3.809 ms stddev 1.895
progress: 208.0 s, 2340.1 tps, lat 3.417 ms stddev 0.977
progress: 209.0 s, 2052.9 tps, lat 3.902 ms stddev 6.712
progress: 210.0 s, 2346.1 tps, lat 3.408 ms stddev 0.965
progress: 211.0 s, 2316.9 tps, lat 3.452 ms stddev 0.994
RO Node 1, 2:
Read-Only Stress Testing:
pgbench -M prepared -n -r -P 1 -c 32 -j 32 -T 1200 -S
progress: 254.0 s, 30921.5 tps, lat 1.034 ms stddev 0.892
progress: 255.0 s, 32351.0 tps, lat 0.990 ms stddev 1.660
progress: 256.0 s, 33540.0 tps, lat 0.953 ms stddev 0.859
progress: 257.0 s, 33027.0 tps, lat 0.971 ms stddev 1.040
progress: 258.0 s, 32791.1 tps, lat 0.976 ms stddev 0.631
progress: 259.0 s, 32839.6 tps, lat 0.975 ms stddev 1.837
progress: 260.0 s, 33539.4 tps, lat 0.954 ms stddev 0.527
progress: 261.0 s, 34344.5 tps, lat 0.932 ms stddev 0.984
progress: 262.0 s, 32383.9 tps, lat 0.988 ms stddev 0.618
progress: 263.0 s, 33186.0 tps, lat 0.964 ms stddev 0.512
progress: 264.0 s, 33253.3 tps, lat 0.962 ms stddev 0.497
progress: 265.0 s, 32584.0 tps, lat 0.982 ms stddev 0.466
progress: 266.0 s, 32959.8 tps, lat 0.967 ms stddev 1.310
progress: 267.0 s, 32392.0 tps, lat 0.991 ms stddev 0.701
progress: 268.0 s, 33307.8 tps, lat 0.961 ms stddev 0.505
progress: 269.0 s, 33255.8 tps, lat 0.962 ms stddev 0.589
progress: 270.0 s, 33994.6 tps, lat 0.941 ms stddev 0.449
progress: 271.0 s, 34127.2 tps, lat 0.937 ms stddev 1.441
pgbench -M prepared -n -r -P 1 -c 32 -j 32 -T 1200 -S
progress: 253.0 s, 32289.2 tps, lat 0.991 ms stddev 0.621
progress: 254.0 s, 32778.7 tps, lat 0.976 ms stddev 0.658
progress: 255.0 s, 32314.9 tps, lat 0.984 ms stddev 1.320
progress: 256.0 s, 31984.6 tps, lat 1.006 ms stddev 1.159
progress: 257.0 s, 31889.4 tps, lat 1.004 ms stddev 0.679
progress: 258.0 s, 32466.4 tps, lat 0.986 ms stddev 0.565
progress: 259.0 s, 33572.2 tps, lat 0.953 ms stddev 0.517
progress: 260.0 s, 33482.7 tps, lat 0.956 ms stddev 0.532
progress: 261.0 s, 33047.4 tps, lat 0.968 ms stddev 0.487
progress: 262.0 s, 32951.0 tps, lat 0.971 ms stddev 0.634
progress: 263.0 s, 34039.1 tps, lat 0.940 ms stddev 1.068
progress: 264.0 s, 33919.8 tps, lat 0.944 ms stddev 0.936
progress: 265.0 s, 34062.7 tps, lat 0.940 ms stddev 0.648
progress: 266.0 s, 31726.4 tps, lat 1.009 ms stddev 0.567
progress: 267.0 s, 34335.7 tps, lat 0.932 ms stddev 1.252
progress: 268.0 s, 33604.4 tps, lat 0.952 ms stddev 0.571
progress: 269.0 s, 34043.3 tps, lat 0.940 ms stddev 0.673
progress: 270.0 s, 33909.3 tps, lat 0.944 ms stddev 0.547
Low Observation Delay:
postgres=# select * from pg_stat_replication ;
-[ RECORD 1 ]----+------------------------------
pid | 17361
usesysid | 10
usename | polardb
application_name | replica1
client_addr | 172.25.9.69
client_hostname |
client_port | 56684
backend_start | 2021-11-08 15:34:04.711213+08
backend_xmin |
state | streaming
sent_lsn | 2/456D3BF8
write_lsn | 2/456D3BF8
flush_lsn | 2/456D3BF8
replay_lsn | 2/456D3BF8
write_lag | 00:00:00.000147
flush_lag | 00:00:00.000147
replay_lag | 00:00:00.000244
sync_priority | 1
sync_state | sync
-[ RECORD 2 ]----+------------------------------
pid | 17363
usesysid | 10
usename | polardb
application_name | replica2
client_addr | 172.25.9.70
client_hostname |
client_port | 45858
backend_start | 2021-11-08 15:34:30.442495+08
backend_xmin |
state | streaming
sent_lsn | 2/456D3BF8
write_lsn | 2/456D3BF8
flush_lsn | 2/456D3BF8
replay_lsn | 2/456D3BF8
write_lag | 00:00:00.000517
flush_lag | 00:00:00.000517
replay_lag | 00:00:00.00052
sync_priority | 2
sync_state | potential
Analyze wait events:
The data is concentrated in wal writer. Since NBD server runs on TCP, it is normal if the delay is very high. On the other hand, TPS 0 indicates that I/O traffic is limited. This environment is built by the NBD server composed of ECS + cloud disk. The ECS network layer and cloud disk support traffic limitations, so TPS 0 may occur. You can create a local SSD server with good performance as an NBD server next time.
If there is a bottleneck in RT but not in IOBW throughput, group commit can be used to improve performance. If a bottleneck occurs in IOBW, there is no way to improve performance.
postgres=# select wait_event_type,wait_event,count(*) from pg_stat_activity group by 1,2 order by 3 desc;
wait_event_type | wait_event | count
-----------------+---------------------+-------
LWLock | WALWriteLock | 8
Activity | WalSenderMain | 2
Activity | CheckpointerMain | 1
IO | WALInitWrite | 1
Activity | AutoVacuumMain | 1
| | 1
IO | VFSFileOpen | 1
Activity | LogicalLauncherMain | 1
(8 rows)
nbd server, observe network, and disk throughput:
dstat
----total-cpu-usage---- -dsk/total- -net/total- ---paging-- ---system--
usr sys idl wai hiq siq| read writ| recv send| in out | int csw
0 2 80 17 0 0| 0 51M| 54M 33M| 0 0 | 61k 99k
0 2 79 18 0 0| 0 50M| 54M 35M| 0 0 | 63k 102k
0 1 89 10 0 0| 0 26M| 28M 36M| 0 0 | 45k 69k
0 1 92 7 0 0| 0 17M| 19M 31M| 0 0 | 36k 55k
0 1 92 7 0 0| 0 18M| 19M 28M| 0 0 | 36k 53k
iostat -x 1
avg-cpu: %user %nice %system %iowait %steal %idle
0.13 0.00 0.77 7.17 0.00 91.93
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
vda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
vdb 0.00 1142.00 0.00 6946.00 0.00 18460.00 5.32 0.55 0.08 0.00 0.08 0.14 99.20
vdc 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
vdd 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Monitor IO Performance:
su - root
pfsadm mountstat nvme0n1
Open group commit:
// Sets the delay in microseconds between transaction commit and flushing WAL to disk.
postgres=# alter role polardb set commit_delay ='10';
ALTER ROLE
// Sets the minimum concurrent open transactions before performing commit_delay.
postgres=# alter role polardb set commit_siblings =5;
ALTER ROLE
Set the number of transaction processes in the reading and writing stress testing to 32 concurrent requests:
pgbench -M prepared -n -r -P 1 -c 32 -j 32 -T 1200
There is no improvement because the IOBW is full.
After changing to the unlogged table, the performance of group commit with 32 concurrent requests is improved. (Note: Unlogged tables and temporary tables are not supported on RO nodes. Unlogged tables created by the master node cannot be queried on RO nodes.)
progress: 164.0 s, 4019.8 tps, lat 7.965 ms stddev 2.940
progress: 165.0 s, 1676.1 tps, lat 8.152 ms stddev 2.298
progress: 166.0 s, 4027.8 tps, lat 12.503 ms stddev 63.648
progress: 167.0 s, 4445.0 tps, lat 7.197 ms stddev 2.822
progress: 168.0 s, 4357.3 tps, lat 7.342 ms stddev 2.804
progress: 169.0 s, 4567.0 tps, lat 7.006 ms stddev 3.003
progress: 170.0 s, 4648.9 tps, lat 6.881 ms stddev 2.792
progress: 171.0 s, 4427.9 tps, lat 7.226 ms stddev 3.254
progress: 172.0 s, 4468.9 tps, lat 7.163 ms stddev 3.111
progress: 173.0 s, 4571.2 tps, lat 7.003 ms stddev 3.023
progress: 174.0 s, 4695.8 tps, lat 6.814 ms stddev 2.940
progress: 175.0 s, 4627.2 tps, lat 6.914 ms stddev 2.644
progress: 176.0 s, 4466.9 tps, lat 7.159 ms stddev 3.036
progress: 177.0 s, 4508.4 tps, lat 7.109 ms stddev 2.564
progress: 178.0 s, 4474.7 tps, lat 7.143 ms stddev 2.683
progress: 179.0 s, 4476.1 tps, lat 7.156 ms stddev 2.609
progress: 180.0 s, 4622.0 tps, lat 6.924 ms stddev 2.884
progress: 181.0 s, 4726.6 tps, lat 6.770 ms stddev 2.798
progress: 182.0 s, 4480.2 tps, lat 7.142 ms stddev 2.644
progress: 183.0 s, 2224.2 tps, lat 6.899 ms stddev 2.849
progress: 184.0 s, 3633.6 tps, lat 13.391 ms stddev 70.771
progress: 185.0 s, 4314.0 tps, lat 7.416 ms stddev 2.274
Is PostgreSQL HOT Vacuum Link Contraction Secure for DML Where CTID=ctid?
digoal - October 16, 2023
digoal - October 18, 2022
ApsaraDB - December 5, 2018
Alibaba Clouder - April 10, 2018
ApsaraDB - September 19, 2022
ApsaraDB - March 3, 2020
Alibaba Cloud PolarDB for PostgreSQL is an in-house relational database service 100% compatible with PostgreSQL and highly compatible with the Oracle syntax.
Learn MoreAlibaba Cloud PolarDB for Xscale (PolarDB-X) is a cloud-native high-performance distributed database service independently developed by Alibaba Cloud.
Learn MoreAlibaba Cloud PolarDB for MySQL is a cloud-native relational database service 100% compatible with MySQL.
Learn MoreMulti-source metrics are aggregated to monitor the status of your business and services in real time.
Learn MoreMore Posts by digoal