By digoal
Many options are available for storing data in the cloud: local SSD, SSD, ESSD, ESSD PL1, PL2, and PL3. But, how to choose a storage type for a PostgreSQL database? What are the advantages and disadvantages of each option? And how to select a storage type in different scenarios? Is the single I/O latency of cloud disks a shared concern for all kinds of businesses? How to avoid single I/O latency properly?
First, test the performance. Then analyze the similarities and differences between local SSD and cloud SSD to finalize the appropriate storage option for the database.
The test environment is as follows:
Disk:
CPU:
Memory: 512 GB
# lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 64
On-line CPU(s) list: 0-63
Thread(s) per core: 2
Core(s) per socket: 32
Socket(s): 1
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 85
Model name: Intel(R) Xeon(R) Platinum 8163 CPU @ 2.50GHz
Stepping: 4
CPU MHz: 2500.006
BogoMIPS: 5000.01
Hypervisor vendor: KVM
Virtualization type: full
L1d cache: 32K
L1i cache: 32K
L2 cache: 1024K
L3 cache: 33792K
NUMA node0 CPU(s): 0-63
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl eagerfpu pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single ibrs ibpb stibp fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx avx512f avx512dq rdseed adx smap avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 spec_ctrl intel_stibp
# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
vda 253:0 0 100G 0 disk
└─vda1 253:1 0 100G 0 part /
vdb 253:16 0 1.8T 0 disk
vdc 253:32 0 1.8T 0 disk
vdd 253:48 0 1.8T 0 disk
vde 253:64 0 1.8T 0 disk
vdf 253:80 0 1.8T 0 disk
vdg 253:96 0 1.8T 0 disk
vdh 253:112 0 1.8T 0 disk
vdi 253:128 0 1.8T 0 disk
Disk 1:
Disk 2:
CPU:
Memory: 192 GB
# lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 52
On-line CPU(s) list: 0-51
Thread(s) per core: 2
Core(s) per socket: 26
Socket(s): 1
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 85
Model name: Intel(R) Xeon(R) Platinum 8269CY CPU @ 2.50GHz
Stepping: 7
CPU MHz: 2500.002
BogoMIPS: 5000.00
Hypervisor vendor: KVM
Virtualization type: full
L1d cache: 32K
L1i cache: 32K
L2 cache: 1024K
L3 cache: 36608K
NUMA node0 CPU(s): 0-51
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl eagerfpu pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx avx512f avx512dq rdseed adx smap avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 arat avx512_vnni
# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
vda 253:0 0 100G 0 disk
└─vda1 253:1 0 100G 0 part /
vdb 253:16 0 1.8T 0 disk
vdc 253:32 0 1.8T 0 disk
vdd 253:48 0 1.8T 0 disk
vde 253:64 0 1.8T 0 disk
vdf 253:80 0 1.8T 0 disk
vdg 253:96 0 1.8T 0 disk
vdh 253:112 0 1.8T 0 disk
vdi 253:128 0 1.8T 0 disk
vdj 253:144 0 19.5T 0 disk
vi /etc/sysctl.conf
# add by digoal.zhou
fs.aio-max-nr = 1048576
fs.file-max = 76724600
# 可选:kernel.core_pattern = /data01/corefiles/core_%e_%u_%t_%s.%p
# /data01/corefiles 事先建好,权限777,如果是软链接,对应的目录修改为777
kernel.sem = 4096 2147483647 2147483646 512000
# 信号量, ipcs -l 或 -u 查看,每16个进程一组,每组信号量需要17个信号量。
kernel.shmall = 107374182
# 所有共享内存段相加大小限制 (建议内存的80%),单位为页。
kernel.shmmax = 274877906944
# 最大单个共享内存段大小 (建议为内存一半), >9.2的版本已大幅降低共享内存的使用,单位为字节。
kernel.shmmni = 819200
# 一共能生成多少共享内存段,每个PG数据库集群至少2个共享内存段
net.core.netdev_max_backlog = 10000
net.core.rmem_default = 262144
# The default setting of the socket receive buffer in bytes.
net.core.rmem_max = 4194304
# The maximum receive socket buffer size in bytes
net.core.wmem_default = 262144
# The default setting (in bytes) of the socket send buffer.
net.core.wmem_max = 4194304
# The maximum send socket buffer size in bytes.
net.core.somaxconn = 4096
net.ipv4.tcp_max_syn_backlog = 4096
net.ipv4.tcp_keepalive_intvl = 20
net.ipv4.tcp_keepalive_probes = 3
net.ipv4.tcp_keepalive_time = 60
net.ipv4.tcp_mem = 8388608 12582912 16777216
net.ipv4.tcp_fin_timeout = 5
net.ipv4.tcp_synack_retries = 2
net.ipv4.tcp_syncookies = 1
# 开启SYN Cookies。当出现SYN等待队列溢出时,启用cookie来处理,可防范少量的SYN攻击
net.ipv4.tcp_timestamps = 1
# 减少time_wait
net.ipv4.tcp_tw_recycle = 0
# 如果=1则开启TCP连接中TIME-WAIT套接字的快速回收,但是NAT环境可能导致连接失败,建议服务端关闭它
net.ipv4.tcp_tw_reuse = 1
# 开启重用。允许将TIME-WAIT套接字重新用于新的TCP连接
net.ipv4.tcp_max_tw_buckets = 262144
net.ipv4.tcp_rmem = 8192 87380 16777216
net.ipv4.tcp_wmem = 8192 65536 16777216
net.nf_conntrack_max = 1200000
net.netfilter.nf_conntrack_max = 1200000
vm.dirty_background_bytes = 409600000
# 系统脏页到达这个值,系统后台刷脏页调度进程 pdflush(或其他) 自动将(dirty_expire_centisecs/100)秒前的脏页刷到磁盘
# 默认为10%,大内存机器建议调整为直接指定多少字节
vm.dirty_expire_centisecs = 3000
# 比这个值老的脏页,将被刷到磁盘。3000表示30秒。
vm.dirty_ratio = 95
# 如果系统进程刷脏页太慢,使得系统脏页超过内存 95 % 时,则用户进程如果有写磁盘的操作(如fsync, fdatasync等调用),则需要主动把系统脏页刷出。
# 有效防止用户进程刷脏页,在单机多实例,并且使用CGROUP限制单实例IOPS的情况下非常有效。
vm.dirty_writeback_centisecs = 100
# pdflush(或其他)后台刷脏页进程的唤醒间隔, 100表示1秒。
vm.swappiness = 0
# 不使用交换分区
vm.mmap_min_addr = 65536
vm.overcommit_memory = 0
# 在分配内存时,允许少量over malloc, 如果设置为 1, 则认为总是有足够的内存,内存较少的测试环境可以使用 1 .
vm.overcommit_ratio = 90
# 当overcommit_memory = 2 时,用于参与计算允许指派的内存大小。
vm.swappiness = 0
# 关闭交换分区
vm.zone_reclaim_mode = 0
# 禁用 numa, 或者在vmlinux中禁止.
net.ipv4.ip_local_port_range = 40000 65535
# 本地自动分配的TCP, UDP端口号范围
fs.nr_open=20480000
# 单个进程允许打开的文件句柄上限
# 以下参数请注意
vm.extra_free_kbytes = 4096000
vm.min_free_kbytes = 6291456 # vm.min_free_kbytes 建议每32G内存分配1G vm.min_free_kbytes
# 如果是小内存机器,以上两个值不建议设置
# vm.nr_hugepages = 66536
# 建议shared buffer设置超过64GB时 使用大页,页大小 /proc/meminfo Hugepagesize
vm.lowmem_reserve_ratio = 1 1 1
# 对于内存大于64G时,建议设置,否则建议默认值 256 256 32
# 生效
# sysctl -p
vi /etc/security/limits.conf
# nofile超过1048576的话,一定要先将sysctl的fs.nr_open设置为更大的值,并生效后才能继续设置nofile.
# 注释其他行 , 添加如下
* soft nofile 1024000
* hard nofile 1024000
* soft nproc unlimited
* hard nproc unlimited
* soft core unlimited
* hard core unlimited
* soft memlock unlimited
* hard memlock unlimited
Simultaneously modify (if any):
/etc/security/limits.d/20-nproc.conf
echo never > /sys/kernel/mm/transparent_hugepage/enabled
The configuration is permanently valid after taking effect.
chmod +x /etc/rc.d/rc.local
vi /etc/rc.local
touch /var/lock/subsys/local
if test -f /sys/kernel/mm/transparent_hugepage/enabled; then
echo never > /sys/kernel/mm/transparent_hugepage/enabled
fi
vi /etc/rc.local
touch /var/lock/subsys/local
if test -f /sys/kernel/mm/transparent_hugepage/enabled; then
echo never > /sys/kernel/mm/transparent_hugepage/enabled
fi
echo tsc > /sys/devices/system/clocksource/clocksource0/current_clocksource
Supported clocks:
cat /sys/devices/system/clocksource/clocksource0/available_clocksource
kvm-clock tsc acpi_pm
Modify clock:
echo tsc > /sys/devices/system/clocksource/clocksource0/current_clocksource
Clock performance:
kvm-clock:
pg_test_timing
Testing timing overhead for 3 seconds.
Per loop time including overhead: 38.05 ns
Histogram of timing durations:
< us % of total count
1 96.22568 75875976
2 3.76667 2970102
4 0.00055 435
8 0.00704 5550
16 0.00005 42
32 0.00000 1
64 0.00000 1
128 0.00000 0
256 0.00000 0
512 0.00000 0
1024 0.00000 1
tsc:
pg_test_timing
Testing timing overhead for 3 seconds.
Per loop time including overhead: 31.93 ns
Histogram of timing durations:
< us % of total count
1 96.82838 90964258
2 3.16507 2973390
4 0.00078 731
8 0.00574 5394
16 0.00003 26
32 0.00000 2
TSC is slightly better.
systemctl enable irqbalance
systemctl start irqbalance
systemctl status irqbalance
ethtool -l eth0
Channel parameters for eth0:
Pre-set maximums:
RX: 0
TX: 0
Other: 0
Combined: 32
Current hardware settings:
RX: 0
TX: 0
Other: 0
Combined: 32
If not enabled, take the maximum value of Pre-set maximum Combined.
ethtool -L eth0 combined 32
yum install -y https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm
yum install -y https://download.postgresql.org/pub/repos/yum/reporpms/EL-7-x86_64/pgdg-redhat-repo-latest.noarch.rpm
yum install -y centos-release-scl
yum install -y postgresql12*
yum install -y lvm2
parted -a optimal -s /dev/vdb mklabel gpt mkpart primary 1MiB 100%FREE
parted -a optimal -s /dev/vdc mklabel gpt mkpart primary 1MiB 100%FREE
parted -a optimal -s /dev/vdd mklabel gpt mkpart primary 1MiB 100%FREE
parted -a optimal -s /dev/vde mklabel gpt mkpart primary 1MiB 100%FREE
parted -a optimal -s /dev/vdf mklabel gpt mkpart primary 1MiB 100%FREE
parted -a optimal -s /dev/vdg mklabel gpt mkpart primary 1MiB 100%FREE
parted -a optimal -s /dev/vdh mklabel gpt mkpart primary 1MiB 100%FREE
parted -a optimal -s /dev/vdi mklabel gpt mkpart primary 1MiB 100%FREE
wipefs -f -a /dev/vd[b-i]1
pvcreate /dev/vd[b-i]1
vgcreate -A y -s 128M vgdata01 /dev/vd[b-i]1
lvcreate -A y -i 8 -I 8 -l 100%FREE -n lv01 vgdata01
[root@iZbp12984c7sv43o6lbbxoZ ~]# pvs
PV VG Fmt Attr PSize PFree
/dev/vdb1 vgdata01 lvm2 a-- <1.75t 0
/dev/vdc1 vgdata01 lvm2 a-- <1.75t 0
/dev/vdd1 vgdata01 lvm2 a-- <1.75t 0
/dev/vde1 vgdata01 lvm2 a-- <1.75t 0
/dev/vdf1 vgdata01 lvm2 a-- <1.75t 0
/dev/vdg1 vgdata01 lvm2 a-- <1.75t 0
/dev/vdh1 vgdata01 lvm2 a-- <1.75t 0
/dev/vdi1 vgdata01 lvm2 a-- <1.75t 0
[root@iZbp12984c7sv43o6lbbxoZ ~]# lvs
LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert
lv01 vgdata01 -wi-a----- <13.97t
mkfs.ext4 /dev/mapper/vgdata01-lv01 -m 0 -O extent,uninit_bg -E lazy_itable_init=1,stride=2,stripe_width=16 -b 4096 -T largefile -L lv01
vi /etc/fstab
LABEL=lv01 /data01 ext4 defaults,noatime,nodiratime,nodelalloc,barrier=0,data=writeback 0 0
mkdir /data01
mount -a
parted -a optimal -s /dev/vdb mklabel gpt mkpart primary 1MiB 100%FREE
parted -a optimal -s /dev/vdc mklabel gpt mkpart primary 1MiB 100%FREE
parted -a optimal -s /dev/vdd mklabel gpt mkpart primary 1MiB 100%FREE
parted -a optimal -s /dev/vde mklabel gpt mkpart primary 1MiB 100%FREE
parted -a optimal -s /dev/vdf mklabel gpt mkpart primary 1MiB 100%FREE
parted -a optimal -s /dev/vdg mklabel gpt mkpart primary 1MiB 100%FREE
parted -a optimal -s /dev/vdh mklabel gpt mkpart primary 1MiB 100%FREE
parted -a optimal -s /dev/vdi mklabel gpt mkpart primary 1MiB 100%FREE
wipefs -f -a /dev/vd[b-i]1
pvcreate /dev/vd[b-i]1
vgcreate -A y -s 128M vgdata01 /dev/vd[b-i]1
lvcreate -A y -i 8 -I 8 -l 100%FREE -n lv01 vgdata01
[root@iZbp1621kc3de3pm12a0exZ ~]# pvs
PV VG Fmt Attr PSize PFree
/dev/vdb1 vgdata01 lvm2 a-- <1.76t 0
/dev/vdc1 vgdata01 lvm2 a-- <1.76t 0
/dev/vdd1 vgdata01 lvm2 a-- <1.76t 0
/dev/vde1 vgdata01 lvm2 a-- <1.76t 0
/dev/vdf1 vgdata01 lvm2 a-- <1.76t 0
/dev/vdg1 vgdata01 lvm2 a-- <1.76t 0
/dev/vdh1 vgdata01 lvm2 a-- <1.76t 0
/dev/vdi1 vgdata01 lvm2 a-- <1.76t 0
[root@iZbp1621kc3de3pm12a0exZ ~]# lvs
LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert
lv01 vgdata01 -wi-a----- 14.06t
mkfs.ext4 /dev/mapper/vgdata01-lv01 -m 0 -O extent,uninit_bg -E lazy_itable_init=1,stride=2,stripe_width=16 -b 4096 -T largefile -L lv01
vi /etc/fstab
LABEL=lv01 /data01 ext4 defaults,noatime,nodiratime,nodelalloc,barrier=0,data=writeback 0 0
mkdir /data01
mount -a
parted -a optimal -s /dev/vdj mklabel gpt mkpart primary 1MiB 100%FREE
mkfs.ext4 /dev/vdj1 -m 0 -O extent,uninit_bg -E lazy_itable_init=1 -b 4096 -T largefile -L vdj1
vi /etc/fstab
LABEL=vdj1 /data02 ext4 defaults,noatime,nodiratime,nodelalloc,barrier=0,data=writeback 0 0
mkdir /data02
mount -a
mkdir /data01/pg12_1921
chown postgres:postgres /data01/pg12_1921
su - postgres
vi ~/.bash_profile
# 追加
export PS1="$USER@`/bin/hostname -s`-> "
export PGPORT=1921
export PGDATA=/data01/pg12_$PGPORT/pg_root
export LANG=en_US.utf8
export PGHOME=/usr/pgsql-12
export LD_LIBRARY_PATH=$PGHOME/lib:/lib64:/usr/lib64:/usr/local/lib64:/lib:/usr/lib:/usr/local/lib:$LD_LIBRARY_PATH
export DATE=`date +"%Y%m%d%H%M"`
export PATH=$PGHOME/bin:$PATH:.
export MANPATH=$PGHOME/share/man:$MANPATH
export PGHOST=$PGDATA
export PGUSER=postgres
export PGDATABASE=postgres
alias rm='rm -i'
alias ll='ls -lh'
unalias vi
initdb -D $PGDATA -U postgres -E SQL_ASCII --lc-collate=C --lc-ctype=en_US.utf8 --wal-segsize=1024
vi $PGDATA/postgresql.auto.conf
listen_addresses = '0.0.0.0'
# 监听端口
port = 1921
max_connections = 1000
superuser_reserved_connections = 13
unix_socket_directories = '., /var/run/postgresql, /tmp'
unix_socket_permissions = 0700
tcp_keepalives_idle = 60
tcp_keepalives_interval = 10
tcp_keepalives_count = 10
shared_buffers = 32GB
max_prepared_transactions = 1000
work_mem = 8MB
maintenance_work_mem = 2GB
dynamic_shared_memory_type = posix
vacuum_cost_delay = 0
bgwriter_delay = 10ms
bgwriter_lru_maxpages = 1000
bgwriter_lru_multiplier = 10.0
bgwriter_flush_after = 512kB
effective_io_concurrency = 0
max_worker_processes = 128
max_parallel_maintenance_workers = 5
max_parallel_workers_per_gather = 0
parallel_leader_participation = off
max_parallel_workers = 40
backend_flush_after = 2MB
wal_level = replica
fsync = on
synchronous_commit = off
full_page_writes = on
wal_compression = on
wal_buffers = 16MB
wal_writer_delay = 10ms
wal_writer_flush_after = 1MB
checkpoint_timeout = 20min
max_wal_size = 128GB
min_wal_size = 16GB
checkpoint_completion_target = 0.2
archive_mode = on
archive_command = '/bin/date'
max_wal_senders = 32
wal_keep_segments = 0
wal_sender_timeout = 10s
max_replication_slots = 24
random_page_cost = 1.1
effective_cache_size = 128GB
log_destination = 'csvlog'
logging_collector = on
log_directory = 'log'
log_filename = 'postgresql-%a.log'
log_truncate_on_rotation = on
log_rotation_age = 1d
log_rotation_size = 0
log_checkpoints = on
log_connections = on
log_disconnections = on
log_error_verbosity = verbose
log_line_prefix = '%m [%p] '
log_lock_waits = on
log_statement = 'ddl'
log_timezone = 'Asia/Shanghai'
track_functions = pl
autovacuum = on
log_autovacuum_min_duration = 0
autovacuum_max_workers = 8
autovacuum_vacuum_scale_factor = 0.1
autovacuum_analyze_scale_factor = 0.05
autovacuum_freeze_max_age = 800000000
autovacuum_multixact_freeze_max_age = 900000000
autovacuum_vacuum_cost_delay = 0ms
default_table_access_method = 'heap'
check_function_bodies = on
# 以下两个超时参数建议根据业务需求设置
lock_timeout = 6s
idle_in_transaction_session_timeout = 60s
vacuum_freeze_table_age = 750000000
vacuum_multixact_freeze_table_age = 750000000
datestyle = 'iso, mdy'
timezone = 'Asia/Shanghai'
lc_messages = 'en_US.utf8'
lc_monetary = 'en_US.utf8'
lc_numeric = 'en_US.utf8'
lc_time = 'en_US.utf8'
default_text_search_config = 'pg_catalog.english'
Start database:
pg_ctl start
mkdir /data01/pg12_1921
chown postgres:postgres /data01/pg12_1921
su - postgres
vi ~/.bash_profile
# 追加
export PS1="$USER@`/bin/hostname -s`-> "
export PGPORT=1921
export PGDATA=/data01/pg12_$PGPORT/pg_root
export LANG=en_US.utf8
export PGHOME=/usr/pgsql-12
export LD_LIBRARY_PATH=$PGHOME/lib:/lib64:/usr/lib64:/usr/local/lib64:/lib:/usr/lib:/usr/local/lib:$LD_LIBRARY_PATH
export DATE=`date +"%Y%m%d%H%M"`
export PATH=$PGHOME/bin:$PATH:.
export MANPATH=$PGHOME/share/man:$MANPATH
export PGHOST=$PGDATA
export PGUSER=postgres
export PGDATABASE=postgres
alias rm='rm -i'
alias ll='ls -lh'
unalias vi
initdb -D $PGDATA -U postgres -E SQL_ASCII --lc-collate=C --lc-ctype=en_US.utf8 --wal-segsize=1024
vi $PGDATA/postgresql.auto.conf
listen_addresses = '0.0.0.0'
# 监听端口
port = 1921
max_connections = 1000
superuser_reserved_connections = 13
unix_socket_directories = '., /var/run/postgresql, /tmp'
unix_socket_permissions = 0700
tcp_keepalives_idle = 60
tcp_keepalives_interval = 10
tcp_keepalives_count = 10
shared_buffers = 32GB
max_prepared_transactions = 1000
work_mem = 8MB
maintenance_work_mem = 2GB
dynamic_shared_memory_type = posix
vacuum_cost_delay = 0
bgwriter_delay = 10ms
bgwriter_lru_maxpages = 1000
bgwriter_lru_multiplier = 10.0
bgwriter_flush_after = 512kB
effective_io_concurrency = 0
max_worker_processes = 128
max_parallel_maintenance_workers = 5
max_parallel_workers_per_gather = 0
parallel_leader_participation = off
max_parallel_workers = 40
backend_flush_after = 2MB
wal_level = replica
fsync = on
synchronous_commit = off
full_page_writes = on
wal_compression = on
wal_buffers = 16MB
wal_writer_delay = 10ms
wal_writer_flush_after = 1MB
checkpoint_timeout = 20min
max_wal_size = 128GB
min_wal_size = 16GB
checkpoint_completion_target = 0.2
archive_mode = on
archive_command = '/bin/date'
max_wal_senders = 32
wal_keep_segments = 0
wal_sender_timeout = 10s
max_replication_slots = 24
random_page_cost = 1.1
effective_cache_size = 128GB
log_destination = 'csvlog'
logging_collector = on
log_directory = 'log'
log_filename = 'postgresql-%a.log'
log_truncate_on_rotation = on
log_rotation_age = 1d
log_rotation_size = 0
log_checkpoints = on
log_connections = on
log_disconnections = on
log_error_verbosity = verbose
log_line_prefix = '%m [%p] '
log_lock_waits = on
log_statement = 'ddl'
log_timezone = 'Asia/Shanghai'
track_functions = pl
autovacuum = on
log_autovacuum_min_duration = 0
autovacuum_max_workers = 8
autovacuum_vacuum_scale_factor = 0.1
autovacuum_analyze_scale_factor = 0.05
autovacuum_freeze_max_age = 800000000
autovacuum_multixact_freeze_max_age = 900000000
autovacuum_vacuum_cost_delay = 0ms
default_table_access_method = 'heap'
check_function_bodies = on
# 以下两个超时参数建议根据业务需求设置
lock_timeout = 6s
idle_in_transaction_session_timeout = 60s
vacuum_freeze_table_age = 750000000
vacuum_multixact_freeze_table_age = 750000000
datestyle = 'iso, mdy'
timezone = 'Asia/Shanghai'
lc_messages = 'en_US.utf8'
lc_monetary = 'en_US.utf8'
lc_numeric = 'en_US.utf8'
lc_time = 'en_US.utf8'
default_text_search_config = 'pg_catalog.english'
Start database:
pg_ctl start
mkdir /data02/pg12_1922
chown postgres:postgres /data02/pg12_1922
su - postgres
export PGPORT=1922
export PGDATA=/data02/pg12_$PGPORT/pg_root
export PGHOST=$PGDATA
initdb -D $PGDATA -U postgres -E SQL_ASCII --lc-collate=C --lc-ctype=en_US.utf8 --wal-segsize=1024
vi $PGDATA/postgresql.auto.conf
listen_addresses = '0.0.0.0'
# 监听端口
port = 1922
max_connections = 1000
superuser_reserved_connections = 13
unix_socket_directories = '., /var/run/postgresql, /tmp'
unix_socket_permissions = 0700
tcp_keepalives_idle = 60
tcp_keepalives_interval = 10
tcp_keepalives_count = 10
shared_buffers = 32GB
max_prepared_transactions = 1000
work_mem = 8MB
maintenance_work_mem = 2GB
dynamic_shared_memory_type = posix
vacuum_cost_delay = 0
bgwriter_delay = 10ms
bgwriter_lru_maxpages = 1000
bgwriter_lru_multiplier = 10.0
bgwriter_flush_after = 512kB
effective_io_concurrency = 0
max_worker_processes = 128
max_parallel_maintenance_workers = 5
max_parallel_workers_per_gather = 0
parallel_leader_participation = off
max_parallel_workers = 40
backend_flush_after = 2MB
wal_level = replica
fsync = on
synchronous_commit = off
full_page_writes = on
wal_compression = on
wal_buffers = 16MB
wal_writer_delay = 10ms
wal_writer_flush_after = 1MB
checkpoint_timeout = 20min
max_wal_size = 128GB
min_wal_size = 16GB
checkpoint_completion_target = 0.2
archive_mode = on
archive_command = '/bin/date'
max_wal_senders = 32
wal_keep_segments = 0
wal_sender_timeout = 10s
max_replication_slots = 24
random_page_cost = 1.1
effective_cache_size = 128GB
log_destination = 'csvlog'
logging_collector = on
log_directory = 'log'
log_filename = 'postgresql-%a.log'
log_truncate_on_rotation = on
log_rotation_age = 1d
log_rotation_size = 0
log_checkpoints = on
log_connections = on
log_disconnections = on
log_error_verbosity = verbose
log_line_prefix = '%m [%p] '
log_lock_waits = on
log_statement = 'ddl'
log_timezone = 'Asia/Shanghai'
track_functions = pl
autovacuum = on
log_autovacuum_min_duration = 0
autovacuum_max_workers = 8
autovacuum_vacuum_scale_factor = 0.1
autovacuum_analyze_scale_factor = 0.05
autovacuum_freeze_max_age = 800000000
autovacuum_multixact_freeze_max_age = 900000000
autovacuum_vacuum_cost_delay = 0ms
default_table_access_method = 'heap'
check_function_bodies = on
# 以下两个超时参数建议根据业务需求设置
lock_timeout = 6s
idle_in_transaction_session_timeout = 60s
vacuum_freeze_table_age = 750000000
vacuum_multixact_freeze_table_age = 750000000
datestyle = 'iso, mdy'
timezone = 'Asia/Shanghai'
lc_messages = 'en_US.utf8'
lc_monetary = 'en_US.utf8'
lc_numeric = 'en_US.utf8'
lc_time = 'en_US.utf8'
default_text_search_config = 'pg_catalog.english'
Start database:
pg_ctl start
This test also includes the single I/O capacity and the synchronous commit latency of databases.
su - postgres
cd $PGDATA
postgres@iZbp12984c7sv43o6lbbxoZ-> pg_test_fsync
5 seconds per test
O_DIRECT supported on this platform for open_datasync and open_sync.
Compare file sync methods using one 8kB write:
(in wal_sync_method preference order, except fdatasync is Linux's default)
open_datasync 32804.010 ops/sec 30 usecs/op
fdatasync 28216.659 ops/sec 35 usecs/op
fsync 25325.493 ops/sec 39 usecs/op
fsync_writethrough n/a
open_sync 29137.619 ops/sec 34 usecs/op
Compare file sync methods using two 8kB writes:
(in wal_sync_method preference order, except fdatasync is Linux's default)
open_datasync 14620.630 ops/sec 68 usecs/op
fdatasync 24263.659 ops/sec 41 usecs/op
fsync 21762.878 ops/sec 46 usecs/op
fsync_writethrough n/a
open_sync 13006.581 ops/sec 77 usecs/op
Compare open_sync with different write sizes:
(This is designed to compare the cost of writing 16kB in different write
open_sync sizes.)
1 * 16kB open_sync write 25182.114 ops/sec 40 usecs/op
2 * 8kB open_sync writes 13001.270 ops/sec 77 usecs/op
4 * 4kB open_sync writes 7233.661 ops/sec 138 usecs/op
8 * 2kB open_sync writes 3865.482 ops/sec 259 usecs/op
16 * 1kB open_sync writes 1979.762 ops/sec 505 usecs/op
Test if fsync on non-write file descriptor is honored:
(If the times are similar, fsync() can sync data written on a different
descriptor.)
write, fsync, close 24085.484 ops/sec 42 usecs/op
write, close, fsync 24072.174 ops/sec 42 usecs/op
Non-sync'ed 8kB writes:
write 192054.909 ops/sec 5 usecs/op
su - postgres
cd $PGDATA
postgres@iZbp1621kc3de3pm12a0exZ-> pg_test_fsync
5 seconds per test
O_DIRECT supported on this platform for open_datasync and open_sync.
Compare file sync methods using one 8kB write:
(in wal_sync_method preference order, except fdatasync is Linux's default)
open_datasync 6345.217 ops/sec 158 usecs/op
fdatasync 6087.221 ops/sec 164 usecs/op
fsync 3689.149 ops/sec 271 usecs/op
fsync_writethrough n/a
open_sync 3942.073 ops/sec 254 usecs/op
Compare file sync methods using two 8kB writes:
(in wal_sync_method preference order, except fdatasync is Linux's default)
open_datasync 3020.085 ops/sec 331 usecs/op
fdatasync 4684.567 ops/sec 213 usecs/op
fsync 2785.027 ops/sec 359 usecs/op
fsync_writethrough n/a
open_sync 1814.915 ops/sec 551 usecs/op
Compare open_sync with different write sizes:
(This is designed to compare the cost of writing 16kB in different write
open_sync sizes.)
1 * 16kB open_sync write 2847.150 ops/sec 351 usecs/op
2 * 8kB open_sync writes 1920.576 ops/sec 521 usecs/op
4 * 4kB open_sync writes 1023.594 ops/sec 977 usecs/op
8 * 2kB open_sync writes 82.223 ops/sec 12162 usecs/op
16 * 1kB open_sync writes 63.008 ops/sec 15871 usecs/op
Test if fsync on non-write file descriptor is honored:
(If the times are similar, fsync() can sync data written on a different
descriptor.)
write, fsync, close 3677.066 ops/sec 272 usecs/op
write, close, fsync 3708.723 ops/sec 270 usecs/op
Non-sync'ed 8kB writes:
write 297195.065 ops/sec 3 usecs/op
su - postgres
export PGPORT=1922
export PGDATA=/data02/pg12_$PGPORT/pg_root
export PGHOST=$PGDATA
cd $PGDATA
postgres@iZbp1621kc3de3pm12a0exZ-> pg_test_fsync
5 seconds per test
O_DIRECT supported on this platform for open_datasync and open_sync.
Compare file sync methods using one 8kB write:
(in wal_sync_method preference order, except fdatasync is Linux's default)
open_datasync 6103.193 ops/sec 164 usecs/op
fdatasync 5901.509 ops/sec 169 usecs/op
fsync 3893.064 ops/sec 257 usecs/op
fsync_writethrough n/a
open_sync 3971.187 ops/sec 252 usecs/op
Compare file sync methods using two 8kB writes:
(in wal_sync_method preference order, except fdatasync is Linux's default)
open_datasync 3010.795 ops/sec 332 usecs/op
fdatasync 5034.260 ops/sec 199 usecs/op
fsync 3648.353 ops/sec 274 usecs/op
fsync_writethrough n/a
open_sync 2033.980 ops/sec 492 usecs/op
Compare open_sync with different write sizes:
(This is designed to compare the cost of writing 16kB in different write
open_sync sizes.)
1 * 16kB open_sync write 3574.507 ops/sec 280 usecs/op
2 * 8kB open_sync writes 1996.539 ops/sec 501 usecs/op
4 * 4kB open_sync writes 1079.944 ops/sec 926 usecs/op
8 * 2kB open_sync writes 96.440 ops/sec 10369 usecs/op
16 * 1kB open_sync writes 75.738 ops/sec 13203 usecs/op
Test if fsync on non-write file descriptor is honored:
(If the times are similar, fsync() can sync data written on a different
descriptor.)
write, fsync, close 3716.489 ops/sec 269 usecs/op
write, close, fsync 3641.653 ops/sec 275 usecs/op
Non-sync'ed 8kB writes:
write 297784.864 ops/sec 3 usecs/op
yum install -y fio
yum install -y iotop
Test script:
vi fiotest
function RunFio
{
numjobs=$1 # 实例中的测试线程数,如示例中的8
iodepth=$2 # 同时发出I/O数的上限,如示例中的64
bs=$3 # 单次I/O的块文件大小,如示例中的4K
rw=$4 # 测试时的读写策略,如示例中的randwrite
filename=$5 # 指定测试文件的名称,如示例中的/data01/test
ioengine=$6 # io engine : libaio, sync等,参考man fio
direct=$7 # 是否跳过page cache ,参考man fio
nr_cpus=`cat /proc/cpuinfo |grep "processor" |wc -l`
if [ $nr_cpus -lt $numjobs ];then
echo “Numjobs is more than cpu cores, exit!”
exit -1
fi
let nu=$numjobs+1
cpulist=""
for ((i=1;i<10;i++))
do
list=`cat /sys/block/vdb/mq/*/cpu_list | awk '{if(i<=NF) print $i;}' i="$i" | tr -d ',' | tr '\n' ','`
if [ -z $list ];then
break
fi
cpulist=${cpulist}${list}
done
spincpu=`echo $cpulist | cut -d ',' -f 2-${nu}`
echo $spincpu
fio --ioengine=${ioengine} --runtime=60s --numjobs=${numjobs} --iodepth=${iodepth} --bs=${bs} --rw=${rw} --filename=${filename} --time_based=1 --direct=${direct} --name=test --group_reporting --cpus_allowed=$spincpu --cpus_allowed_policy=split --size=16G
}
# 设置essd块设备 queue rq affinity,假设有9个essd盘,并且他们在vd[b-j]
echo 2 > /sys/block/vdb/queue/rq_affinity
echo 2 > /sys/block/vdc/queue/rq_affinity
echo 2 > /sys/block/vdd/queue/rq_affinity
echo 2 > /sys/block/vde/queue/rq_affinity
echo 2 > /sys/block/vdf/queue/rq_affinity
echo 2 > /sys/block/vdg/queue/rq_affinity
echo 2 > /sys/block/vdh/queue/rq_affinity
echo 2 > /sys/block/vdi/queue/rq_affinity
echo 2 > /sys/block/vdj/queue/rq_affinity
sleep 5
RunFio $1 $2 $3 $4 $5 $6 $7
# RunFio 16 64 8k randwrite /data01/test libaio 1
Or,
vi fiotest_1
function RunFio
{
numjobs=$1 # 实例中的测试线程数,如示例中的8
iodepth=$2 # 同时发出I/O数的上限,如示例中的64
bs=$3 # 单次I/O的块文件大小,如示例中的4K
rw=$4 # 测试时的读写策略,如示例中的randwrite
filename=$5 # 指定测试文件的名称,如示例中的/data01/test
ioengine=$6 # io engine : libaio, sync等,参考man fio
direct=$7 # 是否跳过page cache ,参考man fio
size=$8 # 写多大数据,例如16G
nr_cpus=`cat /proc/cpuinfo |grep "processor" |wc -l`
if [ $nr_cpus -lt $numjobs ];then
echo “Numjobs is more than cpu cores, exit!”
exit -1
fi
let nu=$numjobs+1
cpulist=""
for ((i=1;i<10;i++))
do
list=`cat /sys/block/vdb/mq/*/cpu_list | awk '{if(i<=NF) print $i;}' i="$i" | tr -d ',' | tr '\n' ','`
if [ -z $list ];then
break
fi
cpulist=${cpulist}${list}
done
spincpu=`echo $cpulist | cut -d ',' -f 2-${nu}`
echo $spincpu
fio --ioengine=${ioengine} --runtime=60s --numjobs=${numjobs} --iodepth=${iodepth} --bs=${bs} --rw=${rw} --filename=${filename} --time_based=1 --direct=${direct} --name=test --group_reporting --cpus_allowed=$spincpu --cpus_allowed_policy=split --size=${size}
}
# 设置essd块设备 queue rq affinity,假设有9个essd盘,并且他们在vd[b-j]
echo 2 > /sys/block/vdb/queue/rq_affinity
echo 2 > /sys/block/vdc/queue/rq_affinity
echo 2 > /sys/block/vdd/queue/rq_affinity
echo 2 > /sys/block/vde/queue/rq_affinity
echo 2 > /sys/block/vdf/queue/rq_affinity
echo 2 > /sys/block/vdg/queue/rq_affinity
echo 2 > /sys/block/vdh/queue/rq_affinity
echo 2 > /sys/block/vdi/queue/rq_affinity
echo 2 > /sys/block/vdj/queue/rq_affinity
sleep 5
RunFio $1 $2 $3 $4 $5 $6 $7 $8
# RunFio 16 64 8k randwrite /data01/test libaio 1 16G
. ./fiotest 32 8 8K randwrite /data01/test libaio 1
write: IOPS=131k, BW=1024MiB/s (1074MB/s)(29.0GiB/30003msec)
. ./fiotest 32 8 8K randread /data01/test libaio 1
read: IOPS=153k, BW=1193MiB/s (1250MB/s)(34.9GiB/30001msec)
. ./fiotest 32 8 8K write /data01/test libaio 1
write: IOPS=134k, BW=1050MiB/s (1101MB/s)(30.8GiB/30001msec)
. ./fiotest 32 8 8K read /data01/test libaio 1
read: IOPS=151k, BW=1182MiB/s (1240MB/s)(34.6GiB/30002msec)
. ./fiotest 32 8 8K randwrite /data01/test libaio 1
write: IOPS=106k, BW=825MiB/s (865MB/s)(24.2GiB/30003msec)
. ./fiotest 32 8 8K randread /data01/test libaio 1
read: IOPS=106k, BW=826MiB/s (866MB/s)(24.2GiB/30016msec)
. ./fiotest 32 8 8K write /data01/test libaio 1
write: IOPS=106k, BW=826MiB/s (866MB/s)(24.2GiB/30003msec)
. ./fiotest 32 8 8K read /data01/test libaio 1
read: IOPS=106k, BW=829MiB/s (869MB/s)(24.3GiB/30003msec)
. ./fiotest 32 8 8K randwrite /data02/test libaio 1
write: IOPS=106k, BW=826MiB/s (867MB/s)(24.2GiB/30003msec)
. ./fiotest 32 8 8K randread /data02/test libaio 1
read: IOPS=106k, BW=830MiB/s (870MB/s)(24.3GiB/30004msec)
. ./fiotest 32 8 8K write /data02/test libaio 1
write: IOPS=106k, BW=824MiB/s (865MB/s)(24.2GiB/30006msec)
. ./fiotest 32 8 8K read /data02/test libaio 1
read: IOPS=106k, BW=825MiB/s (866MB/s)(24.2GiB/30004msec)
su - postgres
pgbench -i -s 10000
1000000000 of 1000000000 tuples (100%) done (elapsed 1058.21 s, remaining 0.00 s)
checkpoint;
set max_parallel_workers_per_gather =16;
set enable_indexscan =on;
set enable_indexonlyscan =on;
set enable_bitmapscan =off;
set enable_seqscan=off;
explain (analyze) select count(*) from pgbench_accounts ;
set enable_indexscan =off;
set enable_indexonlyscan =off;
set enable_bitmapscan =off;
set enable_seqscan=on;
explain (analyze) select count(*) from pgbench_accounts ;
pgbench -M prepared -n -r -P 3 -c 128 -j 128 -T 120 -S
transaction type: <builtin: select only>
scaling factor: 10000
query mode: prepared
number of clients: 128
number of threads: 128
duration: 120 s
number of transactions actually processed: 75279853
latency average = 0.204 ms
latency stddev = 0.557 ms
tps = 627000.964875 (including connections establishing)
tps = 627126.432692 (excluding connections establishing)
statement latencies in milliseconds:
0.001 \set aid random(1, 100000 * :scale)
0.205 SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
pgbench -M prepared -n -r -P 3 -c 128 -j 128 -T 120
transaction type: <builtin: TPC-B (sort of)>
scaling factor: 10000
query mode: prepared
number of clients: 128
number of threads: 128
duration: 120 s
number of transactions actually processed: 3073411
latency average = 4.997 ms
latency stddev = 25.946 ms
tps = 25595.742845 (including connections establishing)
tps = 25600.289357 (excluding connections establishing)
statement latencies in milliseconds:
0.001 \set aid random(1, 100000 * :scale)
0.001 \set bid random(1, 1 * :scale)
0.001 \set tid random(1, 10 * :scale)
0.001 \set delta random(-5000, 5000)
0.105 BEGIN;
3.458 UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :aid;
0.339 SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
0.220 UPDATE pgbench_tellers SET tbalance = tbalance + :delta WHERE tid = :tid;
0.206 UPDATE pgbench_branches SET bbalance = bbalance + :delta WHERE bid = :bid;
0.196 INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP);
0.488 END;
Write operation takes 1,058 seconds.
The query qps is 627,000 (with all server memory of 512GB hit).
The read/write tps is 25,595, and the read/write qps is 127,975.
su - postgres
pgbench -i -s 10000
1000000000 of 1000000000 tuples (100%) done (elapsed 948.07 s, remaining 0.00 s)
checkpoint;
set max_parallel_workers_per_gather =16;
set enable_indexscan =on;
set enable_indexonlyscan =on;
set enable_bitmapscan =off;
set enable_seqscan=off;
explain (analyze) select count(*) from pgbench_accounts ;
set enable_indexscan =off;
set enable_indexonlyscan =off;
set enable_bitmapscan =off;
set enable_seqscan=on;
explain (analyze) select count(*) from pgbench_accounts ;
pgbench -M prepared -n -r -P 3 -c 128 -j 128 -T 120 -S
transaction type: <builtin: select only>
scaling factor: 10000
query mode: prepared
number of clients: 128
number of threads: 128
duration: 120 s
number of transactions actually processed: 7802607
latency average = 1.968 ms
latency stddev = 9.345 ms
tps = 64979.905944 (including connections establishing)
tps = 64991.584022 (excluding connections establishing)
statement latencies in milliseconds:
0.000 \set aid random(1, 100000 * :scale)
1.970 SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
pgbench -M prepared -n -r -P 3 -c 128 -j 128 -T 120
transaction type: <builtin: TPC-B (sort of)>
scaling factor: 10000
query mode: prepared
number of clients: 128
number of threads: 128
duration: 120 s
number of transactions actually processed: 3438342
latency average = 4.467 ms
latency stddev = 11.680 ms
tps = 28627.417199 (including connections establishing)
tps = 28633.295374 (excluding connections establishing)
statement latencies in milliseconds:
0.001 \set aid random(1, 100000 * :scale)
0.000 \set bid random(1, 1 * :scale)
0.000 \set tid random(1, 10 * :scale)
0.000 \set delta random(-5000, 5000)
0.035 BEGIN;
3.967 UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :aid;
0.084 SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
0.117 UPDATE pgbench_tellers SET tbalance = tbalance + :delta WHERE tid = :tid;
0.084 UPDATE pgbench_branches SET bbalance = bbalance + :delta WHERE bid = :bid;
0.071 INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP);
0.109 END;
Write operation takes 948 seconds.
The query qps is 64,979 (with the server memory of 192 GB hit at a low rate and worse I/O rt compared with local disk).
The read/write tps is 28,627,and the read/write qps is 143,135.
su - postgres
export PGPORT=1922
export PGDATA=/data02/pg12_$PGPORT/pg_root
export PGHOST=$PGDATA
pgbench -i -s 10000
1000000000 of 1000000000 tuples (100%) done (elapsed 1082.59 s, remaining 0.00 s)
checkpoint;
set max_parallel_workers_per_gather =16;
set enable_indexscan =on;
set enable_indexonlyscan =on;
set enable_bitmapscan =off;
set enable_seqscan=off;
explain (analyze) select count(*) from pgbench_accounts ;
set enable_indexscan =off;
set enable_indexonlyscan =off;
set enable_bitmapscan =off;
set enable_seqscan=on;
explain (analyze) select count(*) from pgbench_accounts ;
pgbench -M prepared -n -r -P 3 -c 128 -j 128 -T 120 -S
transaction type: <builtin: select only>
scaling factor: 10000
query mode: prepared
number of clients: 128
number of threads: 128
duration: 120 s
number of transactions actually processed: 4473595
latency average = 3.433 ms
latency stddev = 5.682 ms
tps = 37253.374526 (including connections establishing)
tps = 37259.270387 (excluding connections establishing)
statement latencies in milliseconds:
0.000 \set aid random(1, 100000 * :scale)
3.434 SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
pgbench -M prepared -n -r -P 3 -c 128 -j 128 -T 120
transaction type: <builtin: TPC-B (sort of)>
scaling factor: 10000
query mode: prepared
number of clients: 128
number of threads: 128
duration: 120 s
number of transactions actually processed: 3790414
latency average = 4.052 ms
latency stddev = 12.467 ms
tps = 31545.429236 (including connections establishing)
tps = 31550.985798 (excluding connections establishing)
statement latencies in milliseconds:
0.001 \set aid random(1, 100000 * :scale)
0.000 \set bid random(1, 1 * :scale)
0.000 \set tid random(1, 10 * :scale)
0.000 \set delta random(-5000, 5000)
0.024 BEGIN;
3.700 UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :aid;
0.053 SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
0.077 UPDATE pgbench_tellers SET tbalance = tbalance + :delta WHERE tid = :tid;
0.062 UPDATE pgbench_branches SET bbalance = bbalance + :delta WHERE bid = :bid;
0.081 INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP);
0.052 END;
Write operation takes 1,082 seconds.
The query qps is 37,253 (with the server memory of 192 GB hit at a low rate and a worse I/O rt compared with local disk).
The read/write tps is 31,545, and the read/write qps is 157,725.
postgres=# \dt+
List of relations
Schema | Name | Type | Owner | Size | Description
--------+------------------+-------+----------+---------+-------------
public | pgbench_accounts | table | postgres | 1251 GB |
public | pgbench_branches | table | postgres | 3576 kB |
public | pgbench_history | table | postgres | 0 bytes |
public | pgbench_tellers | table | postgres | 42 MB |
(4 rows)
postgres=# \di+
List of relations
Schema | Name | Type | Owner | Table | Size | Description
--------+-----------------------+-------+----------+------------------+---------+-------------
public | pgbench_accounts_pkey | index | postgres | pgbench_accounts | 209 GB |
public | pgbench_branches_pkey | index | postgres | pgbench_branches | 4408 kB |
public | pgbench_tellers_pkey | index | postgres | pgbench_tellers | 43 MB |
(3 rows)
10000000000 of 10000000000 tuples (100%) done (elapsed 9486.77 s, remaining 0.00 s)
pgbench -M prepared -n -r -P 3 -c 128 -j 128 -T 120 -S
transaction type: <builtin: select only>
scaling factor: 100000
query mode: prepared
number of clients: 128
number of threads: 128
duration: 120 s
number of transactions actually processed: 19059491
latency average = 0.806 ms
latency stddev = 14.075 ms
tps = 158451.015605 (including connections establishing)
tps = 158486.055663 (excluding connections establishing)
statement latencies in milliseconds:
0.001 \set aid random(1, 100000 * :scale)
0.813 SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
Total DISK READ : 4.56 G/s | Total DISK WRITE : 16.16 M/s
Actual DISK READ: 4.56 G/s | Actual DISK WRITE: 16.49 M/s
最高到了 7 GB/s
pgbench -M prepared -n -r -P 3 -c 128 -j 128 -T 120
transaction type: <builtin: TPC-B (sort of)>
scaling factor: 100000
query mode: prepared
number of clients: 128
number of threads: 128
duration: 120 s
number of transactions actually processed: 2969763
latency average = 5.171 ms
latency stddev = 27.769 ms
tps = 24720.906138 (including connections establishing)
tps = 24727.358605 (excluding connections establishing)
statement latencies in milliseconds:
0.001 \set aid random(1, 100000 * :scale)
0.001 \set bid random(1, 1 * :scale)
0.001 \set tid random(1, 10 * :scale)
0.001 \set delta random(-5000, 5000)
0.088 BEGIN;
3.596 UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :aid;
0.304 SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
0.223 UPDATE pgbench_tellers SET tbalance = tbalance + :delta WHERE tid = :tid;
0.218 UPDATE pgbench_branches SET bbalance = bbalance + :delta WHERE bid = :bid;
0.412 INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP);
0.340 END;
10000000000 of 10000000000 tuples (100%) done (elapsed 8195.55 s, remaining 0.00 s)
pgbench -M prepared -n -r -P 3 -c 128 -j 128 -T 120 -S
transaction type: <builtin: select only>
scaling factor: 100000
query mode: prepared
number of clients: 128
number of threads: 128
duration: 120 s
number of transactions actually processed: 2536629
latency average = 6.055 ms
latency stddev = 7.617 ms
tps = 21123.418940 (including connections establishing)
tps = 21126.649320 (excluding connections establishing)
statement latencies in milliseconds:
0.001 \set aid random(1, 100000 * :scale)
6.054 SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
Total DISK READ : 793.93 M/s | Total DISK WRITE : 1475.73 K/s
Actual DISK READ: 793.21 M/s | Actual DISK WRITE: 1351.46 K/s
最高900多MB/s
pgbench -M prepared -n -r -P 3 -c 128 -j 128 -T 120
transaction type: <builtin: TPC-B (sort of)>
scaling factor: 100000
query mode: prepared
number of clients: 128
number of threads: 128
duration: 120 s
number of transactions actually processed: 2071345
latency average = 7.415 ms
latency stddev = 9.884 ms
tps = 17243.124457 (including connections establishing)
tps = 17246.457208 (excluding connections establishing)
statement latencies in milliseconds:
0.001 \set aid random(1, 100000 * :scale)
0.000 \set bid random(1, 1 * :scale)
0.000 \set tid random(1, 10 * :scale)
0.000 \set delta random(-5000, 5000)
0.019 BEGIN;
7.153 UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :aid;
0.047 SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
0.077 UPDATE pgbench_tellers SET tbalance = tbalance + :delta WHERE tid = :tid;
0.051 UPDATE pgbench_branches SET bbalance = bbalance + :delta WHERE bid = :bid;
0.037 INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP);
0.029 END;
10000000000 of 10000000000 tuples (100%) done (elapsed 7983.23 s, remaining 0.00 s)
pgbench -M prepared -n -r -P 3 -c 128 -j 128 -T 120 -S
transaction type: <builtin: select only>
scaling factor: 100000
query mode: prepared
number of clients: 128
number of threads: 128
duration: 120 s
number of transactions actually processed: 2702617
latency average = 5.683 ms
latency stddev = 4.769 ms
tps = 22504.744988 (including connections establishing)
tps = 22508.795301 (excluding connections establishing)
statement latencies in milliseconds:
0.000 \set aid random(1, 100000 * :scale)
5.681 SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
postgres=# select wait_event,count(*) from pg_stat_activity group by 1 order by 2 desc;
wait_event | count
---------------------+-------
DataFileRead | 128
BgWriterMain | 1
| 1
AutoVacuumMain | 1
CheckpointerMain | 1
LogicalLauncherMain | 1
WalWriterMain | 1
(7 rows)
pgbench -M prepared -n -r -P 3 -c 128 -j 128 -T 120
transaction type: <builtin: TPC-B (sort of)>
scaling factor: 100000
query mode: prepared
number of clients: 128
number of threads: 128
duration: 120 s
number of transactions actually processed: 2052621
latency average = 7.482 ms
latency stddev = 5.452 ms
tps = 17087.070214 (including connections establishing)
tps = 17090.310318 (excluding connections establishing)
statement latencies in milliseconds:
0.001 \set aid random(1, 100000 * :scale)
0.000 \set bid random(1, 1 * :scale)
0.000 \set tid random(1, 10 * :scale)
0.000 \set delta random(-5000, 5000)
0.020 BEGIN;
7.224 UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :aid;
0.046 SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
0.073 UPDATE pgbench_tellers SET tbalance = tbalance + :delta WHERE tid = :tid;
0.052 UPDATE pgbench_branches SET bbalance = bbalance + :delta WHERE bid = :bid;
0.037 INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP);
0.029 END;
The bandwidth and IOPS of cloud disks are related to the capacity of cloud disks as well as the type of ECS instances. Therefore, test 104-core 384 GB ECS with 20 TB ESSD PL3 additionally.
104-core 384 GB ECS with 20 TB ESSD PL3:
pg_test_fsync
5 seconds per test
O_DIRECT supported on this platform for open_datasync and open_sync.
Compare file sync methods using one 8kB write:
(in wal_sync_method preference order, except fdatasync is Linux's default)
open_datasync 6358.070 ops/sec 157 usecs/op
fdatasync 6221.143 ops/sec 161 usecs/op
fsync 3922.576 ops/sec 255 usecs/op
fsync_writethrough n/a
open_sync 4078.494 ops/sec 245 usecs/op
Compare file sync methods using two 8kB writes:
(in wal_sync_method preference order, except fdatasync is Linux's default)
open_datasync 3237.264 ops/sec 309 usecs/op
fdatasync 5393.018 ops/sec 185 usecs/op
fsync 3499.650 ops/sec 286 usecs/op
fsync_writethrough n/a
open_sync 2117.949 ops/sec 472 usecs/op
Compare open_sync with different write sizes:
(This is designed to compare the cost of writing 16kB in different write
open_sync sizes.)
1 * 16kB open_sync write 3684.151 ops/sec 271 usecs/op
2 * 8kB open_sync writes 2083.492 ops/sec 480 usecs/op
4 * 4kB open_sync writes 1142.101 ops/sec 876 usecs/op
8 * 2kB open_sync writes 85.792 ops/sec 11656 usecs/op
16 * 1kB open_sync writes 69.520 ops/sec 14384 usecs/op
Test if fsync on non-write file descriptor is honored:
(If the times are similar, fsync() can sync data written on a different
descriptor.)
write, fsync, close 3878.116 ops/sec 258 usecs/op
write, close, fsync 3795.515 ops/sec 263 usecs/op
Non-sync'ed 8kB writes:
write 325926.283 ops/sec 3 usecs/op
FIO:
pg_test_fsync
5 seconds per test
O_DIRECT supported on this platform for open_datasync and open_sync.
Compare file sync methods using one 8kB write:
(in wal_sync_method preference order, except fdatasync is Linux's default)
open_datasync 6358.070 ops/sec 157 usecs/op
fdatasync 6221.143 ops/sec 161 usecs/op
fsync 3922.576 ops/sec 255 usecs/op
fsync_writethrough n/a
open_sync 4078.494 ops/sec 245 usecs/op
Compare file sync methods using two 8kB writes:
(in wal_sync_method preference order, except fdatasync is Linux's default)
open_datasync 3237.264 ops/sec 309 usecs/op
fdatasync 5393.018 ops/sec 185 usecs/op
fsync 3499.650 ops/sec 286 usecs/op
fsync_writethrough n/a
open_sync 2117.949 ops/sec 472 usecs/op
Compare open_sync with different write sizes:
(This is designed to compare the cost of writing 16kB in different write
open_sync sizes.)
1 * 16kB open_sync write 3684.151 ops/sec 271 usecs/op
2 * 8kB open_sync writes 2083.492 ops/sec 480 usecs/op
4 * 4kB open_sync writes 1142.101 ops/sec 876 usecs/op
8 * 2kB open_sync writes 85.792 ops/sec 11656 usecs/op
16 * 1kB open_sync writes 69.520 ops/sec 14384 usecs/op
Test if fsync on non-write file descriptor is honored:
(If the times are similar, fsync() can sync data written on a different
descriptor.)
write, fsync, close 3878.116 ops/sec 258 usecs/op
write, close, fsync 3795.515 ops/sec 263 usecs/op
Non-sync'ed 8kB writes:
write 325926.283 ops/sec 3 usecs/op
1 billion:
pgbench -i -s 10000
1000000000 of 1000000000 tuples (100%) done (elapsed 1058.21 s, remaining 0.00 s)
checkpoint;
set max_parallel_workers_per_gather =16;
set enable_indexscan =on;
set enable_indexonlyscan =on;
set enable_bitmapscan =off;
set enable_seqscan=off;
explain (analyze) select count(*) from pgbench_accounts ;
set enable_indexscan =off;
set enable_indexonlyscan =off;
set enable_bitmapscan =off;
set enable_seqscan=on;
explain (analyze) select count(*) from pgbench_accounts ;
pgbench -M prepared -n -r -P 3 -c 52 -j 52 -T 120 -S
transaction type: <builtin: select only>
scaling factor: 10000
query mode: prepared
number of clients: 52
number of threads: 52
duration: 120 s
number of transactions actually processed: 106501930
latency average = 0.058 ms
latency stddev = 0.017 ms
tps = 887505.958088 (including connections establishing)
tps = 887576.878385 (excluding connections establishing)
statement latencies in milliseconds:
0.001 \set aid random(1, 100000 * :scale)
0.058 SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
pgbench -M prepared -n -r -P 3 -c 52 -j 52 -T 120
transaction type: <builtin: TPC-B (sort of)>
scaling factor: 10000
query mode: prepared
number of clients: 52
number of threads: 52
duration: 120 s
number of transactions actually processed: 9363834
latency average = 0.666 ms
latency stddev = 4.141 ms
tps = 78006.043823 (including connections establishing)
tps = 78014.124808 (excluding connections establishing)
statement latencies in milliseconds:
0.001 \set aid random(1, 100000 * :scale)
0.001 \set bid random(1, 1 * :scale)
0.000 \set tid random(1, 10 * :scale)
0.000 \set delta random(-5000, 5000)
0.020 BEGIN;
0.417 UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :aid;
0.039 SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
0.053 UPDATE pgbench_tellers SET tbalance = tbalance + :delta WHERE tid = :tid;
0.046 UPDATE pgbench_branches SET bbalance = bbalance + :delta WHERE bid = :bid;
0.046 INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP);
0.044 END;
10 billion:
pgbench -i -s 100000
10000000000 of 10000000000 tuples (100%) done (elapsed 7789.99 s, remaining 0.00 s)
pgbench -M prepared -n -r -P 3 -c 52 -j 52 -T 120 -S
transaction type: <builtin: select only>
scaling factor: 100000
query mode: prepared
number of clients: 52
number of threads: 52
duration: 120 s
number of transactions actually processed: 3833782
latency average = 1.627 ms
latency stddev = 1.566 ms
tps = 31946.758650 (including connections establishing)
tps = 31949.973049 (excluding connections establishing)
statement latencies in milliseconds:
0.001 \set aid random(1, 100000 * :scale)
1.628 SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
pgbench -M prepared -n -r -P 3 -c 52 -j 52 -T 120
transaction type: <builtin: TPC-B (sort of)>
scaling factor: 100000
query mode: prepared
number of clients: 52
number of threads: 52
duration: 120 s
number of transactions actually processed: 2287129
latency average = 2.728 ms
latency stddev = 2.637 ms
tps = 19056.932156 (including connections establishing)
tps = 19058.904144 (excluding connections establishing)
statement latencies in milliseconds:
0.001 \set aid random(1, 100000 * :scale)
0.000 \set bid random(1, 1 * :scale)
0.000 \set tid random(1, 10 * :scale)
0.000 \set delta random(-5000, 5000)
0.016 BEGIN;
2.504 UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :aid;
0.038 SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
0.066 UPDATE pgbench_tellers SET tbalance = tbalance + :delta WHERE tid = :tid;
0.046 UPDATE pgbench_branches SET bbalance = bbalance + :delta WHERE bid = :bid;
0.031 INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP);
0.024 END;
Storage | Disk Configuration | Nominal IOPS of the Disk | CPU | Memory |
---|---|---|---|---|
ESSD PL3 | 20 TB | 1,000,000, 4 GB/s bandwidth | 52 cores 104 threads 2.5 GHz | 384 GB |
Test Case | 104-thread machine with 20 TB ESSD PL3 |
---|---|
fdatasync 8K | 6221 |
Test Case | 104-thread machine with 20 TB ESSD PL3 |
---|---|
8k random write | IOPS=167k, BW=1,303 MiB/s |
8k random read | IOPS=141k, BW=1,103 MiB/s |
8k sequential write | IOPS=158k, BW=1,231 MiB/s |
8K sequential read | IOPS=121k, BW=943 MiB/s |
Test Case | 104-thread machine with 20 TB ESSD PL3 |
---|---|
1 billion write | 1,058 seconds |
1 billion read-only | qps: 887,505 |
1 billion read and write | tps: 78,006, qps: 390,030 |
Test Case | 104-thread machine with 20 TB ESSD PL3 |
---|---|
10 billion write | 7,789 seconds |
10 billion read-only | qps: 31,946 |
10 billion read and write | tps: 19,056, qps: 95,280 |
26-core 52-thread 192 GB-memory machine with 1.8 TB ESSD:
export PGPORT=1923
export PGDATA=/data03/pg12_$PGPORT/pg_root
export PGHOST=$PGDATA
initdb -D $PGDATA -U postgres -E SQL_ASCII --lc-collate=C --lc-ctype=en_US.utf8 --wal-segsize=1024
postgres@iZbp1621kc3de3pm12a0exZ-> pg_test_fsync
5 seconds per test
O_DIRECT supported on this platform for open_datasync and open_sync.
Compare file sync methods using one 8kB write:
(in wal_sync_method preference order, except fdatasync is Linux's default)
open_datasync 6455.059 ops/sec 155 usecs/op
fdatasync 6114.725 ops/sec 164 usecs/op
fsync 4182.573 ops/sec 239 usecs/op
fsync_writethrough n/a
open_sync 4318.160 ops/sec 232 usecs/op
Compare file sync methods using two 8kB writes:
(in wal_sync_method preference order, except fdatasync is Linux's default)
open_datasync 3187.113 ops/sec 314 usecs/op
fdatasync 5170.527 ops/sec 193 usecs/op
fsync 3430.941 ops/sec 291 usecs/op
fsync_writethrough n/a
open_sync 2123.727 ops/sec 471 usecs/op
Compare open_sync with different write sizes:
(This is designed to compare the cost of writing 16kB in different write
open_sync sizes.)
1 * 16kB open_sync write 3902.495 ops/sec 256 usecs/op
2 * 8kB open_sync writes 2136.396 ops/sec 468 usecs/op
4 * 4kB open_sync writes 1138.692 ops/sec 878 usecs/op
8 * 2kB open_sync writes 90.899 ops/sec 11001 usecs/op
16 * 1kB open_sync writes 74.271 ops/sec 13464 usecs/op
Test if fsync on non-write file descriptor is honored:
(If the times are similar, fsync() can sync data written on a different
descriptor.)
write, fsync, close 4072.978 ops/sec 246 usecs/op
write, close, fsync 4136.551 ops/sec 242 usecs/op
Non-sync'ed 8kB writes:
write 296604.607 ops/sec 3 usecs/op
. ./fiotest 32 8 8K randwrite /data03/test libaio 1 16G
write: IOPS=91.9k, BW=718MiB/s (753MB/s)(42.1GiB/60019msec)
. ./fiotest 32 8 8K randread /data03/test libaio 1 16G
read: IOPS=92.2k, BW=720MiB/s (755MB/s)(42.2GiB/60003msec)
. ./fiotest 32 8 8K write /data03/test libaio 1 16G
write: IOPS=91.9k, BW=718MiB/s (753MB/s)(42.1GiB/60003msec)
. ./fiotest 32 8 8K read /data03/test libaio 1 16G
read: IOPS=92.0k, BW=719MiB/s (754MB/s)(42.2GiB/60026msec)
pgbench -i -s 10000
1000000000 of 1000000000 tuples (100%) done (elapsed 735.47 s, remaining 0.00 s)
checkpoint;
set max_parallel_workers_per_gather =16;
set enable_indexscan =on;
set enable_indexonlyscan =on;
set enable_bitmapscan =off;
set enable_seqscan=off;
explain (analyze) select count(*) from pgbench_accounts ;
set enable_indexscan =off;
set enable_indexonlyscan =off;
set enable_bitmapscan =off;
set enable_seqscan=on;
explain (analyze) select count(*) from pgbench_accounts ;
pgbench -M prepared -n -r -P 3 -c 32 -j 32 -T 120 -S
transaction type: <builtin: select only>
scaling factor: 10000
query mode: prepared
number of clients: 32
number of threads: 32
duration: 120 s
number of transactions actually processed: 9111047
latency average = 0.421 ms
latency stddev = 1.461 ms
tps = 75913.573252 (including connections establishing)
tps = 75917.478749 (excluding connections establishing)
statement latencies in milliseconds:
0.000 \set aid random(1, 100000 * :scale)
0.421 SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
pgbench -M prepared -n -r -P 3 -c 32 -j 32 -T 120
transaction type: <builtin: TPC-B (sort of)>
scaling factor: 10000
query mode: prepared
number of clients: 32
number of threads: 32
duration: 120 s
number of transactions actually processed: 4021731
latency average = 0.955 ms
latency stddev = 1.840 ms
tps = 33511.413835 (including connections establishing)
tps = 33513.166609 (excluding connections establishing)
statement latencies in milliseconds:
0.001 \set aid random(1, 100000 * :scale)
0.000 \set bid random(1, 1 * :scale)
0.000 \set tid random(1, 10 * :scale)
0.000 \set delta random(-5000, 5000)
0.019 BEGIN;
0.756 UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :aid;
0.036 SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
0.046 UPDATE pgbench_tellers SET tbalance = tbalance + :delta WHERE tid = :tid;
0.039 UPDATE pgbench_branches SET bbalance = bbalance + :delta WHERE bid = :bid;
0.032 INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP);
0.025 END;
Storage | Disk Configuration | Nominal IOPS of the Disk | CPU | Memory |
---|---|---|---|---|
Local SSD | 1.8 TB * 8 | 240,000 for each disk | 32 cores 64 threads 2.5 GHz | 512 GB |
ESSD PL3 | 1.8 TB | 91,800 | 26 cores 52 threads 2.5 GHz | 192 GB |
ESSD PL3 | 1.8 TB * 8 | 91,800 for each disk | 26 cores 52 threads 2.5 GHz | 192 GB |
ESSD PL3 | 20 TB | 1,000,000 | 26 cores 52 threads 2.5 GHz | 192 GB |
ESSD PL3 | 20 TB | 1,000,000 | 52 cores 104 threads 2.5 GHz | 384 GB |
This test also includes the single I/O capacity and the synchronous commit latency of databases.
Test Case | Local SSD 1.8 TB * 8 | ESSD PL3 1.8 TB * 8 | (26-core) ESSD PL3 20 TB | (52-core) ESSD PL3 20 TB | (26-core) ESSD PL3 1.8 TB |
---|---|---|---|---|---|
fdatasync 8K | 28,216 | 6,087 | 5,901 | 6,221 | 6,114 |
Test Case | Local SSD 1.8 TB * 8 | ESSD PL3 1.8 TB * 8 | (26-core) ESSD PL3 20 TB | (52-core) ESSD PL3 20 TB | (26-core) ESSD PL3 1.8 TB |
---|---|---|---|---|---|
8k random write | IOPS=131k, BW=1024 MiB/s | IOPS=106k, BW=825 MiB/s | IOPS=106k, BW=826 MiB/s | IOPS=167k, BW=1303 MiB/s | IOPS=91.9k, BW=718 MiB/s |
8k random read | IOPS=153k, BW=1193 MiB/s | IOPS=106k, BW=826 MiB/s | IOPS=106k, BW=830 MiB/s | IOPS=141k, BW=1103 MiB/s | IOPS=92.2k, BW=720 MiB/s |
8 KB sequential write | IOPS=134k, BW=1050 MiB/s | IOPS=106k, BW=826 MiB/s | IOPS=106k, BW=824 MiB/s | IOPS=158k, BW=1231 MiB/s | IOPS=91.9k, BW=718 MiB/s |
8 KB sequential read | IOPS=151k, BW=1182 MiB/s | IOPS=106k, BW=829 MiB/s | IOPS=106k, BW=825 MiB/s | IOPS=121k, BW=943 MiB/s | IOPS=92.0k, BW=719 MiB/s |
Test Case | Local SSD 1.8 TB * 8 | ESSD PL3 1.8 TB * 8 | (26-core) ESSD PL3 20 TB | (52-core) ESSD PL3 20 TB | (26-core) ESSD PL3 1.8 TB |
---|---|---|---|---|---|
1 billion write | 1,058 seconds | 948 seconds | 1,082 seconds | 1,058 seconds | 735 seconds |
1 billion read-only | qps: 627,000 (with the server memory of 512 GB all hit) | qps: 64,979 (with the server memory of 192 GB hit at a low rate and a worse I/O rt compared with local disk) | qps: 37,253 (with the server memory of 192 GB hit at a low rate and a worse I/O rt compared with local disk) | qps: 887,505 (with the server memory of 384 GB all hit) | qps: 75,913 (with the server memory of 192 GB hit at a low rate and a worse I/O rt compared with local disk) |
1 billion read and write | tps: 31,545; qps: 157,725 | tps: 25,595; qps: 127,975 | tps: 28,627; qps: 143,135 | tps: 78,006; qps: 390,030 | tps: 33,511; qps: 167,555 |
Test Case | Local SSD 1.8 TB * 8 | ESSD PL3 1.8TB * 8 | (26-core) ESSD PL3 20 TB | (52-core) ESSD PL3 20 TB |
---|---|---|---|---|
10 billion write | 9,486 seconds | 8,195 seconds | 7,983 seconds | 7,789 seconds |
10 billion read-only | qps: 158,451 | qps: 21,123 | qps: 22,504 | qps: 31,946 |
10 billion read and write | tps: 24,720; qps: 123,600 | tps: 17,243; qps: 86,215 | tps: 17,087; qps: 85,435 | tps: 19,056; qps: 95,280 |
The latency of a single I/O is higher (164us vs. 30us). What’s the solution?
Parallelization is needed to make up for the cloud disk I/O latency.
1) Partition the table as much as possible to improve vacuum throughput.
2) Adopt 32 KB block size for analytical use cases to increase the single I/O and the throughput.
3) Pre-read block devices based on the operation system or database.
4) The IOPS and bandwidth of cloud disks are not only restricted by the disk size but also by the ECS instance specification. ECS instances of higher specifications are recommended for I/O-sensitive use cases.
Table creation, indexing, and queries are all parallelized. See PG v12 parallel computing.
Enable backend flush to involve more processes in I/O operations to improve throughput and reduce the checkpoint burden.
How PostgreSQL 12 Generates Less Log Data When Monitoring and Detecting Incomplete Startup Packet
digoal - May 24, 2021
Junho Lee - June 22, 2023
Junho Lee - June 15, 2023
Alibaba Clouder - May 15, 2018
digoal - August 3, 2021
digoal - May 24, 2021
Alibaba Cloud PolarDB for PostgreSQL is an in-house relational database service 100% compatible with PostgreSQL and highly compatible with the Oracle syntax.
Learn MoreElastic and secure virtual cloud servers to cater all your cloud hosting needs.
Learn MoreBlock-level data storage attached to ECS instances to achieve high performance, low latency, and high reliability
Learn MoreAn online MPP warehousing service based on the Greenplum Database open source program
Learn MoreMore Posts by digoal