By digoal
The PolarDB for PostgreSQL project opened the relevant source code of the core storage architecture on September 1, 2021. This architecture is similar to the Oracle RAC architecture. It supports the creation of multiple instances on single storage, enabling Write Once Read Many and rapid read expansion. Currently, the open-source code can be quickly deployed on a stand-alone computer. The code of distributed PolarDB file system (PFS) will be available soon. You can use PFS to run PolarDB on distributed shared storage.
The industry's first open-source cloud-native database contains a large number of core self-developed technologies of Alibaba Cloud databases. In the future, features (such as cross-machine parallel execution) will be introduced to enhance analysis capabilities. The original distributed version continues to evolve on the distributed branch.
Stay tuned for more information: https://github.com/alibaba/PolarDB-for-PostgreSQL
CentOS 7.9 16c64g 2TB ssd
Please see Why Does PolarDB Solve the Performance Problem of FPW? for more information about machine deployment.
yum -y install coreutils glib2 lrzsz dstat sysstat e4fsprogs xfsprogs ntp readline-devel zlib-devel openssl-devel pam-devel libxml2-devel libxslt-devel python-devel tcl-devel gcc gcc-c++ make smartmontools flex bison perl-devel perl-ExtUtils* openldap-devel jadetex openjade bzip2 git iotop lvm2 perf centos-release-scl
rpm -ivh https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm
parted -a optimal -s /dev/vdb mklabel gpt mkpart primary 1MiB 100%FREE
mkfs.ext4 /dev/vdb1 -m 0 -O extent,uninit_bg -b 4096 -L lv01
vi /etc/fstab
LABEL=lv01 /data01 ext4 defaults,noatime,nodiratime,nodelalloc,barrier=0,data=writeback 0 0
mkdir /data01
mount -a
vi /etc/sysctl.conf
# add by digoal.zhou
fs.aio-max-nr = 1048576
fs.file-max = 76724600
# Options: Set the kernel.core_pattern parameter to /data01/corefiles/core_%e_%u_%t_%s.%p.
# The /data01/corefiles directory that is used to store core dumps is created with the 777 permission before testing. If a symbolic link is used, change the corresponding directory to 777.
kernel.sem = 4096 2147483647 2147483646 512000
# Specify the semaphore. You can run the ipcs -l or -u command to obtain the semaphore count. Each group of 16 processes requires a semaphore with a count of 17.
kernel.shmall = 107374182
# Specify the total size of shared memory segments. Recommended value: 80% of the memory capacity. Unit: pages.
kernel.shmmax = 274877906944
# Specify the maximum size of a single shared memory segment. Recommended value: 50% of the memory capacity. Unit: bytes. In PostgreSQL versions later than 9.2, the use of shared memory significantly drops.
kernel.shmmni = 819200
# Specify the total number of shared memory segments that can be generated. There are at least 2 shared memory segments within each PostgreSQL cluster.
net.core.netdev_max_backlog = 10000
net.core.rmem_default = 262144
# The default setting of the socket receive buffer in bytes.
net.core.rmem_max = 4194304
# The maximum receive socket buffer size in bytes
net.core.wmem_default = 262144
# The default setting (in bytes) of the socket send buffer.
net.core.wmem_max = 4194304
# The maximum send socket buffer size in bytes.
net.core.somaxconn = 4096
net.ipv4.tcp_max_syn_backlog = 4096
net.ipv4.tcp_keepalive_intvl = 20
net.ipv4.tcp_keepalive_probes = 3
net.ipv4.tcp_keepalive_time = 60
net.ipv4.tcp_mem = 8388608 12582912 16777216
net.ipv4.tcp_fin_timeout = 5
net.ipv4.tcp_synack_retries = 2
net.ipv4.tcp_syncookies = 1
# Enable SYN cookies. If the SYN waiting queue overflows, you can enable SYN cookies to defend against a small number of SYN attacks.
net.ipv4.tcp_timestamps = 1
# Reduce time_wait.
net.ipv4.tcp_tw_recycle = 0
# If you set this parameter to 1, sockets in the TIME-WAIT state over TCP connections are recycled. However, if network address translation (NAT) is used, TCP connections may fail. We recommend that you set this parameter to 0 on the database server.
net.ipv4.tcp_tw_reuse = 1
# Enable the reuse function. This function enables network sockets in the TIME-WAIT state to be reused over new TCP connections.
net.ipv4.tcp_max_tw_buckets = 262144
net.ipv4.tcp_rmem = 8192 87380 16777216
net.ipv4.tcp_wmem = 8192 65536 16777216
net.nf_conntrack_max = 1200000
net.netfilter.nf_conntrack_max = 1200000
vm.dirty_background_bytes = 409600000
# If the size of dirty pages reaches the specified limit, a background scheduling process (for example, pdflush) is invoked to flush the dirty pages to disks. These are the pages that are generated n seconds earlier. The value of n is calculated by using the following formula: n = The value of the dirty_expire_centisecs parameter/100.
# The default limit is 10% of the memory capacity. We recommend that you specify the limit in bytes for machines with large memory capacity.
vm.dirty_expire_centisecs = 3000
# Specify the maximum period to retain dirty pages. Dirty pages are flushed to disks after the time period specified by this parameter elapses. The value 3000 indicates 30 seconds.
vm.dirty_ratio = 95
# If the process that the system flushes dirty pages is too slow, causing the system dirty pages to exceed 95% of the memory, the process that users call to write data onto disks must actively flush dirty pages to disks (These processes include fsync, fdatasync, etc.).
# Set this parameter properly to prevent user-called processes from flushing dirty pages to disks. This allows you to create multiple ApsaraDB RDS instances on a stand-alone server and use control groups to limit the input/output operations per second (IOPS) per instance.
vm.dirty_writeback_centisecs = 100
# Specify the time interval at which the background scheduling process (such as pdflush) flushes dirty pages to disks. The value 100 indicates 1 second.
vm.swappiness = 0
# Disable the swap function.
vm.mmap_min_addr = 65536
vm.overcommit_memory = 0
# When allocating memory, more memory space than the malloc is allowed. If you set this parameter to 1, the system always considers the available memory space sufficient. If the memory capacity provided in the test environment is low, we recommend that you set this parameter to 1.
vm.overcommit_ratio = 90
# Specify the memory capacity that can be allocated when the overcommit_memory parameter is set to 2.
vm.swappiness = 0
# Disable the swap partition.
vm.zone_reclaim_mode = 0
# Disable non-uniform memory access (NUMA). You can also disable NUMA in the vmlinux file.
net.ipv4.ip_local_port_range = 40000 65535
# Specify the range of TCP or UDP port numbers that are automatically allocated locally. fs.nr_open=20480000
# Specify the maximum number of file handles a single process can open.
# Take note of the following parameters:
# vm.extra_free_kbytes = 4096000
# vm.min_free_kbytes = 2097152 # vm.min_free_kbytes We recommend that you set the value of the vm.min_free_kbytes parameter to 1 GB for every 32 GB of memory.
# If the physical host does not provide much memory, we recommend that you do not configure vm.extra_free_kbytes and vm.min_free_kbytes.
# vm.nr_hugepages = 66536
# If the size of the shared buffer exceeds 64 GB, we recommend that you use huge pages. You can specify the page size by setting the Hugepagesize parameter in the /proc/meminfo file.
# vm.lowmem_reserve_ratio = 1 1 1
# If the memory capacity exceeds 64 GB, we recommend that you set this parameter. Otherwise, we recommend that you retain the default value 256 256 32.
sysctl -p
vi /etc/security/limits.d/20-nproc.conf
# If nofile exceeds 1048576, the fs.nr_open of sysctl must be set to a larger value, and then you can continue to set nofile after sysctl takes effect.
* soft nofile 1024000
* hard nofile 1024000
* soft nproc unlimited
* hard nproc unlimited
* soft core unlimited
* hard core unlimited
* soft memlock unlimited
* hard memlock unlimited
chmod +x /etc/rc.d/rc.local
vi /etc/rc.local
touch /var/lock/subsys/local
if test -f /sys/kernel/mm/transparent_hugepage/enabled; then
echo never > /sys/kernel/mm/transparent_hugepage/enabled
fi
yum install -y readline-devel zlib-devel perl-CPAN bison flex git
cpan -fi Test::More IPC::Run
useradd polardb -d /data01/polardb
su - polardb
cd ~
git clone https://github.com/alibaba/PolarDB-for-PostgreSQL.git
./polardb_build.sh --withrep --repnum=2 --withstandby
PS: If you want to modify lc_collate in initdb, modify the polardb_build.sh.sh script. If you want to modify some parameters of postgresql.conf, modify the polardb_build.sh.sh
Test:
$HOME/tmp_basedir_polardb_pg_1100_bld/bin/psql -p 5432 -c 'select version();'
$HOME/tmp_basedir_polardb_pg_1100_bld/bin/psql -p 5432 -c 'select * from pg_replication_slots;'
Database started:
[polardb@iZbp15sgewxe2ioglp30z4Z PolarDB-for-PostgreSQL]$ ipcs
------ Message Queues --------
key msqid owner perms used-bytes messages
------ Shared Memory Segments --------
key shmid owner perms bytes nattch status
0x0052e2c1 16 polardb 600 56 15
0x0052e6a9 17 polardb 600 56 7
0x0052ea91 18 polardb 600 56 7
0x0052ee79 19 polardb 600 56 6
------ Semaphore Arrays --------
key semid owner perms nsems
Directory Explanation:
cd ~
[polardb@iZbp15sgewxe2ioglp30z4Z ~]$ du -sh *
968M PolarDB-for-PostgreSQL PolarDB Source Code
92M tmp_basedir_polardb_pg_1100_bld PolarDB Binary Software
840M tmp_datadir_polardb_pg_1100_bld Shared Data of RW RO Instance
24M tmp_primary_dir_polardb_pg_1100_bld Local Content of RW Instance
23M tmp_replica_dir_polardb_pg_1100_bld1 Local Content of RO Instance
23M tmp_replica_dir_polardb_pg_1100_bld2 Local Content of RO Instance
40M tmp_standby_datadir_polardb_pg_1100_bld Shared Data of Standby Instance
23M tmp_standby_dir_polardb_pg_1100_bld Local Content of Standby Instance
Configure environment variables for ease of use:
vi ~/.bashrc
# Append the following parameters:
export PS1="$USER@`/bin/hostname -s`-> "
export PGPORT=5432
export PGDATA=/data01/polardb/tmp_primary_dir_polardb_pg_1100_bld
export LANG=en_US.utf8
export PGHOME=/data01/polardb/tmp_basedir_polardb_pg_1100_bld
export LD_LIBRARY_PATH=$PGHOME/lib:/lib64:/usr/lib64:/usr/local/lib64:/lib:/usr/lib:/usr/local/lib:$LD_LIBRARY_PATH
export DATE=`date +"%Y%m%d%H%M"`
export PATH=$PGHOME/bin:$PATH:.
export MANPATH=$PGHOME/share/man:$MANPATH
export PGHOST=127.0.0.1
export PGUSER=postgres
export PGDATABASE=postgres
alias rm='rm -i'
alias ll='ls -lh'
unalias vi
Query:
postgres=# select version();
version
------------------------------------------------------------------------------------------------------------
PostgreSQL 11beta2 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-44), 64-bit
(1 row)
postgres=# select * from pg_replication_slots;
slot_name | plugin | slot_type | datoid | database | temporary | active | active_pid | xmin | catalog_xmin | restart_lsn | confirmed_flush_lsn
-----------+--------+-----------+--------+----------+-----------+--------+------------+------+--------------+-------------+---------------------
standby1 | | physical | | | f | t | 25125 | | | 0/185DCB0 |
replica2 | | physical | | | f | t | 25112 | | | 0/185DB98 |
replica1 | | physical | | | f | t | 25098 | | | 0/185DB98 |
(3 rows)
Why is it 11beta2? The PolarDB for PG was developed from beta2, and all the versions will be merged gradually in the future. We are looking forward to PG 14.
People who know greenplum must remember how greenplum was updated to Version 9.4 step by step and have experienced every PG version.
Perform stress testing:
pgbench -i -s 100
pgbench -M prepared -n -r -P 1 -c 16 -j 16 -T 120
Some parameters:
postgres=# select name,setting from pg_settings where name ~ 'polar';
name | setting
--------------------------------------------+------------------------------------------------------------
polar_bg_replay_batch_size | 20000
polar_bgwriter_batch_size_flushlist | 100
polar_bgwriter_max_batch_size | 5000
polar_bgwriter_sleep_lsn_lag | 100
polar_buffer_copy_lsn_lag_with_cons_lsn | 100
polar_buffer_copy_min_modified_count | 5
polar_check_checkpoint_legal_interval | 1
polar_clog_max_local_cache_segments | 128
polar_clog_slot_size | 128
polar_copy_buffers | 16384
polar_datadir | file-dio:///data01/polardb/tmp_datadir_polardb_pg_1100_bld
polar_disk_name | data01
polar_enable_async_ddl_lock_replay | on
polar_enable_parallel_bgwriter | on
polar_enable_redo_logindex | on
polar_fullpage_keep_segments | 16
polar_hostid | 2
polar_logindex_bloom_blocks | 1024
polar_logindex_mem_size | 512
polar_logindex_table_batch_size | 100
polar_logindex_unit_test | 0
polar_max_logindex_files | 80
polar_openfile_with_readonly_in_replica | off
polar_parallel_bgwriter_check_interval | 10
polar_parallel_bgwriter_delay | 10
polar_parallel_bgwriter_enable_dynamic | on
polar_parallel_bgwriter_workers | 5
polar_parallel_new_bgwriter_threshold_lag | 1024
polar_parallel_new_bgwriter_threshold_time | 10
polar_read_ahead_xlog_num | 200
polar_storage_cluster_name |
polar_streaming_xlog_meta | on
polar_version | 1.1.15
polar_vfs.localfs_mode | on
polar_vfs.max_direct_io_size | 1048576
polar_vfs.pfs_force_mount | on
polar_worker.enable_polar_worker | on
polar_worker.polar_worker_check_interval | 5
polar_worker.prealloc_wal_file_num | 2
polar_worker.xlog_temp_outdate_time | -1
polar_xlog_page_buffers | 1024
polar_xlog_queue_buffers | 512
(42 rows)
https://github.com/alibaba/PolarDB-for-PostgreSQL
We provide a script that uses default configuration to compile PolarDB. This section describes how to deploy a PolarDB instance with storage on your local disks with this script. System Requirements: CentOS 7.5 and above. The following setup steps are tested based on CentOS 7.5.
1. Download the source code of PolarDB from https://github.com/alibaba/PolarDB-for-PostgreSQL/tree/main.
2. Install dependent packages:
sudo yum install readline-devel zlib-devel perl-CPAN bison flex
sudo cpan -fi Test::More IPC::Run
3. Select the command according to your needs to start deployment.
Run the following commands to only compile the source code (without creating any instance):
./polardb_build.sh --noinit
Run the following commands to compile the source code and create an instance with one node (primary node). The port number is 5432.
./polardb_build.sh
Run the following commands to compile the source code and create an instance with two nodes. The nodes include:
./polardb_build.sh --withrep --repnum=1
Run the following commands to compile the source code and create an instance with three nodes. The nodes include:
./polardb_build.sh --withrep --repnum=1 --withstandby
Run the following commands to compile the source code and create an instance with four nodes. The nodes include:
./polardb_build.sh --withrep --repnum=2 --withstandby
4. After the deployment, please refer to the following steps to check and test the created instance to make sure it is deployed correctly.
Run the following commands to check the instance:
$HOME/tmp_basedir_polardb_pg_1100_bld/bin/psql -p 5432 -c 'select version();'
$HOME/tmp_basedir_polardb_pg_1100_bld/bin/psql -p 5432 -c 'select * from pg_replication_slots;'
Run the following commands to perform a full regression test:
./polardb_build.sh --withrep --repnum=1 --withstandby -r-check-all -e -r-contrib -r-pl -r-external -r-installcheck-all
rule ELT - The Real-Time Conversion of Row Store and Column Store of Time Series Data
Open-Source PolarDB for PostgreSQL on Shared Storage Block Device: Multi-Machine Deployment Practice
ApsaraDB - September 19, 2022
Alibaba Clouder - February 2, 2021
ApsaraDB - July 23, 2021
digoal - October 31, 2022
ApsaraDB - April 14, 2020
digoal - October 16, 2023
Alibaba Cloud PolarDB for PostgreSQL is an in-house relational database service 100% compatible with PostgreSQL and highly compatible with the Oracle syntax.
Learn MoreAlibaba Cloud PolarDB for Xscale (PolarDB-X) is a cloud-native high-performance distributed database service independently developed by Alibaba Cloud.
Learn MoreAlibaba Cloud PolarDB for MySQL is a cloud-native relational database service 100% compatible with MySQL.
Learn MoreMulti-source metrics are aggregated to monitor the status of your business and services in real time.
Learn MoreMore Posts by digoal