This topic describes the parallel INSERT feature of PolarDB for PostgreSQL.
Prerequisites
A PolarDB for PostgreSQL cluster that runs one of the following versions is created:
PostgreSQL 11 with a minor engine version of 1.1.17 or later
PostgreSQL 14 with a minor engine version of 14.8.12.0 or later
You can execute one of the following statements to query the minor engine version of your PolarDB for PostgreSQL cluster:
PostgreSQL 11
show polar_version;
PostgreSQL 14
select version();
Overview
PolarDB for PostgreSQL provides the elastic parallel query (ePQ) feature that allows an SQL query to be run on multiple compute nodes in parallel. This improves the query performance of PolarDB for PostgreSQL. You can also use the ePQ feature to write data in parallel by starting multiple processes on the primary node of a cluster. This increases the execution speed of the INSERT
statement.
The parallel INSERT
feature is applicable to the SQL statements that both read and write data, such as the INSERT INTO ... SELECT ...
statement. For the SELECT
section of the SQL statement, ePQ starts multiple processes to perform the query in parallel. For the INSERT
section of the SQL statement, ePQ starts multiple processes on the primary node to write data in parallel. The Motion operator is used to transfer data between the read and write processes.
The following types of tables support the parallel INSERT
feature:
Standard tables
Partitioned tables
Part of external tables
The parallel INSERT
feature allows you to dynamically adjust the degree of parallelism (DOP) of parallel writes by adjusting the number of write processes. This way, the query performance can be increased by up to three times if the query speed is fast enough.
How it works
Parallel queries and parallel writes are performed simultaneously in the form of a pipeline. The following figure shows the execution process of parallel queries and parallel writes.When multiple
INSERT
operations are performed in parallel by using the ePQ feature, the following steps are performed:
The ePQ optimizer generates a hierarchy chart based on the syntax tree that is obtained from the query.
The ePQ executor distributes the hierarchy chart to each compute node and creates processes for parallel queries and parallel writes. Then, the processes are started.
The processes of parallel queries read data shards in parallel and send the data to the Motion operator.
The processes of parallel writes obtain the data from the Motion operator and write the data in parallel.
Parameters
polar_px_dop_per_node
: the DOP of parallel queries.polar_px_insert_dop_num
: the DOP of parallel writes.
If you gradually increase the DOP of parallel writes when the DOP of parallel queries is low, the execution of SQL statements gradually speeds up and then remains at a specific level because the query speed cannot keep up with the write speed.
If you gradually increase the DOP of parallel writes when the DOP of parallel queries is high, the execution of SQL statements gradually speeds up and then remains at a specific level. This is because parallel writes can be performed only on the primary node of a cluster. In this case, the write speed cannot keep up with the query speed due to heavy lock contention.
Examples
Create a table named
t1
and a table namedt2
, and insert data into thet1
table.CREATE TABLE t1 (id INT); CREATE TABLE t2 (id INT); INSERT INTO t1 SELECT generate_series(1,100000);
Enable the ePQ feature and the parallel
INSERT
feature.SET polar_enable_px TO ON; SET polar_px_enable_insert_select TO ON;
Insert all data of the
t1
table into thet2
table by executing theINSERT
statement. View the execution plan of parallelINSERT
operations.EXPLAIN INSERT INTO t2 SELECT * FROM t1; QUERY PLAN ----------------------------------------------------------------------------------------- Insert on t2 (cost=0.00..952.87 rows=33334 width=4) -> Result (cost=0.00..0.00 rows=0 width=0) -> PX Hash 6:3 (slice1; segments: 6) (cost=0.00..432.04 rows=100000 width=8) -> Partial Seq Scan on t1 (cost=0.00..431.37 rows=16667 width=4) Optimizer: PolarDB PX Optimizer (5 rows)
PX Hash 6:3
indicates that six processes are started to query data from thet1
table in parallel and three processes are started to write data to thet2
table in parallel.Specify the
polar_px_insert_dop_num
parameter to dynamically adjust the DOP of parallel writes. Then, view the execution plan of parallelINSERT
operations.SET polar_px_insert_dop_num TO 12; EXPLAIN INSERT INTO t2 SELECT * FROM t1; QUERY PLAN ------------------------------------------------------------------------------------------ Insert on t2 (cost=0.00..952.87 rows=8334 width=4) -> Result (cost=0.00..0.00 rows=0 width=0) -> PX Hash 6:12 (slice1; segments: 6) (cost=0.00..432.04 rows=100000 width=8) -> Partial Seq Scan on t1 (cost=0.00..431.37 rows=16667 width=4) Optimizer: PolarDB PX Optimizer (5 rows)
PX Hash 6:12
in the execution plan indicates that six processes are started to query data from thet1
table in parallel, which remains unchanged, and12
processes are started to write data to thet2
table in parallel.