Perform parallel INSERT operations

Updated at: 2023-12-07 06:00

This topic describes the parallel INSERT feature of PolarDB for PostgreSQL.

Prerequisites

A PolarDB for PostgreSQL cluster that runs one of the following versions is created:

  • PostgreSQL 11 with a minor engine version of 1.1.17 or later

  • PostgreSQL 14 with a minor engine version of 14.8.12.0 or later

Note

You can execute one of the following statements to query the minor engine version of your PolarDB for PostgreSQL cluster:

  • PostgreSQL 11

    show polar_version;
  • PostgreSQL 14

    select version();

Overview

PolarDB for PostgreSQL provides the elastic parallel query (ePQ) feature that allows an SQL query to be run on multiple compute nodes in parallel. This improves the query performance of PolarDB for PostgreSQL. You can also use the ePQ feature to write data in parallel by starting multiple processes on the primary node of a cluster. This increases the execution speed of the INSERT statement.

The parallel INSERT feature is applicable to the SQL statements that both read and write data, such as the INSERT INTO ... SELECT ... statement. For the SELECT section of the SQL statement, ePQ starts multiple processes to perform the query in parallel. For the INSERT section of the SQL statement, ePQ starts multiple processes on the primary node to write data in parallel. The Motion operator is used to transfer data between the read and write processes.

The following types of tables support the parallel INSERT feature:

  • Standard tables

  • Partitioned tables

  • Part of external tables

The parallel INSERT feature allows you to dynamically adjust the degree of parallelism (DOP) of parallel writes by adjusting the number of write processes. This way, the query performance can be increased by up to three times if the query speed is fast enough.

How it works

Parallel queries and parallel writes are performed simultaneously in the form of a pipeline. The following figure shows the execution process of parallel queries and parallel writes.Execution processWhen multiple INSERT operations are performed in parallel by using the ePQ feature, the following steps are performed:

  1. The ePQ optimizer generates a hierarchy chart based on the syntax tree that is obtained from the query.

  2. The ePQ executor distributes the hierarchy chart to each compute node and creates processes for parallel queries and parallel writes. Then, the processes are started.

  3. The processes of parallel queries read data shards in parallel and send the data to the Motion operator.

  4. The processes of parallel writes obtain the data from the Motion operator and write the data in parallel.

Parameters

  • polar_px_dop_per_node: the DOP of parallel queries.

  • polar_px_insert_dop_num: the DOP of parallel writes.

If you gradually increase the DOP of parallel writes when the DOP of parallel queries is low, the execution of SQL statements gradually speeds up and then remains at a specific level because the query speed cannot keep up with the write speed.

If you gradually increase the DOP of parallel writes when the DOP of parallel queries is high, the execution of SQL statements gradually speeds up and then remains at a specific level. This is because parallel writes can be performed only on the primary node of a cluster. In this case, the write speed cannot keep up with the query speed due to heavy lock contention.

Examples

  1. Create a table named t1 and a table named t2, and insert data into the t1 table.

    CREATE TABLE t1 (id INT);
    CREATE TABLE t2 (id INT);
    INSERT INTO t1 SELECT generate_series(1,100000);
  2. Enable the ePQ feature and the parallel INSERT feature.

    SET polar_enable_px TO ON;
    SET polar_px_enable_insert_select TO ON;
  3. Insert all data of the t1 table into the t2 table by executing the INSERT statement. View the execution plan of parallel INSERT operations.

    EXPLAIN INSERT INTO t2 SELECT * FROM t1;
                                           QUERY PLAN
    -----------------------------------------------------------------------------------------
     Insert on t2  (cost=0.00..952.87 rows=33334 width=4)
       ->  Result  (cost=0.00..0.00 rows=0 width=0)
             ->  PX Hash 6:3  (slice1; segments: 6)  (cost=0.00..432.04 rows=100000 width=8)
                   ->  Partial Seq Scan on t1  (cost=0.00..431.37 rows=16667 width=4)
     Optimizer: PolarDB PX Optimizer
    (5 rows)

    PX Hash 6:3 indicates that six processes are started to query data from the t1 table in parallel and three processes are started to write data to the t2 table in parallel.

  4. Specify the polar_px_insert_dop_num parameter to dynamically adjust the DOP of parallel writes. Then, view the execution plan of parallel INSERT operations.

    SET polar_px_insert_dop_num TO 12;
    EXPLAIN INSERT INTO t2 SELECT * FROM t1;
                                            QUERY PLAN
    ------------------------------------------------------------------------------------------
     Insert on t2  (cost=0.00..952.87 rows=8334 width=4)
       ->  Result  (cost=0.00..0.00 rows=0 width=0)
             ->  PX Hash 6:12  (slice1; segments: 6)  (cost=0.00..432.04 rows=100000 width=8)
                   ->  Partial Seq Scan on t1  (cost=0.00..431.37 rows=16667 width=4)
     Optimizer: PolarDB PX Optimizer
    (5 rows)

    PX Hash 6:12 in the execution plan indicates that six processes are started to query data from the t1 table in parallel, which remains unchanged, and 12 processes are started to write data to the t2 table in parallel.

  • On this page (1, O)
  • Prerequisites
  • Overview
  • How it works
  • Parameters
  • Examples
Feedback