All Products
Search
Document Center

PolarDB:Parallel writes and queries of OSS foreign tables

Last Updated:Apr 02, 2024

PolarDB for PostgreSQL supports the Elastic Parallel Query (ePQ) feature for parallel writes and queries of Object Storage Service (OSS) foreign tables.

Prerequisites

A PolarDB for PostgreSQL cluster that runs one of the following versions is created:

  • PostgreSQL 14 with a minor version of 14.9.13.0 or later

  • PostgreSQL 11 with a minor version of 1.1.35 or later

Note

You can execute one of the following statements to view the minor version of your PolarDB for PostgreSQL cluster:

  • PostgreSQL 14

    select version();
  • PostgreSQL 11

    show polar_version;

Background information

PolarDB for PostgreSQL allows you to create OSS foreign tables by using the oss_fdw extension. The foreign tables are physically stored in OSS and only the metadata of the foreign tables is stored in databases. This way, you can archive the historical data or cold data that is not frequently used in databases to OSS as foreign tables to reduce the storage costs of databases. The foreign data wrapper (FDW) standard used by PolarDB for PostgreSQL ensures that you can still read and write OSS foreign tables by executing standard SQL statements under specified conditions.

When you write data from a local table to an OSS foreign table for data archiving, PolarDB for PostgreSQL starts a process to write data to the OSS foreign table by default. In this case, a single process is used to upload data to OSS over the network. In scenarios in which a large amount of data needs to be archived, the single-process write mode is inefficient due to low network bandwidth utilization.

When you query historical archived data in an OSS foreign table, PolarDB for PostgreSQL starts a process to query full data in the OSS foreign table by default. In this case, a single process is used to download data from OSS over the network. In scenarios in which a large amount of archived data needs to be queried, the single-process query mode is inefficient due to low network bandwidth utilization.

To resolve the preceding issues, PolarDB for PostgreSQL supports ePQ for parallel writes and queries of OSS foreign tables.

  • The ePQ optimizer can generate execution plans for parallel data writes to an OSS foreign table in multiple processes. The ePQ optimizer starts multiple processes on the read/write nodes to write data to the OSS foreign table in parallel.

  • The ePQ optimizer can generate execution plans for parallel data queries of an OSS foreign table in multiple processes. The ePQ optimizer starts multiple processes on the read/write nodes to query data in the OSS foreign table in parallel.

This way, the single-process upload and download modes are optimized to the multi-process upload and download modes. This makes full use of the network bandwidth resources of OSS and improves the performance for cold data archiving and queries.

Procedure

Prepare data

  1. Create a local table named t_local and insert a specific amount of data into the table.

    CREATE TABLE t_local (id INT, age INT, msg TEXT, detail TEXT);
    
    INSERT INTO t_local
    SELECT
        random() * 1000000,
        random() * 10000,
        md5(random()::TEXT),
        md5(random()::TEXT)
    FROM generate_series(1, 2000000);
  2. Create an OSS foreign table named t_oss that has the same schema as the local table t_local.

    CREATE EXTENSION oss_fdw;
    
    CREATE SERVER ossserver FOREIGN DATA WRAPPER oss_fdw
    OPTIONS (host 'oss-cn-xxx.aliyuncs.com', bucket 'mybucket', id 'xxx', key 'xxx');
    
    CREATE FOREIGN TABLE t_oss (id INT, age INT, msg TEXT, detail TEXT)
    SERVER ossserver OPTIONS (dir 'archive/');

Write data to the OSS foreign table in parallel

  1. Run the following command to disable ePQ:

    SET polar_enable_px TO OFF;
  2. Run the following command to import data from the local table t_local to the OSS foreign table t_oss. The following sample code provides an example of the execution plan and shows the time consumed.

    EXPLAIN (COSTS OFF) INSERT INTO t_oss SELECT * FROM t_local;
            QUERY PLAN
    ---------------------------
     Insert on t_oss
       ->  Seq Scan on t_local
    (2 rows)
    
    INSERT INTO t_oss SELECT * FROM t_local;
    INSERT 0 2000000
    Time: 8861.708 ms (00:08.862)

    According to the preceding execution plan, the executor starts a process to scan the local table and write data to the OSS foreign table at the same time. The total time consumed is 8861.708 ms.

  3. Run the following commands to enable ePQ and the parallel INSERT feature, and set both the degree of parallelism of parallel queries in local tables and that of parallel writes to OSS foreign tables to 16:

    SET polar_enable_px TO ON;
    SET polar_px_enable_insert_select TO ON;
    SET polar_px_dop_per_node TO 16;
    SET polar_px_insert_dop_num TO 16;
  4. Run the following command again. The following sample code provides an example of the execution plan and shows the time consumed.

    EXPLAIN (COSTS OFF) INSERT INTO t_oss SELECT * FROM t_local;
                        QUERY PLAN
    ---------------------------------------------------
     Insert on t_oss
       ->  Result
             ->  PX Hash 32:16  (slice1; segments: 32)
                   ->  Partial Seq Scan on t_local
     Optimizer: PolarDB PX Optimizer
    (5 rows)
    
    INSERT INTO t_oss SELECT * FROM t_local;
    INSERT 0 2000000
    Time: 1321.212 ms (00:01.321)

    In the preceding execution plan, segments: 32 indicates that the executor starts 32 processes to execute slice1. Partial Seq Scan indicates that the local table is scanned in parallel. PX Hash 32:16 indicates that the Motion operator is used to distribute data to the 16 processes for parallel writes to the OSS foreign table. The total time consumed is 1321.212 ms, which is significantly reduced compared with data writes in single-process mode.

Query data in the OSS foreign table in parallel

  1. Run the following command to disable ePQ:

    SET polar_enable_px TO OFF;
  2. Run the following command to query full data in the OSS foreign table t_oss. The following sample code provides an example of the execution plan and shows the query result.

    EXPLAIN SELECT COUNT(*) FROM t_oss;
                                      QUERY PLAN
    -------------------------------------------------------------------------------
     Aggregate  (cost=1366687.96..1366687.97 rows=1 width=8)
       ->  Foreign Scan on t_oss  (cost=0.00..1334280.40 rows=12963024 width=0)
             Directory on OSS: archive/
             Number Of OSS file: 17
             Total size of OSS file: 297 MB
    (5 rows)
    
    SELECT COUNT(*) FROM t_oss;
      count
    ---------
     4000000
    (1 row)
    
    Time: 36230.325 ms (00:36.230)

    According to the preceding execution plan, the executor starts a process to scan 17 OSS objects in the OSS foreign table. The total size of the OSS objects is 297 MB. The total time consumed is 36230.325 ms.

  3. Run the following commands to enable ePQ and set the degree of parallelism of parallel queries to 8:

    SET polar_enable_px TO ON;
    SET polar_px_dop_per_node TO 8;
  4. Run the following command again. The following sample code provides an example of the execution plan and shows the query result.

    EXPLAIN SELECT COUNT(*) FROM t_oss;
                                          QUERY PLAN
    ---------------------------------------------------------------------------------------
     Aggregate  (cost=0.00..431.00 rows=1 width=8)
       ->  PX Coordinator 16:1  (slice1; segments: 16)  (cost=0.00..431.00 rows=1 width=1)
             ->  Partial Foreign Scan on t_oss  (cost=0.00..431.00 rows=1 width=1)
                   Directory on OSS: archive/
                   Number Of OSS file: 17
                   Total size of OSS file: 297 MB
     Optimizer: PolarDB PX Optimizer
    (7 rows)
    
    SELECT COUNT(*) FROM t_oss;
      count
    ---------
     4000000
    (1 row)
    
    Time: 18100.894 ms (00:18.101)

    In the preceding execution plan, segments: 16 indicates that the executor starts 16 processes to execute slice1. Partial Foreign Scan indicates that the OSS foreign table is scanned in parallel. The OSS objects are distributed to the 16 processes for parallel queries. PX Coordinator 16:1 indicates that the query results of all processes are aggregated to the process that initiates the data queries and returned by using the Motion operator. The total time consumed is 18100.894 ms, which is significantly reduced compared with data queries in single-process mode.