The elastic parallel query (ePQ) feature in PolarDB for PostgreSQL supports parallel writes to and queries on Object Storage Service (OSS) foreign tables.
Applicability
This feature is supported in the following versions of PolarDB for PostgreSQL:
PostgreSQL 14 with minor engine version 2.0.14.9.13.0 or later
PostgreSQL 11 with minor engine version 2.0.11.9.35.0 or later
You can view the minor engine version in the console or run the SHOW polardb_version; statement. If the version does not meet the requirements, you can upgrade the minor engine version.
Background information
PolarDB for PostgreSQL lets you create Object Storage Service (OSS) foreign tables using the oss_fdw extension. These foreign tables are physically stored in Alibaba Cloud OSS, while only their metadata is saved in the database. You can archive infrequently used historical data or cold data to OSS as foreign tables to reduce database storage costs. The foreign data wrapper (FDW) standard in PolarDB for PostgreSQL ensures that you can still perform limited reads and writes on OSS foreign tables using standard SQL.
When you archive data by writing it from a local table to an OSS foreign table, PolarDB for PostgreSQL starts a single process by default. This is essentially a single-process upload access mode. When you archive a large data volume, this single-process write mode is inefficient because it cannot fully utilize the high network bandwidth of OSS.
Similarly, when you query historical archived data in an OSS foreign table, PolarDB for PostgreSQL starts a single process by default to query the full data. This is a single-process download access mode. When you query a large volume of archived data, this single-process query mode is also inefficient because it cannot fully utilize the high network bandwidth of OSS.
The elastic parallel query (ePQ) feature in PolarDB for PostgreSQL supports parallel writes to and queries on OSS foreign tables:
The ePQ optimizer generates execution plans for parallel writes to OSS foreign tables. The ePQ executor then starts multiple processes on the read/write node to write data in parallel.
The ePQ optimizer generates execution plans for parallel queries on OSS foreign tables. The ePQ executor then starts multiple processes on multiple compute nodes to query data in parallel.
The parallel read and write access to OSS foreign tables provided by ePQ optimizes the database network access mode from a single-process upload/download to a multi-process upload/download. This change fully utilizes OSS network bandwidth resources and improves the performance of cold data archiving and queries.
Usage guide
Prepare data
Create a local table named
t_localand insert a large amount of data.CREATE TABLE t_local (id INT, age INT, msg TEXT, detail TEXT); INSERT INTO t_local SELECT random() * 1000000, random() * 10000, md5(random()::TEXT), md5(random()::TEXT) FROM generate_series(1, 2000000);Create an OSS foreign table named
t_ossthat has the same structure as the local tablet_local.CREATE EXTENSION oss_fdw; CREATE SERVER ossserver FOREIGN DATA WRAPPER oss_fdw OPTIONS (host 'oss-cn-xxx.aliyuncs.com', bucket 'mybucket', id 'xxx', key 'xxx'); CREATE FOREIGN TABLE t_oss (id INT, age INT, msg TEXT, detail TEXT) SERVER ossserver OPTIONS (dir 'archive/');
Write to the OSS foreign table in parallel
Run the following command to disable ePQ.
SET polar_enable_px TO OFF;Run the following command to import data from the local table
t_localto the OSS foreign tablet_oss. The execution plan and the time taken are as follows.EXPLAIN (COSTS OFF) INSERT INTO t_oss SELECT * FROM t_local; QUERY PLAN --------------------------- Insert on t_oss -> Seq Scan on t_local (2 rows) INSERT INTO t_oss SELECT * FROM t_local; INSERT 0 2000000 Time: 8861.708 ms (00:08.862)The execution plan shows that the executor starts one process to scan the local table and write to the OSS foreign table. The total execution time is 8861.708 ms.
Run the following commands to enable ePQ and the parallel
INSERTfeature. Set the degree of parallelism for scanning the local table to 16 and for writing to the OSS foreign table to 16.SET polar_enable_px TO ON; SET polar_px_enable_insert_select TO ON; SET polar_px_dop_per_node TO 16; SET polar_px_insert_dop_num TO 16;Run the command again. The execution plan and the time taken are as follows.
EXPLAIN (COSTS OFF) INSERT INTO t_oss SELECT * FROM t_local; QUERY PLAN --------------------------------------------------- Insert on t_oss -> Result -> PX Hash 32:16 (slice1; segments: 32) -> Partial Seq Scan on t_local Optimizer: PolarDB PX Optimizer (5 rows) INSERT INTO t_oss SELECT * FROM t_local; INSERT 0 2000000 Time: 1321.212 ms (00:01.321)The execution plan shows that the ePQ execution framework starts 32 processes (
segments: 32) to execute plan sliceslice1and scan the local table in parallel (Partial Seq Scan). Then, the Motion operator redistributes the data (PX Hash 32:16) to the 16 processes that write to the OSS foreign table in parallel. The total execution time is 1321.212 ms, which is a significant improvement over the single-process write.
Query the OSS foreign table in parallel
Run the following command to disable ePQ.
SET polar_enable_px TO OFF;Run the following command to query the total number of rows in the OSS foreign table
t_oss. The following shows the execution plan and results.EXPLAIN SELECT COUNT(*) FROM t_oss; QUERY PLAN ------------------------------------------------------------------------------- Aggregate (cost=1366687.96..1366687.97 rows=1 width=8) -> Foreign Scan on t_oss (cost=0.00..1334280.40 rows=12963024 width=0) Directory on OSS: archive/ Number Of OSS file: 17 Total size of OSS file: 297 MB (5 rows) SELECT COUNT(*) FROM t_oss; count --------- 4000000 (1 row) Time: 36230.325 ms (00:36.230)The execution plan shows that the executor starts one process to scan 17 OSS files for the foreign table, processing a total of 297 MB in 36,230.325 ms.
Run the following commands to enable ePQ and set the degree of parallelism for queries to 8.
SET polar_enable_px TO ON; SET polar_px_dop_per_node TO 8;Run the command again. The execution plan and results are as follows.
EXPLAIN SELECT COUNT(*) FROM t_oss; QUERY PLAN --------------------------------------------------------------------------------------- Aggregate (cost=0.00..431.00 rows=1 width=8) -> PX Coordinator 16:1 (slice1; segments: 16) (cost=0.00..431.00 rows=1 width=1) -> Partial Foreign Scan on t_oss (cost=0.00..431.00 rows=1 width=1) Directory on OSS: archive/ Number Of OSS file: 17 Total size of OSS file: 297 MB Optimizer: PolarDB PX Optimizer (7 rows) SELECT COUNT(*) FROM t_oss; count --------- 4000000 (1 row) Time: 18100.894 ms (00:18.101)The execution plan shows that the ePQ execution framework starts a total of 16 processes (
segments: 16) on two compute nodes to execute plan sliceslice1and scan the OSS foreign table in parallel (Partial Foreign Scan). These 16 processes divide the query task among themselves at the file granularity on OSS. The query results from all processes are gathered by the Motion operator (PX Coordinator 16:1) and sent to the process that initiated the query before being returned. The total execution time is 18100.894 ms, which is a significant improvement over the single-process query.