PolarDB for PostgreSQL supports the Multi-node Elastic Parallel Query (ePQ) feature. This feature can efficiently handle lightweight analytical queries and meet the increasing demand for hybrid transaction/analytical processing (HTAP).
Overview
When ePQ is used for queries, PolarDB for PostgreSQL uses the ePQ optimizer to generate an execution plan that can be executed by multiple compute nodes in parallel. The execution engine of ePQ coordinates the execution of the plan on multiple compute nodes, and uses the CPU, memory, and I/O bandwidth of multiple nodes to scan and compute data.
To achieve serverless elastic scaling, you can use Grand Unified Configuration (GUC) parameters to dynamically adjust the number of compute nodes that participate in ePQ (scale-out) and the degree of parallelism for each node (scale-up).
ePQ is suitable for complex online analytical processing (OLAP) queries that require a long execution duration but not for simple online transaction processing (OLTP) queries that require a short execution duration. In short queries, operations that are performed between compute nodes, such as connection establishment, data exchange, and connection shutdown can degrade query performance. PolarDB for PostgreSQL determines whether to use ePQ or single-node execution for queries based on the table size or the cost of an execution plan. This ensures that the query execution method with higher query performance is selected.
For more information about how ePQ works and performs, see PolarDB for PostgreSQL: ePQ architecture.
Enable ePQ
The following statements enable the ePQ feature for a table that is named t1
. If PolarDB PX Optimizer
is returned in the plan, ePQ is enabled.
SET polar_enable_px = 1;
EXPLAIN SELECT * FROM t1;
QUERY PLAN
-------------------------------------------------------------------------------
PX Coordinator 6:1 (slice1; segments: 6) (cost=0.00..431.00 rows=1 width=8)
-> Partial Seq Scan on t1 (cost=0.00..431.00 rows=1 width=8)
Optimizer: PolarDB PX Optimizer
(3 rows)
SELECT * FROM t1;
GUC parameters
Parameter | Description |
polar_enable_px | Specifies whether to enable the ePQ feature. Valid values:
Note If this parameter is set to on, queries are preferentially sent to the ePQ optimizer to generate execution plans that can be executed in parallel on multiple compute nodes. All ePQ-related GUC parameters take effect. |
polar_px_nodes | The names of the compute nodes that participate in ePQ. This parameter is empty by default, which indicates that all read-only nodes participate in ePQ. If you want only some read-only nodes to participate in ePQ, you can execute the following statements to obtain the names of the read-only nodes:
Set the polar_px_nodes parameter to the obtained node names and separate the names with commas (,).
|
polar_px_dop_per_node | The number of worker processes that participate in ePQ on each compute node in the current session. Default value: 3. Note As a general best practice, we recommend that you set this parameter to half the number of CPU cores of the current node. If the CPU load of the current node is high, you can decrement the value of this parameter until the CPU utilization drops to or below 80%. If the query performance is low, you can increment the value of this parameter. However, the CPU utilization must not exceed 80%. Otherwise, other background processes in the system may be slowed down. |
polar_px_max_workers_number | The maximum number of ePQ worker processes that can exist simultaneously on each compute node. Default value: 30. Note If the number of ePQ worker processes on a compute node exceeds this limit, a query error occurs.
In this case, you can increase the value of this parameter to prevent similar errors. If you set this parameter to an excessively large value, an excessive number of processes may exist on the node, increasing the risk of out of memory (OOM) errors. |
polar_px_wait_lock_timeout | The maximum amount of time that an ePQ process can block other processes using the same resources. Default value: 1800000. Unit: milliseconds (Half an hour). In most cases, ePQ processes are read-only queries that obtain shared locks on the tables being queried. However, some DDL operations require exclusive locks on the tables, causing the operations to be blocked by ePQ processes due to lock conflicts. After the DDL operations are blocked for the time specified by this parameter, ePQ queries are cancelled to free up resources for the processes performing the DDL operations. In most cases, ePQ is used to execute time-consuming analytical queries. Therefore, you must set the |
synchronous_commit | Specifies whether to ensure data consistency in elastic parallel queries. Valid values:
|
polar_px_min_pg_plan_cost | The minimum cost of the execution plan for which ePQ is enabled. Valid values: 0 to 999999999999. Default value: 50000. The query whose single-node execution plan cost is lower than this threshold does not use ePQ. |
polar_px_min_table_scan_size | The minimum size of the table for which ePQ is enabled. Valid values: 0 to 2147483647. Default value: 100 MB. If the size of all tables to be queried is less than this threshold, ePQ is not used. |
polar_px_force_use | Specifies whether to forcibly use ePQ for queries. Valid values: on: forcibly uses ePQ for queries. off (default): does not forcibly use ePQ for queries. |
Best practices
Allow specific tables to use ePQ
If you only want to perform ePQ on a specific table, you can set polar_px_enable_check_workers
to on. You also need to explicitly set the px_workers
option for the table on which you need to perform ePQ.
ALTER TABLE t1 SET (px_workers = 1);
Valid values for px_workers
:
-1
: disables ePQ on the table.0
(default): ignores ePQ on the table.1
: enables ePQ on the table.
Enable the ePQ feature
Global level
In the PolarDB console, set polar_enable_px
to on
to enable ePQ globally. By default, ePQ is used for both transactional and analytical queries.
Use the following sample SQL statement to view the execution plan. If PolarDB PX Optimizer
is returned in the plan, ePQ is enabled.
=> EXPLAIN SELECT * FROM t1;
QUERY PLAN
-------------------------------------------------------------------------------
PX Coordinator 6:1 (slice1; segments: 6) (cost=0.00..431.00 rows=1 width=8)
-> Partial Seq Scan on t1 (cost=0.00..431.00 rows=1 width=8)
Optimizer: PolarDB PX Optimizer
(3 rows)
Session level
Set polar_enable_px
to ON
within a session to enable ePQ for all queries executed within the session.
SET polar_enable_px = ON;
Database level and user level
After ePQ is enabled at the global or session level, the system automatically uses ePQ for all queries that qualify for parallel execution before using other execution methods. From a best practice perspective, ePQ is more suitable for long queries with heavy analytical workloads than for short transactional queries. For short queries, the overhead associated with establishing connections, exchanging data, and destroying connections between compute nodes during the use of ePQ may result in decreased query performance.
If ePQ is required for specific business needs, analytical SQL queries that require ePQ can be extracted from the main database and executed on a separate database configured for ePQ queries.
ALTER DATABASE ap_database SET polar_enable_px = ON;
Alternatively, you can use a specific account to execute analytical SQL queries that require ePQ.
ALTER ROLE ap_role SET polar_enable_px = ON;
Query level
If you want to use ePQ only for specific queries within a session, such as for nightly report tasks, you can use the pg_hint_plan
plug-in and SQL hints to enable ePQ for those queries. To make SQL hints work, make sure that the pg_hint_plan
plug-in has been added to the GUC parameter shared_preload_libraries
.
Prepend /*+ PX() */
to a query to enable ePQ for the query.
=> /*+ PX() */ EXPLAIN SELECT * FROM t1;
QUERY PLAN
----------------------------------------------------------------------------------
PX Coordinator 6:1 (slice1; segments: 6) (cost=0.00..431.03 rows=1000 width=8)
-> Partial Seq Scan on t1 (cost=0.00..431.00 rows=167 width=8)
Optimizer: PolarDB PX Optimizer
(3 rows)
Prepend /*+ NoPX() */
to a query to disable ePQ for the query.
=> /*+ NoPX() */ EXPLAIN SELECT * FROM t1;
QUERY PLAN
------------------------------------------------------
Seq Scan on t1 (cost=0.00..15.00 rows=1000 width=8)
(1 row)
Prepend /*+ PX(N) */
to a query to enable ePQ with a single-node parallelism of N
. In the following example, the value of N
is 6
.
=> /*+ PX(6) */ EXPLAIN SELECT * FROM t1;
QUERY PLAN
------------------------------------------------------------------------------------
PX Coordinator 12:1 (slice1; segments: 12) (cost=0.00..431.02 rows=1000 width=8)
-> Partial Seq Scan on t1 (cost=0.00..431.00 rows=84 width=8)
Optimizer: PolarDB PX Optimizer
(3 rows)