All Products
Search
Document Center

PolarDB:Best Practices

Last Updated:Apr 16, 2024

PolarDB for PostgreSQL supports the Multi-node Elastic Parallel Query (ePQ) feature. This feature can efficiently handle lightweight analytical queries and meet the increasing demand for hybrid transaction/analytical processing (HTAP).

Overview

When ePQ is used for queries, PolarDB for PostgreSQL uses the ePQ optimizer to generate an execution plan that can be executed by multiple compute nodes in parallel. The execution engine of ePQ coordinates the execution of the plan on multiple compute nodes, and uses the CPU, memory, and I/O bandwidth of multiple nodes to scan and compute data.

To achieve serverless elastic scaling, you can use Grand Unified Configuration (GUC) parameters to dynamically adjust the number of compute nodes that participate in ePQ (scale-out) and the degree of parallelism for each node (scale-up).

ePQ is suitable for complex online analytical processing (OLAP) queries that require a long execution duration but not for simple online transaction processing (OLTP) queries that require a short execution duration. In short queries, operations that are performed between compute nodes, such as connection establishment, data exchange, and connection shutdown can degrade query performance. PolarDB for PostgreSQL determines whether to use ePQ or single-node execution for queries based on the table size or the cost of an execution plan. This ensures that the query execution method with higher query performance is selected.

For more information about how ePQ works and performs, see PolarDB for PostgreSQL: ePQ architecture.

Enable ePQ

The following statements enable the ePQ feature for a table that is named t1. If PolarDB PX Optimizer is returned in the plan, ePQ is enabled.

SET polar_enable_px = 1;
EXPLAIN SELECT * FROM t1;
                                  QUERY PLAN
-------------------------------------------------------------------------------
 PX Coordinator 6:1  (slice1; segments: 6)  (cost=0.00..431.00 rows=1 width=8)
   ->  Partial Seq Scan on t1  (cost=0.00..431.00 rows=1 width=8)
 Optimizer: PolarDB PX Optimizer
(3 rows)

SELECT * FROM t1;

GUC parameters

Parameter

Description

polar_enable_px

Specifies whether to enable the ePQ feature. Valid values:

  • on

  • off (default)

Note

If this parameter is set to on, queries are preferentially sent to the ePQ optimizer to generate execution plans that can be executed in parallel on multiple compute nodes. All ePQ-related GUC parameters take effect.

polar_px_nodes

The names of the compute nodes that participate in ePQ. This parameter is empty by default, which indicates that all read-only nodes participate in ePQ.

If you want only some read-only nodes to participate in ePQ, you can execute the following statements to obtain the names of the read-only nodes:

=> CREATE EXTENSION polar_monitor;
CREATE EXTENSION

=> SELECT name,slot_name,type FROM polar_cluster_info WHERE type = 'Replica';
 name  | slot_name |  type
-------+-----------+---------
 node2 | replica1  | Replica
 node3 | replica2  | Replica
(2 rows)

Set the polar_px_nodes parameter to the obtained node names and separate the names with commas (,).

=> SET polar_px_nodes = 'node2,node3';
=> SHOW polar_px_nodes;
 polar_px_nodes
----------------
 node2,node3
(1 row)

polar_px_dop_per_node

The number of worker processes that participate in ePQ on each compute node in the current session. Default value: 3.

Note

As a general best practice, we recommend that you set this parameter to half the number of CPU cores of the current node. If the CPU load of the current node is high, you can decrement the value of this parameter until the CPU utilization drops to or below 80%. If the query performance is low, you can increment the value of this parameter. However, the CPU utilization must not exceed 80%. Otherwise, other background processes in the system may be slowed down.

polar_px_max_workers_number

The maximum number of ePQ worker processes that can exist simultaneously on each compute node. Default value: 30.

Note

If the number of ePQ worker processes on a compute node exceeds this limit, a query error occurs.

ERROR:  over px max workers, already xxx workers, max 30 workers

In this case, you can increase the value of this parameter to prevent similar errors. If you set this parameter to an excessively large value, an excessive number of processes may exist on the node, increasing the risk of out of memory (OOM) errors.

polar_px_wait_lock_timeout

The maximum amount of time that an ePQ process can block other processes using the same resources. Default value: 1800000. Unit: milliseconds (Half an hour).

In most cases, ePQ processes are read-only queries that obtain shared locks on the tables being queried. However, some DDL operations require exclusive locks on the tables, causing the operations to be blocked by ePQ processes due to lock conflicts. After the DDL operations are blocked for the time specified by this parameter, ePQ queries are cancelled to free up resources for the processes performing the DDL operations.

In most cases, ePQ is used to execute time-consuming analytical queries. Therefore, you must set the statement_timeout parameter to a value that allows for a reasonable response time. For example, the value 10,800,000 milliseconds, which is equivalent to 3 hours, may be appropriate for large and complex queries. If you use the default value 0 (unlimited time), the database connection used to execute the query may not be released for a long time due to the lengthy execution time of the query.

synchronous_commit

Specifies whether to ensure data consistency in elastic parallel queries. Valid values:

  • on: ensures data consistency in elastic parallel queries. The database waits for write-ahead logging (WAL) records to be written to disks before returning a successful transaction commit to the client.

  • off (default): does not ensure data consistency in elastic parallel queries.

polar_px_min_pg_plan_cost

The minimum cost of the execution plan for which ePQ is enabled. Valid values: 0 to 999999999999. Default value: 50000. The query whose single-node execution plan cost is lower than this threshold does not use ePQ.

polar_px_min_table_scan_size

The minimum size of the table for which ePQ is enabled. Valid values: 0 to 2147483647. Default value: 100 MB. If the size of all tables to be queried is less than this threshold, ePQ is not used.

polar_px_force_use

Specifies whether to forcibly use ePQ for queries. Valid values:

on: forcibly uses ePQ for queries.

off (default): does not forcibly use ePQ for queries.

Best practices

Allow specific tables to use ePQ

If you only want to perform ePQ on a specific table, you can set polar_px_enable_check_workers to on. You also need to explicitly set the px_workers option for the table on which you need to perform ePQ.

ALTER TABLE t1 SET (px_workers = 1);
Note

Valid values for px_workers:

  • -1: disables ePQ on the table.

  • 0 (default): ignores ePQ on the table.

  • 1: enables ePQ on the table.

Enable the ePQ feature

Global level

In the PolarDB console, set polar_enable_px to on to enable ePQ globally. By default, ePQ is used for both transactional and analytical queries.

Use the following sample SQL statement to view the execution plan. If PolarDB PX Optimizer is returned in the plan, ePQ is enabled.

=> EXPLAIN SELECT * FROM t1;
                                  QUERY PLAN
-------------------------------------------------------------------------------
 PX Coordinator 6:1  (slice1; segments: 6)  (cost=0.00..431.00 rows=1 width=8)
   ->  Partial Seq Scan on t1  (cost=0.00..431.00 rows=1 width=8)
 Optimizer: PolarDB PX Optimizer
(3 rows)

Session level

Set polar_enable_px to ON within a session to enable ePQ for all queries executed within the session.

SET polar_enable_px = ON;

Database level and user level

After ePQ is enabled at the global or session level, the system automatically uses ePQ for all queries that qualify for parallel execution before using other execution methods. From a best practice perspective, ePQ is more suitable for long queries with heavy analytical workloads than for short transactional queries. For short queries, the overhead associated with establishing connections, exchanging data, and destroying connections between compute nodes during the use of ePQ may result in decreased query performance.

If ePQ is required for specific business needs, analytical SQL queries that require ePQ can be extracted from the main database and executed on a separate database configured for ePQ queries.

ALTER DATABASE ap_database SET polar_enable_px = ON;

Alternatively, you can use a specific account to execute analytical SQL queries that require ePQ.

ALTER ROLE ap_role SET polar_enable_px = ON;

Query level

If you want to use ePQ only for specific queries within a session, such as for nightly report tasks, you can use the pg_hint_plan plug-in and SQL hints to enable ePQ for those queries. To make SQL hints work, make sure that the pg_hint_plan plug-in has been added to the GUC parameter shared_preload_libraries.

Prepend /*+ PX() */ to a query to enable ePQ for the query.

=> /*+ PX() */ EXPLAIN SELECT * FROM t1;
                                    QUERY PLAN
----------------------------------------------------------------------------------
 PX Coordinator 6:1  (slice1; segments: 6)  (cost=0.00..431.03 rows=1000 width=8)
   ->  Partial Seq Scan on t1  (cost=0.00..431.00 rows=167 width=8)
 Optimizer: PolarDB PX Optimizer
(3 rows)

Prepend /*+ NoPX() */ to a query to disable ePQ for the query.

=> /*+ NoPX() */ EXPLAIN SELECT * FROM t1;
                      QUERY PLAN
------------------------------------------------------
 Seq Scan on t1  (cost=0.00..15.00 rows=1000 width=8)
(1 row)

Prepend /*+ PX(N) */ to a query to enable ePQ with a single-node parallelism of N. In the following example, the value of N is 6.

=> /*+ PX(6) */ EXPLAIN SELECT * FROM t1;
                                     QUERY PLAN
------------------------------------------------------------------------------------
 PX Coordinator 12:1  (slice1; segments: 12)  (cost=0.00..431.02 rows=1000 width=8)
   ->  Partial Seq Scan on t1  (cost=0.00..431.00 rows=84 width=8)
 Optimizer: PolarDB PX Optimizer
(3 rows)