Partition pruning

Updated at: 2024-05-17 03:12

PolarDB for PostgreSQL supports the partition pruning feature, which greatly accelerates queries on partitioned tables.

Overview

PolarDB for PostgreSQL provides the partition pruning feature. If partition pruning is enabled, the planner checks the definition of each partition to determine whether the scanning of a partition can be skipped because the partition does not contain rows that meet the condition specified by the WHERE clause. If a partition does not contain rows that meet the condition, the partition is excluded from an execution plan. Partition pruning dramatically reduces the amount of data retrieved from disk and shortens processing time, thus improving query performance and resource utilization.

PolarDB for PostgreSQL supports static and dynamic pruning. Static or dynamic pruning is performed based on the SQL statements that you execute.

  • Static pruning occurs during compilation. In this case, the partitions to be accessed are known before the execution. For example, if an SQL statement contains a WHERE clause that specifies a condition based on a constant value of the partition key, static pruning can be performed.

  • Dynamic pruning happens at run-time, meaning that the exact partitions to be accessed by a statement are not known beforehand. For example, if an SQL statement contains a WHERE clause that uses an operator or function, dynamic pruning is performed.

Partition pruning affects the statistics of the objects where pruning occurs and also affects the execution plan of a statement.

The system will limit the data scan only to the partitions that could contain the desired records. Both static pruning and dynamic pruning improve the query performance by excluding partitions from an execution plan.

Partition pruning and constraint exclusion

Partition pruning and constraint exclusion are different in the following aspects:

  • When the system performs a partition pruning, it takes the relationship between partitions into consideration. This is not the case for a constraint exclusion.

    For example, when looking for a specific record in a list-partitioned table, in accordance with partition pruning, the system knows the partition where the record is stored. However, during a constraint exclusion, the system must check the constraint defined for each partition.

  • Partition pruning happens early in the optimization stage to reduce the number of partitions that the planner needs to look at. However, constraint-based exclusion takes place in the later stage of optimization.

Partition pruning in difference stages

PolarDB for PostgreSQL classifies conditional expressions into three types: immutable, stable, and volatile. The three types of conditional expressions correspond to three types of pruning.

  • If a conditional expression is immutable, such as an expression based on a constant value, partition pruning occurs during the optimization.

  • If a conditional expression is stable, such as an expression that contains now(), partition pruning occurs when the executor is initialized.

  • If a conditional expression is volatile, such as an expression that contains random(), partition pruning occurs when the executor is run.

Partition pruning during the optimization

The following example shows how partitions are pruned during the optimization.

Create a table named measurement in a PolarDB for PostgreSQL cluster, and use logdate as the partition key to create the following four partitions: measurement_y2023q1, measurement_y2023q2, measurement_y2023q3, and measurement_y2023q4. The four partitions correspond to the four quarters of 2023.

CREATE TABLE measurement(
    city_id         int not null,
    logdate         date not null,
    peaktemp        int,
    unitsales       int
) PARTITION BY RANGE (logdate);
CREATE TABLE measurement_y2023q1 PARTITION OF measurement
    FOR VALUES FROM ('2023-01-01') TO ('2023-04-01');
CREATE TABLE measurement_y2023q2 PARTITION OF measurement
    FOR VALUES FROM ('2023-04-01') TO ('2023-07-01');
CREATE TABLE measurement_y2023q3 PARTITION OF measurement
    FOR VALUES FROM ('2023-07-01') TO ('2023-10-01');
CREATE TABLE measurement_y2023q4 PARTITION OF measurement
    FOR VALUES FROM ('2023-10-01') TO ('2024-04-01');

Query the measurement table based on the following condition: logdate >= DATE '2023-10-01'. As shown in the results of the EXPLAIN command, the partitions for the first, second, and third quarters are pruned by default. The system does not query data in the three partitions because their data does not meet the condition logdate >= DATE '2023-10-01'. This is the partition pruning during the optimization. The filter condition in the SQL statement is based on the partition key logdate, and DATE '2023-10-01' used in the conditional expression is static and immutable. This way, the conditional expression can be calculated during the optimization.

EXPLAIN SELECT * FROM measurement WHERE logdate >= DATE '2023-10-01';
                                 QUERY PLAN                                  
-----------------------------------------------------------------------------
 Append  (cost=0.00..34.09 rows=567 width=20)
   ->  Seq Scan on measurement_y2023q4  (cost=0.00..31.25 rows=567 width=20)
         Filter: (logdate >= '01-OCT-23 00:00:00'::date)
(3 rows)

Partition pruning when the executor is initialized

In this example, the measurement table is used to show how partitions are pruned when the executor is initialized.

As shown in the following statement, the value in the conditional expression changes from a constant value to now(). As a result, the conditional expression becomes a stable expression that cannot be calculated during the optimization, but can be calculated when the executor is initialized. If the month is July of 2023, the two partitions for the first and second quarters are removed (Subplans Removed: 2). Only the partitions for the third and fourth quarters need to be scanned. This is the partition pruning that occurs when the executor is initialized. The filter condition is based on the partition key logdate, and now() used in the conditional expression is stable. The conditional expression can be calculated when the executor is initialized.

EXPLAIN SELECT * FROM measurement WHERE logdate >= now();
                                 QUERY PLAN                                  
-----------------------------------------------------------------------------
 Append  (cost=0.00..153.34 rows=2268 width=20)
   Subplans Removed: 2
   ->  Seq Scan on measurement_y2023q3  (cost=0.00..35.50 rows=567 width=20)
         Filter: (logdate >= now())
   ->  Seq Scan on measurement_y2023q4  (cost=0.00..35.50 rows=567 width=20)
         Filter: (logdate >= now())
(6 rows)

Partition pruning when the executor is run

In this example, the measurement table is used to show how partitions are pruned when an executor is run.

As shown in the following statement, the value in the conditional expression changes from a constant value to (select to_date('2023-10-1')), which causes a volatile subjoin. As a result, the conditional expression cannot be calculated during the optimization or when the executor is initialized, but can be calculated only when the executor is run.

As shown in the results of the EXPLAIN ANALYZE command, the partitions for the first, second, and third quarters are marked with (never executed). This is the partition pruning that occurs when the executor is run. This type of partition pruning is applicable to volatile expressions, subqueries, subjoins, and expressions of join conditions. The filter condition is based on the partition key logdate, and the value in the conditional expression is (select to_date('2023-10-1')), which causes a volatile subjoin. The conditional expression can be calculated only when the executor is run.

EXPLAIN ANALYZE SELECT * FROM measurement WHERE logdate >= (select  to_date('2023-10-1'));
                                                      QUERY PLAN                                                       
-----------------------------------------------------------------------------------------------------------------------
 Append  (cost=0.01..136.35 rows=2268 width=20) (actual time=0.067..0.068 rows=0 loops=1)
   InitPlan 1 (returns $0)
     ->  Result  (cost=0.00..0.01 rows=1 width=8) (actual time=0.051..0.053 rows=1 loops=1)
   ->  Seq Scan on measurement_y2023q1  (cost=0.00..31.25 rows=567 width=20) (never executed)
         Filter: (logdate >= $0)
   ->  Seq Scan on measurement_y2023q2  (cost=0.00..31.25 rows=567 width=20) (never executed)
         Filter: (logdate >= $0)
   ->  Seq Scan on measurement_y2023q3  (cost=0.00..31.25 rows=567 width=20) (never executed)
         Filter: (logdate >= $0)
   ->  Seq Scan on measurement_y2023q4  (cost=0.00..31.25 rows=567 width=20) (actual time=0.004..0.004 rows=0 loops=1)
         Filter: (logdate >= $0)
  • On this page (1, T)
  • Overview
  • Partition pruning in difference stages
  • Partition pruning during the optimization
  • Partition pruning when the executor is initialized
  • Partition pruning when the executor is run
Feedback
phone Contact Us

Chat now with Alibaba Cloud Customer Service to assist you in finding the right products and services to meet your needs.

alicare alicarealicarealicare