This topic describes the correlated subquery pull-up feature and its use cases.
Prerequisites
The feature is supported in PolarDB for PostgreSQL clusters that run PostgreSQL 14 (revision version: 14.8.11.0 or later).
You can execute the following statement to view the revision version of your PolarDB for PostgreSQL cluster:
select version();
Background information
The PostgreSQL optimizer uses a SUBLINK
to represent a combination of a subquery and the related operator in an expression. The following types of SUBLINKs
are supported:
EXISTS_SUBLINK
: implements anEXISTS (SELECT ...)
subquery.ALL_SUBLINK
: implements anALL (SELECT ...)
subquery.ANY_SUBLINK
: implements anANY (SELECT ...)
orIN (SELECT ...)
subquery.
The optimizer usually attempts to pull up correlated subqueries that are used with the ANY
, IN
, EXISTS
, or NOT EXISTS
operator. This way, an optimized execution plan that uses a semi-join or anti-join can be generated for a correlated subquery and its outer query, which improves query performance. If an ANY_SUBLINK
subquery references the variables in the outer query, the subquery is not pulled up. As a result, the subquery is not optimized together with the outer query. The subquery can only be independently optimized, which results in a significant increase in SQL execution time.
PolarDB for PostgreSQL allows you to use a parameter to control the pull-up of correlated ANY_SUBLINK
subqueries. If you set this parameter to ON, the optimizer pulls up a correlated subquery that is used with the IN
or ANY
operator even if the subquery references the variables in the outer query. This increases the search scope of the optimizer and allows the optimizer to generate a better execution plan.
Usage
You can configure the polar_enable_pullup_with_lateral
parameter to specify whether to pull up correlated ANY_SUBLINK
subqueries. Valid values:
ON (default)
OFF
Examples
Prepare data
CREATE TABLE t1 (a INT, b INT);
INSERT INTO t1 SELECT i, 1 FROM generate_series(1, 100000) i;
CREATE TABLE t2 AS SELECT * FROM t1;
View the execution plan and time after the correlated subquery pull-up feature is disabled
=> SET polar_enable_pullup_with_lateral TO OFF;
=> EXPLAIN (COSTS OFF, ANALYZE)
SELECT * FROM t1
WHERE t1.a IN (SELECT a FROM t2 WHERE t2.b = t1.b AND t2.b = 1);
QUERY PLAN
---------------------------------------------------------------------------------
Seq Scan on t1 (actual time=67.631..1641827.119 rows=100000 loops=1)
Filter: (SubPlan 1)
SubPlan 1
-> Result (actual time=0.005..13.124 rows=50000 loops=100000)
One-Time Filter: (t1.b = 1)
-> Seq Scan on t2 (actual time=0.005..7.718 rows=50000 loops=100000)
Filter: (b = 1)
Planning Time: 0.145 ms
Execution Time: 1641847.702 ms
(9 rows)
View the execution plan and time after the correlated subquery pull-up feature is enabled
=> SET polar_enable_pullup_with_lateral TO ON;
=> EXPLAIN (COSTS OFF, ANALYZE)
SELECT * FROM t1
WHERE t1.a IN (SELECT a FROM t2 WHERE t2.b = t1.b AND t2.b = 1);
QUERY PLAN
----------------------------------------------------------------------------
Hash Semi Join (actual time=64.783..173.482 rows=100000 loops=1)
Hash Cond: (t1.a = t2.a)
-> Seq Scan on t1 (actual time=0.016..25.440 rows=100000 loops=1)
Filter: (b = 1)
-> Hash (actual time=64.550..64.551 rows=100000 loops=1)
Buckets: 131072 Batches: 2 Memory Usage: 2976kB
-> Seq Scan on t2 (actual time=0.010..30.330 rows=100000 loops=1)
Filter: (b = 1)
Planning Time: 0.195 ms
Execution Time: 178.050 ms
(10 rows)
As shown in the preceding example, the subquery and the outer query are optimized into a semi-join after the correlated subquery pull-up feature is enabled. The filter conditions in the subquery effectively filter the results of the outer query, and the execution time is significantly shortened. If the subquery is not pulled up, the rows to be returned to the outer query cannot be filtered by the subquery.