Unlock the Power of AI

1 million free tokens

88% Price Reduction

NaNDayNaN:NaN:NaN
Activate Now

Sublink pushdown

Updated at: 2025-04-18 10:45

PolarDB for PostgreSQL (Compatible with Oracle) allows you to rewrite queries to push down sublinks. This feature improves the execution efficiency of SQL statements that contain IN and ANY clauses.

Background information

In PostgreSQL, sublinks of the ANY type (IN and ANY clauses) are typically pulled up as semi-joins (semi join). However, if the linked table is a subquery that cannot be pulled up as a semi-join, PostgreSQL is unable to create a parameterized path for it. As a result, the subquery is executed independently. This impacts the SQL performance when the subquery involves a large amount of data.

For example, the subquery of the following SQL statement contains a GROUP BY clause and cannot be pulled up. The execution time primarily depends on scanning and sorting the t_big table. As the size of the t_big table grows, the execution time increases.

EXPLAIN ANALYZE SELECT * FROM (SELECT a, sum(b) b FROM t_big GROUP BY a)v WHERE a IN (SELECT a FROM t_small);
                                                                   QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------
 Merge Semi Join  (cost=0.55..59523.15 rows=10000 width=12) (actual time=0.064..1237.621 rows=2 loops=1)
   Merge Cond: (t_big.a = t_small.a)
   ->  GroupAggregate  (cost=0.42..46910.99 rows=1000000 width=12) (actual time=0.033..1113.615 rows=1000000 loops=1)
         Group Key: t_big.a
         ->  Index Scan using t_big_a_idx on t_big  (cost=0.42..31910.99 rows=1000000 width=8) (actual time=0.024..420.575 rows=1000000 loops=1)
   ->  Index Only Scan using t_small_a_idx on t_small  (cost=0.13..12.16 rows=2 width=4) (actual time=0.028..0.030 rows=2 loops=1)
         Heap Fetches: 2
 Planning Time: 0.256 ms
 Execution Time: 1237.700 ms
(9 rows)

If the sublink of the ANY type can be pushed down to the subquery, the index in the subquery can be used to improve the execution efficiency.

EXPLAIN ANALYZE SELECT * FROM (SELECT a, sum(b) b FROM t_big WHERE a IN (SELECT a FROM t_small) GROUP BY a)v;
                                                             QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------
 GroupAggregate  (cost=17.96..17.99 rows=2 width=12) (actual time=0.061..0.064 rows=2 loops=1)
   Group Key: t_big.a
   ->  Sort  (cost=17.96..17.96 rows=2 width=8) (actual time=0.054..0.056 rows=2 loops=1)
         Sort Key: t_big.a
         Sort Method: quicksort  Memory: 25kB
         ->  Nested Loop  (cost=1.46..17.95 rows=2 width=8) (actual time=0.031..0.045 rows=2 loops=1)
               ->  Unique  (cost=1.03..1.04 rows=2 width=4) (actual time=0.014..0.017 rows=2 loops=1)
                     ->  Sort  (cost=1.03..1.03 rows=2 width=4) (actual time=0.013..0.014 rows=2 loops=1)
                           Sort Key: t_small.a
                           Sort Method: quicksort  Memory: 25kB
                           ->  Seq Scan on t_small  (cost=0.00..1.02 rows=2 width=4) (actual time=0.005..0.006 rows=2 loops=1)
               ->  Index Scan using t_big_a_idx on t_big  (cost=0.42..8.44 rows=1 width=8) (actual time=0.010..0.011 rows=1 loops=2)
                     Index Cond: (a = t_small.a)
 Planning Time: 0.527 ms
 Execution Time: 0.143 ms
(15 rows)

Prerequisites

Your PolarDB for PostgreSQL (Compatible with Oracle) cluster runs the following database engine version:

  • PolarDB for PostgreSQL (Compatible with Oracle) 2.0 whose revision version is 2.0.14.13.28.0 or later

  • PolarDB for PostgreSQL (Compatible with Oracle) 1.0 whose revision version is 2.0.11.15.44.0 or later

Note

You can view the revision version in the console or execute the SHOW polardb_version; statement to query the revision version. To upgrade the revision version, see Upgrade the version.

Scenarios

The sublink pushdown feature is suitable for subqueries with GROUP BY statements that are referenced by IN or ANY clauses, especially when the subqueries involve large tables. Pushing down the IN or ANY clause into the subquery allows for the use of the index of the large table, reducing the data access amount.

Limits

The following limitations apply to the sublink pushdown feature:

  • The IN or ANY clause must reference a subquery with a GROUP BY statement. Otherwise, open source PostgreSQL creates a parameterized path. The sublink pushdown feature is not used.

  • The columns in the IN or ANY clause must be included in the columns in the GROUP BY statement. Otherwise, the SQL statement after pushdown may not be equivalent to the original SQL statement.

  • The current query block must not contain outer joins. Otherwise, the SQL statement after pushdown is not equivalent to the original SQL statement.

  • Only one column can be referenced in the IN or ANY clause, such as a in (select a from t) or a = any(select a from t).

  • Only SELECT and CREATE TABLE AS statements are supported.

Usage

You can configure the following parameter to control the sublink pushdown feature:

Parameter

Description

Parameter

Description

polar_cbqt_pushdown_sublink

Enables or disables the sublink pushdown feature. Valid values:

  • OFF (default): disables the sublink pushdown feature.

  • ON: enables the sublink pushdown feature. The sublinks are pushed down based on CBQT.

  • FORCE: forcibly enables the sublink pushdown feature. The sublinks are pushed down regardless of CBQT. Use this parameter value in a hint instead of global settings.

Examples

Data preparation

CREATE TABLE t_small(a int);
CREATE TABLE t_big(a int, b int, c int);

CREATE INDEX ON t_big(a);

INSERT INTO t_big SELECT i, i, i FROM generate_series(1, 1000000)i;
INSERT INTO t_small VALUES(1), (1000000);

ANALYZE t_small, t_big;

Original query

In the original query plan, the join condition t_big.a = t_small.a cannot be pushed down as a parameterized path. A full table scan must be performed on the t_big table, which is inefficient.

EXPLAIN ANALYZE SELECT * FROM (SELECT a, sum(b) b FROM t_big GROUP BY a)v WHERE a IN (SELECT a FROM t_small);

Sample result:

                                                                   QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------
 Merge Semi Join  (cost=1.46..59510.27 rows=10000 width=12) (actual time=0.049..1239.128 rows=2 loops=1)
   Merge Cond: (t_big.a = t_small.a)
   ->  GroupAggregate  (cost=0.42..46909.23 rows=1000000 width=12) (actual time=0.034..1113.324 rows=1000000 loops=1)
         Group Key: t_big.a
         ->  Index Scan using t_big_a_idx on t_big  (cost=0.42..31909.23 rows=1000000 width=8) (actual time=0.025..412.650 rows=1000000 loops=1)
   ->  Sort  (cost=1.03..1.03 rows=2 width=4) (actual time=0.012..0.013 rows=2 loops=1)
         Sort Key: t_small.a
         Sort Method: quicksort  Memory: 25kB
         ->  Seq Scan on t_small  (cost=0.00..1.02 rows=2 width=4) (actual time=0.005..0.006 rows=2 loops=1)
 Planning Time: 0.219 ms
 Execution Time: 1239.208 ms
(11 rows)

Enable sublink pushdown by enabling CBQT

After the CBQT and sublink pushdown features are enabled, the clause a in (select a from t_small) is pushed down to the subquery. A parameterized path is generated for the t_big table based on the join condition, which significantly reduces the amount of data to be scanned and the execution time.

Note

The cost of the original query plan must exceed the value of polar_cbqt_cost_threshold for the sublink pushdown feature to be used.

-- Enable CBQT
SET polar_enable_cbqt to on;

-- Enable sublink pushdown
SET polar_cbqt_pushdown_sublink to on;

EXPLAIN ANALYZE SELECT * FROM (SELECT a, sum(b) b FROM t_big GROUP BY a)v WHERE a IN (SELECT a FROM t_small);

Sample result:

                                                             QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------
 GroupAggregate  (cost=17.96..17.99 rows=2 width=12) (actual time=0.056..0.059 rows=2 loops=1)
   Group Key: t_big.a
   ->  Sort  (cost=17.96..17.96 rows=2 width=8) (actual time=0.051..0.052 rows=2 loops=1)
         Sort Key: t_big.a
         Sort Method: quicksort  Memory: 25kB
         ->  Nested Loop  (cost=1.46..17.95 rows=2 width=8) (actual time=0.032..0.045 rows=2 loops=1)
               ->  Unique  (cost=1.03..1.04 rows=2 width=4) (actual time=0.014..0.018 rows=2 loops=1)
                     ->  Sort  (cost=1.03..1.03 rows=2 width=4) (actual time=0.014..0.015 rows=2 loops=1)
                           Sort Key: t_small.a
                           Sort Method: quicksort  Memory: 25kB
                           ->  Seq Scan on t_small  (cost=0.00..1.02 rows=2 width=4) (actual time=0.007..0.008 rows=2 loops=1)
               ->  Index Scan using t_big_a_idx on t_big  (cost=0.42..8.44 rows=1 width=8) (actual time=0.010..0.010 rows=1 loops=2)
                     Index Cond: (a = t_small.a)
 Planning Time: 0.518 ms
 Execution Time: 0.141 ms
(15 rows)

Enable sublink pushdown by using a hint

You can use a hint to enable the sublink pushdown feature for a SQL statement. As a result, the clause a in (select a from t_small) is pushed down to the subquery. A parameterized path is generated for the t_big table based on the join condition, which significantly reduces the amount of data to be scanned and the execution time.

-- The default value of polar_cbqt_pushdown_sublink parameter is off
SET polar_cbqt_pushdown_sublink to off;

EXPLAIN ANALYZE /*+ Set(polar_cbqt_pushdown_sublink force) */ SELECT * FROM (SELECT a, sum(b) b FROM t_big GROUP BY a)v WHERE a IN (SELECT a FROM t_small);

Sample result:

                                                             QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------
 GroupAggregate  (cost=17.96..17.99 rows=2 width=12) (actual time=0.073..0.076 rows=2 loops=1)
   Group Key: t_big.a
   ->  Sort  (cost=17.96..17.96 rows=2 width=8) (actual time=0.067..0.069 rows=2 loops=1)
         Sort Key: t_big.a
         Sort Method: quicksort  Memory: 25kB
         ->  Nested Loop  (cost=1.46..17.95 rows=2 width=8) (actual time=0.026..0.040 rows=2 loops=1)
               ->  Unique  (cost=1.03..1.04 rows=2 width=4) (actual time=0.011..0.015 rows=2 loops=1)
                     ->  Sort  (cost=1.03..1.03 rows=2 width=4) (actual time=0.010..0.011 rows=2 loops=1)
                           Sort Key: t_small.a
                           Sort Method: quicksort  Memory: 25kB
                           ->  Seq Scan on t_small  (cost=0.00..1.02 rows=2 width=4) (actual time=0.005..0.006 rows=2 loops=1)
               ->  Index Scan using t_big_a_idx on t_big  (cost=0.42..8.44 rows=1 width=8) (actual time=0.009..0.009 rows=1 loops=2)
                     Index Cond: (a = t_small.a)
 Planning Time: 0.788 ms
 Execution Time: 0.156 ms
(15 rows)

  • On this page (1)
  • Background information
  • Prerequisites
  • Scenarios
  • Limits
  • Usage
  • Examples
  • Data preparation
  • Original query
  • Enable sublink pushdown by enabling CBQT
  • Enable sublink pushdown by using a hint
Feedback
phone Contact Us

Chat now with Alibaba Cloud Customer Service to assist you in finding the right products and services to meet your needs.

alicare alicarealicarealicare