×
Community Blog PostgreSQL: Optimizing Single-column Conditions-based Queries

PostgreSQL: Optimizing Single-column Conditions-based Queries

This article discusses single-column conditions-based query optimization in PostgreSQL and talks about the table structure and query plan.

By Digoal

Background

There may be a need to search for a spatial object that intersects with any specified polygons and the like during spatial data query.

For example, in Cainiao and new retail businesses, points in several shopping mall polygons or several residential community polygons are queried.

The syntax of the SQL statements may be as follows:

select geo_point,* from table where ST_Within(geo_point, polygon_1) or ST_Within(geo_point, polygon_2) or ... ST_Within(geo_point, polygon_n);  

PostgreSQL supports spatial indexes as well as bitmapAnd and bitmapOr index scans. In other words, index scans can be performed regardless of the number of query conditions if the geo_point field includes an index.

How can this query method be optimized?

Before the optimization, let's see another example.

Single-column Conditions-based Queries

Single-column conditions-based queries are similar to the requirement described above. Therefore, one field specifies multiple query conditions. This is similar to matching one field with multiple polygons.

The table structure is as follows:

postgres=# \d+ a  
                                                Table "public.a"  
  Column  |            Type             | Collation | Nullable | Default | Storage  | Stats target | Description   
----------+-----------------------------+-----------+----------+---------+----------+--------------+-------------  
 id       | integer                     |           | not null |         | plain    |              |   
 info     | text                        |           |          |         | extended |              |   
 crt_time | timestamp without time zone |           |          |         | plain    |              |   
Indexes:  
    "a_pkey" PRIMARY KEY, btree (id)  

According to the query requirement, the ID field specifies four conditions, and a result is returned when any of the conditions is met.

Based on statistics and cost-based optimization, the bitmapOr feature of PostgreSQL is adopted to reuse the single-column index multiple times and merge the data blocks found by all the specified conditions. Finally, a recheck is performed to obtain the desired result.

BitmapAnd and bitmapOr are unique features of PostgreSQL databases. You can use multiple indexes in a conditions-based query to scan data blocks for merging and deduplication.

postgres=# explain (analyze,verbose,timing,costs,buffers) select * from a where id <100 or id<200 or id<300 or id between 10 and 100 or id between 100 and 30000;  
                                                                QUERY PLAN                                                                   
-------------------------------------------------------------------------------------------------------------------------------------------  
 Bitmap Heap Scan on public.a  (cost=39.85..3118.98 rows=2610 width=44) (actual time=5.099..15.355 rows=3154 loops=1)  
   Output: id, info, crt_time  
   Recheck Cond: ((a.id < 100) OR (a.id < 200) OR (a.id < 300) OR ((a.id >= 10) AND (a.id <= 100)) OR ((a.id >= 100) AND (a.id <= 30000)))  
   Heap Blocks: exact=38  
   Buffers: shared hit=16 read=40  
   ->  BitmapOr  (cost=39.85..39.85 rows=2610 width=0) (actual time=5.089..5.089 rows=0 loops=1)  
         Buffers: shared hit=15 read=3  
         ->  Bitmap Index Scan on a_pkey  (cost=0.00..1.70 rows=8 width=0) (actual time=0.003..0.003 rows=1 loops=1)  
               Index Cond: (a.id < 100)  
               Buffers: shared hit=3  
         ->  Bitmap Index Scan on a_pkey  (cost=0.00..1.77 rows=17 width=0) (actual time=0.001..0.001 rows=1 loops=1)  
               Index Cond: (a.id < 200)  
               Buffers: shared hit=3  
         ->  Bitmap Index Scan on a_pkey  (cost=0.00..1.84 rows=26 width=0) (actual time=0.001..0.001 rows=1 loops=1)  
               Index Cond: (a.id < 300)  
               Buffers: shared hit=3  
         ->  Bitmap Index Scan on a_pkey  (cost=0.00..1.72 rows=8 width=0) (actual time=0.004..0.004 rows=0 loops=1)  
               Index Cond: ((a.id >= 10) AND (a.id <= 100))  
               Buffers: shared hit=3  
         ->  Bitmap Index Scan on a_pkey  (cost=0.00..29.55 rows=2551 width=0) (actual time=5.079..5.079 rows=3153 loops=1)  
               Index Cond: ((a.id >= 100) AND (a.id <= 30000))  
               Buffers: shared hit=3 read=3  
 Planning time: 0.251 ms  
 Execution time: 15.699 ms  
(24 rows)  

However, the foregoing four conditions can be merged into one.

postgres=# explain (analyze,verbose,timing,costs,buffers) select * from a where id <= 30000;  
                                                      QUERY PLAN                                                        
----------------------------------------------------------------------------------------------------------------------  
 Bitmap Heap Scan on public.a  (cost=23.88..3006.82 rows=2560 width=44) (actual time=0.110..0.568 rows=3154 loops=1)  
   Output: id, info, crt_time  
   Recheck Cond: (a.id <= 30000)  
   Heap Blocks: exact=38  
   Buffers: shared hit=44  
   ->  Bitmap Index Scan on a_pkey  (cost=0.00..23.24 rows=2560 width=0) (actual time=0.102..0.102 rows=3154 loops=1)  
         Index Cond: (a.id <= 30000)  
         Buffers: shared hit=6  
 Planning time: 0.127 ms  
 Execution time: 0.835 ms  
(10 rows)  

After the conditions are merged, data is scanned based on a precise index, improving performance dozens of times.

This also applies to spatial data. It is more convenient to merge query conditions using st_union.

Optimization by Merging Spatial Data Query Conditions

select geo_point,* from table where ST_Within(geo_point, polygon_1) or ST_Within(geo_point, polygon_2) or ... ST_Within(geo_point, polygon_n);  

is optimized as follows:

select geo_point,* from table where ST_Within(geo_point, st_union(polygon_1,polygon_2,...polygon_n) );  

The performance has also improved significantly.

References

0 0 0
Share on

digoal

282 posts | 24 followers

You may also like

Comments

digoal

282 posts | 24 followers

Related Products