By Digoal
There may be a need to search for a spatial object that intersects with any specified polygons and the like during spatial data query.
For example, in Cainiao and new retail businesses, points in several shopping mall polygons or several residential community polygons are queried.
The syntax of the SQL statements may be as follows:
select geo_point,* from table where ST_Within(geo_point, polygon_1) or ST_Within(geo_point, polygon_2) or ... ST_Within(geo_point, polygon_n);
PostgreSQL supports spatial indexes as well as bitmapAnd and bitmapOr index scans. In other words, index scans can be performed regardless of the number of query conditions if the geo_point field includes an index.
How can this query method be optimized?
Before the optimization, let's see another example.
Single-column conditions-based queries are similar to the requirement described above. Therefore, one field specifies multiple query conditions. This is similar to matching one field with multiple polygons.
The table structure is as follows:
postgres=# \d+ a
Table "public.a"
Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
----------+-----------------------------+-----------+----------+---------+----------+--------------+-------------
id | integer | | not null | | plain | |
info | text | | | | extended | |
crt_time | timestamp without time zone | | | | plain | |
Indexes:
"a_pkey" PRIMARY KEY, btree (id)
According to the query requirement, the ID field specifies four conditions, and a result is returned when any of the conditions is met.
Based on statistics and cost-based optimization, the bitmapOr feature of PostgreSQL is adopted to reuse the single-column index multiple times and merge the data blocks found by all the specified conditions. Finally, a recheck is performed to obtain the desired result.
BitmapAnd and bitmapOr are unique features of PostgreSQL databases. You can use multiple indexes in a conditions-based query to scan data blocks for merging and deduplication.
postgres=# explain (analyze,verbose,timing,costs,buffers) select * from a where id <100 or id<200 or id<300 or id between 10 and 100 or id between 100 and 30000;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on public.a (cost=39.85..3118.98 rows=2610 width=44) (actual time=5.099..15.355 rows=3154 loops=1)
Output: id, info, crt_time
Recheck Cond: ((a.id < 100) OR (a.id < 200) OR (a.id < 300) OR ((a.id >= 10) AND (a.id <= 100)) OR ((a.id >= 100) AND (a.id <= 30000)))
Heap Blocks: exact=38
Buffers: shared hit=16 read=40
-> BitmapOr (cost=39.85..39.85 rows=2610 width=0) (actual time=5.089..5.089 rows=0 loops=1)
Buffers: shared hit=15 read=3
-> Bitmap Index Scan on a_pkey (cost=0.00..1.70 rows=8 width=0) (actual time=0.003..0.003 rows=1 loops=1)
Index Cond: (a.id < 100)
Buffers: shared hit=3
-> Bitmap Index Scan on a_pkey (cost=0.00..1.77 rows=17 width=0) (actual time=0.001..0.001 rows=1 loops=1)
Index Cond: (a.id < 200)
Buffers: shared hit=3
-> Bitmap Index Scan on a_pkey (cost=0.00..1.84 rows=26 width=0) (actual time=0.001..0.001 rows=1 loops=1)
Index Cond: (a.id < 300)
Buffers: shared hit=3
-> Bitmap Index Scan on a_pkey (cost=0.00..1.72 rows=8 width=0) (actual time=0.004..0.004 rows=0 loops=1)
Index Cond: ((a.id >= 10) AND (a.id <= 100))
Buffers: shared hit=3
-> Bitmap Index Scan on a_pkey (cost=0.00..29.55 rows=2551 width=0) (actual time=5.079..5.079 rows=3153 loops=1)
Index Cond: ((a.id >= 100) AND (a.id <= 30000))
Buffers: shared hit=3 read=3
Planning time: 0.251 ms
Execution time: 15.699 ms
(24 rows)
However, the foregoing four conditions can be merged into one.
postgres=# explain (analyze,verbose,timing,costs,buffers) select * from a where id <= 30000;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on public.a (cost=23.88..3006.82 rows=2560 width=44) (actual time=0.110..0.568 rows=3154 loops=1)
Output: id, info, crt_time
Recheck Cond: (a.id <= 30000)
Heap Blocks: exact=38
Buffers: shared hit=44
-> Bitmap Index Scan on a_pkey (cost=0.00..23.24 rows=2560 width=0) (actual time=0.102..0.102 rows=3154 loops=1)
Index Cond: (a.id <= 30000)
Buffers: shared hit=6
Planning time: 0.127 ms
Execution time: 0.835 ms
(10 rows)
After the conditions are merged, data is scanned based on a precise index, improving performance dozens of times.
This also applies to spatial data. It is more convenient to merge query conditions using st_union.
select geo_point,* from table where ST_Within(geo_point, polygon_1) or ST_Within(geo_point, polygon_2) or ... ST_Within(geo_point, polygon_n);
is optimized as follows:
select geo_point,* from table where ST_Within(geo_point, st_union(polygon_1,polygon_2,...polygon_n) );
The performance has also improved significantly.
PostgreSQL: Nearest Neighbor Query Performance on Billions of Geolocation Records
PostgreSQL: Database Requirement Analysis and Design for Logistics Tracking System
Alibaba Cloud Community - September 10, 2024
digoal - December 18, 2018
digoal - September 12, 2019
Alibaba Clouder - July 5, 2019
digoal - April 12, 2019
ApsaraDB - August 21, 2024
Alibaba Cloud PolarDB for PostgreSQL is an in-house relational database service 100% compatible with PostgreSQL and highly compatible with the Oracle syntax.
Learn MoreLeverage cloud-native database solutions dedicated for FinTech.
Learn MoreMigrate your legacy Oracle databases to Alibaba Cloud to save on long-term costs and take advantage of improved scalability, reliability, robust security, high performance, and cloud-native features.
Learn MoreMigrating to fully managed cloud databases brings a host of benefits including scalability, reliability, and cost efficiency.
Learn MoreMore Posts by digoal