By digoal
Currently, GiST indexes are used in PostgreSQL for spatial objects. The following figure shows the tree structure of spatial data, in which each data entry is a bounding box.
If an object is a polygon, its bounding box is stored in the index.
A non-box object usually causes bounding box amplification.
If the input is a polygon, this polygon's bounding box is used as an input criterion. The query's main task is finding a branch of the bounding box meeting the operators' conditions from the PostgreSQL database based on the indexes.
In this case, if the invalid area is too large, I/O amplification and CPU usage amplification will occur during index scanning.
Optimization method: Split the input polygon to reduce invalid areas and then use the UNION ALL command to merge the results.
For a multipolygon object, the bounding box can be huge.
In the following figure, the multipolygon's bounding box that comprises three separate polygons on the map is enormous.
select * from tbl where st_contains(multi_polygon, pos);
When you run the preceding SQL statement to perform a GiST index-based search, all spatial objects in the bounding box are returned, instead of the spatial objects in these small polygons. Then "check filter" is used for filtering. This results in I/O and CPU usage amplification.
To sum it all up, split spatial objects and use the UNION ALL command to merge spatial search results to obtain the final result.
1) Create a test table.
create table tbl (id int, pos geometry);
2) Write 10 million spatial data points.
insert into tbl select id,
st_setsrid(
st_makepoint(
round((random()*(135.085831-73.406586)+73.406586)::numeric,6),
round((random()*(53.880950-3.408477)+3.408477)::numeric,6)
),
4326
) from generate_series(1,10000000) t(id);
3) Create spatial indexes.
create index idx_tbl_pos on tbl using gist(pos);
4) Construct a multipolygon with multiple polygons.
select st_union(
array[
st_setsrid(st_makepolygon(ST_GeomFromText('LINESTRING(75.15 29.53,77 29,77.6 29.5, 75.15 29.53)')), 4326),
st_setsrid(st_makepolygon(ST_GeomFromText('LINESTRING(125.15 29.53,125 29,125.6 29.5, 125.15 29.53)')), 4326)
]
);
5) Use the multipolygon as the input for the search.
explain (analyze,verbose,timing,costs,buffers) select * from tbl where st_contains(
st_union(
array[
st_setsrid(st_makepolygon(ST_GeomFromText('LINESTRING(75.15 29.53,77 29,77.6 29.5, 75.15 29.53)')), 4326),
st_setsrid(st_makepolygon(ST_GeomFromText('LINESTRING(125.15 29.53,125 29,125.6 29.5, 125.15 29.53)')), 4326)
]
),
pos
);
Bitmap Heap Scan on public.tbl (cost=156.65..13467.61 rows=3333 width=36) (actual time=41.020..5062.317 rows=2445 loops=1)
Output: id, pos
Recheck Cond: ('0106000020E610000002000000010300000001000000040000009A99999999C9524048E17A14AE873D4000000000004053400000000000003D4066666666666653400000000000803D409A99999999C9524048E17A14AE873D40010300000001000000040000009A99999999495F4048E17A14AE873D400000000000405F400000000000003D406666666666665F400000000000803D409A99999999495F4048E17A14AE873D40'::geometry ~ tbl.pos)
Filter: _st_contains('0106000020E610000002000000010300000001000000040000009A99999999C9524048E17A14AE873D4000000000004053400000000000003D4066666666666653400000000000803D409A99999999C9524048E17A14AE873D40010300000001000000040000009A99999999495F4048E17A14AE873D400000000000405F400000000000003D406666666666665F400000000000803D409A99999999495F4048E17A14AE873D40'::geometry, tbl.pos)
Rows Removed by Filter: 83589
Heap Blocks: exact=53874
Buffers: shared hit=823 read=53873 written=7641
-> Bitmap Index Scan on idx_tbl_pos (cost=0.00..155.82 rows=10000 width=0) (actual time=22.305..22.305 rows=86034 loops=1)
Index Cond: ('0106000020E610000002000000010300000001000000040000009A99999999C9524048E17A14AE873D4000000000004053400000000000003D4066666666666653400000000000803D409A99999999C9524048E17A14AE873D40010300000001000000040000009A99999999495F4048E17A14AE873D400000000000405F400000000000003D406666666666665F400000000000803D409A99999999495F4048E17A14AE873D40'::geometry ~ tbl.pos)
Buffers: shared hit=822
Planning time: 0.268 ms
Execution time: 5062.947 ms
(12 rows)
6) Split the multipolygon into multiple polygons for the search and use the UNION ALL command to merge the results.
explain (analyze,verbose,timing,costs,buffers)
select * from tbl where st_contains(st_setsrid(st_makepolygon(ST_GeomFromText('LINESTRING(75.15 29.53,77 29,77.6 29.5, 75.15 29.53)')), 4326), pos)
union all
select * from tbl where st_contains(st_setsrid(st_makepolygon(ST_GeomFromText('LINESTRING(125.15 29.53,125 29,125.6 29.5, 125.15 29.53)')), 4326), pos);
Append (cost=156.65..27001.89 rows=6666 width=36) (actual time=1.594..11.143 rows=2445 loops=1)
Buffers: shared hit=5230
-> Bitmap Heap Scan on public.tbl (cost=156.65..13467.61 rows=3333 width=36) (actual time=1.594..8.473 rows=2016 loops=1)
Output: tbl.id, tbl.pos
Recheck Cond: ('0103000020E610000001000000040000009A99999999C9524048E17A14AE873D4000000000004053400000000000003D4066666666666653400000000000803D409A99999999C9524048E17A14AE873D40'::geometry ~ tbl.pos)
Filter: _st_contains('0103000020E610000001000000040000009A99999999C9524048E17A14AE873D4000000000004053400000000000003D4066666666666653400000000000803D409A99999999C9524048E17A14AE873D40'::geometry, tbl.pos)
Rows Removed by Filter: 2172
Heap Blocks: exact=4083
Buffers: shared hit=4133
-> Bitmap Index Scan on idx_tbl_pos (cost=0.00..155.82 rows=10000 width=0) (actual time=1.001..1.001 rows=4188 loops=1)
Index Cond: ('0103000020E610000001000000040000009A99999999C9524048E17A14AE873D4000000000004053400000000000003D4066666666666653400000000000803D409A99999999C9524048E17A14AE873D40'::geometry ~ tbl.pos)
Buffers: shared hit=50
-> Bitmap Heap Scan on public.tbl tbl_1 (cost=156.65..13467.61 rows=3333 width=36) (actual time=0.429..2.227 rows=429 loops=1)
Output: tbl_1.id, tbl_1.pos
Recheck Cond: ('0103000020E610000001000000040000009A99999999495F4048E17A14AE873D400000000000405F400000000000003D406666666666665F400000000000803D409A99999999495F4048E17A14AE873D40'::geometry ~ tbl_1.pos)
Filter: _st_contains('0103000020E610000001000000040000009A99999999495F4048E17A14AE873D400000000000405F400000000000003D406666666666665F400000000000803D409A99999999495F4048E17A14AE873D40'::geometry, tbl_1.pos)
Rows Removed by Filter: 655
Heap Blocks: exact=1076
Buffers: shared hit=1097
-> Bitmap Index Scan on idx_tbl_pos (cost=0.00..155.82 rows=10000 width=0) (actual time=0.295..0.295 rows=1084 loops=1)
Index Cond: ('0103000020E610000001000000040000009A99999999495F4048E17A14AE873D400000000000405F400000000000003D406666666666665F400000000000803D409A99999999495F4048E17A14AE873D40'::geometry ~ tbl_1.pos)
Buffers: shared hit=21
Planning time: 0.247 ms
Execution time: 11.432 ms
(24 rows)
7) Write user-defined functions (UDFs) to simplify multiple UNION ALL commands.
create or replace function q_mp(VARIADIC arr geometry[]) returns setof record as $$
declare
sql text := '';
var geometry;
begin
foreach var in array arr loop
sql := sql || format(' select * from tbl where st_contains(''%s''::geometry, pos) union all', var);
end loop;
sql := rtrim(sql, 'union all');
return query execute sql;
end;
$$ language plpgsql strict;
postgres=# explain (analyze,verbose,timing,costs,buffers) select * from q_mp
(
st_setsrid(st_makepolygon(ST_GeomFromText('LINESTRING(75.15 29.53,77 29,77.6 29.5, 75.15 29.53)')), 4326),
st_setsrid(st_makepolygon(ST_GeomFromText('LINESTRING(125.15 29.53,125 29,125.6 29.5, 125.15 29.53)')), 4326)
) as t(id int, pos geometry);
Function Scan on public.q_mp t (cost=0.25..10.25 rows=1000 width=36) (actual time=11.451..11.707 rows=2445 loops=1)
Output: id, pos
Function Call: q_mp(VARIADIC '{0103000020E610000001000000040000009A99999999C9524048E17A14AE873D4000000000004053400000000000003D4066666666666653400000000000803D409A99999999C9524048E17A14AE873D40:0103000020E610000001000000040000009A99999999495F4048E17A14AE873D400000000000405F400000000000003D406666666666665F400000000000803D409A99999999495F4048E17A14AE873D40}'::geometry[])
Buffers: shared hit=5230
Planning time: 0.095 ms
Execution time: 11.975 ms
(6 rows)
1) UDFs greatly simplify SQL statements and improve performance.
2) You can reduce the search criteria (or invalid areas in the bounding box) as much as possible to reduce the I/O and CPU usage and significantly improve the performance.
3) There may be another optimizable problem. If the spatial object is very large, the PGLZ compression method is used. If the CPU usage on compression and decompression has become a bottleneck, a non-compressed storage format can reduce CPU usage.
You can use perf to observe bottlenecks.
4) Custom compression methods: https://commitfest.postgresql.org/21/1294/
For more information on how to split a multigeometry (multipolygon) into multiple polygons, see PostGIS documentation: http://postgis.net/docs/manual-2.4/ST_GeometryN.html
digoal - January 21, 2021
digoal - December 16, 2020
Alibaba Clouder - May 20, 2020
digoal - December 18, 2020
afzaalvirgoboy - February 25, 2020
Alibaba Clouder - March 15, 2019
Alibaba Cloud PolarDB for PostgreSQL is an in-house relational database service 100% compatible with PostgreSQL and highly compatible with the Oracle syntax.
Learn MoreAn online MPP warehousing service based on the Greenplum Database open source program
Learn MoreAn on-demand database hosting service for PostgreSQL with automated monitoring, backup and disaster recovery capabilities
Learn MoreLeverage cloud-native database solutions dedicated for FinTech.
Learn MoreMore Posts by digoal
5374541210554523 January 3, 2022 at 11:52 am
请问第一张图是怎么做出来的?GiST R树索引就一层树吗?