By digoal
Determining the point-plane position relationships and searching for geometries such as points covered by an existing plane are common requirements in GIS-based geometric applications.
You can create a GiST index in your PostgreSQL database to accelerate such determinations, but creating an index is just the first step.
You must perform some optimization steps to achieve optimal performance, for example, to reduce the latency and CPU usage.
We have written an article covering this, describing the optimization of data access in a B-tree indexed database. If there is a poor linear correlation between the data storage sequence and the index sequence, then the I/O amplification problem occurs.
You must first understand the spatial index structure and then perform spatial aggregation to reduce the spatial scanning I/Os.
The following example shows how to optimize the searches based on the ST_Contains and ST_Within functions.
Assume that there is a table with 10 million data records of spatial geometries. We are to search this table for spatial geometries covered by a polygon. The problem is that this polygon is a long strip, as shown in the following figure. The bounding box that contains this polygon is large.
You can use the following snippet to build this polygon.
postgres=# select st_setsrid(st_makepolygon(ST_GeomFromText('LINESTRING(0 0,1 0,1 2.5,6 2.5,6 4,7 4,7 5,5 5,5 3,0 3,0 0)')), 4326);
st_setsrid
----------------------------
0103000020E6100000010000000B00000000000000000000000000000000000000000000000000F03F0000000000000000000000000000F03F000000000000044000000000000018400000000000000440000000000000184000000000000010400000000000001C4000000000000010400000000000001C40000000000000144000000000000014400000000000001440000000000000144000000000000008400000000000000000000000000000084000000000000000000000000000000000
(1 row)
1) Create a table.
postgres=# create table e(id int8, pos geometry);
CREATE TABLE
2) Write the spatial test data (10 million random points) covering the range of 50W to 50E and 50S to 50N.
postgres=# insert into e select id, st_setsrid(st_makepoint(50-random()*100, 50-random()*100), 4326) from generate_series(1,10000000) t(id);
INSERT 0 10000000
3) Create a spatial index.
postgres=# create index idx_e on e using gist(pos);
CREATE INDEX
4) Search for geometries whose bounding boxes are covered by the bounding box of this polygon.
postgres=# explain (analyze,verbose,timing,costs,buffers) select * from e where pos @ st_setsrid(st_makepolygon(ST_GeomFromText('LINESTRING(0 0,1 0,1 2.5,6 2.5,6 4,7 4,7 5,5 5,5 3,0 3,0 0)')), 4326);
QUERY PLAN
-----------------------
Index Scan using idx_e on public.e (cost=0.42..12526.72 rows=10000 width=40) (actual time=0.091..39.449 rows=35081 loops=1)
Output: id, pos
Index Cond: (e.pos @ '0103000020E6100000010000000B00000000000000000000000000000000000000000000000000F03F0000000000000000000000000000F03F000000000000044000000000000018400000000000000440000000000000184000000000000010400000000000001C4000000000000010400000000000001C40000000000000144000000000000014400000000000001440000000000000144000000000000008400000000000000000000000000000084000000000000000000000000000000000'::geometry)
Buffers: shared hit=35323
Planning time: 0.108 ms
Execution time: 41.222 ms
(6 rows)
35,323 data blocks are searched, and 35,081 records are returned.
5) Search for geometries covered by this polygon.
postgres=# explain (analyze,verbose,timing,costs,buffers) select * from e where st_contains(st_setsrid(st_makepolygon(ST_GeomFromText('LINESTRING(0 0,1 0,1 2.5,6 2.5,6 4,7 4,7 5,5 5,5 3,0 3,0 0)')), 4326), pos);
QUERY PLAN
-----------------------
Index Scan using idx_e on public.e (cost=0.42..15026.72 rows=3333 width=40) (actual time=0.077..49.015 rows=8491 loops=1)
Output: id, pos
Index Cond: ('0103000020E6100000010000000B00000000000000000000000000000000000000000000000000F03F0000000000000000000000000000F03F000000000000044000000000000018400000000000000440000000000000184000000000000010400000000000001C4000000000000010400000000000001C40000000000000144000000000000014400000000000001440000000000000144000000000000008400000000000000000000000000000084000000000000000000000000000000000'::geometry ~ e.pos)
Filter: _st_contains('0103000020E6100000010000000B00000000000000000000000000000000000000000000000000F03F0000000000000000000000000000F03F000000000000044000000000000018400000000000000440000000000000184000000000000010400000000000001C4000000000000010400000000000001C40000000000000144000000000000014400000000000001440000000000000144000000000000008400000000000000000000000000000084000000000000000000000000000000000'::geometry, e.pos)
Rows Removed by Filter: 26590
Buffers: shared hit=35323
Planning time: 0.085 ms
Execution time: 49.460 ms
(8 rows)
35,081 records from 35,323 data blocks are searched, and 8,491 records are returned, with 26,590 records filtered out.
The difference is that Step 4 searches for geometries whose bounding boxes are covered by the polygon's bounding box, and Step 5 searches for geometries that are covered by the polygon. A spatial index is created based on the bounding box.
Although many data blocks are searched, not many records meet the conditions. You can perform spatial aggregation to reduce the number of blocks to be scanned.
6) Create a table and adjust the data storage sequence through spatial aggregation and then create a spatial index.
postgres=# create table f(like e);
CREATE TABLE
postgres=# insert into f select * from e order by st_geohash(pos,15);
INSERT 0 10000000
postgres=# create index idx_f on f using gist(pos);
CREATE INDEX
7) Perform Step 4 on the optimized table.
Search for geometries whose bounding boxes are covered by the bounding box of this polygon. Before optimization, 35,323 blocks need to be scanned. Now, only 1,648 blocks need to be scanned.
postgres=# explain (analyze,verbose,timing,costs,buffers) select * from f where pos @ st_setsrid(st_makepolygon(ST_GeomFromText('LINESTRING(0 0,1 0,1 2.5,6 2.5,6 4,7 4,7 5,5 5,5 3,0 3,0 0)')), 4326);
QUERY PLAN
-----------------------
Index Scan using idx_f on public.f (cost=0.42..12526.72 rows=10000 width=40) (actual time=0.081..9.702 rows=35081 loops=1)
Output: id, pos
Index Cond: (f.pos @ '0103000020E6100000010000000B00000000000000000000000000000000000000000000000000F03F0000000000000000000000000000F03F000000000000044000000000000018400000000000000440000000000000184000000000000010400000000000001C4000000000000010400000000000001C40000000000000144000000000000014400000000000001440000000000000144000000000000008400000000000000000000000000000084000000000000000000000000000000000'::geometry)
Buffers: shared hit=1648
Planning time: 0.096 ms
Execution time: 11.404 ms
(6 rows)
8) Perform Step 5 on the optimized table.
Search for geometries covered by this polygon. Before optimization, 35,323 blocks need to be scanned. Now, only 1,648 blocks need to be scanned.
postgres=# explain (analyze,verbose,timing,costs,buffers) select * from f where st_contains(st_setsrid(st_makepolygon(ST_GeomFromText('LINESTRING(0 0,1 0,1 2.5,6 2.5,6 4,7 4,7 5,5 5,5 3,0 3,0 0)')), 4326), pos);
QUERY PLAN
-----------------------
Index Scan using idx_f on public.f (cost=0.42..15026.72 rows=3333 width=40) (actual time=1.216..32.398 rows=8491 loops=1)
Output: id, pos
Index Cond: ('0103000020E6100000010000000B00000000000000000000000000000000000000000000000000F03F0000000000000000000000000000F03F000000000000044000000000000018400000000000000440000000000000184000000000000010400000000000001C4000000000000010400000000000001C40000000000000144000000000000014400000000000001440000000000000144000000000000008400000000000000000000000000000084000000000000000000000000000000000'::geometry ~ f.pos)
Filter: _st_contains('0103000020E6100000010000000B00000000000000000000000000000000000000000000000000F03F0000000000000000000000000000F03F000000000000044000000000000018400000000000000440000000000000184000000000000010400000000000001C4000000000000010400000000000001C40000000000000144000000000000014400000000000001440000000000000144000000000000008400000000000000000000000000000084000000000000000000000000000000000'::geometry, f.pos)
Rows Removed by Filter: 26590
Buffers: shared hit=1648
Planning time: 0.101 ms
Execution time: 32.837 ms
(8 rows)
Spatial aggregation reduces the number of data blocks to be scanned from 35,323 to 1,648.
Spatial aggregation solves the problem of I/O amplification. The other problem to tackle is bounding box expansion. Optimization concerning this problem is related to the spatial index's structure.
From the preceding example, you can see that a spatial index is created based on the bounding box. Therefore, when the valid area accounts only for a small portion of the entire bounding box, many invalid data points may be included. As a result, both I/O and CPU usage are amplified. Next, we show how to solve this problem.
In the following figure, the rectangle enclosed by the dashed lines is the polygon's bounding box. When you use the GiST index to search for geometries that meet the conditions of points contained by the polygon, all geometries that are contained in this bounding box will be returned.
Optimization ideas:
Split this polygon into four boxes to eliminate the bounding box amplification problem.
explain (analyze,verbose,timing,costs,buffers) select * from f where
st_contains(st_setsrid(st_makebox2d(st_makepoint(0,0), st_makepoint(1,3)), 4326), pos)
or
st_contains(st_setsrid(st_makebox2d(st_makepoint(1,2.5), st_makepoint(5,3)), 4326), pos)
or
st_contains(st_setsrid(st_makebox2d(st_makepoint(5,2.5), st_makepoint(6,5)), 4326), pos)
or
st_contains(st_setsrid(st_makebox2d(st_makepoint(6,4), st_makepoint(7,5)), 4326), pos);
explain (analyze,verbose,timing,costs,buffers) select * from f where
pos @ st_setsrid(st_makebox2d(st_makepoint(0,0), st_makepoint(1,3)), 4326)
or
pos @ st_setsrid(st_makebox2d(st_makepoint(1,2.5), st_makepoint(5,3)), 4326)
or
pos @ st_setsrid(st_makebox2d(st_makepoint(5,2.5), st_makepoint(6,5)), 4326)
or
pos @ st_setsrid(st_makebox2d(st_makepoint(6,4), st_makepoint(7,5)), 4326);
1) After optimizing the database by using the preceding two optimization methods in combination, search for geometries covered by the polygon's bounding box.
The number of data blocks scanned is reduced from 1,648 to 243.
explain (analyze,verbose,timing,costs,buffers) select * from f where
pos @ st_setsrid(st_makebox2d(st_makepoint(0,0), st_makepoint(1,3)), 4326)
or
pos @ st_setsrid(st_makebox2d(st_makepoint(1,2.5), st_makepoint(5,3)), 4326)
or
pos @ st_setsrid(st_makebox2d(st_makepoint(5,2.5), st_makepoint(6,5)), 4326)
or
pos @ st_setsrid(st_makebox2d(st_makepoint(6,4), st_makepoint(7,5)), 4326);
QUERY PLAN
-----------------------
Bitmap Heap Scan on public.f (cost=10000000690.01..10000037405.46 rows=39940 width=40) (actual time=1.502..2.329 rows=8491 loops=1)
Output: id, pos
Recheck Cond: ((f.pos @ '0103000020E610000001000000050000000000000000000000000000000000000000000000000000000000000000000840000000000000F03F0000000000000840000000000000F03F000000000000000000000000000000000000000000000000'::geometry) OR (f.pos @ '0103000020E61000000100000005000000000000000000F03F0000000000000440000000000000F03F00000000000008400000000000001440000000000000084000000000000014400000000000000440000000000000F03F0000000000000440'::geometry) OR (f.pos @ '0103000020E610000001000000050000000000000000001440000000000000044000000000000014400000000000001440000000000000184000000000000014400000000000001840000000000000044000000000000014400000000000000440'::geometry) OR (f.pos @ '0103000020E6100000010000000500000000000000000018400000000000001040000000000000184000000000000014400000000000001C4000000000000014400000000000001C40000000000000104000000000000018400000000000001040'::geometry))
Heap Blocks: exact=119
Buffers: shared hit=243
-> BitmapOr (cost=690.01..690.01 rows=40000 width=0) (actual time=1.483..1.483 rows=0 loops=1)
Buffers: shared hit=124
-> Bitmap Index Scan on idx_f (cost=0.00..162.52 rows=10000 width=0) (actual time=0.461..0.461 rows=3077 loops=1)
Index Cond: (f.pos @ '0103000020E610000001000000050000000000000000000000000000000000000000000000000000000000000000000840000000000000F03F0000000000000840000000000000F03F000000000000000000000000000000000000000000000000'::geometry)
Buffers: shared hit=37
-> Bitmap Index Scan on idx_f (cost=0.00..162.52 rows=10000 width=0) (actual time=0.423..0.423 rows=1991 loops=1)
Index Cond: (f.pos @ '0103000020E61000000100000005000000000000000000F03F0000000000000440000000000000F03F00000000000008400000000000001440000000000000084000000000000014400000000000000440000000000000F03F0000000000000440'::geometry)
Buffers: shared hit=33
-> Bitmap Index Scan on idx_f (cost=0.00..162.52 rows=10000 width=0) (actual time=0.366..0.366 rows=2435 loops=1)
Index Cond: (f.pos @ '0103000020E610000001000000050000000000000000001440000000000000044000000000000014400000000000001440000000000000184000000000000014400000000000001840000000000000044000000000000014400000000000000440'::geometry)
Buffers: shared hit=31
-> Bitmap Index Scan on idx_f (cost=0.00..162.52 rows=10000 width=0) (actual time=0.232..0.232 rows=988 loops=1)
Index Cond: (f.pos @ '0103000020E6100000010000000500000000000000000018400000000000001040000000000000184000000000000014400000000000001C4000000000000014400000000000001C40000000000000104000000000000018400000000000001040'::geometry)
Buffers: shared hit=23
Planning time: 0.104 ms
Execution time: 2.751 ms
(21 rows)
2) After optimizing the database using the preceding two optimization methods in combination, search for geometries covered by the polygon.
The number of data blocks scanned is reduced from 1,648 to 243.
postgres=# explain (analyze,verbose,timing,costs,buffers) select * from f where
st_contains(st_setsrid(st_makebox2d(st_makepoint(0,0), st_makepoint(1,3)), 4326), pos)
or
st_contains(st_setsrid(st_makebox2d(st_makepoint(1,2.5), st_makepoint(5,3)), 4326), pos)
or
st_contains(st_setsrid(st_makebox2d(st_makepoint(5,2.5), st_makepoint(6,5)), 4326), pos)
or
st_contains(st_setsrid(st_makebox2d(st_makepoint(6,4), st_makepoint(7,5)), 4326), pos);
QUERY PLAN
--------------------------------------------
Bitmap Heap Scan on public.f (cost=663.40..77378.85 rows=13327 width=40) (actual time=1.496..11.038 rows=8491 loops=1)
Output: id, pos
Recheck Cond: (('0103000020E610000001000000050000000000000000000000000000000000000000000000000000000000000000000840000000000000F03F0000000000000840000000000000F03F000000000000000000000000000000000000000000000000'::geometry ~ f.pos) OR
('0103000020E61000000100000005000000000000000000F03F0000000000000440000000000000F03F00000000000008400000000000001440000000000000084000000000000014400000000000000440000000000000F03F0000000000000440'::geometry ~ f.pos) OR ('0103000020E610000001000000050000000000000000001440000000000000044000000000000014400000000000001440000000000000184000000000000014400000000000001840000000000000044000000000000014400000000000000440'::geometry ~ f.pos) OR ('0103000020E6100000010000000500000000000000000018400000000000001040000000000000184000000000000014400000000000001C4000000000000014400000000000001C40000000000000104000000000000018400000000000001040'::geometry ~ f.pos))
Filter: ((('0103000020E610000001000000050000000000000000000000000000000000000000000000000000000000000000000840000000000000F03F0000000000000840000000000000F03F000000000000000000000000000000000000000000000000'::geometry ~ f.pos) AND _st_contains('0103000020E610000001000000050000000000000000000000000000000000000000000000000000000000000000000840000000000000F03F0000000000000840000000000000F03F000000000000000000000000000000000000000000000000'::geometry, f.pos)) OR (('0103000020E61000000100000005000000000000000000F03F0000000000000440000000000000F03F00000000000008400000000000001440000000000000084000000000000014400000000000000440000000000000F03F0000000000000440'::geometry ~ f.pos) AND _st_contains('0103000020E61000000100000005000000000000000000F03F0000000000000440000000000000F03F00000000000008400000000000001440000000000000084000000000000014400000000000000440000000000000F03F0000000000000440'::geometry, f.pos)) OR (('0103000020E610000001000000050000000000000000001440000000000000044000000000000014400000000000001440000000000000184000000000000014400000000000001840000000000000044000000000000014400000000000000440'::geometry ~ f.pos) AND _st_contains('0103000020E610000001000000050000000000000000001440000000000000044000000000000014400000000000001440000000000000184000000000000014400000000000001840000000000000044000000000000014400000000000000440'::geometry, f.pos)) OR (('0103000020E6100000010000000500000000000000000018400000000000001040000000000000184000000000000014400000000000001C4000000000000014400000000000001C40000000000000104000000000000018400000000000001040'::geometry ~ f.pos) AND _st_contains('0103000020E6100000010000000500000000000000000018400000000000001040000000000000184000000000000014400000000000001C4000000000000014400000000000001C40000000000000104000000000000018400000000000001040'::geometry, f.pos)))
Heap Blocks: exact=119
Buffers: shared hit=243
-> BitmapOr (cost=663.40..663.40 rows=40000 width=0) (actual time=1.472..1.472 rows=0 loops=1)
Buffers: shared hit=124
-> Bitmap Index Scan on idx_f (cost=0.00..162.52 rows=10000 width=0) (actual time=0.436..0.436 rows=3077 loops=1)
Index Cond: ('0103000020E610000001000000050000000000000000000000000000000000000000000000000000000000000000000840000000000000F03F0000000000000840000000000000F03F000000000000000000000000000000000000000000000000'::geometry ~ f.pos)
Buffers: shared hit=37
-> Bitmap Index Scan on idx_f (cost=0.00..162.52 rows=10000 width=0) (actual time=0.438..0.438 rows=1991 loops=1)
Index Cond: ('0103000020E61000000100000005000000000000000000F03F0000000000000440000000000000F03F00000000000008400000000000001440000000000000084000000000000014400000000000000440000000000000F03F0000000000000440'::geometry ~ f.pos)
Buffers: shared hit=33
-> Bitmap Index Scan on idx_f (cost=0.00..162.52 rows=10000 width=0) (actual time=0.365..0.365 rows=2435 loops=1)
Index Cond: ('0103000020E610000001000000050000000000000000001440000000000000044000000000000014400000000000001440000000000000184000000000000014400000000000001840000000000000044000000000000014400000000000000440'::geometry ~ f.pos)
Buffers: shared hit=31
-> Bitmap Index Scan on idx_f (cost=0.00..162.52 rows=10000 width=0) (actual time=0.234..0.234 rows=988 loops=1)
Index Cond: ('0103000020E6100000010000000500000000000000000018400000000000001040000000000000184000000000000014400000000000001C4000000000000014400000000000001C40000000000000104000000000000018400000000000001040'::geometry ~ f.pos)
Buffers: shared hit=23
Planning time: 0.163 ms
Execution time: 11.497 ms
(22 rows)
Optimization method 2 splits the polygon (in the shape of a long strip) into smaller boxes. This operation eliminates the large bounding box and reduces the number of blocks to 243.
The combination of the preceding two optimization methods achieves great results.
Read the following topic of the PostGIS document to know how to split geometries: http://postgis.net/docs/manual-2.4/ST_Split.html
-- this creates a geometry collection consisting of the 2 halves of the polygon
-- this is similar to the example we demonstrated in ST_BuildArea
SELECT ST_Split(circle, line)
FROM (SELECT
ST_MakeLine(ST_MakePoint(10, 10),ST_MakePoint(190, 190)) As line,
ST_Buffer(ST_GeomFromText('POINT(100 90)'), 50) As circle) As foo;
-- result --
GEOMETRYCOLLECTION(POLYGON((150 90,149.039264020162 80.2454838991936,146.193976625564 70.8658283817455,...), POLYGON(...)))
-- To convert to individual polygons, you can use ST_Dump or ST_GeometryN
SELECT ST_AsText((ST_Dump(ST_Split(circle, line))).geom) As wkt
FROM (SELECT
ST_MakeLine(ST_MakePoint(10, 10),ST_MakePoint(190, 190)) As line,
ST_Buffer(ST_GeomFromText('POINT(100 90)'), 50) As circle) As foo;
-- result --
wkt
---------------
POLYGON((150 90,149.039264020162 80.2454838991936,...))
POLYGON((60.1371179574584 60.1371179574584,58.4265193848728 62.2214883490198,53.8060233744357 ...))
SELECT ST_AsText(ST_Split(mline, pt)) As wktcut
FROM (SELECT
ST_GeomFromText('MULTILINESTRING((10 10, 190 190), (15 15, 30 30, 100 90))') As mline,
ST_Point(30,30) As pt) As foo;
wktcut
------
GEOMETRYCOLLECTION(
LINESTRING(10 10,30 30),
LINESTRING(30 30,190 190),
LINESTRING(15 15,30 30),
LINESTRING(30 30,100 90)
)
@, ~, ST_Contains, and ST_Within are operators and functions for determining the inclusion relationships between geometries. What are the differences between them?
@
A @ B
returns TRUE if A's bounding box is contained by B's.
~
~ is the opposite of @.
A ~ B
returns TRUE if A's bounding box contains B's.
ST_Contains(A, B)
returns true if and only if no points of B lie in the exterior of A and at least one point of the interior of B lies in the interior of A.
ST_Within is the opposite of ST_Contains.
ST_Within(A, B)
returns true if geometry A is completely inside geometry B.
@
and ~
operations are not directly on the geometric object, but on the bound box of A and B, that is to say, the BOX consisting of the lower left and upper right points of the object.
ST_Within and ST_Contains are for geometric objects, but from the point of view of GiST index search, BOUND BOX needs to be used to search first, and then judged by CPU calculation.
A @ Polygon,返回真
B @ Polygon,返回真
C @ Polygon,返回真
ST_Contains(Polygon, A),返回假
ST_Contains(Polygon, B),返回真
ST_Contains(Polygon, C),返回假
ST_Subdivide is also a split function that splits a spatial geometry into a collection of spatial geometries.
Two problems affect a database's performance in terms of spatial searches:
1) The spatial data is stored out of order. As a result, many data blocks must be scanned when you search for some data, which is not a problem during point queries.
2) A PostGIS GiST index uses a bounding box as the key. As a result, searches are performed based on the bounding boxes of geometries. Therefore, if the target geometry is an irregularly shaped long strip, there are many empty areas in the bounding box of the geometry — increasing the scanning range (if you are using the ST_Contains and ST_Within functions) as well as the CPU usage when you filter out the empty areas.
Optimization method 1: Resolve the I/O amplification problem through spatial aggregation.
Optimization method 2: Split the long-strip polygon to reduce scanning range amplification (if you are using the ST_Contains and ST_Within functions) caused by bounding box amplification.
These optimization methods are used on a table with 10 million data records.
The optimization result is verified by determining the point-plane position relationships. Precisely, the optimization performance is verified through searching for spatial geometries covered by a long-strip polygon. You can also replace the long-strip polygon with multiple disconnected polygons.
| Comparison Item | Before Optimization | Optimized with Method 1 (Spatial Aggregation) | Optimized with Methods 1 and 2 (Spatial Aggregation + Geometry Split) |
| --- | --- | --- | --- |
| Data blocks scanned | 35,323 | 1,648 | 243 |
| Invalid records filtered out | 26,590 | 26,590 | 0 |
Spatial Search: Geometry and GiST Combination Outperforms Geohash and B-tree
Use Geometry Split to Optimize PostgreSQL Performance in GiST Index-based Polygon Search
digoal - June 26, 2019
digoal - January 21, 2021
digoal - December 18, 2020
digoal - December 21, 2020
digoal - January 18, 2021
digoal - February 14, 2021
Alibaba Cloud PolarDB for PostgreSQL is an in-house relational database service 100% compatible with PostgreSQL and highly compatible with the Oracle syntax.
Learn MoreAn online MPP warehousing service based on the Greenplum Database open source program
Learn MoreAn on-demand database hosting service for PostgreSQL with automated monitoring, backup and disaster recovery capabilities
Learn MoreLeverage cloud-native database solutions dedicated for FinTech.
Learn MoreMore Posts by digoal