By digoal
create table aa (id int, c1 int, c2 int, c3 int[], c4 timestamp);
insert into aa select generate_series(1,10000000), random()*100, random()*1000, array[(random()*100)::int, (random()*1000)::int], clock_timestamp();
create index idx_c3 on aa using gin (c3);
create index idx_c4 on aa (c4);
explain (analyze,verbose,timing,costs,buffers) select * from aa where c3 @> array[-1] order by c4 desc limit 10 offset 320;
The statistics are not generated, so the database selects an incorrect execution plan (for example, the GIN index is not used) and uses the c4 sorting index, resulting in a slowdown.
postgres=> select * from pg_stats where attname='c4' and tablename='aa';
schemaname | tablename | attname | inherited | null_frac | avg_width | n_distinct | most_common_vals | most_common_freqs | histogram_bounds | correlation | most_common_elems | most_common_elem_
freqs | elem_count_histogram
------------+-----------+---------+-----------+-----------+-----------+------------+------------------+-------------------+------------------+-------------+-------------------+------------------
------+----------------------
(0 rows)
postgres=> select * from pg_stats where attname='c3' and tablename='aa';
schemaname | tablename | attname | inherited | null_frac | avg_width | n_distinct | most_common_vals | most_common_freqs | histogram_bounds | correlation | most_common_elems | most_common_elem_
freqs | elem_count_histogram
------------+-----------+---------+-----------+-----------+-----------+------------+------------------+-------------------+------------------+-------------+-------------------+------------------
------+----------------------
(0 rows)
postgres=> explain (analyze,verbose,timing,costs,buffers) select * from aa where c3 @> array[-1] order by c4 desc limit 10 offset 320;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=2039.27..2102.98 rows=10 width=52) (actual time=2382.421..2382.421 rows=0 loops=1)
Output: id, c1, c2, c3, c4
Buffers: shared hit=130422 read=27325
I/O Timings: read=116.959
-> Index Scan Backward using idx_c4 on public.aa (cost=0.43..318568.03 rows=50000 width=52) (actual time=2382.418..2382.419 rows=0 loops=1)
Output: id, c1, c2, c3, c4
Filter: (aa.c3 @> '{-1}'::integer[])
Rows Removed by Filter: 10000000
Buffers: shared hit=130422 read=27325
I/O Timings: read=116.959
Planning Time: 2.304 ms
Execution Time: 2382.443 ms
(12 rows)
After statistics are generated, it is perfect:
postgres=> vacuum analyze aa;
VACUUM
-- 查询pg_stats已经有统计信息了
postgres=> explain (analyze,verbose,timing,costs,buffers) select * from aa where c3 @> array[-1] order by c4 desc limit 10 offset 320;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------
Limit (cost=2857.15..2857.17 rows=10 width=49) (actual time=0.017..0.017 rows=0 loops=1)
Output: id, c1, c2, c3, c4
Buffers: shared hit=3
-> Sort (cost=2856.35..2862.60 rows=2500 width=49) (actual time=0.016..0.016 rows=0 loops=1)
Output: id, c1, c2, c3, c4
Sort Key: aa.c4 DESC
Sort Method: quicksort Memory: 25kB
Buffers: shared hit=3
-> Bitmap Heap Scan on public.aa (cost=28.17..2739.27 rows=2500 width=49) (actual time=0.013..0.013 rows=0 loops=1)
Output: id, c1, c2, c3, c4
Recheck Cond: (aa.c3 @> '{-1}'::integer[])
Buffers: shared hit=3
-> Bitmap Index Scan on idx_c3 (cost=0.00..27.55 rows=2500 width=0) (actual time=0.011..0.011 rows=0 loops=1)
Index Cond: (aa.c3 @> '{-1}'::integer[])
Buffers: shared hit=3
Planning Time: 0.197 ms
Execution Time: 0.045 ms
(17 rows)
There is also a more powerful method by forcibly using the index that orders by c4. In other words, forcibly execute a where condition in the c4 index. Then, common queries without the where condition cannot use this index.
create table aa (id int, c1 int, c2 int, c3 int[], c4 timestamp);
insert into aa select generate_series(1,10000000), random()*100, random()*1000, array[(random()*100)::int, (random()*1000)::int], clock_timestamp();
create index idx_c3 on aa using gin (c3);
create index idx_c4 on aa (c4) where c4 >= '1970-01-01';
-- 即使不收集统计信息, 也不使用c4索引了
explain (analyze,verbose,timing,costs,buffers) select * from aa where c3 @> array[-1] order by c4 desc limit 10 offset 320;
postgres=> explain (analyze,verbose,timing,costs,buffers) select * from aa where c3 @> array[-1] order by c4 desc limit 10 offset 320;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=43941.49..43942.69 rows=10 width=52) (actual time=22.029..22.029 rows=0 loops=1)
Output: id, c1, c2, c3, c4
Buffers: shared hit=3
-> Gather Merge (cost=43903.17..49889.91 rows=50000 width=52) (actual time=22.027..27.849 rows=0 loops=1)
Output: id, c1, c2, c3, c4
Workers Planned: 4
Workers Launched: 4
Buffers: shared hit=31
-> Sort (cost=42903.12..42934.37 rows=12500 width=52) (actual time=0.070..0.070 rows=0 loops=5)
Output: id, c1, c2, c3, c4
Sort Key: aa.c4 DESC
Sort Method: quicksort Memory: 25kB
Worker 0: Sort Method: quicksort Memory: 25kB
Worker 1: Sort Method: quicksort Memory: 25kB
Worker 2: Sort Method: quicksort Memory: 25kB
Worker 3: Sort Method: quicksort Memory: 25kB
Buffers: shared hit=31
Worker 0: actual time=0.047..0.047 rows=0 loops=1
Buffers: shared hit=7
Worker 1: actual time=0.050..0.050 rows=0 loops=1
Buffers: shared hit=7
Worker 2: actual time=0.049..0.050 rows=0 loops=1
Buffers: shared hit=7
Worker 3: actual time=0.058..0.058 rows=0 loops=1
Buffers: shared hit=7
-> Parallel Bitmap Heap Scan on public.aa (cost=409.50..42317.72 rows=12500 width=52) (actual time=0.040..0.040 rows=0 loops=5)
Output: id, c1, c2, c3, c4
Recheck Cond: (aa.c3 @> '{-1}'::integer[])
Buffers: shared hit=3
Worker 0: actual time=0.013..0.013 rows=0 loops=1
Worker 1: actual time=0.013..0.013 rows=0 loops=1
Worker 2: actual time=0.012..0.012 rows=0 loops=1
Worker 3: actual time=0.020..0.020 rows=0 loops=1
-> Bitmap Index Scan on idx_c3 (cost=0.00..397.00 rows=50000 width=0) (actual time=0.008..0.008 rows=0 loops=1)
Index Cond: (aa.c3 @> '{-1}'::integer[])
Buffers: shared hit=3
Planning Time: 0.162 ms
Execution Time: 27.878 ms
(38 rows)
-- 带where条件的可以走索引, 不带不走
explain (analyze,verbose,timing,costs,buffers) select * from aa where c4 >= '1970-01-01' order by c4 desc limit 10 offset 320;
postgres=> explain (analyze,verbose,timing,costs,buffers) select * from aa where c4 >= '1970-01-01' order by c4 desc limit 10 offset 320;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=17.09..17.61 rows=10 width=52) (actual time=0.107..0.110 rows=10 loops=1)
Output: id, c1, c2, c3, c4
Buffers: shared hit=5 read=4
I/O Timings: read=0.011
-> Index Scan Backward using idx_c4 on public.aa (cost=0.43..173457.83 rows=3333333 width=52) (actual time=0.021..0.084 rows=330 loops=1)
Output: id, c1, c2, c3, c4
Buffers: shared hit=5 read=4
I/O Timings: read=0.011
Planning Time: 0.163 ms
Execution Time: 0.124 ms
(10 rows)
explain (analyze,verbose,timing,costs,buffers) select * from aa order by c4 desc limit 10 offset 320;
postgres=> explain (analyze,verbose,timing,costs,buffers) select * from aa order by c4 desc limit 10 offset 320;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=217794.43..217795.63 rows=10 width=49) (actual time=686.640..686.644 rows=10 loops=1)
Output: id, c1, c2, c3, c4
Buffers: shared hit=19346
-> Gather Merge (cost=217755.73..1426997.81 rows=9999950 width=49) (actual time=686.541..706.945 rows=330 loops=1)
Output: id, c1, c2, c3, c4
Workers Planned: 5
Workers Launched: 5
Buffers: shared hit=103313
-> Sort (cost=216755.65..221755.63 rows=1999990 width=49) (actual time=669.873..669.897 rows=291 loops=6)
Output: id, c1, c2, c3, c4
Sort Key: aa.c4 DESC
Sort Method: top-N heapsort Memory: 117kB
Worker 0: Sort Method: top-N heapsort Memory: 117kB
Worker 1: Sort Method: top-N heapsort Memory: 117kB
Worker 2: Sort Method: top-N heapsort Memory: 117kB
Worker 3: Sort Method: top-N heapsort Memory: 117kB
Worker 4: Sort Method: top-N heapsort Memory: 117kB
Buffers: shared hit=103313
Worker 0: actual time=653.603..653.630 rows=330 loops=1
Buffers: shared hit=16563
Worker 1: actual time=662.200..662.228 rows=330 loops=1
Buffers: shared hit=16725
Worker 2: actual time=667.136..667.163 rows=330 loops=1
Buffers: shared hit=16757
Worker 3: actual time=672.997..673.024 rows=330 loops=1
Buffers: shared hit=16895
Worker 4: actual time=676.989..677.017 rows=330 loops=1
Buffers: shared hit=17027
-> Parallel Seq Scan on public.aa (cost=0.00..123092.90 rows=1999990 width=49) (actual time=0.014..266.190 rows=1666667 loops=6)
Output: id, c1, c2, c3, c4
Buffers: shared hit=103093
Worker 0: actual time=0.011..265.481 rows=1602052 loops=1
Buffers: shared hit=16516
Worker 1: actual time=0.014..269.871 rows=1617766 loops=1
Buffers: shared hit=16678
Worker 2: actual time=0.013..273.440 rows=1620849 loops=1
Buffers: shared hit=16710
Worker 3: actual time=0.012..275.377 rows=1634256 loops=1
Buffers: shared hit=16848
Worker 4: actual time=0.026..276.290 rows=1648515 loops=1
Buffers: shared hit=16995
Planning Time: 0.161 ms
Execution Time: 707.004 ms
(43 rows)
Alibaba Clouder - January 9, 2018
digoal - September 6, 2019
digoal - June 26, 2019
ApsaraDB - August 8, 2023
ApsaraDB - July 4, 2022
digoal - June 26, 2019
Alibaba Cloud PolarDB for PostgreSQL is an in-house relational database service 100% compatible with PostgreSQL and highly compatible with the Oracle syntax.
Learn MoreAn online MPP warehousing service based on the Greenplum Database open source program
Learn MoreAn on-demand database hosting service for PostgreSQL with automated monitoring, backup and disaster recovery capabilities
Learn MoreLeverage cloud-native database solutions dedicated for FinTech.
Learn MoreMore Posts by digoal