PostgreSQL Optimized Case – Where a Field Range Order by B Field Sort Limit X

By digoal

Background Information

The goal of optimization is to minimize computing, IO, filter, recheck, etc. For example, the following query is difficult to optimize at first glance:

select * from tbl where ts between x and y   
order by z limit x;

The query condition of the ts field is a wide range with a large scale, which gives many results. A large number of sort records are displayed sorting by z.
Records that meet the ts criteria are returned in the z-field, and qualified results are ranked behind. Therefore, a large number of filters are required.

Because ts and z are two fields, index (ts,z) or index (z,ts) cannot be used to completely perform indexing.

Optimization

First, let’s understand the background. z is pk. z and ts have linear correlation (positive correlation or negative correlation), and both have directionality, such as increasing or decreasing. Therefore, this query can be modified.

select * from tbl where ts between x and y
and z?
order by z limit x;

n records are queried each time, and all records meeting the conditions of ts between x and y are successively filtered and processed.

Optimization Ideas:

1. Supplement the condition of z as the initial condition, so there is no need to filter too much.

select min(z), max(z) into id1,id2 from tbl where ts between x and y;  
  
select * from tbl where ts between x and y   
and z>=id1 and z<=id2   
order by z desc limit x;   
  
min(z) into id2

2. Each time a batch of records is found, the id is passed in as a new condition, so there is no need to filter too much every time.

select * from tbl where ts between x and y and z<id2 order by z desc limit x;   
  
min(z) into id2

Example

create table tbl (id int primary key, info text, crt_time timestamp);  
insert into tbl select generate_series(1,100000000), md5(random()::text) , clock_timestamp();

Original SQL statement:

postgres=# select min(crt_time), max(crt_time) from tbl;  
            min             |            max               
----------------------------+----------------------------  
 2020-07-10 14:04:46.600332 | 2020-07-10 14:08:06.737822  
(1 row)  
  
select * from tbl where crt_time between '2020-07-10 14:05:45' and '2020-07-10 14:07:00'  
order by id limit 100;  
  
postgres=# explain (analyze,verbose,timing,costs,buffers) select * from tbl where crt_time between '2020-07-10 14:05:45' and '2020-07-10 14:07:00'  
order by id limit 100;  
                                                                           QUERY PLAN                                                                              
-----------------------------------------------------------------------------------------------------------------------------------------------------------------  
 Limit  (cost=0.57..9.22 rows=100 width=45) (actual time=4003.046..4003.071 rows=100 loops=1)  
   Output: id, info, crt_time  
   Buffers: shared hit=288650 read=63193  
   ->  Index Scan using tbl_pkey on public.tbl  (cost=0.57..3208769.85 rows=37078282 width=45) (actual time=4003.045..4003.062 rows=100 loops=1)  
         Output: id, info, crt_time  
         Filter: ((tbl.crt_time >= '2020-07-10 14:05:45'::timestamp without time zone) AND (tbl.crt_time <= '2020-07-10 14:07:00'::timestamp without time zone))  
         Rows Removed by Filter: 29130419  
         Buffers: shared hit=288650 read=63193  
 Planning Time: 0.093 ms  
 Execution Time: 4003.096 ms  
(10 rows)  
  
-- 大量filter

Optimize the initial SQL:

select min(id),max(id) from tbl where crt_time between '2020-07-10 14:05:45' and '2020-07-10 14:07:00';  
   min    |   max      
----------+----------  
 29130420 | 66423729  
(1 row)  
  
select * from tbl where crt_time between '2020-07-10 14:05:45' and '2020-07-10 14:07:00'  
and id>=29130420 and id<=66423729   
order by id desc limit 100;   
  
postgres=# explain (analyze,verbose,timing,costs,buffers) select * from tbl where crt_time between '2020-07-10 14:05:45' and '2020-07-10 14:07:00'  
and id>=29130420 and id<=66423729   
order by id desc limit 100;   
                                                                           QUERY PLAN                                                                              
-----------------------------------------------------------------------------------------------------------------------------------------------------------------  
 Limit  (cost=0.57..10.57 rows=100 width=45) (actual time=0.022..0.051 rows=100 loops=1)  
   Output: id, info, crt_time  
   Buffers: shared hit=6  
   ->  Index Scan Backward using tbl_pkey on public.tbl  (cost=0.57..1374883.82 rows=13745329 width=45) (actual time=0.021..0.043 rows=100 loops=1)  
         Output: id, info, crt_time  
         Index Cond: ((tbl.id >= 29130420) AND (tbl.id <= 66423729))  
         Filter: ((tbl.crt_time >= '2020-07-10 14:05:45'::timestamp without time zone) AND (tbl.crt_time <= '2020-07-10 14:07:00'::timestamp without time zone))  
         Buffers: shared hit=6  
 Planning Time: 0.121 ms  
 Execution Time: 0.068 ms  
(10 rows)

Optimize SQL statements for batch query:

select * from tbl where   
crt_time between '2020-07-10 14:05:45' and '2020-07-10 14:07:00'  
and id<66423630  
order by id desc limit 100;   
  
select * from tbl where   
crt_time between '2020-07-10 14:05:45' and '2020-07-10 14:07:00'  
and id<66423530  
order by id desc limit 100;

                                                                           QUERY PLAN                                                                              
-----------------------------------------------------------------------------------------------------------------------------------------------------------------  
 Limit  (cost=0.57..9.90 rows=100 width=45) (actual time=0.042..0.074 rows=100 loops=1)  
   Output: id, info, crt_time  
   Buffers: shared hit=8  
   ->  Index Scan Backward using tbl_pkey on public.tbl  (cost=0.57..2298576.97 rows=24640916 width=45) (actual time=0.041..0.065 rows=100 loops=1)  
         Output: id, info, crt_time  
         Index Cond: (tbl.id < 66423530)  
         Filter: ((tbl.crt_time >= '2020-07-10 14:05:45'::timestamp without time zone) AND (tbl.crt_time <= '2020-07-10 14:07:00'::timestamp without time zone))  
         Buffers: shared hit=8  
 Planning Time: 0.107 ms  
 Execution Time: 0.097 ms  
(10 rows)

It has 40000 times better performance.

Community

PostgreSQL Optimized Case – Where a Field Range Order by B Field Sort Limit X

Background Information

Optimization

Example

Read previous post:

Read next post:

digoal

You may also like

Comments

digoal

Related Products

PolarDB for PostgreSQL

AnalyticDB for PostgreSQL

ApsaraDB RDS for PostgreSQL

Database for FinTech Solution