By digoal
When there are multiple range conditions, the database can only use one range condition. Other range conditions cannot converge accurately based on composite indexes.
The following ts and col1 are converted into range conditions:
create unlogged table tbl (id int, col1 text, ts timestamp);
insert into tbl select random()*1000000, random()::text, now()+(random()*1000||' hour')::interval from generate_series(1,10000000);
select * from tbl
where ts >= $1 and ts <= $2
and col1 like '0.1%';
select * from tbl
where ts >= '2021-05-21 20:00:00' and ts <= '2021-05-28 21:00:00'
and col1 like '0.1%';
Normally, we will create the following indexes for the database to select one index:
If lc_collate=C, you can use:
postgres=# create index idx_tbl_2 on tbl(col1);
CREATE INDEX
Time: 5713.425 ms (00:05.713)
postgres=# create index idx_tbl_3 on tbl(ts);
CREATE INDEX
Time: 4155.860 ms (00:04.156)
Even if you create a composite index of ts and col1, it doesn’t work. As explained earlier, the second range field cannot be filtered through the index.
The query performance is listed below:
postgres=# explain select count(*) from tbl where col1 like '0.1%' and ts >= '2021-05-21 20:00:00' and ts <= '2021-05-28 21:00:00';
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=114785.25..114785.26 rows=1 width=8)
-> Index Scan using idx_tbl_2 on tbl (cost=0.56..114353.77 rows=172590 width=0)
Index Cond: ((col1 >= '0.1'::text) AND (col1 < '0.2'::text))
Filter: ((col1 ~~ '0.1%'::text) AND (ts >= '2021-05-21 20:00:00'::timestamp without time zone) AND (ts <= '2021-05-28 21:00:00'::timestamp without time zone))
(4 rows)
Time: 0.674 ms
postgres=# select count(*) from tbl where col1 like '0.1%' and ts >= '2021-05-21 20:00:00' and ts <= '2021-05-28 21:00:00';
count
--------
169344
(1 row)
Time: 4087.236 ms (00:04.087)
Time is stepped and can be segmented by day to create an expression index. The purpose of this is to eliminate a range so the index skip scan can be used to accelerate in the leading query, and the second range becomes an accurate index to converge.
create or replace function im_to_char (timestamp,text) returns text as $$
select to_char($1,$2);
$$ language sql strict immutable parallel safe;
create or replace function im_to_char (timestamptz,text) returns text as $$
select to_char($1,$2);
$$ language sql strict immutable parallel safe;
create or replace function im_to_char (date,text) returns text as $$
select to_char($1,$2);
$$ language sql strict immutable parallel safe;
create index idx_t_1 on tbl (im_to_char(ts,'yyyymmdd'), col1 text_pattern_ops);
If lc_collate=C, you can use:
create index idx_t_1 on tbl (im_to_char(ts,'yyyymmdd'), col1);
SQL rewrite is listed below:
select * from tbl
where im_to_char(ts,'yyyymmdd') = any(array(
select im_to_char(ts::date+i,'yyyymmdd') from generate_series(0,$2::date-$1::date+1) i
))
and col1 like '0.1%'
and ts >= $1 and ts <= $2;
Get SQL like this:
select count(*) from tbl
where im_to_char(ts,'yyyymmdd') = any(array['20210521','20210522','20210523','20210524','20210525','20210526','20210527','20210528'])
and col1 like '0.1%'
and ts >= '2021-05-21 20:00:00' and ts <= '2021-05-28 21:00:00';
As seen from the execution plan, this composite index has been used, and index cond is used for the like and segmentation conditions we want. Filter the exact time and like in the filter:
postgres=# explain select count(*) from tbl where im_to_char(ts,'yyyymmdd') = any(array['20210521','20210522','20210523','20210524','20210525','20210526','20210527','20210528']) and col1 like '0.1%' and ts >= '2021-05-21 20:00:00' and ts <= '2021-05-28 21:00:00';
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=36081.60..36081.61 rows=1 width=8)
-> Index Scan using idx_t_1 on tbl (cost=0.56..36064.35 rows=6900 width=0)
Index Cond: ((im_to_char(ts, 'yyyymmdd'::text) = ANY ('{20210521,20210522,20210523,20210524,20210525,20210526,20210527,20210528}'::text[])) AND (col1 >= '0.1'::text) AND (col1 < '0.2'::text))
Filter: ((col1 ~~ '0.1%'::text) AND (ts >= '2021-05-21 20:00:00'::timestamp without time zone) AND (ts <= '2021-05-28 21:00:00'::timestamp without time zone))
(4 rows)
Time: 0.883 ms
Time consumption is reduced significantly.
postgres=# select count(*) from tbl
postgres-# where im_to_char(ts,'yyyymmdd') = any(array['20210521','20210522','20210523','20210524','20210525','20210526','20210527','20210528'])
postgres-# and col1 like '0.1%'
postgres-# and ts >= '2021-05-21 20:00:00' and ts <= '2021-05-28 21:00:00';
count
--------
169344
(1 row)
Time: 680.602 ms
digoal - March 20, 2019
digoal - June 26, 2019
ApsaraDB - July 20, 2021
ApsaraDB - August 12, 2020
digoal - January 19, 2021
digoal - April 12, 2019
Alibaba Cloud PolarDB for PostgreSQL is an in-house relational database service 100% compatible with PostgreSQL and highly compatible with the Oracle syntax.
Learn MoreAn online MPP warehousing service based on the Greenplum Database open source program
Learn MoreAn on-demand database hosting service for SQL Server with automated monitoring, backup and disaster recovery capabilities
Learn MoreAn on-demand database hosting service for PostgreSQL with automated monitoring, backup and disaster recovery capabilities
Learn MoreMore Posts by digoal
Dikky Ryan Pratama June 23, 2023 at 8:46 am
awesome!