By Digoal
Recently, a friend in the community of Alibaba Cloud developers complained that their database did not show very good performance when processing thousands of concurrent IP address library queries in MySQL. So I asked him for his IP address library data, SQL query statements, and the table structure in MySQL. I told him that I was going to transfer the data into PostgreSQL and perform stress testing to see the query performance in PostgreSQL.
This issue and its related requirement - the need to process thousands of concurrent IP address library queries in MySQL - is also common in many business scenarios, such as calculations involving age ranges, income ranges, frequent activity ranges, geo location chunks, geometric chunks, and line segments. In reality, all these scenarios have the same description dimension. It's scope. With the development of the Internet of Things (IoT), this type of query will become more and more common in the years to come.
However, a lack of good indexing mechanisms will lead to heavy CPU usage and performance bottlenecks. This article describes the range types introduced in PostgreSQL 9.2 and indexes for range types that can significantly improve query performance.
For a range type to be considered a range type, it must contain a subtype. For example, the range of int4 is called int4range. Next, to use int4range to represent 1, 2, 3, 4, and 5, we can use '(0,6)'::int4range, '[1,6)'::int4range, '[1,5]'::int4range or '(0,5]'::int4range
.
postgres=# select '(0,6)'::int4range;
int4range
-----------
[1,6)
(1 row)
postgres=# select '[1,6)'::int4range;
int4range
-----------
[1,6)
(1 row)
postgres=# select '[1,5]'::int4range;
int4range
-----------
[1,6)
(1 row)
postgres=# select '(0,5]'::int4range;
int4range
-----------
[1,6)
(1 row)
Next, sparse range types use [)
to represent a range of elements. In this, the bracket indicates inclusion and a parenthesis indicates exclusion. Sparse range types must have canonical functions defined, which converts the storage format to [). And for a sparse range type, we know what the previous value and the following value of a specific value. For example, the value 1 in an int range is preceded by 0 and followed by 2. If this is a numeric range, we do not the value before 1 (0.9999999999... till infinite) and the value after 1 (1.00000000... 1). However, continuous range types store accurate elements, for example:
postgres=# select '(0,5]'::numrange;
numrange
----------
(0,5]
(1 row)
postgres=# select '[0,5]'::numrange;
numrange
----------
[0,5]
(1 row)
postgres=# select '[0,5)'::numrange;
numrange
----------
[0,5)
(1 row)
postgres=# select '(0,5)'::numrange;
numrange
----------
(0,5)
(1 row)
PostgreSQL provides several useful features for range types, such as the inclusion, exclusion, and intersection. Specifically, PostgreSQL comes with the following built-in range types:
INT4RANGE — Range of INTEGER
INT8RANGE — Range of BIGINT
NUMRANGE — Range of NUMERIC
TSRANGE — Range of TIMESTAMP WITHOUT TIME ZONE
TSTZRANGE — Range of TIMESTAMP WITH TIME ZONE
DATERANGE — Range of DATE
The following can be found in the system table.
digoal=# select oid, typname from pg_type where typname ~ 'range';
oid | typname
-------+------------
3904 | int4range
3905 | _int4range
3906 | numrange
3907 | _numrange
3908 | tsrange
3909 | _tsrange
3910 | tstzrange
3911 | _tstzrange
3912 | daterange
3913 | _daterange
3926 | int8range
3927 | _int8range
3831 | anyrange
11026 | pg_range
Let's see which functions are related to anyrange
.
digoal=# select proname,proargtypes from pg_proc where proargtypes::text ~ '3831';
proname | proargtypes
-------------------------+----------------------
anyrange_out | 3831
range_out | 3831
range_send | 3831
lower | 3831 -- the lower bound of a range. Sparse types and continuous types are different in this regard. See the example section
upper | 3831 -- the upper bound of a range. Sparse types and continuous types are different in this regard. See the example section
isempty | 3831 -- indicates if a range does not contain any elements
lower_inc | 3831 -- indicates that the lower bound is inclusive. Sparse types and continuous types are different in this regard. See the example section
upper_inc | 3831 -- indicates that the upper bound is inclusive. Sparse types and continuous types are different in this regard. See the example section
lower_inf | 3831 -- indicates that the lower bound is an infinitely small value. (Note that this infinite value is never the value of the subtype of the range. It means that the lower bound is not defined. See the example section.)
upper_inf | 3831 -- indicates that the upper bound is an infinitely large value. (Note that this infinite value is never the value of the subtype of the range. It means that the upper bound is not defined. See the example section.)
range_eq | 3831 3831
range_ne | 3831 3831
range_overlaps | 3831 3831
range_contains_elem | 3831 2283
range_contains | 3831 3831
elem_contained_by_range | 2283 3831
range_contained_by | 3831 3831
range_adjacent | 3831 3831
range_before | 3831 3831
range_after | 3831 3831
range_overleft | 3831 3831
range_overright | 3831 3831
range_union | 3831 3831
range_intersect | 3831 3831
range_minus | 3831 3831
range_cmp | 3831 3831
range_lt | 3831 3831
range_le | 3831 3831
range_ge | 3831 3831
range_gt | 3831 3831
range_gist_consistent | 2281 3831 23 26 2281
range_gist_same | 3831 3831 2281
hash_range | 3831
The following are the related operators:
postgres=# select oprname from pg_operator where oprleft=3831 or oprright=3831;
oprname
---------
=
<>
<
<=
>=
>
&&
@>
@>
<@
<@
<<
>>
&<
&>
-|-
+
-
*
To understand the built-in range types and how they work. Follow these steps:
1. Create a test table.
digoal=# CREATE TABLE reservation ( room int, during TSRANGE );
2. Insert a range of test data of the subtype timestamp
.
digoal=# INSERT INTO reservation VALUES
digoal-# ( 1108, '[2010-01-01 14:30, 2010-01-01 15:30)' );
INSERT 0 1
Note that @>
determines the inclusivity.
digoal=# SELECT int4range(10, 20) @> 3;
? column?
----------
f
(1 row)
Also note that && judges
specifies whether two ranges have overlaps.
digoal=# SELECT numrange(11.1, 22.2) && numrange(20.0, 30.0);
? column?
----------
t
(1 row)
Note that upper
finds the upper bound, and that int8range
is also a sparse range.
digoal=# SELECT upper(int8range(15, 25));
upper
-------
25
(1 row)
3. Find the upper bound of the continuous range numrange
. The result is consistent with that of the previous range.
digoal=# SELECT upper(numrange(15, 25));
upper
-------
25
(1 row)
To put it simply, the output of a sparse range is an exclusive upper bound. In other words, they are the values of the upper and lower bounds in the [) range pattern.
digoal=# SELECT upper('(15,25)'::int8range);
upper
-------
25
(1 row)
digoal=# SELECT upper('(15,25]'::int8range);
upper
-------
26
(1 row)
For continuous ranges, the output depends on the range input.
digoal=# SELECT upper('(15,25]'::numrange);
upper
-------
25
(1 row)
digoal=# SELECT upper('(15,25)'::numrange);
upper
-------
25
(1 row)
An askerisk (*) returns the output of the intersection of two ranges.
digoal=# SELECT int4range(10, 20) * int4range(15, 25);
? column?
----------
[15,20)
(1 row)
The isempty
function indicates if a range is empty.
digoal=# SELECT isempty(numrange(1, 5));
isempty
---------
f
(1 row)
The following examples are used to explain infinitely large elements. Below, the following is a time range from now to an infinitely large value.
digoal=# SELECT '(now,)'::tsrange;
tsrange
---------------------------------
("2012-05-17 16:32:43.055233",)
(1 row)
And the following, here, is a time range from an infinitely small value to an infinitely large value.
digoal=# SELECT '(,)'::tsrange;
tsrange
---------
(,)
(1 row)
Last, the following shows a time range from an infinitely value to now.
digoal=# SELECT '(,now)'::tsrange;
tsrange
---------------------------------
(,"2012-05-17 16:32:55.800172")
(1 row)
Range values entered must follow one of the example formats below:
(lower-bound,upper-bound)
(lower-bound,upper-bound]
[lower-bound,upper-bound)
[lower-bound,upper-bound]
empty
For example, the following range is empty. It does not contain any elements.
digoal=# SELECT 'empty'::tsrange;
tsrange
---------
empty
(1 row)
Each range type has a constructor function with the same name as the range type. The constructor function can accept three arguments: lower bound
, upper bound
, and boundary pattern (with one of the strings "()", "(]", "[)", or "[]")
. For example, the constructor function of the int4range
type is also called int4range
.
digoal=# select int4range(1,2,'()');
int4range
-----------
empty
(1 row)
digoal=# select int4range(1,2,'(]');
int4range
-----------
[2,3)
(1 row)
digoal=# select int4range(null,2,'(]');
int4range
-----------
(,3)
(1 row)
Defining a new range type will also create a constructor function with the same name as that new range type. Consider the following example:
digoal=# create type iprange as range (subtype=inet);
CREATE TYPE
digoal=# select iprange('1.1.1.1'::inet,null);
iprange
------------
[1.1.1.1,)
(1 row)
Let's see the example syntax that creates a range type:
CREATE TYPE name AS RANGE (
SUBTYPE = subtype
[ , SUBTYPE_OPCLASS = subtype_operator_class ]
[ , COLLATION = collation ]
[ , CANONICAL = canonical_function ]
[ , SUBTYPE_DIFF = subtype_diff_function ]
)
In the above syntax, subtype_diff_function
improves the GiST index for better query performance, and canonical_function
defines sparse range types. Next, we can use a GiST index
on a range type to accelerate queries in some specific scenarios.
A GiST index can accelerate queries involving these range operators: =, &&, <@, @>, <<, >>, -|-, &<, and &>
Note that B-tree or hash indexes are not suitable for range types.
digoal=# CREATE INDEX reservation_idx ON reservation USING gist (during);
CREATE INDEX
digoal=# \d reservation
Table "public.reservation"
Column | Type | Modifiers
--------+---------+-----------
room | integer |
during | tsrange |
Indexes:
"reservation_idx" gist (during)
digoal=# insert into reservation values (1,'(,now)'::tsrange);
INSERT 0 1
digoal=# select * from reservation ;
room | during
------+-----------------------------------------------
1108 | ["2010-01-01 14:30:00","2010-01-01 15:30:00")
1 | (,"2012-05-17 16:49:13.40783")
(2 rows)
digoal=# explain select * from reservation where during @> '[now,now]'::tsrange;
QUERY PLAN
----------------------------------------------------------------------------------------------
Seq Scan on reservation (cost=0.00.. 1.01 rows=1 width=36)
Filter: (during @> '["2012-05-17 16:50:18.794268","2012-05-17 16:50:18.794268"]'::tsrange)
(2 rows)
This example has a too small number of records and the query is performed without using the index. Next we can force a query with the index. Consider the following code.
digoal=# set enable_seqscan=off;
SET
digoal=# explain select * from reservation where during @> '[now,now]'::tsrange;
QUERY PLAN
--------------------------------------------------------------------------------------------------
Index Scan using reservation_idx on reservation (cost=0.00.. 8.27 rows=1 width=36)
Index Cond: (during @> '["2012-05-17 16:50:59.716661","2012-05-17 16:50:59.716661"]'::tsrange)
(2 rows)
The EXCLUDE constraint is also mentioned in the PostgreSQL DBA2000 training material. The following is an example of the EXCLUDE constraint in a range:
digoal=# delete from reservation ;
DELETE 2
The following constraint prevents any overlapping time values from existing in the during field at the same time.
digoal=# ALTER TABLE reservation
digoal-# ADD EXCLUDE USING gist (during WITH &&);
NOTICE: ALTER TABLE / ADD EXCLUDE will create implicit index "reservation_during_excl" for table "reservation"
ALTER TABLE
digoal=# INSERT INTO reservation VALUES
( 1108, '[2010-01-01 11:30, 2010-01-01 13:00)' );
INSERT 0 1
The insert operation is a failure due to the time conflicts (or overlaps), indicating that the constraint is effective.
digoal=# INSERT INTO reservation VALUES
( 1108, '[2010-01-01 11:45, 2010-01-01 15:45)' );
ERROR: conflicting key value violates exclusion constraint "reservation_during_excl"
DETAIL: Key (during)=(["2010-01-01 11:45:00","2010-01-01 15:45:00")) conflicts with existing key (during)=(["2010-01-01 11:30:00","2010-01-01 13:00:00")).
STATEMENT: INSERT INTO reservation VALUES
( 1108, '[2010-01-01 11:45, 2010-01-01 15:45)' );
ERROR: conflicting key value violates exclusion constraint "reservation_during_excl"
DETAIL: Key (during)=(["2010-01-01 11:45:00","2010-01-01 15:45:00")) conflicts with existing key (during)=(["2010-01-01 11:30:00","2010-01-01 13:00:00")).
You can also install the btree_gist module to enhance the EXCLUDE constraint. Because currently I have not installed the btree_gist module, I have failed to use the GiST index on the column of the int range type.
A GiST index cannot be created on the column of the int range type. The following SQL statement returns an error:
digoal=# ALTER TABLE reservation
ADD EXCLUDE USING gist (room WITH =, during WITH &&);
ERROR: data type integer has no default operator class for access method "gist"
HINT: You must specify an operator class for the index or define a default operator class for the data type.
STATEMENT: ALTER TABLE reservation
ADD EXCLUDE USING gist (room WITH =, during WITH &&);
ERROR: data type integer has no default operator class for access method "gist"
HINT: You must specify an operator class for the index or define a default operator class for the data type.
Retry the preceding SQL statement after the btree_gist module is loaded. The statement runs successfully.
digoal=# create extension btree_gist;
CREATE EXTENSION
digoal=# ALTER TABLE reservation
ADD EXCLUDE USING gist (room WITH =, during WITH &&);
NOTICE: ALTER TABLE / ADD EXCLUDE will create implicit index "reservation_room_during_excl" for table "reservation"
ALTER TABLE
The constraint rejects records with the same room number and overlapping during field values. Consider the following example:
digoal=# CREATE TABLE room_reservation
digoal-# (
digoal(# room TEXT,
digoal(# during TSRANGE,
digoal(# EXCLUDE USING gist (room WITH =, during WITH &&)
digoal(# );
NOTICE: CREATE TABLE / EXCLUDE will create implicit index "room_reservation_room_during_excl" for table "room_reservation"
CREATE TABLE
digoal=# INSERT INTO room_reservation VALUES
digoal-# ( '123A', '[2010-01-01 14:00, 2010-01-01 15:00)' );
INSERT 0 1
digoal=# INSERT INTO room_reservation VALUES
digoal-# ( '123A', '[2010-01-01 14:30, 2010-01-01 15:30)' );
ERROR: conflicting key value violates exclusion constraint "room_reservation_room_during_excl"
DETAIL: Key (room, during)=(123A, ["2010-01-01 14:30:00","2010-01-01 15:30:00")) conflicts with existing key (room, during)=(123A, ["2010-01-01 14:00:00","2010-01-01 15:00:00")).
STATEMENT: INSERT INTO room_reservation VALUES
( '123A', '[2010-01-01 14:30, 2010-01-01 15:30)' );
ERROR: conflicting key value violates exclusion constraint "room_reservation_room_during_excl"
DETAIL: Key (room, during)=(123A, ["2010-01-01 14:30:00","2010-01-01 15:30:00")) conflicts with existing key (room, during)=(123A, ["2010-01-01 14:00:00","2010-01-01 15:00:00")).
digoal=# INSERT INTO room_reservation VALUES
digoal-# ( '123B', '[2010-01-01 14:30, 2010-01-01 15:30)' );
INSERT 0 1
btree_gist
supports the following types:
int2, int4, int8, float4, float8, numeric, timestamp with time zone, timestamp without time zone, time with time zone, time without time zone, date, interval, oid, money, char, varchar, text, bytea, bit, varbit, macaddr, inet, and cidr.
An Application Scenario of Range Types
Consider the following scenario, you want to use an IP address to locate the region where that specific IP address is. Now, assume that you will use iprange
to store IPs and their corresponding region names. And then assume that you need to find the region of an IP address that a user has submitted. To do this, you need to additionally follow these steps:
1. Create an iprange
.
digoal=# create type iprange as range (subtype=inet);
CREATE TYPE
2. Create a test table.
digoal=# create table ip_info (id serial primary key,iprange iprange,location text);
NOTICE: CREATE TABLE will create implicit sequence "ip_info_id_seq" for serial column "ip_info.id"
NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "ip_info_pkey" for table "ip_info"
CREATE TABLE
3. Create an EXCLUDE constraint. Note that I first need to install the btree_gist module because the GiST index used here is of type Text. Otherwise, the creation would fail.
digoal=# alter table ip_info add constraint ck_exclude_iprange exclude using gist(location with =, iprange with &&);
NOTICE: ALTER TABLE / ADD EXCLUDE will create implicit index "ck_exclude_iprange" for table "ip_info"
ALTER TABLE
4. Insert test data:
digoal=# insert into ip_info (iprange,location) values (iprange('192.168.1.0'::inet,'192.168.1.10'::inet,'[]'),'Beijing');
INSERT 0 1
digoal=# insert into ip_info (iprange,location) values (iprange('192.168.1.11'::inet,'192.168.1.20'::inet,'[]'),'Shanghai');
INSERT 0 1
digoal=# insert into ip_info (iprange,location) values (iprange('192.168.1.21'::inet,'192.168.1.30'::inet,'[]'),'Nanjing');
INSERT 0 1
digoal=# insert into ip_info (iprange,location) values (iprange('192.168.1.31'::inet,'192.168.1.40'::inet,'[]'),'Hangzhou');
INSERT 0 1
digoal=# insert into ip_info (iprange,location) values (iprange('192.168.1.41'::inet,'192.168.1.50'::inet,'[]'),'Nanchang');
INSERT 0 1
digoal=# insert into ip_info (iprange,location) values (iprange('192.168.1.51'::inet,'192.168.1.60'::inet,'[]'),'Guangzhou');
INSERT 0 1
digoal=# insert into ip_info (iprange,location) values (iprange('192.168.1.61'::inet,'192.168.1.70'::inet,'[]'),'Chongqing');
INSERT 0 1
digoal=# insert into ip_info (iprange,location) values (iprange('192.168.1.71'::inet,'192.168.1.80'::inet,'[]'),'Hong Kong');
INSERT 0 1
5. View the current table structure.
digoal=# \d ip_info
Table "public.ip_info"
Column | Type | Modifiers
----------+---------+------------------------------------------------------
id | integer | not null default nextval('ip_info_id_seq'::regclass)
iprange | iprange |
location | text |
Indexes:
"ip_info_pkey" PRIMARY KEY, btree (id)
"ck_exclude_iprange" EXCLUDE USING gist (location WITH =, iprange WITH &&)
6. Perform a test query.
digoal=# select * from ip_info where iprange @> '192.168.1.1'::inet;
id | iprange | location
----+----------------------------+----------
1 | [192.168.1.0,192.168.1.10] | 北京
(1 row)
7. View the query execution plan.
digoal=# explain select * from ip_info where iprange @> '192.168.1.1'::inet;
QUERY PLAN
-----------------------------------------------------------------------------------
Index Scan using ck_exclude_iprange on ip_info (cost=0.00.. 8.27 rows=1 width=68)
Index Cond: (iprange @> '192.168.1.1'::inet)
(2 rows)
You can use features similar to thoses in the temporal module in versions of PostgreSQL earlier than version 9.1. For more information, see the following blogs:
Optimizing Internet of Vehicles Data with the Window Function
Alibaba Clouder - December 11, 2017
digoal - December 11, 2019
digoal - July 25, 2019
digoal - December 18, 2020
digoal - December 21, 2020
digoal - June 26, 2019
Provides secure and reliable communication between devices and the IoT Platform which allows you to manage a large number of devices on a single IoT Platform.
Learn MoreA cloud solution for smart technology providers to quickly build stable, cost-efficient, and reliable ubiquitous platforms
Learn MoreAn online MPP warehousing service based on the Greenplum Database open source program
Learn MoreMigrate your Internet Data Center’s (IDC) Internet gateway to the cloud securely through Alibaba Cloud’s high-quality Internet bandwidth and premium Mainland China route.
Learn MoreMore Posts by digoal