Many businesses continuously insert new data and require simultaneous updates. To achieve this requirement, the pre-update data must be recorded in the historical table. This requirement is similar to an audit requirement, wherein, records must be audited before and after changes.
Although, this article is not about auditing and using triggers. Instead, it focuses on what methods should be used to automatically separate new data from old data. It suggests using the with
syntax to complete insert and update in one SQL statement.
PostgreSQL allows implementing insert and update in one SQL statement.
Create a current status table and a historical table using the commands below.
postgres=# create table tbl(id int primary key, price int);
CREATE TABLE
postgres=# create table tbl_history (id int not null, price int);
CREATE TABLE
Insert a record that does not exist, which will not trigger insertion into the historical table.
Remember to replace the variables.
id = $1 = 2
price = $2 = 7
postgres=# with old as (select * from tbl where id= $1),
postgres-# new as (insert into tbl values ($1, $2) on conflict (id) do update set price=excluded.price where tbl.price<>excluded.price returning *)
postgres-# insert into tbl_history select old.* from old,new where old.id=new.id;
INSERT 0 0
postgres=# select tableoid,ctid,* from tbl union all select tableoid,ctid,* from tbl_history ;
tableoid | ctid | id | price
----------+-------+----+-------
18243 | (0,1) | 2 | 7
(1 row)
Insert a record that does not exist, which will not trigger insertion into the historical table.
id = $1 = 1
price = $2 = 1
postgres=# with old as (select * from tbl where id= $1),
new as (insert into tbl values ($1, $2) on conflict (id) do update set price=excluded.price where tbl.price<>excluded.price returning *)
insert into tbl_history select old.* from old,new where old.id=new.id;
INSERT 0 0
postgres=# select tableoid,ctid,* from tbl union all select tableoid,ctid,* from tbl_history ;
tableoid | ctid | id | price
----------+-------+----+-------
18243 | (0,1) | 2 | 7
18243 | (0,2) | 1 | 1
(2 rows)
Now, insert an existing record with data changes, which triggers insertion into the historical table.
id = $1 = 1
price = $2 = 2
postgres=# with old as (select * from tbl where id= $1),
new as (insert into tbl values ($1, $2) on conflict (id) do update set price=excluded.price where tbl.price<>excluded.price returning *)
insert into tbl_history select old.* from old,new where old.id=new.id;
INSERT 0 1
postgres=# select tableoid,ctid,* from tbl union all select tableoid,ctid,* from tbl_history ;
tableoid | ctid | id | price
----------+-------+----+-------
18243 | (0,1) | 2 | 7
18243 | (0,3) | 1 | 2
18251 | (0,1) | 1 | 1
(3 rows)
Inserting an existing record with identical data will not trigger insertion into the historical table.
id = $1 = 1
price = $2 = 2
postgres=# with old as (select * from tbl where id= $1),
new as (insert into tbl values ($1, $2) on conflict (id) do update set price=excluded.price where tbl.price<>excluded.price returning *)
insert into tbl_history select old.* from old,new where old.id=new.id;
INSERT 0 0
postgres=# select tableoid,ctid,* from tbl union all select tableoid,ctid,* from tbl_history ;
tableoid | ctid | id | price
----------+-------+----+-------
18243 | (0,1) | 2 | 7
18243 | (0,3) | 1 | 2
18251 | (0,1) | 1 | 1
(3 rows)
Plan implementation as shown below.
postgres=# explain with old as (select * from tbl where id= $1),
new as (insert into tbl values ($1, $2) on conflict (id) do update set price=excluded.price where tbl.price<>excluded.price returning *)
insert into tbl_history select old.* from old,new where old.id=new.id;
QUERY PLAN
----------------------------------------------------------------------------
Insert on tbl_history (cost=2.17..2.23 rows=1 width=8)
CTE old
-> Index Scan using tbl_pkey on tbl (cost=0.14..2.16 rows=1 width=8)
Index Cond: (id = 1)
CTE new
-> Insert on tbl tbl_1 (cost=0.00..0.01 rows=1 width=8)
Conflict Resolution: UPDATE
Conflict Arbiter Indexes: tbl_pkey
Conflict Filter: (tbl_1.price <> excluded.price)
-> Result (cost=0.00..0.01 rows=1 width=8)
-> Nested Loop (cost=0.00..0.05 rows=1 width=8)
Join Filter: (old.id = new.id)
-> CTE Scan on old (cost=0.00..0.02 rows=1 width=8)
-> CTE Scan on new (cost=0.00..0.02 rows=1 width=4)
(14 rows)
In PostgreSQL, versions earlier than 9.5 do not support insert on conflict, therefore, adjust the SQL as follows.
id = $1 = 1
price = $2 = 2
with new as (update tbl set price=$2 where id=$1 and price<>$2)
insert into tbl select $1, $2 where not exists (select 1 from tbl where id=$1);
If the version is earlier than 9.5, the code for the scenario in this article must be as follows:
id = $1 = 1
price = $2 = 2
with
old as (select * from tbl where id=$1),
new_upd as (update tbl set price=$2 where id=$1 and price<>$2 returning *),
new_ins as (insert into tbl select $1, $2 where not exists (select 1 from tbl where id=$1) returning *)
insert into tbl_history
select old.* from old left outer join new_upd on (old.id=new_upd.id) where new_upd.* is not null;
PostgreSQL supports Rule syntax. Create a rule with Rule syntax. If the rule exists, update it. Otherwise, insert it.
However, while using the volatile function, do not directly use exists. Otherwise, it will be processed as an immutable function. For more information, see the following precautions.
postgres=# create table d(id int primary key, info text, crt_time timestamp);
postgres=# create rule r1 as on insert to d where (exists (select 1 from d where d.id=NEW.id)) do instead update d set info=NEW.info,crt_time=NEW.crt_time where id=NEW.id;
postgres=# insert into d values (1,'test',now());
INSERT 0 1
postgres=# select * from d;
id | info | crt_time
----+------+----------------------------
1 | test | 2017-08-10 14:12:20.053353
(1 row)
postgres=# insert into d values (1,'test123',now());
INSERT 0 0
postgres=# select * from d;
id | info | crt_time
----+---------+----------------------------
1 | test123 | 2017-08-10 14:12:26.964074
(1 row)
If the rule exists, do not insert it (ignore it). Otherwise, update it. Implement idempotent writing to avoid any problem when writing during resumable uploads.
postgres=# create table d(id int primary key, info text, crt_time timestamp);
postgres=# create rule r1 as on insert to d where (exists (select 1 from d where d.id=NEW.id)) do instead nothing;
CREATE RULE
postgres=# insert into d values (1,'test123',now());
INSERT 0 0
postgres=# insert into d values (1,'test123',now());
INSERT 0 0
postgres=# insert into d values (1,'test123',now());
INSERT 0 0
postgres=# insert into d values (0,'test123',now());
INSERT 0 1
Implement performance stress testing as shown below.
vi test.sql
\set id random(1,1000000)
insert into d values (:id, md5(random()::text), now());
pgbench -M prepared -n -r -P 1 -f ./test.sql -c 64 -j 64 -T 1000
progress: 90.0 s, 132056.5 tps, lat 0.220 ms stddev 0.055
progress: 91.0 s, 131656.9 tps, lat 0.220 ms stddev 0.040
progress: 92.0 s, 134941.0 tps, lat 0.215 ms stddev 0.090
progress: 93.0 s, 134324.5 tps, lat 0.216 ms stddev 0.076
progress: 94.0 s, 136699.4 tps, lat 0.212 ms stddev 0.070
progress: 95.0 s, 139291.4 tps, lat 0.208 ms stddev 0.067
progress: 96.0 s, 136073.2 tps, lat 0.213 ms stddev 0.076
progress: 97.0 s, 135804.6 tps, lat 0.214 ms stddev 0.076
progress: 98.0 s, 146037.6 tps, lat 0.199 ms stddev 0.069
progress: 99.0 s, 129619.5 tps, lat 0.224 ms stddev 0.049
progress: 100.0 s, 129230.0 tps, lat 0.224 ms stddev 0.047
progress: 101.0 s, 131048.4 tps, lat 0.221 ms stddev 0.055
progress: 102.0 s, 128808.0 tps, lat 0.225 ms stddev 0.048
progress: 103.0 s, 128954.6 tps, lat 0.225 ms stddev 0.048
progress: 104.0 s, 131227.9 tps, lat 0.221 ms stddev 0.042
progress: 105.0 s, 129604.0 tps, lat 0.224 ms stddev 0.057
1) Using exists in the Rule syntax is only suitable for a single insert statement (use the volatile function to solve this problem). Otherwise, constraints are required to ensure uniqueness.
postgres=# create table e(id int, info text);
CREATE TABLE
postgres=# create rule r1 as on insert to e where exists (select 1 from e t1 where t1.id=NEW.id limit 1) do instead nothing;
CREATE RULE
在一个语句中插入多条,如果多条中有重复,则在RULE中判断条件时仅判断一次(类似immutable函数)。
postgres=# insert into e values (1,'test'),(1,'test');
INSERT 0 2
postgres=# select * from e;
id | info
----+------
1 | test
1 | test
(2 rows)
解决方法,使用volatile函数
postgres=# drop rule r1 on e;
DROP RULE
CREATE OR REPLACE FUNCTION public.ff(integer)
RETURNS boolean
LANGUAGE sql
STRICT
volatile
AS $function$
select true from e where id=$1 limit 1;
$function$;
postgres=# create rule r1 as on insert to e where ff(NEW.id) do instead nothing;
CREATE RULE
postgres=# insert into e values (1,'test'),(1,'test');
INSERT 0 0
postgres=# insert into e values (2,'test'),(2,'test');
INSERT 0 1
postgres=# insert into e values (3,'test'),(3,'test');
INSERT 0 1
postgres=# truncate e;
TRUNCATE TABLE
postgres=# select * from b;
id | info
----+------
1 | a
1 | b
(2 rows)
postgres=# insert into e select * from b;
INSERT 0 1
postgres=# \d+ b
Table "public.b"
Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
--------+---------+-----------+----------+---------+----------+--------------+-------------
id | integer | | | | plain | |
info | text | | | | extended | |
2) Rule syntax does not support copy statements. Hence, repeated copy statements also cause problems.
In any case, we always recommend that tables requiring an upsert must have a primary key.
How Does PostgreSQL Implement Batch Update, Deletion, and Insertion?
digoal - March 25, 2020
Alibaba EMR - June 22, 2021
ApsaraDB - February 29, 2024
ApsaraDB - July 25, 2023
ApsaraDB - December 27, 2023
digoal - April 28, 2021
Alibaba Cloud PolarDB for PostgreSQL is an in-house relational database service 100% compatible with PostgreSQL and highly compatible with the Oracle syntax.
Learn MoreAn online MPP warehousing service based on the Greenplum Database open source program
Learn MoreTair is a Redis-compatible in-memory database service that provides a variety of data structures and enterprise-level capabilities.
Learn MoreTSDB is a stable, reliable, and cost-effective online high-performance time series database service.
Learn MoreMore Posts by digoal