In some business scenarios, data can be either hot or cold. In cases where a business may require only the data of the last day, week, or month, it is feasible to discard historical data. For example, in some monitoring scenarios, the monitoring data from the last week is retained while other historical data is discarded. The dilemma is how to discard historical data or implement rotation? Let's take a quick look at the available two options to resolve this issue.
1) Delete data generated more than one week ago.
delete from table where crt_time<=now()-interval '7 day';
This method incurs additional overhead, including writing redo logs and garbage collection. If a large amount of data needs to be deleted, statistics must be collected again. If the statistics are not collected in time, the information may be inaccurate. This may also lead to merge JOIN problems.
2) Use partition tables in polling mode and use truncate to clear partitions.
This method avoids problems caused by deletion. However, improper use may also lead to new problems. Both truncate and rename table are Data Definition Language (DDL) operations. We recommend adding a lock timeout mechanism for the DDL operations. Otherwise, DDL waiting will block other SQL statements.
Switching table names require a temporary table name that is not currently in use.
The following section uses a partition table as an example to describe how to rotate data.
Assuming data is retained for one week, the following section describes how to rotate data according to the second method stated above.
Note: PostgreSQL supports DDL transactions.
There are nine tables, one primary and eight partition tables, of which seven tables correspond to dow partitions and one table corresponds to the default partition (swap partition). The advantage of using constraints in dow partitions is that partitions are directly filtered according to the constraints during query.
Step 1) Use the following command to create the primary table.
create table test(id int primary key, info text, crt_time timestamp);
Step 2) Create partitions as shown below.
create table test0(like test including all, constraint ck check(extract(dow from crt_time)=0)) inherits(test);
create table test1(like test including all, constraint ck check(extract(dow from crt_time)=1)) inherits(test);
create table test2(like test including all, constraint ck check(extract(dow from crt_time)=2)) inherits(test);
create table test3(like test including all, constraint ck check(extract(dow from crt_time)=3)) inherits(test);
create table test4(like test including all, constraint ck check(extract(dow from crt_time)=4)) inherits(test);
create table test5(like test including all, constraint ck check(extract(dow from crt_time)=5)) inherits(test);
create table test6(like test including all, constraint ck check(extract(dow from crt_time)=6)) inherits(test);
create table test_def(like test including all) inherits(test);
Step 3) Select, Update, and Delete data operations are performed directly in the primary table. Add time conditions to filter partitions.
postgres=# explain select * from test where crt_time=now() and extract(dow from crt_time)=0;
QUERY PLAN
-----------------------------------------------------------------------------------------------------
Append (cost=0.00..65.20 rows=3 width=44)
-> Seq Scan on test (cost=0.00..0.00 rows=1 width=44)
Filter: ((crt_time = now()) AND (date_part('dow'::text, crt_time) = '0'::double precision))
-> Seq Scan on test0 (cost=0.00..32.60 rows=1 width=44)
Filter: ((crt_time = now()) AND (date_part('dow'::text, crt_time) = '0'::double precision))
-> Seq Scan on test_def (cost=0.00..32.60 rows=1 width=44)
Filter: ((crt_time = now()) AND (date_part('dow'::text, crt_time) = '0'::double precision))
(7 rows)
Step 4) Use the program to automatically generate table names according to the dow partitions of the time specified by crt_time
to directly insert data into the partition tables.
If the program has trouble inserting data into the partitions, use a trigger or rule as shown in the example below.
create or replace function ins_tg() returns trigger as $$
declare
begin
case extract(dow from NEW.crt_time)
when 0 then insert into test0 values (NEW.*);
when 1 then insert into test1 values (NEW.*);
when 2 then insert into test2 values (NEW.*);
when 3 then insert into test3 values (NEW.*);
when 4 then insert into test4 values (NEW.*);
when 5 then insert into test5 values (NEW.*);
when 6 then insert into test6 values (NEW.*);
end case;
return null;
end;
$$ language plpgsql strict;
create trigger tg before insert on test for each row execute procedure ins_tg();
insert into test values (1,'test',now()+interval '1 day');
insert into test values (1,'test',now()+interval '2 day');
insert into test values (1,'test',now()+interval '3 day');
insert into test values (1,'test',now()+interval '4 day');
insert into test values (1,'test',now()+interval '5 day');
insert into test values (1,'test',now()+interval '6 day');
insert into test values (1,'test',now()+interval '7 day');
postgres=# select tableoid::regclass , * from test;
tableoid | id | info | crt_time
----------+----+------+----------------------------
test0 | 1 | test | 2017-03-26 14:40:48.066905
test1 | 1 | test | 2017-03-27 14:40:50.450942
test2 | 1 | test | 2017-03-28 14:40:52.271922
test4 | 1 | test | 2017-03-23 14:40:22.551928
test5 | 1 | test | 2017-03-24 14:40:24.643933
test6 | 1 | test | 2017-03-25 14:40:28.138913
test3 | 1 | test | 2017-03-22 14:40:20.586945
(7 rows)
Process transactions once each day during idle periods. DDL supports transactions. If a transaction fails, it's recommended to retry it. Complete the following operations in a transaction:
1) Calculate the dow partitions for the next day.
2) Clear test_def
constraints.
3) Clear test_def
data.
4) Rename test_def
to test_def_tmp
(a nonexistent table name).
5) Rename the partition for the next day to test_def
.
6) Add constraints to test_def_tmp
.
7) Rename test_def_tmp
to the partition for the next day.
Consider the following example.
1. 开始事务
begin;
2. 设置锁超时
set lock_timeout = '60s';
3. 查询明天的dow
select extract(dow from current_date+1);
date_part
-----------
3
(1 row)
4. 清除test_def约束
alter table test_def drop constraint IF EXISTS ck;
5. 清除test_def数据
truncate test_def;
6. 重命名test_def
alter table test_def rename to test_def_tmp;
7. 重命名明天的分区表
alter table test3 rename to test_def;
8. test_def_tmp添加约束(明天)
alter table test_def_tmp add constraint ck check(extract(dow from crt_time)=3);
9. test_def_tmp重命名为明天的分区
alter table test_def_tmp rename to test3;
10. 提交或回滚
commit;
如果失败,回滚事务。
Rollback;
Take the following precautions to avoid any errors:
1) Lock timeout
2) Rollback upon transaction failure
3) Intermediate table name does not exist
4) Unified constraint name
Using the delete method to delete historical data may incur additional overhead, including writing redo logs and garbage collection. If a large amount of data needs to be deleted, make sure to collect statistics again. The information may be inaccurate if the statistics are not collected in time, in addition to causing merge Join problems. Since PostgreSQL supports DDL encapsulation in transactions, partition tables can be used in polling mode.
How Does PostgreSQL Implement Batch Update, Deletion, and Insertion?
digoal - March 25, 2020
Alibaba Clouder - February 15, 2018
Ryan Zhang - September 18, 2018
Alibaba F(x) Team - March 1, 2023
Alex - January 22, 2020
digoal - December 6, 2023
Alibaba Cloud PolarDB for PostgreSQL is an in-house relational database service 100% compatible with PostgreSQL and highly compatible with the Oracle syntax.
Learn MoreAn online MPP warehousing service based on the Greenplum Database open source program
Learn MoreTair is a Redis-compatible in-memory database service that provides a variety of data structures and enterprise-level capabilities.
Learn MoreTSDB is a stable, reliable, and cost-effective online high-performance time series database service.
Learn MoreMore Posts by digoal