The PostgreSQL compression algorithm can be customized. For example, simple mean compression, maximum compression and minimum compression, or compression based on the revolving door compression algorithm.
Take the following table as an example
create table tbl (
id serial8 primary key, -- primary key
sid int, -- sensor ID
hid int, -- indicator D
val float8, -- collected value
ts timestamp -- acquisition time
);
create index idx_tbl on tbl(ts);
The 5-minute compression table will be like the following one.
create table tbl_5min (
id serial8 primary key, -- primary key
sid int, -- sensor ID
hid int, -- indicator ID
val float8, -- inheritance, average, easy to do ring analysis
ts timestamp, -- inheritance, start time, easy to do ring analysis
val_min float8, -- minimum
val_max float8, -- maximum
val_sum float8, -- and
val_count float8, -- number of acquisitions
ts_start timestamp, -- interval start time
ts_end timestamp -- interval end time
);
alter table tbl_5min inherit tbl;
The 30-minute compression table will be like the following one.
create table tbl_30min (
id serial8 primary key, -- primary key
sid int, -- sensor ID
hid int, -- indicator ID
val float8, -- inheritance, average, easy to do ring analysis
ts timestamp, -- inheritance, start time, easy to do ring analysis
val_min float8, -- minimum
val_max float8, -- maximum
val_sum float8, -- and
val_count float8, -- number of acquisitions
ts_start timestamp, -- interval start time
ts_end timestamp -- interval end time
);
alter table tbl_30min inherit tbl;
Then you can compress the table with the following statement.
5-minute compression statement
with tmp1 as (
delete from only tbl where ts <= now()-interval '1 day' returning *
)
insert into tbl_5min
(sid, hid, val, ts, val_min, val_max, val_sum, val_count, ts_start, ts_end)
select sid, hid, avg(val) as val, min(ts) as ts, min(val) as val_min, max(val) as val_max, sum(val) as val_sum, count(*) as val_count, min(ts) as ts_start, max(ts) as ts_end from
tmp1
group by sid, hid, substring(to_char(ts, 'yyyymmddhh24mi'), 1, 10) || lpad(((substring(to_char(ts, 'yyyymmddhh24mi'), 11, 2)::int / 5) * 5)::text, 2, '0');
30-minute compression statement
with tmp1 as (
delete from only tbl_5min where ts_start <= now()-interval '1 day' returning *
)
insert into tbl_30min
(sid, hid, val_min, val_max, val_sum, val_count, ts_start, ts_end)
select sid, hid, min(val_min) as val_min, max(val_max) as val_max, sum(val_sum) as val_sum, sum(val_count) as val_count, min(ts_start) as ts_start, max(ts_end) as ts_end from
tmp1
group by sid, hid, substring(to_char(ts_start, 'yyyymmddhh24mi'), 1, 10) || lpad(((substring(to_char(ts_start, 'yyyymmddhh24mi'), 11, 2)::int / 30) * 30)::text, 2, '0');
For the design, demo and more details, please go to PostgreSQL Time-Series Data Case: Automatic Compression over Time.
A good compression algorithm is extremely important when it comes to any IoT scenarios that involve continuous monitoring and large amounts of data points. So the question arises are there compression algorithms available that can help with all of this but also ensure that the data is not distorted in the process? If so, which algorithms are they?
To answer this question, in this blog, we will show you the Swinging Door Trending (SDT) algorithm and how you can use it in PostgreSQL for effective compression in IoT scenarios.
The Swinging Door Trending (SDT) algorithm is a linear trend compression algorithm. In essence, it replaces a series of continuous data points with a straight line determined by the start and end points.
TimescaleDB is based on the traditional relational database Postgres and therefore does not support dynamic schema. But it supports pre-aggregation and downsampling. After Facebook Gorilla is developed, time series compression draws lots of attention. Proper compression can be performed based on the features of time series data. For example, if timestamps are consecutive and data is stable, time series compression by using delta-of-delta or xor is much more efficient that general compression.
With the combination of time series compression and block compression, this volume of data can be compressed to 413 GB, achieving a compression ratio of around 15:1. Compared with block compression alone, the combination of time series compression and block compression increases the compression effectiveness by 40%, which is of great significance to storage cost reduction. Another advantage is that block compression effectiveness will not be affected because time series compression is done before block compression and the size of data after time series compression is already very small. In large query scenarios, RT is reduced by half. When we scan data for a long time range, data has already been highly compressed. Therefore, the I/O efficiency is very high and RT can be reduced by half.
Row-store with no compression is used by default when you create a table in AnalyticDB for PostgreSQL. To enable column-store and compression, you must specify the column-store and compression options when creating the table.
LZ4 is an advanced compression algorithm, with a higher compression ratio and execution speed. You will learn how to enable the LZ4 compression algorithm and how to enable the LZ4 compression algorithm for the entire database by default in this document.
ApsaraDB HybridDB for PostgreSQL is an online MPP (Massively Parallel Processing) data warehousing service based on the open source Greenplum Database.
ApsaraDB HybridDB provides online expansion and performance monitoring service to free your team from complicated MPP cluster operations and management (O&M). This enables database administrators, developers and data analysts to focus on upgrading enterprise productivity through SQL development.
Time Series Database (TSDB) supports high-speed data reading and writing. It offers high compression ratios for cost-efficient data storage. This service also supports visualization of precision reduction, interpolation, multi-metric aggregate computing, and query results.
To celebrate Alibaba Cloud Database Service's recognition as A Strong Performer by Forrester, we are now giving out attractive offers exclusively for our customers. Please refer to the Terms and Conditions section for details and rules of the offers.
2,599 posts | 762 followers
Followdigoal - May 16, 2019
Alibaba Clouder - August 2, 2019
digoal - May 17, 2021
digoal - January 30, 2022
digoal - July 24, 2019
Alibaba Clouder - July 31, 2019
2,599 posts | 762 followers
FollowAn on-demand database hosting service for PostgreSQL with automated monitoring, backup and disaster recovery capabilities
Learn MoreMore Posts by Alibaba Clouder