In the real world, the data generated by many businesses have the attribute of time-series data (that is, the data is written sequentially in time dimension, and a large number of requests for time-interval query statistics are also included).
For example, FEED data of the business, time-series data generated by Internet of Things (such as weather sensors and vehicle trajectories) and real-time data from the financial industry.
PostgreSQL UDF and BRIN (block-level index) are ideal for processing time-series data. Specifically, see the two following examples.
In fact, in PostgreSQL ecology, a time-series plug-in named TimescaleDB has been derived, which is specially used to process time-series data. (Timescale improvements, including improvements to the SQL optimizer (it supports "merge append", and time shard aggregation is very efficient), rotate interface, and automatic slicing)
Many investors are also interested in TimescaleDB and it has already received an investment of USD 50 million, which indirectly indicates that the time-series database will be very popular with users in the future.
First, TimescaleDB is automatically sharded and has no influence from the users' perspective. When the amount of data is very large, the write performance does not deteriorate. (This mainly refers to disks with lower IOPS. For disks with better IOPS, PG performs OK after writing a large amount of data.)
Secondly, Timescale improves SQL optimizer and adds the execution node of "merge append". When "group by" is performed on small time shards, it does not need to perform HASH or GROUP operation on the entire timestamp range, but instead performs calculation on shards, which makes it very efficient.
Finally, some APIs have been added to Timescale, making it very efficient for users to write, maintain, and query time-series data, and very easy to maintain the data.
These APIs are as follows: http://docs.timescale.com/v0.8/api
Deploy TimescaleDB
Take CentOS 7.x x64 as an example.
1. Install PostgreSQL
Please see PostgreSQL on Linux Best Deployment Manual
export USE_NAMED_POSIX_SEMAPHORES=1
LIBS=-lpthread CFLAGS="-O3" ./configure --prefix=/home/digoal/pgsql10 --with-segsize=8 --with-wal-segsize=256
LIBS=-lpthread CFLAGS="-O3" make world -j 64
LIBS=-lpthread CFLAGS="-O3" make install-world
2. Install cmake3
epel
yum install -y cmake3
ln -s /usr/bin/cmake3 /usr/bin/cmake
3. Compile TimescaleDB
git clone https://github.com/timescale/timescaledb/
cd timescaledb
git checkout release-0.8.0
或
wget https://github.com/timescale/timescaledb/archive/0.8.0.tar.gz
export PATH=/home/digoal/pgsql10/bin:$PATH
export LD_LIBRARY_PATH=/home/digoal/pgsql10/lib:$LD_LIBRARY_PATH
# Bootstrap the build system
./bootstrap
cd ./build && make
make install
[ 2%] Built target sqlupdatefile
[ 4%] Built target sqlfile
[100%] Built target timescaledb
Install the project...
-- Install configuration: "Release"
-- Installing: /home/dege.zzz/pgsql10/share/extension/timescaledb.control
-- Installing: /home/dege.zzz/pgsql10/share/extension/timescaledb--0.8.0.sql
-- Installing: /home/dege.zzz/pgsql10/share/extension/timescaledb--0.7.1--0.8.0.sql
-- Installing: /home/dege.zzz/pgsql10/share/extension/timescaledb--0.1.0--0.2.0.sql
-- Installing: /home/dege.zzz/pgsql10/share/extension/timescaledb--0.2.0--0.3.0.sql
-- Installing: /home/dege.zzz/pgsql10/share/extension/timescaledb--0.3.0--0.4.0.sql
-- Installing: /home/dege.zzz/pgsql10/share/extension/timescaledb--0.4.0--0.4.1.sql
-- Installing: /home/dege.zzz/pgsql10/share/extension/timescaledb--0.4.1--0.4.2.sql
-- Installing: /home/dege.zzz/pgsql10/share/extension/timescaledb--0.4.2--0.5.0.sql
-- Installing: /home/dege.zzz/pgsql10/share/extension/timescaledb--0.5.0--0.6.0.sql
-- Installing: /home/dege.zzz/pgsql10/share/extension/timescaledb--0.6.0--0.6.1.sql
-- Installing: /home/dege.zzz/pgsql10/share/extension/timescaledb--0.6.1--0.7.0.sql
-- Installing: /home/dege.zzz/pgsql10/share/extension/timescaledb--0.6.1--0.7.1.sql
-- Installing: /home/dege.zzz/pgsql10/share/extension/timescaledb--0.7.0--0.7.1.sql
-- Installing: /home/dege.zzz/pgsql10/lib/timescaledb.so
4. Configure postgresql.conf to automatically load the timescale lib library when the database is started
vi $PGDATA/postgresql.conf
shared_preload_libraries = 'timescaledb'
pg_ctl restart -m fast
5. Create plug-ins for databases that need to use TimescaleDB
psql
psql (10.1)
Type "help" for help.
postgres=# create extension timescaledb ;
6. Parameters related to TimescaleDB
timescaledb.constraint_aware_append
timescaledb.disable_optimizations
timescaledb.optimize_non_hypertables
timescaledb.restoring
postgres=# show timescaledb.constraint_aware_append ;
timescaledb.constraint_aware_append
-------------------------------------
on
(1 row)
postgres=# show timescaledb.disable_optimizations ;
timescaledb.disable_optimizations
-----------------------------------
off
(1 row)
postgres=# show timescaledb.optimize_non_hypertables ;
timescaledb.optimize_non_hypertables
--------------------------------------
off
(1 row)
postgres=# show timescaledb.restoring ;
timescaledb.restoring
-----------------------
off
(1 row)
The first example is the actual New York city taxicab data, http://docs.timescale.com/v0.8/tutorials/tutorial-hello-nyc
The data is real, taken from New York city taxi cabs, http://www.nyc.gov/html/tlc/html/about/trip_record_data.shtml
1. Download sample data
wget https://timescaledata.blob.core.windows.net/datasets/nyc_data.tar.gz
2. Extract
tar -zxvf nyc_data.tar.gz
3. Create a table, which involves using the create_hypertable API to convert ordinary tables into time-series storage tables.
psql -f nyc_data.sql
Some of the truncated nyc_data.sql content is as follows:
cat nyc_data.sql
-- 打车数据: 包括时长、计费、路程、上车、下车经纬度、时间、人数等等。
CREATE TABLE "rides"(
vendor_id TEXT,
pickup_datetime TIMESTAMP WITHOUT TIME ZONE NOT NULL,
dropoff_datetime TIMESTAMP WITHOUT TIME ZONE NOT NULL,
passenger_count NUMERIC,
trip_distance NUMERIC,
pickup_longitude NUMERIC,
pickup_latitude NUMERIC,
rate_code INTEGER,
dropoff_longitude NUMERIC,
dropoff_latitude NUMERIC,
payment_type INTEGER,
fare_amount NUMERIC,
extra NUMERIC,
mta_tax NUMERIC,
tip_amount NUMERIC,
tolls_amount NUMERIC,
improvement_surcharge NUMERIC,
total_amount NUMERIC
);
This sentence converts the "rides" table into a time-series storage table
SELECT create_hypertable('rides', 'pickup_datetime', 'payment_type', 2, create_default_indexes=>FALSE);
Create an index
CREATE INDEX ON rides (vendor_id, pickup_datetime desc);
CREATE INDEX ON rides (pickup_datetime desc, vendor_id);
CREATE INDEX ON rides (rate_code, pickup_datetime DESC);
CREATE INDEX ON rides (passenger_count, pickup_datetime desc);
4. Import test data
psql -c "\COPY rides FROM nyc_data_rides.csv CSV"
COPY 10906858
5. Execute some test SQL on the "rides" table that has been converted to a time-series storage table, the performance of which is better than PostgreSQL ordinary tables.
What is the average charge for transactions with more than two passengers per day?
-- Average fare amount of rides with 2+ passengers by day
SELECT date_trunc('day', pickup_datetime) as day, avg(fare_amount)
FROM rides
WHERE passenger_count > 1 AND pickup_datetime < '2016-01-08'
GROUP BY day ORDER BY day;
day | avg
--------------------+---------------------
2016-01-01 00:00:00 | 13.3990821679715529
2016-01-02 00:00:00 | 13.0224687415181399
2016-01-03 00:00:00 | 13.5382068607068607
2016-01-04 00:00:00 | 12.9618895561740149
2016-01-05 00:00:00 | 12.6614611935518309
2016-01-06 00:00:00 | 12.5775245695086098
2016-01-07 00:00:00 | 12.5868802584437019
(7 rows)
6. The performance of some queries is even more than 20 times better
How many transactions are there every day?
-- Total number of rides by day for first 5 days
SELECT date_trunc('day', pickup_datetime) as day, COUNT(*) FROM rides
GROUP BY day ORDER BY day
LIMIT 5;
day | count
--------------------+--------
2016-01-01 00:00:00 | 345037
2016-01-02 00:00:00 | 312831
2016-01-03 00:00:00 | 302878
2016-01-04 00:00:00 | 316171
2016-01-05 00:00:00 | 343251
(5 rows)
Timescale adds the execution optimization of "merge append", so it is highly efficient to aggregate by small granularity on time shards. The more data, the more obvious the difference in performance improvement.
For example, TimescaleDB introduces a time-based "merge append" optimization to minimize the number of groups which must be processed to execute the following (given its knowledge that time is already ordered).
For our 100M row table, this results in query latency that is 396x faster than PostgreSQL (82ms vs. 32566ms).
SELECT date_trunc('minute', time) AS minute, max(usage_user)
FROM cpu
WHERE time < '2017-01-01'
GROUP BY minute
ORDER BY minute DESC
LIMIT 5;
7. Execute some functions specific to TimescaleDB, such as time_bucket, and some acceleration algorithms built into TimescaleDB is also used here.
Every 5-minute interval is a BUCKET, which produces the number of orders generated in each interval.
-- Number of rides by 5 minute intervals
-- (using the TimescaleDB "time_bucket" function)
SELECT time_bucket('5 minute', pickup_datetime) as five_min, count(*)
FROM rides
WHERE pickup_datetime < '2016-01-01 02:00'
GROUP BY five_min ORDER BY five_min;
five_min | count
---------------------+-------
2016-01-01 00:00:00 | 703
2016-01-01 00:05:00 | 1482
2016-01-01 00:10:00 | 1959
2016-01-01 00:15:00 | 2200
2016-01-01 00:20:00 | 2285
2016-01-01 00:25:00 | 2291
2016-01-01 00:30:00 | 2349
2016-01-01 00:35:00 | 2328
2016-01-01 00:40:00 | 2440
2016-01-01 00:45:00 | 2372
2016-01-01 00:50:00 | 2388
2016-01-01 00:55:00 | 2473
2016-01-01 01:00:00 | 2395
2016-01-01 01:05:00 | 2510
2016-01-01 01:10:00 | 2412
2016-01-01 01:15:00 | 2482
2016-01-01 01:20:00 | 2428
2016-01-01 01:25:00 | 2433
2016-01-01 01:30:00 | 2337
2016-01-01 01:35:00 | 2366
2016-01-01 01:40:00 | 2325
2016-01-01 01:45:00 | 2257
2016-01-01 01:50:00 | 2316
2016-01-01 01:55:00 | 2250
(24 rows)
8. Execute some statistical analysis SQL
The volume of taxi transactions in each city.
-- Join rides with rates to get more information on rate_code
SELECT rates.description, COUNT(vendor_id) as num_trips FROM rides
JOIN rates on rides.rate_code = rates.rate_code
WHERE pickup_datetime < '2016-01-08'
GROUP BY rates.description ORDER BY rates.description;
description | num_trips
-----------------------+-----------
JFK | 54832
Nassau or Westchester | 967
Newark | 4126
group ride | 17
negotiated fare | 7193
standard rate | 2266401
(6 rows)
Statistics of taxi rides in some cities in January 2016 (including longest distance, shortest distance, average number of passengers, and hours)
-- Analysis of all JFK and EWR rides in Jan 2016
SELECT rates.description, COUNT(vendor_id) as num_trips,
AVG(dropoff_datetime - pickup_datetime) as avg_trip_duration, AVG(total_amount) as avg_total,
AVG(tip_amount) as avg_tip, MIN(trip_distance) as min_distance, AVG(trip_distance) as avg_distance, MAX(trip_distance) as max_distance,
AVG(passenger_count) as avg_passengers
FROM rides
JOIN rates on rides.rate_code = rates.rate_code
WHERE rides.rate_code in (2,3) AND pickup_datetime < '2016-02-01'
GROUP BY rates.description ORDER BY rates.description;
description | num_trips | avg_trip_duration | avg_total | avg_tip | min_distance | avg_distance | max_distance | avg_passengers
-------------+-----------+-------------------+---------------------+--------------------+--------------+---------------------+--------------+--------------------
JFK | 225019 | 00:45:46.822517 | 64.3278115181384683 | 7.3334228220728027 | 0.00 | 17.2602816651038357 | 221.00 | 1.7333869584346211
Newark | 16822 | 00:35:16.157472 | 86.4633688027582927 | 9.5461657353465700 | 0.00 | 16.2706122934252764 | 177.23 | 1.7435501129473309
(2 rows)
9. Automatic data sharding and run plan
postgres=# \d+ rides
Table "public.rides"
Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
-----------------------+-----------------------------+-----------+----------+---------+----------+--------------+-------------
vendor_id | text | | | | extended | |
pickup_datetime | timestamp without time zone | | not null | | plain | |
dropoff_datetime | timestamp without time zone | | not null | | plain | |
passenger_count | numeric | | | | main | |
trip_distance | numeric | | | | main | |
pickup_longitude | numeric | | | | main | |
pickup_latitude | numeric | | | | main | |
rate_code | integer | | | | plain | |
dropoff_longitude | numeric | | | | main | |
dropoff_latitude | numeric | | | | main | |
payment_type | integer | | | | plain | |
fare_amount | numeric | | | | main | |
extra | numeric | | | | main | |
mta_tax | numeric | | | | main | |
tip_amount | numeric | | | | main | |
tolls_amount | numeric | | | | main | |
improvement_surcharge | numeric | | | | main | |
total_amount | numeric | | | | main | |
Indexes:
"rides_passenger_count_pickup_datetime_idx" btree (passenger_count, pickup_datetime DESC)
"rides_pickup_datetime_vendor_id_idx" btree (pickup_datetime DESC, vendor_id)
"rides_rate_code_pickup_datetime_idx" btree (rate_code, pickup_datetime DESC)
"rides_vendor_id_pickup_datetime_idx" btree (vendor_id, pickup_datetime DESC)
Child tables: _timescaledb_internal._hyper_1_1_chunk,
_timescaledb_internal._hyper_1_2_chunk,
_timescaledb_internal._hyper_1_3_chunk,
_timescaledb_internal._hyper_1_4_chunk
其中一个分片的约束如下
Check constraints:
"constraint_1" CHECK (pickup_datetime >= '2015-12-31 00:00:00'::timestamp without time zone AND pickup_datetime < '2016-01-30 00:00:00'::timestamp without time zone)
"constraint_2" CHECK (_timescaledb_internal.get_partition_hash(payment_type) >= 1073741823)
Inherits: rides
-- Peek behind the scenes
postgres=# select count(*) from rides;
count
----------
10906858
(1 row)
Time: 376.247 ms
postgres=# explain select count(*) from rides;
QUERY PLAN
------------------------------------------------------------------------------------------------------------
Finalize Aggregate (cost=254662.23..254662.24 rows=1 width=8)
-> Gather (cost=254661.71..254662.22 rows=5 width=8)
Workers Planned: 5
-> Partial Aggregate (cost=253661.71..253661.72 rows=1 width=8)
-> Append (cost=0.00..247468.57 rows=2477258 width=0)
-> Parallel Seq Scan on rides (cost=0.00..0.00 rows=1 width=0)
-> Parallel Seq Scan on _hyper_1_1_chunk (cost=0.00..77989.57 rows=863657 width=0)
-> Parallel Seq Scan on _hyper_1_2_chunk (cost=0.00..150399.01 rows=1331101 width=0)
-> Parallel Seq Scan on _hyper_1_3_chunk (cost=0.00..6549.75 rows=112675 width=0)
-> Parallel Seq Scan on _hyper_1_4_chunk (cost=0.00..12530.24 rows=169824 width=0)
(10 rows)
10. You can also check the shards directly
postgres=# select count(*) from _timescaledb_internal._hyper_1_1_chunk;
count
---------
3454961
(1 row)
Sliced metadata:
postgres=# \dn
List of schemas
Name | Owner
-----------------------+----------
_timescaledb_cache | postgres
_timescaledb_catalog | postgres
_timescaledb_internal | postgres
public | postgres
(4 rows)
The time-series database timescaleDB plug-in is combined with the spatial-temporal database PostGIS plug-in. PostgreSQL is very good at handling spatial data.
1. Create a spatial database PostGIS
create extension postgis;
2. Add a spatial type field
http://postgis.net/docs/manual-2.4/AddGeometryColumn.html
postgres=# SELECT AddGeometryColumn ('public','rides','pickup_geom',2163,'POINT',2);
addgeometrycolumn
--------------------------------------------------------
public.rides.pickup_geom SRID:2163 TYPE:POINT DIMS:2
(1 row)
postgres=# SELECT AddGeometryColumn ('public','rides','dropoff_geom',2163,'POINT',2);
addgeometrycolumn
---------------------------------------------------------
public.rides.dropoff_geom SRID:2163 TYPE:POINT DIMS:2
(1 row)
postgres=#
postgres=# \d+ rides
Table "public.rides"
Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
-----------------------+-----------------------------+-----------+----------+---------+----------+--------------+-------------
vendor_id | text | | | | extended | |
pickup_datetime | timestamp without time zone | | not null | | plain | |
dropoff_datetime | timestamp without time zone | | not null | | plain | |
passenger_count | numeric | | | | main | |
trip_distance | numeric | | | | main | |
pickup_longitude | numeric | | | | main | |
pickup_latitude | numeric | | | | main | |
rate_code | integer | | | | plain | |
dropoff_longitude | numeric | | | | main | |
dropoff_latitude | numeric | | | | main | |
payment_type | integer | | | | plain | |
fare_amount | numeric | | | | main | |
extra | numeric | | | | main | |
mta_tax | numeric | | | | main | |
tip_amount | numeric | | | | main | |
tolls_amount | numeric | | | | main | |
improvement_surcharge | numeric | | | | main | |
total_amount | numeric | | | | main | |
pickup_geom | geometry(Point,2163) | | | | main | |
dropoff_geom | geometry(Point,2163) | | | | main | |
Indexes:
"rides_passenger_count_pickup_datetime_idx" btree (passenger_count, pickup_datetime DESC)
"rides_pickup_datetime_vendor_id_idx" btree (pickup_datetime DESC, vendor_id)
"rides_rate_code_pickup_datetime_idx" btree (rate_code, pickup_datetime DESC)
"rides_vendor_id_pickup_datetime_idx" btree (vendor_id, pickup_datetime DESC)
Child tables: _timescaledb_internal._hyper_1_1_chunk,
_timescaledb_internal._hyper_1_2_chunk,
_timescaledb_internal._hyper_1_3_chunk,
_timescaledb_internal._hyper_1_4_chunk
3. Update the data to the geometry field (It is actually stored as two automatic fields, representing longitude and latitude respectively. In fact, it does not matter whether it is updated or not, because PG supports expression indexes, and you can use these two fields to create expression spatial indexes.)
-- Generate the geometry points and write to table
-- (Note: These calculations might take a few mins)
UPDATE rides SET pickup_geom = ST_Transform(ST_SetSRID(ST_MakePoint(pickup_longitude,pickup_latitude),4326),2163);
UPDATE rides SET dropoff_geom = ST_Transform(ST_SetSRID(ST_MakePoint(dropoff_longitude,dropoff_latitude),4326),2163);
vacuum full rides;
4. Examples of Spatio-Temporal Analysis.
How many cars are called every 30 minutes within 400 meters of (lat, long) (40.7589,-73.9851).
-- Number of rides on New Years Eve originating within
-- 400m of Times Square, by 30 min buckets
-- Note: Times Square is at (lat, long) (40.7589,-73.9851)
SELECT time_bucket('30 minutes', pickup_datetime) AS thirty_min, COUNT(*) AS near_times_sq
FROM rides
WHERE ST_Distance(pickup_geom, ST_Transform(ST_SetSRID(ST_MakePoint(-73.9851,40.7589),4326),2163)) < 400
AND pickup_datetime < '2016-01-01 14:00'
GROUP BY thirty_min ORDER BY thirty_min;
thirty_min | near_times_sq
---------------------+--------------
2016-01-01 00:00:00 | 74
2016-01-01 00:30:00 | 102
2016-01-01 01:00:00 | 120
2016-01-01 01:30:00 | 98
2016-01-01 02:00:00 | 112
2016-01-01 02:30:00 | 109
2016-01-01 03:00:00 | 163
2016-01-01 03:30:00 | 181
2016-01-01 04:00:00 | 214
2016-01-01 04:30:00 | 185
2016-01-01 05:00:00 | 158
2016-01-01 05:30:00 | 113
2016-01-01 06:00:00 | 102
2016-01-01 06:30:00 | 91
2016-01-01 07:00:00 | 88
2016-01-01 07:30:00 | 58
2016-01-01 08:00:00 | 72
2016-01-01 08:30:00 | 94
2016-01-01 09:00:00 | 115
2016-01-01 09:30:00 | 118
2016-01-01 10:00:00 | 135
2016-01-01 10:30:00 | 160
2016-01-01 11:00:00 | 212
2016-01-01 11:30:00 | 229
2016-01-01 12:00:00 | 244
2016-01-01 12:30:00 | 230
2016-01-01 13:00:00 | 235
2016-01-01 13:30:00 | 238
http://docs.timescale.com/v0.8/tutorials/other-sample-datasets
No more details are given here.
http://docs.timescale.com/v0.8/api
create_hypertable()
Required Arguments
Name | Description |
main_table | Identifier of table to convert to hypertable |
time_column_name | Name of the column containing time values |
Optional Arguments
Name | Description |
partitioning_column | Name of an additional column to partition by. If provided, number_partitions must be set. |
number_partitions | Number of hash partitions to use for partitioning_column when this optional argument is supplied. Must be > 0. |
chunk_time_interval | Interval in event time that each chunk covers. Must be > 0. Default is 1 month. |
create_default_indexes | Boolean whether to create default indexes on time/partitioning columns. Default is TRUE. |
if_not_exists | Boolean whether to print warning if table already converted to hypertable or raise exception. Default is FALSE. |
partitioning_func | The function to use for calculating a value's partition. |
Hash and interval shards are supported
add_dimension()
Required Arguments
Name | Description |
main_table | Identifier of hypertable to add the dimension to. |
column_name | Name of the column to partition by. |
Optional Arguments
Name | Description |
number_partitions | Number of hash partitions to use on column_name. Must be > 0. |
interval_length | Interval that each chunk covers. Must be > 0. |
partitioning_func | The function to use for calculating a value's partition (see create_hypertable instructions). |
Delete shards before the specified time
drop_chunks()
Required Arguments
Name | Description |
older_than | Timestamp of cut-off point for data to be dropped, i.e., anything older than this should be removed. |
Optional Arguments
Name | Description |
table_name | Hypertable name from which to drop chunks. If not supplied, all hypertables are affected. |
schema_name | Schema name of the hypertable from which to drop chunks. Defaults to public. |
cascade | Boolean on whether to CASCADE the drop on chunks, therefore removing dependent objects on chunks to be removed. Defaults to FALSE. |
set_chunk_time_interval()
Required Arguments
Name | Description |
main_table | Identifier of hypertable to update interval for. |
chunk_time_interval | Interval in event time that each new chunk covers. Must be > 0. |
first()
Required Arguments
Name | Description |
value | The value to return (anyelement) |
time | The timestamp to use for comparison (TIMESTAMP/TIMESTAMPTZ or integer type) |
For example, find the earliest uploaded temperature values for all sensors.
SELECT device_id, first(temp, time)
FROM metrics
GROUP BY device_id;
This can also be done by using recursive SQL:
Applications of PostgrSQL Recursive SQL - Geeks and Normal People
last()
Required Arguments
Name | Description |
value | The value to return (anyelement) |
time | The timestamp to use for comparison (TIMESTAMP/TIMESTAMPTZ or integer type) |
For example, find the latest temperature value of each sensor every 5 minutes.
SELECT device_id, time_bucket('5 minutes', time) as interval,
last(temp, time)
FROM metrics
WHERE time > now () - interval '1 day'
GROUP BY device_id, interval
ORDER BY interval DESC;
This can also be done by using recursive SQL:
Applications of PostgrSQL Recursive SQL - Geeks and Normal People
histogram()
Required Arguments
Name | Description |
value | A set of values to partition into a histogram |
min | The histogram's lower bound used in bucketing |
max | The histogram's upper bound used in bucketing |
nbuckets | The integer value for the number of histogram buckets (partitions) |
For example:
The battery level of 20 to 60 is divided into FIVE BUCKET intervals, and an array of 5 + 2 values (representing the number of records in each bucket interval) is returned. The two values at the beginning and the end indicate how many records are outside the boundary.
SELECT device_id, histogram(battery_level, 20, 60, 5)
FROM readings
GROUP BY device_id
LIMIT 10;
device_id | histogram
------------+------------------------------
demo000000 | {0,0,0,7,215,206,572}
demo000001 | {0,12,173,112,99,145,459}
demo000002 | {0,0,187,167,68,229,349}
demo000003 | {197,209,127,221,106,112,28}
demo000004 | {0,0,0,0,0,39,961}
demo000005 | {12,225,171,122,233,80,157}
demo000006 | {0,78,176,170,8,40,528}
demo000007 | {0,0,0,126,239,245,390}
demo000008 | {0,0,311,345,116,228,0}
demo000009 | {295,92,105,50,8,8,442}
This is similar to date_trunc, but it is more powerful and can be truncated with any interval. It is easy for users to use.
time_bucket()
Required Arguments
Name | Description |
bucket_width | A PostgreSQL time interval for how long each bucket is (interval) |
time | The timestamp to bucket (timestamp/timestamptz/date) |
Optional Arguments
Name | Description |
offset | The time interval to offset all buckets by (interval) |
hypertable_relation_size_pretty()
SELECT * FROM hypertable_relation_size_pretty('conditions');
table_size | index_size | toast_size | total_size
------------+------------+------------+------------
1171 MB | 1608 MB | 176 kB | 2779 MB
chunk_relation_size_pretty()
SELECT * FROM chunk_relation_size_pretty('conditions');
chunk_table | table_size | index_size | total_size
---------------------------------------------+------------+------------+------------
"_timescaledb_internal"."_hyper_1_1_chunk" | 28 MB | 36 MB | 64 MB
"_timescaledb_internal"."_hyper_1_2_chunk" | 57 MB | 78 MB | 134 MB
...
indexes_relation_size_pretty()
SELECT * FROM indexes_relation_size_pretty('conditions');
index_name_ | total_size
--------------------------------------+------------
public.conditions_device_id_time_idx | 1143 MB
public.conditions_time_idx | 465 MB
https://raw.githubusercontent.com/timescale/timescaledb/master/scripts/dump_meta_data.sql
psql [your connect flags] -d your_timescale_db < dump_meta_data.sql > dumpfile.txt
TimescaleDB is a very useful time-series data processing plug-in, hiding the shard logic (it is transparent to users) and providing a large number of API function interfaces and performance optimization. It is great for time-series scenarios.
Combined with PostGIS plug-in, PostgreSQL is more powerful in spatio-temporal processing.
PostgreSQL Time-Series Data Case: Automatic Compression over Time
PostgreSQL Time-Series Best Practices: Stock Exchange System Database
digoal - May 28, 2021
digoal - May 16, 2019
digoal - October 16, 2023
Alibaba Cloud Storage - March 1, 2021
ApsaraDB - October 21, 2022
digoal - May 17, 2021
An on-demand database hosting service for PostgreSQL with automated monitoring, backup and disaster recovery capabilities
Learn MoreAlibaba Cloud PolarDB for PostgreSQL is an in-house relational database service 100% compatible with PostgreSQL and highly compatible with the Oracle syntax.
Learn MoreAn on-demand database hosting service for MySQL with automated monitoring, backup and disaster recovery capabilities
Learn MoreApsaraDB RDS for MariaDB supports multiple storage engines, including MySQL InnoDB to meet different user requirements.
Learn MoreMore Posts by digoal