A downsampling query is an aggregate query that is performed based on a specified time interval. Downsampling queries are performed to decrease the sample rate in time series scenarios.
Applicable engines and versions
Downsampling queries are supported only by LindormTSDB 3.4.15 and later versions.
For more information about how to view and upgrade the LindormTSDB version of the Lindorm instance, see Release notes of LindormTSDB and Upgrade the minor engine version of a Lindorm instance.
Syntax
select_sample_by_statement ::= SELECT ( select_clause | '*' )
FROM table_identifier
WHERE where_clause
SAMPLE BY time_interval [ OFFSET offset_interval ] [ FILL fill_option ]
select_clause ::= selector [ AS identifier ] ( ',' selector [ AS identifier ] )
selector ::= tag_identifier | time | function_identifier '(' field_identifier [ ',' function_args ] ')'
where_clause ::= relation ( AND relation )* (OR relation)*
relation ::= ( field_identifier | tag_identifier ) operator term
operator ::= '=' | '<' | '>' | '<=' | '>=' | '!=' | IN | CONTAINS | CONTAINS KEY
time_interval ::= interval units | 0
offset_interval ::= interval units
Aggregate functions supported by downsampling queries
The SAMPLE BY
clause is executed to perform downsampling operations based on independent time series. For more information, see Data model.
The following table lists the functions supported by SAMPLE BY
.
Function | Description |
SUM | Returns the sum of values that are collected within each time window. For more information, see SUM. |
AVG | Returns the average value that is calculated based on values that are collected within each time window. For more information, see AVG. |
COUNT | Returns the number of values that are collected within each time window. For more information, see COUNT. |
MIN | Returns the minimum value that is collected within each time window. For more information, see MIN. |
MAX | Returns the maximum value that is collected within each time window. For more information, see MAX. |
FIRST | Returns the first value that is collected within each time window. For more information, see FIRST. |
LAST | Returns the last value that is collected within each time window. For more information, see LAST. |
PERCENTILE | Returns the specified percentile of values that are collected within each time window. For more information, see PERCENTILE. |
LATEST | Returns the latest values that are collected within a time window. For more information, see LATEST. |
RATE | Returns the change rate of the value in the current data point compared with the value of the previous data point. For more information, see RATE. |
DELTA | Returns the difference between the value of the current data point and the value in the previous data point. For more information, see DELTA. |
Examples
You do not need to specify a function for downsampling in the SELECT statement to query data from the tag column of a table. If you want to query data from non-tag columns, you must specify a function for downsampling.
The following statement is executed to query data from a table named sensor and the sample results that are returned:
SELECT * FROM sensor;
The following result is returned:
+-----------+----------+---------------------------+-------------+-----------+
| device_id | region | time | temperature | humidity |
+-----------+----------+---------------------------+-------------+-----------+
| F07A1260 | north-cn | 2021-01-01T09:00:00+08:00 | 0.000000 | 9.000000 |
| F07A1260 | north-cn | 2021-01-01T12:01:00+08:00 | 1.000000 | 45.000000 |
| F07A1260 | north-cn | 2021-01-01T14:03:00+08:00 | 2.000000 | 46.000000 |
| F07A1260 | north-cn | 2021-01-01T20:00:00+08:00 | 10.000000 | 47.000000 |
| F07A1261 | north-cn | 2021-02-10T12:00:30+08:00 | 3.000000 | 40.000000 |
| F07A1261 | north-cn | 2021-03-01T12:01:00+08:00 | 4.000000 | 41.000000 |
| F07A1261 | north-cn | 2021-03-08T12:08:00+08:00 | 5.000000 | 42.000000 |
| F07A1261 | north-cn | 2021-05-01T13:00:00+08:00 | 6.000000 | 43.000000 |
+-----------+----------+---------------------------+-------------+-----------+
Sample downsampling queries and downsampling queries that include subqueries
Subqueries are not supported in downsampling queries. However, downsampling queries can be used as subqueries in other types of queries.
Example 1: Downsampling is performed at an interval that is specified based on the UTC time. The following statement is executed to aggregate time series based on an 8-hour interval and return the number of values that are collected in the humidity column within each time window:
SELECT device_id,region,time,count(humidity) AS count_humidity FROM sensor WHERE device_id='F07A1260' sample by 8h;
The following result is returned:
+-----------+----------+---------------------------+----------------+ | device_id | region | time | count_humidity | +-----------+----------+---------------------------+----------------+ | F07A1260 | north-cn | 2021-01-01T08:00:00+08:00 | 3 | | F07A1260 | north-cn | 2021-01-01T16:00:00+08:00 | 1 | +-----------+----------+---------------------------+----------------+
Example 2: Downsampling is performed at an interval that is specified based on the UTC time and a specified time offset. The following statement specifies that time series are aggregated based on an 8-hour interval, the offset of the start time of each time window is 3 hours, and the number of values that are collected in the humidity column within each time window is calculated:
SELECT device_id,region,time,count(humidity) AS count_humidity FROM sensor WHERE device_id='F07A1260' sample by 8h offset 3h;
The following result is returned:
+-----------+----------+---------------------------+----------------+ | device_id | region | time | count_humidity | +-----------+----------+---------------------------+----------------+ | F07A1260 | north-cn | 2021-01-01T03:00:00+08:00 | 1 | | F07A1260 | north-cn | 2021-01-01T11:00:00+08:00 | 2 | | F07A1260 | north-cn | 2021-01-01T19:00:00+08:00 | 1 | +-----------+----------+---------------------------+----------------+
Example 3: Downsampling is performed at an interval that is specified based on the UTC time. The following statement specifies that time series are aggregated based on a 24-hour interval, the offset of the start time of each time window is 16 hours (aligned to 00:00 in GMT+08:00), and the number of values that are collected in the humidity column within each time window is calculated:
SELECT device_id,region,time,count(humidity) AS count_humidity FROM sensor WHERE device_id='F07A1260' sample by 24h offset 16h
The following result is returned:
+-----------+----------+---------------------------+----------------+ | device_id | region | time | count_humidity | +-----------+----------+---------------------------+----------------+ | F07A1260 | north-cn | 2021-01-01T00:00:00+08:00 | 4 | +-----------+----------+---------------------------+----------------+
Example 4: The SAMPLE BY clause cannot be used together with the GROUP BY clause, LIMIT OFFSET clause, or ORDER BY clause. You can include the GROUP BY clause, LIMIT OFFSET clause, and ORDER BY clause in SQL statements to specify subqueries.
SELECT device_id, max(avg_humidity) AS max_humidity FROM (SELECT device_id,region,time,avg(humidity) AS avg_humidity FROM sensor sample by 8h) group by device_id;
The following result is returned:
+-----------+--------------+ | device_id | max_humidity | +-----------+--------------+ | F07A1261 | 43.000000 | | F07A1260 | 47.000000 | +-----------+--------------+
Example 5: Use the LIMIT OFFSET clause to specify the number of results that you want to obtain.
SELECT device_id,region, avg_humidity FROM (select device_id,region,time,avg(humidity) AS avg_humidity FROM sensor sample by 8h) limit 1 offset 1;
The following result is returned:
+-----------+----------+--------------+ | device_id | region | avg_humidity | +-----------+----------+--------------+ | F07A1261 | north-cn | 40.000000 | +-----------+----------+--------------+
Sample interpolation queries
Example 1: Interpolate a specified value based on a specified time interval.
SELECT * from (select device_id,region,time, avg(humidity) AS humidity FROM sensor WHERE device_id='F07A1260' sample by 2h fill 1) order by device_id;
The following result is returned:
+-----------+----------+---------------------------+-----------+ | device_id | region | time | humidity | +-----------+----------+---------------------------+-----------+ | F07A1260 | north-cn | 2021-01-01T08:00:00+08:00 | 9.000000 | | F07A1260 | north-cn | 2021-01-01T10:00:00+08:00 | 1.000000 | | F07A1260 | north-cn | 2021-01-01T12:00:00+08:00 | 45.000000 | | F07A1260 | north-cn | 2021-01-01T14:00:00+08:00 | 46.000000 | | F07A1260 | north-cn | 2021-01-01T16:00:00+08:00 | 1.000000 | | F07A1260 | north-cn | 2021-01-01T18:00:00+08:00 | 1.000000 | | F07A1260 | north-cn | 2021-01-01T20:00:00+08:00 | 47.000000 | +-----------+----------+---------------------------+-----------+
Example 2: Interpolate the value that is collected within the previous time window.
SELECT * from (select device_id,region,time,avg(humidity) AS humidity FROM sensor WHERE device_id='F07A1260' sample by 2h fill after) order by device_id;
The following result is returned:
+-----------+----------+---------------------------+-----------+ | device_id | region | time | humidity | +-----------+----------+---------------------------+-----------+ | F07A1260 | north-cn | 2021-01-01T08:00:00+08:00 | 9.000000 | | F07A1260 | north-cn | 2021-01-01T10:00:00+08:00 | 45.000000 | | F07A1260 | north-cn | 2021-01-01T12:00:00+08:00 | 45.000000 | | F07A1260 | north-cn | 2021-01-01T14:00:00+08:00 | 46.000000 | | F07A1260 | north-cn | 2021-01-01T16:00:00+08:00 | 47.000000 | | F07A1260 | north-cn | 2021-01-01T18:00:00+08:00 | 47.000000 | | F07A1260 | north-cn | 2021-01-01T20:00:00+08:00 | 47.000000 | +-----------+----------+---------------------------+-----------+
Examples of aggregation after downsampling
Example 1: Use the AVG function to perform downsampling at an interval of 2 hours and then use the
RATE
function to calculate the change rate of values between two data points.SELECT device_id,region,time,rate(avg(humidity)) AS rate_humidity FROM sensor WHERE device_id='F07A1260' sample by 2h;
The following result is returned:
+-----------+----------+---------------------------+---------------+ | device_id | region | time | rate_humidity | +-----------+----------+---------------------------+---------------+ | F07A1260 | north-cn | 2021-01-01T12:00:00+08:00 | 0.002500 | | F07A1260 | north-cn | 2021-01-01T14:00:00+08:00 | 0.000139 | | F07A1260 | north-cn | 2021-01-01T20:00:00+08:00 | 0.000046 | +-----------+----------+---------------------------+---------------+
Example 2: Perform downsampling at an interval of 2 hours and then use the DELTA function to calculate the difference between the values of two data points.
SELECT device_id,region,time,delta(avg(humidity)) AS humidity FROM sensor WHERE device_id='F07A1260' sample by 2h;
The following result is returned:
+-----------+----------+---------------------------+-----------+ | device_id | region | time | humidity | +-----------+----------+---------------------------+-----------+ | F07A1260 | north-cn | 2021-01-01T12:00:00+08:00 | 36.000000 | | F07A1260 | north-cn | 2021-01-01T14:00:00+08:00 | 1.000000 | | F07A1260 | north-cn | 2021-01-01T20:00:00+08:00 | 1.000000 | +-----------+----------+---------------------------+-----------+
Interpolation
During downsampling, all time series are split based on a specified time interval, and values of data points in each time window are aggregated. If no value is collected within a time window, a specified fill policy can be used to interpolate a value for the data point. For example, the timestamps of values that are collected in a time series after downsampling is completed are: t + 0, t + 20, and t + 30. If you do not specify conditions for interpolation in the query statement, only three values are returned. If you specify 1 for interpolation, four values are returned. The value 1 is interpolated as the value of the data point at t + 10.
Functions for interpolation
Fill Policy | Interpolated value |
none | No values are interpolated. This is the default value. |
zero | Interpolates the value 0. |
linear | Performs linear interpolation. |
previous | Interpolates the previous value. |
near | Interpolates the adjacent value. |
after | Interpolates the next value. |
fixed | Interpolates a specified value. |