全部产品
Search
文档中心

云原生多模数据库 Lindorm:降采样查询

更新时间:Jan 16, 2024

降采样查询是在时间维度的聚合查询,是时序数据应用场景常用的降低采样率的查询。

引擎与版本

降采样查询仅支持时序引擎,且引擎版本需为3.4.15及以上版本。

说明

如何查看或升级时序引擎版本,请参见时序引擎版本说明升级小版本

语法

select_sample_by_statement ::=  SELECT ( select_clause | '*' )
                                FROM table_identifier
                                WHERE where_clause
                                SAMPLE BY time_interval [ OFFSET offset_interval ] [ FILL fill_option ]
select_clause              ::=  selector [ AS identifier ] ( ',' selector [ AS identifier ] )
selector                   ::=  tag_identifier | time | function_identifier '(' field_identifier [ ',' function_args ] ')'
where_clause               ::=  relation ( AND relation )* (OR relation)*
relation                   ::=  ( field_identifier | tag_identifier ) operator term
operator                   ::=  '=' | '<' | '>' | '<=' | '>=' | '!=' | IN | CONTAINS | CONTAINS KEY
time_interval              ::= interval units | 0
offset_interval            ::= interval units

降采样支持的聚合函数列表

SAMPLE BY是针对每条独立的时间线(关于时间线的概念,请参见数据模型)上的降采样操作。

SAMPLE BY支持的函数列表:

函数

说明

SUM

每个指定时间窗口内求和,具体请参见SUM函数

AVG

每个指定时间窗口内求均值,具体请参见AVG函数

COUNT

每个指定时间窗口内值个数,具体请参见COUNT函数

MIN

每个指定时间窗口内最小值,具体请参见MIN函数

MAX

每个指定时间窗口内最大值,具体请参见MAX函数

FIRST

每个指定时间窗口内第一个值,具体请参见FIRST函数

LAST

每个指定时间窗口内最后值,具体请参见LAST函数

PERCENTILE

每个指定时间窗口内求百分位,具体请参见PERCENTILE函数

LATEST

整个时间区间最新值,具体请参见LATEST函数

RATE

与前一行对应值的变化率,具体请参见RATE函数

DELTA

与前一行对应值的差,具体请参见DELTA函数

示例

SELECT查询指定的tag列无需指定降采样函数,其他field列必须指定降采样函数。

假定查询的sensor表数据如下:

SELECT * FROM sensor;

返回结果如下:

+-----------+----------+---------------------------+-------------+-----------+
| device_id |  region  |           time            | temperature | humidity  |
+-----------+----------+---------------------------+-------------+-----------+
| F07A1260  | north-cn | 2021-01-01T09:00:00+08:00 | 0.000000    | 9.000000  |
| F07A1260  | north-cn | 2021-01-01T12:01:00+08:00 | 1.000000    | 45.000000 |
| F07A1260  | north-cn | 2021-01-01T14:03:00+08:00 | 2.000000    | 46.000000 |
| F07A1260  | north-cn | 2021-01-01T20:00:00+08:00 | 10.000000   | 47.000000 |
| F07A1261  | north-cn | 2021-02-10T12:00:30+08:00 | 3.000000    | 40.000000 |
| F07A1261  | north-cn | 2021-03-01T12:01:00+08:00 | 4.000000    | 41.000000 |
| F07A1261  | north-cn | 2021-03-08T12:08:00+08:00 | 5.000000    | 42.000000 |
| F07A1261  | north-cn | 2021-05-01T13:00:00+08:00 | 6.000000    | 43.000000 |
+-----------+----------+---------------------------+-------------+-----------+

降采样与子查询示例

说明

降采样查询中不支持嵌套子查询,但可以作为子查询被其他查询嵌套。

  • 示例1:默认UTC对齐降采样,时间线按照8h时间区间聚合分别求count。

    SELECT device_id,region,time,count(humidity) AS count_humidity FROM sensor WHERE device_id='F07A1260' sample by 8h;

    返回结果如下:

    +-----------+----------+---------------------------+----------------+
    | device_id |  region  |           time            | count_humidity |
    +-----------+----------+---------------------------+----------------+
    | F07A1260  | north-cn | 2021-01-01T08:00:00+08:00 | 3              |
    | F07A1260  | north-cn | 2021-01-01T16:00:00+08:00 | 1              |
    +-----------+----------+---------------------------+----------------+
  • 示例2:默认UTC对齐降采样,指定窗口offset偏移,时间线按照8h时间区间聚合,开始窗口偏移3h,求count。

    SELECT device_id,region,time,count(humidity) AS count_humidity FROM sensor WHERE device_id='F07A1260' sample by 8h offset 3h;

    返回结果如下:

    +-----------+----------+---------------------------+----------------+
    | device_id |  region  |           time            | count_humidity |
    +-----------+----------+---------------------------+----------------+
    | F07A1260  | north-cn | 2021-01-01T03:00:00+08:00 | 1              |
    | F07A1260  | north-cn | 2021-01-01T11:00:00+08:00 | 2              |
    | F07A1260  | north-cn | 2021-01-01T19:00:00+08:00 | 1              |
    +-----------+----------+---------------------------+----------------+
  • 示例3: 默认UTC对齐降采样,对齐到当地时间0点(例如东8区时间),时间线按照24h时间区间聚合,开始窗口偏移16h,求count。

    SELECT device_id,region,time,count(humidity) AS count_humidity FROM sensor WHERE device_id='F07A1260' sample by 24h offset 16h

    返回结果如下:

    +-----------+----------+---------------------------+----------------+
    | device_id | region   | time                      | count_humidity |
    +-----------+----------+---------------------------+----------------+
    | F07A1260  | north-cn | 2021-01-01T00:00:00+08:00 | 4              |
    +-----------+----------+---------------------------+----------------+
  • 示例4:sample by目前不支持与group by、limit offset、order by配合,但可以配合子查询使用。

    SELECT device_id, max(avg_humidity) AS max_humidity  FROM (SELECT device_id,region,time,avg(humidity) AS avg_humidity FROM sensor sample by 8h) group by device_id;

    返回结果如下:

    +-----------+--------------+
    | device_id | max_humidity |
    +-----------+--------------+
    | F07A1261  | 43.000000    |
    | F07A1260  | 47.000000    |
    +-----------+--------------+
  • 示例5:配合limit offset限制结果条数。

    SELECT device_id,region, avg_humidity FROM (select device_id,region,time,avg(humidity) AS avg_humidity FROM sensor sample by 8h) limit 1 offset 1;

    返回结果如下:

    +-----------+----------+--------------+
    | device_id |  region  | avg_humidity |
    +-----------+----------+--------------+
    | F07A1261  | north-cn | 40.000000    |
    +-----------+----------+--------------+

降采样窗口插值示例

  • 示例1:固定值插值。

    SELECT * from (select device_id,region,time, avg(humidity) AS humidity FROM sensor WHERE device_id='F07A1260' sample by 2h fill 1) order by device_id;

    返回结果如下:

    +-----------+----------+---------------------------+-----------+
    | device_id |  region  |           time            | humidity  |
    +-----------+----------+---------------------------+-----------+
    | F07A1260  | north-cn | 2021-01-01T08:00:00+08:00 | 9.000000  |
    | F07A1260  | north-cn | 2021-01-01T10:00:00+08:00 | 1.000000  |
    | F07A1260  | north-cn | 2021-01-01T12:00:00+08:00 | 45.000000 |
    | F07A1260  | north-cn | 2021-01-01T14:00:00+08:00 | 46.000000 |
    | F07A1260  | north-cn | 2021-01-01T16:00:00+08:00 | 1.000000  |
    | F07A1260  | north-cn | 2021-01-01T18:00:00+08:00 | 1.000000  |
    | F07A1260  | north-cn | 2021-01-01T20:00:00+08:00 | 47.000000 |
    +-----------+----------+---------------------------+-----------+
  • 示例2:后值插值。

    SELECT * from (select device_id,region,time,avg(humidity) AS humidity FROM sensor WHERE device_id='F07A1260' sample by 2h fill after) order by device_id;

    返回结果如下:

    +-----------+----------+---------------------------+-----------+
    | device_id |  region  |           time            | humidity  |
    +-----------+----------+---------------------------+-----------+
    | F07A1260  | north-cn | 2021-01-01T08:00:00+08:00 | 9.000000  |
    | F07A1260  | north-cn | 2021-01-01T10:00:00+08:00 | 45.000000 |
    | F07A1260  | north-cn | 2021-01-01T12:00:00+08:00 | 45.000000 |
    | F07A1260  | north-cn | 2021-01-01T14:00:00+08:00 | 46.000000 |
    | F07A1260  | north-cn | 2021-01-01T16:00:00+08:00 | 47.000000 |
    | F07A1260  | north-cn | 2021-01-01T18:00:00+08:00 | 47.000000 |
    | F07A1260  | north-cn | 2021-01-01T20:00:00+08:00 | 47.000000 |
    +-----------+----------+---------------------------+-----------+

降采样后再转换示例

  • 示例1:先按照2h窗口avg降采样,然后对降采样后的数据计算rate斜率。

    SELECT device_id,region,time,rate(avg(humidity)) AS rate_humidity FROM sensor WHERE device_id='F07A1260' sample by 2h;

    返回结果如下:

    +-----------+----------+---------------------------+---------------+
    | device_id |  region  |           time            | rate_humidity |
    +-----------+----------+---------------------------+---------------+
    | F07A1260  | north-cn | 2021-01-01T12:00:00+08:00 | 0.002500      |
    | F07A1260  | north-cn | 2021-01-01T14:00:00+08:00 | 0.000139      |
    | F07A1260  | north-cn | 2021-01-01T20:00:00+08:00 | 0.000046      |
    +-----------+----------+---------------------------+---------------+
  • 示例2:先按照2h窗口降采样,然后对降采样后的数据计算差值delta。

    SELECT device_id,region,time,delta(avg(humidity)) AS humidity FROM sensor WHERE device_id='F07A1260' sample by 2h;

    返回结果如下:

    +-----------+----------+---------------------------+-----------+
    | device_id |  region  |           time            | humidity  |
    +-----------+----------+---------------------------+-----------+
    | F07A1260  | north-cn | 2021-01-01T12:00:00+08:00 | 36.000000 |
    | F07A1260  | north-cn | 2021-01-01T14:00:00+08:00 | 1.000000  |
    | F07A1260  | north-cn | 2021-01-01T20:00:00+08:00 | 1.000000  |
    +-----------+----------+---------------------------+-----------+

插值

降采样先把所有时间线按照指定时间窗口切分,并把每个降采样区间内的数据做一次运算,降采样后如果某个精度区间没有值,插值可以指定在这个时间点填充具体的值。比如某条时间线降采样后的时间戳为:t+0, t+20, t+30,此时如果不指定插值,只有3个值,如果指定了插值为1,此时间线会有4个值,其中t+10时刻的值为1。

插值函数表:

Fill Policy

填充值

none

默认行为,不填值。

zero

固定填入0。

linear

线性填充值。

previous

之前的一个值。

near

邻近的一个值。

after

之后的一个值。

fixed

用指定的一个固定填充值。