全部產品
Search
文件中心

AnalyticDB:視窗函數

更新時間:Oct 01, 2024

視窗函數常用於計算分組排名,移動平均,累計和等複雜計算。本文介紹AnalyticDB for MySQL視窗函數的用法與樣本。

  • 彙總函式

  • 排序函數

    • CUME_DIST:返回一組數值中每個值的累計分布。

    • RANK:返回資料集中每個值的排名。

    • DENSE_RANK:返回一組數值中每個數值的排名。

    • NTILE:將每個視窗分區的資料分散到桶號從1到n的n個桶中。

    • ROW_NUMBER:根據行在視窗分區內的順序,為每行資料返回一個唯一的有序行號,行號從1開始。

    • PERCENT_RANK:返回資料集中每個資料的排名百分比,其結果由(r - 1) / (n - 1)計算得出。其中r為RANK()計算的當前行排名, n為當前視窗分區內總的行數。

  • 值函數

    • FIRST_VALUE:返回視窗分區第1行的值。

    • LAST_VALUE返回視窗分區最後1行的值。

    • LAG:返回視窗內距離當前行之前位移offset後的值。

    • LEAD:返回視窗內距離當前行位移offset後的值。

    • NTH_VALUE:返回視窗內位移指定offset後的值,位移量從1開始。

概述

視窗函數基於查詢結果的行資料進行計算,視窗函數運行在HAVING子句之後、 ORDER BY子句之前。視窗函數需要特殊的關鍵字OVER子句來指定視窗即觸發一個視窗函數。

AnalyticDB for MySQL支援三種類型的視窗函數:彙總函式、排序函數和值函數。

文法

function over ([partition by a] order by b RANGE|ROWS BETWEEN start AND end)                

視窗函數包含以下三個部分。

  • 分區規範(可選):用於將輸入行分散到不同的分區中,過程和GROUP BY子句的分散過程相似。

  • 排序規範:決定輸入資料行在視窗函數中執行的順序。

  • 視窗區間:指定計算資料的視窗邊界。

    視窗區間支援RANGEROWS兩種模式:

    • RANGE按照計算資料行值的範圍進行定義。

    • ROWS按照計算資料行的行數進行範圍定義。

    • RANGEROWS中可以使用BETWEEN start AND end指定邊界可取值。BETWEEN start AND end取值為:

      • CURRENT ROW,當前行。

      • N PRECEDING,前n行。

      • UNBOUNDED PRECEDING,直到第1行。

      • N FOLLOWING,後n行。

      • UNBOUNDED FOLLOWING,直到最後1行。

例如,以下查詢根據當前視窗的每行資料計算profit的部分總和。

select year,country,profit,sum(profit) over (partition by country order by year ROWS BETWEEN UNBOUNDED PRECEDING and CURRENT ROW) as slidewindow from testwindow;
+------+---------+--------+-------------+
| year | country | profit | slidewindow |
+------+---------+--------+-------------+
| 2001 | USA     |     50 |          50 |
| 2001 | USA     |   1500 |        1550 |
| 2000 | Germany |     75 |          75 |
| 2000 | Germany |     75 |         150 |
| 2001 | Germany |     79 |         229 |
| 2000 | Finland |   1500 |        1500 |
| 2001 | Finland |     10 |        1510 |        

而以下查詢只能計算出profit的總和。

select country,sum(profit) over (partition by country) from testwindow;
+---------+-----------------------------------------+
| country | sum(profit) OVER (PARTITION BY country) |
+---------+-----------------------------------------+
| Germany |                                     229 |
| Germany |                                     229 |
| Germany |                                     229 |
| USA     |                                    1550 |
| USA     |                                    1550 |
| Finland |                                    1510 |
| Finland |                                    1510 |        

注意事項

邊界值的取值有如下要求:

  • start不能為UNBOUNDED FOLLOWING,否則提示Window frame start cannot be UNBOUNDED FOLLOWING錯誤。

  • end不能為UNBOUNDED PRECEDING,否則提示Window frame end cannot be UNBOUNDED PRECEDING錯誤。

  • startCURRENT ROW並且endN PRECEDING時,將提示Window frame starting from CURRENT ROW cannot end with PRECEDING錯誤。

  • startN FOLLOWING並且endN PRECEDING時,將提示Window frame starting from FOLLOWING cannot end with PRECEDING錯誤。

  • startN FOLLOWING並且endCURRENT ROW,將提示Window frame starting from FOLLOWING cannot end with CURRENT ROW錯誤。

當模式為RANGE時:

  • start或者endN PRECEDING時,將提示Window frame RANGE PRECEDING is only supported with UNBOUNDED錯誤。

  • start或者endN FOLLOWING時,將提示Window frame RANGE FOLLOWING is only supported with UNBOUNDED錯誤。

準備工作

本文中的視窗函數均以testwindow表為測試資料。

create table testwindow(year int, country varchar(20), product varchar(20), profit int) distributed by hash(year);        
insert into testwindow values (2000,'Finland','Computer',1500);
insert into testwindow values (2001,'Finland','Phone',10);
insert into testwindow values (2000,'Germany','Calculator',75);
insert into testwindow values (2000,'Germany','Calculator',75);
insert into testwindow values (2001,'Germany','Calculator',79);
insert into testwindow values (2001,'USA','Calculator',50);
insert into testwindow values (2001,'USA','Computer',1500);        
SELECT * FROM testwindow;
+------+---------+------------+--------+
| year | country | product    | profit |
+------+---------+------------+--------+
| 2000 | Finland | Computer   |   1500 |
| 2001 | Finland | Phone      |     10 |
| 2000 | Germany | Calculator |     75 |
| 2000 | Germany | Calculator |     75 |
| 2001 | Germany | Calculator |     79 |
| 2001 | USA     | Calculator |     50 |
| 2001 | USA     | Computer   |   1500 |        

彙總函式

所有彙總函式都可以通過添加OVER子句來作為視窗函數使用,彙總函式將基於當前滑動視窗內的資料行計算每一行資料。

例如,通過以下查詢迴圈顯示每個店員每天的訂單額總和。

SELECT clerk, orderdate, orderkey, totalprice,sum(totalprice) OVER (PARTITION BY clerk ORDER BY orderdate) AS rolling_sum FROM orders ORDER BY clerk, orderdate, orderkey            

CUME_DIST

CUME_DIST()           
  • 命令說明:返回一組數值中每個值的累計分布。

    返回結果:在視窗分區中對視窗進行排序後的資料集,包括當前行和當前行之前的資料行數。排序中任何關聯值均會計算成相同的分布值。

  • 傳回值類型:DOUBLE。

  • 樣本:

    select year,country,product,profit,cume_dist() over (partition by country order by profit) as cume_dist from testwindow;
    +------+---------+------------+--------+--------------------+
    | year | country | product    | profit | cume_dist          |
    +------+---------+------------+--------+--------------------+
    | 2001 | USA     | Calculator |     50 |                0.5 |
    | 2001 | USA     | Computer   |   1500 |                1.0 |
    | 2001 | Finland | Phone      |     10 |                0.5 |
    | 2000 | Finland | Computer   |   1500 |                1.0 |
    | 2000 | Germany | Calculator |     75 | 0.6666666666666666 |
    | 2000 | Germany | Calculator |     75 | 0.6666666666666666 |
    | 2001 | Germany | Calculator |     79 |                1.0 |                

RANK

RANK()            
  • 命令說明:返回資料集中每個值的排名。

    排名值是將當前行之前的行數加1,不包含當前行。因此,排序的關聯值可能產生順序上的空隙,而且這個排名會對每個視窗分區進行計算。

  • 傳回值類型:BIGINT。

  • 樣本:

    select year,country,product,profit,rank() over (partition by country order by profit) as rank from testwindow;
    +------+---------+------------+--------+------+
    | year | country | product    | profit | rank |
    +------+---------+------------+--------+------+
    | 2001 | Finland | Phone      |     10 |    1 |
    | 2000 | Finland | Computer   |   1500 |    2 |
    | 2001 | USA     | Calculator |     50 |    1 |
    | 2001 | USA     | Computer   |   1500 |    2 |
    | 2000 | Germany | Calculator |     75 |    1 |
    | 2000 | Germany | Calculator |     75 |    1 |
    | 2001 | Germany | Calculator |     79 |    3 |                    

DENSE_RANK

DENSE_RANK()            
  • 命令說明:返回一組數值中每個數值的排名。

    DENSE_RANK()RANK()功能相似,但是DENSE_RANK()關聯值不會產生順序上的空隙。

  • 傳回值類型:BIGINT。

  • 樣本:

    select year,country,product,profit,dense_rank() over (partition by country order by profit) as dense_rank from testwindow;
    +------+---------+------------+--------+------------+
    | year | country | product    | profit | dense_rank |
    +------+---------+------------+--------+------------+
    | 2001 | Finland | Phone      |     10 |          1 |
    | 2000 | Finland | Computer   |   1500 |          2 |
    | 2001 | USA     | Calculator |     50 |          1 |
    | 2001 | USA     | Computer   |   1500 |          2 |
    | 2000 | Germany | Calculator |     75 |          1 |
    | 2000 | Germany | Calculator |     75 |          1 |
    | 2001 | Germany | Calculator |     79 |          2 |                   

NTILE

NTILE(n)            
  • 命令說明:將每個視窗分區的資料分散到桶號從1nn個桶中。

    桶號值最多間隔1,如果視窗分區中的資料行數不能均勻地分散到每一個桶中,則剩餘值將從第1個桶開始,每1個桶分1行資料。例如,有6行資料和4個桶, 最終桶號值為1 1 2 2 3 4

  • 傳回值類型:BIGINT。

  • 樣本:

    select year,country,product,profit,ntile(2) over (partition by country order by profit) as ntile2 from testwindow;
    +------+---------+------------+--------+--------+
    | year | country | product    | profit | ntile2 |
    +------+---------+------------+--------+--------+
    | 2001 | USA     | Calculator |     50 |      1 |
    | 2001 | USA     | Computer   |   1500 |      2 |
    | 2001 | Finland | Phone      |     10 |      1 |
    | 2000 | Finland | Computer   |   1500 |      2 |
    | 2000 | Germany | Calculator |     75 |      1 |
    | 2000 | Germany | Calculator |     75 |      1 |
    | 2001 | Germany | Calculator |     79 |      2 |                    

ROW_NUMBER

ROW_NUMBER()            
  • 命令說明:根據行在視窗分區內的順序,為每行資料返回一個唯一的有序行號,行號從1開始。

  • 傳回值類型:BIGINT。

  • 樣本:

    SELECT year, country, product, profit, ROW_NUMBER() OVER(PARTITION BY country) AS row_num1 FROM testwindow;
    +------+---------+------------+--------+----------+
    | year | country | product    | profit | row_num1 |
    +------+---------+------------+--------+----------+
    | 2001 | USA     | Calculator |     50 |        1 |
    | 2001 | USA     | Computer   |   1500 |        2 |
    | 2000 | Germany | Calculator |     75 |        1 |
    | 2000 | Germany | Calculator |     75 |        2 |
    | 2001 | Germany | Calculator |     79 |        3 |
    | 2000 | Finland | Computer   |   1500 |        1 |
    | 2001 | Finland | Phone      |     10 |        2 |                    

PERCENT_RANK

PERCENT_RANK()            
  • 命令說明:返回資料集中每個資料的排名百分比,其結果由(r - 1) / (n - 1)計算得出。其中,rRANK()計算的當前行排名, n為當前視窗分區內總的行數。

  • 傳回值類型:DOUBLE。

  • 樣本:

    select year,country,product,profit,PERCENT_RANK() over (partition by country order by profit) as ntile3 from testwindow;
    +------+---------+------------+--------+--------+
    | year | country | product    | profit | ntile3 |
    +------+---------+------------+--------+--------+
    | 2001 | Finland | Phone      |     10 |    0.0 |
    | 2000 | Finland | Computer   |   1500 |    1.0 |
    | 2001 | USA     | Calculator |     50 |    0.0 |
    | 2001 | USA     | Computer   |   1500 |    1.0 |
    | 2000 | Germany | Calculator |     75 |    0.0 |
    | 2000 | Germany | Calculator |     75 |    0.0 |
    | 2001 | Germany | Calculator |     79 |    1.0 |                    

FIRST_VALUE

FIRST_VALUE(x)        
  • 命令說明:返回視窗分區第一行的值。

  • 傳回值類型:與輸入參數類型相同。

  • 樣本:

    select year,country,product,profit,first_value(profit) over (partition by country order by profit) as firstValue from testwindow;
    +------+---------+------------+--------+------------+
    | year | country | product    | profit | firstValue |
    +------+---------+------------+--------+------------+
    | 2000 | Germany | Calculator |     75 |         75 |
    | 2000 | Germany | Calculator |     75 |         75 |
    | 2001 | Germany | Calculator |     79 |         75 |
    | 2001 | USA     | Calculator |     50 |         50 |
    | 2001 | USA     | Computer   |   1500 |         50 |
    | 2001 | Finland | Phone      |     10 |         10 |
    | 2000 | Finland | Computer   |   1500 |         10 |                

LAST_VALUE

LAST_VALUE(x)            
  • 命令說明:返回視窗分區最後一行的值。LAST_VALUE預設統計範圍是 rows between unbounded preceding and current row,即取當前行資料與當前行之前的資料進行比較。如果像FIRST_VALUE那樣直接在每行資料中顯示最後一行資料,需要在 order by 條件的後面加上語句:rows between unbounded preceding and unbounded following。

  • 傳回值類型:與輸入參數類型相同。

  • 樣本1:

    select year,country,product,profit,last_value(profit) over (partition by country order by profit) as firstValue from testwindow;
    +----------------+-------------------+-------------------+------------------+----------------------+
    | year           | country           | product           | profit           | firstValue           |
    +----------------+-------------------+-------------------+------------------+----------------------+
    |           2001 | USA               | Calculator        |               50 |                   50 |
    |           2001 | USA               | Computer          |             1500 |                 1500 |
    |           2001 | Finland           | Phone             |               10 |                   10 |
    |           2000 | Finland           | Computer          |             1500 |                 1500 |
    |           2000 | Germany           | Calculator        |               75 |                   75 |
    |           2000 | Germany           | Calculator        |               75 |                   75 |
    |           2001 | Germany           | Calculator        |               79 |                   79 |                 
  • 樣本2:

    select year,country,product,profit,last_value(profit) over (partition by country order by profitrows between unbounded preceding and unbounded following) as lastValue from testwindow;
    +------+---------+-----------+--------+-----------+
    | year | country | product   | profit | lastValue |
    +------+---------+-----------+--------+-----------+
    | 2001 | Finland | Phone     |   10   |   1500    |
    | 2000 | Finland | Computer  |  1500  |   1500    |
    | 2000 | Germany | Calculator|   75   |    79     |
    | 2000 | Germany | Calculator|   75   |    79     |
    | 2001 | Germany | Calculator|   79   |    79     |
    | 2001 | USA     | Calculator|   50   |   1500    |
    | 2001 | USA     | Computer  |  1500  |   1500    |
    +------+---------+-----------+--------+-----------+

LAG

LAG(x[, offset[, default_value]])           
  • 命令說明:返回視窗內距離當前行之前位移offset後的值。

    位移量起始值是0,也就是當前資料行。位移量可以是標量運算式,預設offset1

    如果位移量的值是null或者大於視窗長度,則返回default_value;如果沒有指定default_value,則返回null

  • 傳回值類型:與輸入參數類型相同。

  • 樣本:

    select year,country,product,profit,lag(profit) over (partition by country order by profit) as lag from testwindow;
    +------+---------+------------+--------+------+
    | year | country | product    | profit | lag  |
    +------+---------+------------+--------+------+
    | 2001 | USA     | Calculator |     50 | NULL |
    | 2001 | USA     | Computer   |   1500 |   50 |
    | 2000 | Germany | Calculator |     75 | NULL |
    | 2000 | Germany | Calculator |     75 |   75 |
    | 2001 | Germany | Calculator |     79 |   75 |
    | 2001 | Finland | Phone      |     10 | NULL |
    | 2000 | Finland | Computer   |   1500 |   10 |                    

LEAD

LEAD(x[,offset[, default_value]])            
  • 命令說明:返回視窗內距離當前行位移offset後的值。

    位移量offset起始值是0,也就是當前資料行。位移量可以是標量運算式,預設offset1

    如果位移量的值是null或者大於視窗長度,則返回default_value;如果沒有指定default_value,則返回null

  • 傳回值類型:與輸入參數類型相同。

  • 樣本:

    select year,country,product,profit,lead(profit) over (partition by country order by profit) as lead from testwindow;
    +------+---------+------------+--------+------+
    | year | country | product    | profit | lead |
    +------+---------+------------+--------+------+
    | 2000 | Germany | Calculator |     75 |   75 |
    | 2000 | Germany | Calculator |     75 |   79 |
    | 2001 | Germany | Calculator |     79 | NULL |
    | 2001 | Finland | Phone      |     10 | 1500 |
    | 2000 | Finland | Computer   |   1500 | NULL |
    | 2001 | USA     | Calculator |     50 | 1500 |
    | 2001 | USA     | Computer   |   1500 | NULL |                    

NTH_VALUE

NTH_VALUE(x, offset)            
  • 命令說明:返回視窗內位移指定offset後的值,位移量從1開始。

    如果位移量offsetnull或者大於視窗內值的個數,則返回null;如果位移量offset0或者負數,則系統提示報錯。

  • 傳回值類型:與輸入參數類型相同。

  • 樣本:

    select year,country,product,profit,nth_value(profit,1) over (partition by country order by profit) as nth_value from testwindow;
    +------+---------+------------+--------+-----------+
    | year | country | product    | profit | nth_value |
    +------+---------+------------+--------+-----------+
    | 2001 | Finland | Phone      |     10 |        10 |
    | 2000 | Finland | Computer   |   1500 |        10 |
    | 2001 | USA     | Calculator |     50 |        50 |
    | 2001 | USA     | Computer   |   1500 |        50 |
    | 2000 | Germany | Calculator |     75 |        75 |
    | 2000 | Germany | Calculator |     75 |        75 |
    | 2001 | Germany | Calculator |     79 |        75 |