視窗函數常用於計算分組排名,移動平均,累計和等複雜計算。本文介紹AnalyticDB for MySQL視窗函數的用法與樣本。
排序函數
CUME_DIST:返回一組數值中每個值的累計分布。
RANK:返回資料集中每個值的排名。
DENSE_RANK:返回一組數值中每個數值的排名。
NTILE:將每個視窗分區的資料分散到桶號從1到n的n個桶中。
ROW_NUMBER:根據行在視窗分區內的順序,為每行資料返回一個唯一的有序行號,行號從1開始。
PERCENT_RANK:返回資料集中每個資料的排名百分比,其結果由
(r - 1) / (n - 1)
計算得出。其中r為RANK()計算的當前行排名, n為當前視窗分區內總的行數。
值函數
FIRST_VALUE:返回視窗分區第1行的值。
LAST_VALUE返回視窗分區最後1行的值。
LAG:返回視窗內距離當前行之前位移offset後的值。
LEAD:返回視窗內距離當前行位移offset後的值。
NTH_VALUE:返回視窗內位移指定offset後的值,位移量從1開始。
概述
視窗函數基於查詢結果的行資料進行計算,視窗函數運行在HAVING
子句之後、 ORDER BY
子句之前。視窗函數需要特殊的關鍵字OVER
子句來指定視窗即觸發一個視窗函數。
AnalyticDB for MySQL支援三種類型的視窗函數:彙總函式、排序函數和值函數。
文法
function over ([partition by a] order by b RANGE|ROWS BETWEEN start AND end)
視窗函數包含以下三個部分。
分區規範(可選):用於將輸入行分散到不同的分區中,過程和
GROUP BY
子句的分散過程相似。排序規範:決定輸入資料行在視窗函數中執行的順序。
視窗區間:指定計算資料的視窗邊界。
視窗區間支援
RANGE
、ROWS
兩種模式:RANGE
按照計算資料行值的範圍進行定義。ROWS
按照計算資料行的行數進行範圍定義。RANGE
、ROWS
中可以使用BETWEEN start AND end
指定邊界可取值。BETWEEN start AND end
取值為:CURRENT ROW
,當前行。N PRECEDING
,前n
行。UNBOUNDED PRECEDING
,直到第1
行。N FOLLOWING
,後n
行。UNBOUNDED FOLLOWING
,直到最後1
行。
例如,以下查詢根據當前視窗的每行資料計算profit
的部分總和。
select year,country,profit,sum(profit) over (partition by country order by year ROWS BETWEEN UNBOUNDED PRECEDING and CURRENT ROW) as slidewindow from testwindow;
+------+---------+--------+-------------+
| year | country | profit | slidewindow |
+------+---------+--------+-------------+
| 2001 | USA | 50 | 50 |
| 2001 | USA | 1500 | 1550 |
| 2000 | Germany | 75 | 75 |
| 2000 | Germany | 75 | 150 |
| 2001 | Germany | 79 | 229 |
| 2000 | Finland | 1500 | 1500 |
| 2001 | Finland | 10 | 1510 |
而以下查詢只能計算出profit
的總和。
select country,sum(profit) over (partition by country) from testwindow;
+---------+-----------------------------------------+
| country | sum(profit) OVER (PARTITION BY country) |
+---------+-----------------------------------------+
| Germany | 229 |
| Germany | 229 |
| Germany | 229 |
| USA | 1550 |
| USA | 1550 |
| Finland | 1510 |
| Finland | 1510 |
注意事項
邊界值的取值有如下要求:
start
不能為UNBOUNDED FOLLOWING
,否則提示Window frame start cannot be UNBOUNDED FOLLOWING
錯誤。end
不能為UNBOUNDED PRECEDING
,否則提示Window frame end cannot be UNBOUNDED PRECEDING
錯誤。start
為CURRENT ROW
並且end
為N PRECEDING
時,將提示Window frame starting from CURRENT ROW cannot end with PRECEDING
錯誤。start
為N FOLLOWING
並且end
為N PRECEDING
時,將提示Window frame starting from FOLLOWING cannot end with PRECEDING
錯誤。start
為N FOLLOWING
並且end
為CURRENT ROW
,將提示Window frame starting from FOLLOWING cannot end with CURRENT ROW
錯誤。
當模式為RANGE
時:
start
或者end
為N PRECEDING
時,將提示Window frame RANGE PRECEDING is only supported with UNBOUNDED
錯誤。start
或者end
為N FOLLOWING
時,將提示Window frame RANGE FOLLOWING is only supported with UNBOUNDED
錯誤。
準備工作
本文中的視窗函數均以testwindow
表為測試資料。
create table testwindow(year int, country varchar(20), product varchar(20), profit int) distributed by hash(year);
insert into testwindow values (2000,'Finland','Computer',1500);
insert into testwindow values (2001,'Finland','Phone',10);
insert into testwindow values (2000,'Germany','Calculator',75);
insert into testwindow values (2000,'Germany','Calculator',75);
insert into testwindow values (2001,'Germany','Calculator',79);
insert into testwindow values (2001,'USA','Calculator',50);
insert into testwindow values (2001,'USA','Computer',1500);
SELECT * FROM testwindow;
+------+---------+------------+--------+
| year | country | product | profit |
+------+---------+------------+--------+
| 2000 | Finland | Computer | 1500 |
| 2001 | Finland | Phone | 10 |
| 2000 | Germany | Calculator | 75 |
| 2000 | Germany | Calculator | 75 |
| 2001 | Germany | Calculator | 79 |
| 2001 | USA | Calculator | 50 |
| 2001 | USA | Computer | 1500 |
彙總函式
所有彙總函式都可以通過添加OVER
子句來作為視窗函數使用,彙總函式將基於當前滑動視窗內的資料行計算每一行資料。
例如,通過以下查詢迴圈顯示每個店員每天的訂單額總和。
SELECT clerk, orderdate, orderkey, totalprice,sum(totalprice) OVER (PARTITION BY clerk ORDER BY orderdate) AS rolling_sum FROM orders ORDER BY clerk, orderdate, orderkey
CUME_DIST
CUME_DIST()
命令說明:返回一組數值中每個值的累計分布。
返回結果:在視窗分區中對視窗進行排序後的資料集,包括當前行和當前行之前的資料行數。排序中任何關聯值均會計算成相同的分布值。
傳回值類型:DOUBLE。
樣本:
select year,country,product,profit,cume_dist() over (partition by country order by profit) as cume_dist from testwindow; +------+---------+------------+--------+--------------------+ | year | country | product | profit | cume_dist | +------+---------+------------+--------+--------------------+ | 2001 | USA | Calculator | 50 | 0.5 | | 2001 | USA | Computer | 1500 | 1.0 | | 2001 | Finland | Phone | 10 | 0.5 | | 2000 | Finland | Computer | 1500 | 1.0 | | 2000 | Germany | Calculator | 75 | 0.6666666666666666 | | 2000 | Germany | Calculator | 75 | 0.6666666666666666 | | 2001 | Germany | Calculator | 79 | 1.0 |
RANK
RANK()
命令說明:返回資料集中每個值的排名。
排名值是將當前行之前的行數加1,不包含當前行。因此,排序的關聯值可能產生順序上的空隙,而且這個排名會對每個視窗分區進行計算。
傳回值類型:BIGINT。
樣本:
select year,country,product,profit,rank() over (partition by country order by profit) as rank from testwindow; +------+---------+------------+--------+------+ | year | country | product | profit | rank | +------+---------+------------+--------+------+ | 2001 | Finland | Phone | 10 | 1 | | 2000 | Finland | Computer | 1500 | 2 | | 2001 | USA | Calculator | 50 | 1 | | 2001 | USA | Computer | 1500 | 2 | | 2000 | Germany | Calculator | 75 | 1 | | 2000 | Germany | Calculator | 75 | 1 | | 2001 | Germany | Calculator | 79 | 3 |
DENSE_RANK
DENSE_RANK()
命令說明:返回一組數值中每個數值的排名。
DENSE_RANK()
與RANK()
功能相似,但是DENSE_RANK()
關聯值不會產生順序上的空隙。傳回值類型:BIGINT。
樣本:
select year,country,product,profit,dense_rank() over (partition by country order by profit) as dense_rank from testwindow; +------+---------+------------+--------+------------+ | year | country | product | profit | dense_rank | +------+---------+------------+--------+------------+ | 2001 | Finland | Phone | 10 | 1 | | 2000 | Finland | Computer | 1500 | 2 | | 2001 | USA | Calculator | 50 | 1 | | 2001 | USA | Computer | 1500 | 2 | | 2000 | Germany | Calculator | 75 | 1 | | 2000 | Germany | Calculator | 75 | 1 | | 2001 | Germany | Calculator | 79 | 2 |
NTILE
NTILE(n)
命令說明:將每個視窗分區的資料分散到桶號從
1
到n
的n
個桶中。桶號值最多間隔
1
,如果視窗分區中的資料行數不能均勻地分散到每一個桶中,則剩餘值將從第1
個桶開始,每1
個桶分1
行資料。例如,有6行資料和4個桶, 最終桶號值為1 1 2 2 3 4
。傳回值類型:BIGINT。
樣本:
select year,country,product,profit,ntile(2) over (partition by country order by profit) as ntile2 from testwindow; +------+---------+------------+--------+--------+ | year | country | product | profit | ntile2 | +------+---------+------------+--------+--------+ | 2001 | USA | Calculator | 50 | 1 | | 2001 | USA | Computer | 1500 | 2 | | 2001 | Finland | Phone | 10 | 1 | | 2000 | Finland | Computer | 1500 | 2 | | 2000 | Germany | Calculator | 75 | 1 | | 2000 | Germany | Calculator | 75 | 1 | | 2001 | Germany | Calculator | 79 | 2 |
ROW_NUMBER
ROW_NUMBER()
命令說明:根據行在視窗分區內的順序,為每行資料返回一個唯一的有序行號,行號從
1
開始。傳回值類型:BIGINT。
樣本:
SELECT year, country, product, profit, ROW_NUMBER() OVER(PARTITION BY country) AS row_num1 FROM testwindow; +------+---------+------------+--------+----------+ | year | country | product | profit | row_num1 | +------+---------+------------+--------+----------+ | 2001 | USA | Calculator | 50 | 1 | | 2001 | USA | Computer | 1500 | 2 | | 2000 | Germany | Calculator | 75 | 1 | | 2000 | Germany | Calculator | 75 | 2 | | 2001 | Germany | Calculator | 79 | 3 | | 2000 | Finland | Computer | 1500 | 1 | | 2001 | Finland | Phone | 10 | 2 |
PERCENT_RANK
PERCENT_RANK()
命令說明:返回資料集中每個資料的排名百分比,其結果由
(r - 1) / (n - 1)
計算得出。其中,r
為RANK()
計算的當前行排名,n
為當前視窗分區內總的行數。傳回值類型:DOUBLE。
樣本:
select year,country,product,profit,PERCENT_RANK() over (partition by country order by profit) as ntile3 from testwindow; +------+---------+------------+--------+--------+ | year | country | product | profit | ntile3 | +------+---------+------------+--------+--------+ | 2001 | Finland | Phone | 10 | 0.0 | | 2000 | Finland | Computer | 1500 | 1.0 | | 2001 | USA | Calculator | 50 | 0.0 | | 2001 | USA | Computer | 1500 | 1.0 | | 2000 | Germany | Calculator | 75 | 0.0 | | 2000 | Germany | Calculator | 75 | 0.0 | | 2001 | Germany | Calculator | 79 | 1.0 |
FIRST_VALUE
FIRST_VALUE(x)
命令說明:返回視窗分區第一行的值。
傳回值類型:與輸入參數類型相同。
樣本:
select year,country,product,profit,first_value(profit) over (partition by country order by profit) as firstValue from testwindow; +------+---------+------------+--------+------------+ | year | country | product | profit | firstValue | +------+---------+------------+--------+------------+ | 2000 | Germany | Calculator | 75 | 75 | | 2000 | Germany | Calculator | 75 | 75 | | 2001 | Germany | Calculator | 79 | 75 | | 2001 | USA | Calculator | 50 | 50 | | 2001 | USA | Computer | 1500 | 50 | | 2001 | Finland | Phone | 10 | 10 | | 2000 | Finland | Computer | 1500 | 10 |
LAST_VALUE
LAST_VALUE(x)
命令說明:返回視窗分區最後一行的值。LAST_VALUE預設統計範圍是 rows between unbounded preceding and current row,即取當前行資料與當前行之前的資料進行比較。如果像FIRST_VALUE那樣直接在每行資料中顯示最後一行資料,需要在 order by 條件的後面加上語句:rows between unbounded preceding and unbounded following。
傳回值類型:與輸入參數類型相同。
樣本1:
select year,country,product,profit,last_value(profit) over (partition by country order by profit) as firstValue from testwindow; +----------------+-------------------+-------------------+------------------+----------------------+ | year | country | product | profit | firstValue | +----------------+-------------------+-------------------+------------------+----------------------+ | 2001 | USA | Calculator | 50 | 50 | | 2001 | USA | Computer | 1500 | 1500 | | 2001 | Finland | Phone | 10 | 10 | | 2000 | Finland | Computer | 1500 | 1500 | | 2000 | Germany | Calculator | 75 | 75 | | 2000 | Germany | Calculator | 75 | 75 | | 2001 | Germany | Calculator | 79 | 79 |
樣本2:
select year,country,product,profit,last_value(profit) over (partition by country order by profitrows between unbounded preceding and unbounded following) as lastValue from testwindow; +------+---------+-----------+--------+-----------+ | year | country | product | profit | lastValue | +------+---------+-----------+--------+-----------+ | 2001 | Finland | Phone | 10 | 1500 | | 2000 | Finland | Computer | 1500 | 1500 | | 2000 | Germany | Calculator| 75 | 79 | | 2000 | Germany | Calculator| 75 | 79 | | 2001 | Germany | Calculator| 79 | 79 | | 2001 | USA | Calculator| 50 | 1500 | | 2001 | USA | Computer | 1500 | 1500 | +------+---------+-----------+--------+-----------+
LAG
LAG(x[, offset[, default_value]])
命令說明:返回視窗內距離當前行之前位移
offset
後的值。位移量起始值是
0
,也就是當前資料行。位移量可以是標量運算式,預設offset
是1
。如果位移量的值是
null
或者大於視窗長度,則返回default_value
;如果沒有指定default_value
,則返回null
。傳回值類型:與輸入參數類型相同。
樣本:
select year,country,product,profit,lag(profit) over (partition by country order by profit) as lag from testwindow; +------+---------+------------+--------+------+ | year | country | product | profit | lag | +------+---------+------------+--------+------+ | 2001 | USA | Calculator | 50 | NULL | | 2001 | USA | Computer | 1500 | 50 | | 2000 | Germany | Calculator | 75 | NULL | | 2000 | Germany | Calculator | 75 | 75 | | 2001 | Germany | Calculator | 79 | 75 | | 2001 | Finland | Phone | 10 | NULL | | 2000 | Finland | Computer | 1500 | 10 |
LEAD
LEAD(x[,offset[, default_value]])
命令說明:返回視窗內距離當前行位移
offset
後的值。位移量
offset
起始值是0
,也就是當前資料行。位移量可以是標量運算式,預設offset
是1
。如果位移量的值是
null
或者大於視窗長度,則返回default_value
;如果沒有指定default_value
,則返回null
。傳回值類型:與輸入參數類型相同。
樣本:
select year,country,product,profit,lead(profit) over (partition by country order by profit) as lead from testwindow; +------+---------+------------+--------+------+ | year | country | product | profit | lead | +------+---------+------------+--------+------+ | 2000 | Germany | Calculator | 75 | 75 | | 2000 | Germany | Calculator | 75 | 79 | | 2001 | Germany | Calculator | 79 | NULL | | 2001 | Finland | Phone | 10 | 1500 | | 2000 | Finland | Computer | 1500 | NULL | | 2001 | USA | Calculator | 50 | 1500 | | 2001 | USA | Computer | 1500 | NULL |
NTH_VALUE
NTH_VALUE(x, offset)
命令說明:返回視窗內位移指定
offset
後的值,位移量從1
開始。如果位移量
offset
是null
或者大於視窗內值的個數,則返回null
;如果位移量offset
為0
或者負數,則系統提示報錯。傳回值類型:與輸入參數類型相同。
樣本:
select year,country,product,profit,nth_value(profit,1) over (partition by country order by profit) as nth_value from testwindow; +------+---------+------------+--------+-----------+ | year | country | product | profit | nth_value | +------+---------+------------+--------+-----------+ | 2001 | Finland | Phone | 10 | 10 | | 2000 | Finland | Computer | 1500 | 10 | | 2001 | USA | Calculator | 50 | 50 | | 2001 | USA | Computer | 1500 | 50 | | 2000 | Germany | Calculator | 75 | 75 | | 2000 | Germany | Calculator | 75 | 75 | | 2001 | Germany | Calculator | 79 | 75 |