You can use window functions to perform complex calculations, such as group rankings, moving averages, and cumulative sums. This topic describes the window function syntax and provides examples on how to use the window functions in AnalyticDB for MySQL.
Sorting functions
CUME_DIST: returns the cumulative distribution of each value in a set of values.
RANK: returns the rank of each value in a dataset.
DENSE_RANK: returns the rank of each value in a set of values.
NTILE: distributes data within each window partition into n buckets. Buckets are numbered from 1 to n.
ROW_NUMBER: returns a unique and sequential number for each row based on the sequence of the row within the window partition, starting from 1.
PERCENT_RANK: returns the ranking percentage of each value in a dataset in the
(r - 1)/(n - 1)
format. r is the rank of the current row calculated by using the RANK() function, and n is the total number of rows within the current window partition.
Value functions
FIRST_VALUE: returns the value of the first row within the window partition.
LAST_VALUE: returns the value of the last row within the window partition.
LAG: returns the value of the row that precedes the current row by offset rows in the window.
LEAD: returns the value of the row that follows the current row by offset rows in the window.
NTH_VALUE: returns the value of the row that is offset by the specified number of offset rows in the window. The offset starts from 1.
Overview
Window functions calculate an aggregate value based on row data from the query result. Window functions run after the HAVING
clause and before the ORDER BY
clause. A window function is triggered after you use an OVER
clause to specify a window.
AnalyticDB for MySQL supports three types of window functions: aggregate functions, sorting functions, and value functions.
Syntax
function over ([partition by a] order by b RANGE|ROWS BETWEEN start AND end)
A window function contains the following parts:
Partition rule: divides input rows into different partitions. The process is similar to the grouping process of the
GROUP BY
clause. The partition rule part is optional.Sorting rule: determines the order in which input rows are executed in the window function.
Window frame: specifies the boundary of the data on which the window function performs calculations.
A window frame supports the
RANGE
andROWS
modes:RANGE
defines the range of column values.ROWS
defines the number of rows relative to the current row.For
RANGE
andROWS
, you can useBETWEEN start AND end
to specify the boundary value. Valid values for the arguments inBETWEEN start AND end
:CURRENT ROW
: the current row.N PRECEDING
: the precedingn
rows.UNBOUNDED PRECEDING
: the rows from thefirst
row to the current row.N FOLLOWING
: the followingn
rows.UNBOUNDED FOLLOWING
: the rows from the current row to thelast
row.
For example, the following query calculates the partial sum of profit
based on each row of data in the current window:
select year,country,profit,sum(profit) over (partition by country order by year ROWS BETWEEN UNBOUNDED PRECEDING and CURRENT ROW) as slidewindow from testwindow;
+------+---------+--------+-------------+
| year | country | profit | slidewindow |
+------+---------+--------+-------------+
| 2001 | USA | 50 | 50 |
| 2001 | USA | 1500 | 1550 |
| 2000 | Germany | 75 | 75 |
| 2000 | Germany | 75 | 150 |
| 2001 | Germany | 79 | 229 |
| 2000 | Finland | 1500 | 1500 |
| 2001 | Finland | 10 | 1510 |
The following query can calculate only the total sum of profit
:
select country,sum(profit) over (partition by country) from testwindow;
+---------+-----------------------------------------+
| country | sum(profit) OVER (PARTITION BY country) |
+---------+-----------------------------------------+
| Germany | 229 |
| Germany | 229 |
| Germany | 229 |
| USA | 1550 |
| USA | 1550 |
| Finland | 1510 |
| Finland | 1510 |
Usage notes
Make sure that the boundary values meet the following requirements:
start
cannot beUNBOUNDED FOLLOWING
. Otherwise, theWindow frame start cannot be UNBOUNDED FOLLOWING
error is returned.end
cannot beUNBOUNDED PRECEDING
. Otherwise, theWindow frame end cannot be UNBOUNDED PRECEDING
error is returned.When
start
isCURRENT ROW
andend
isN PRECEDING
, theWindow frame starting from CURRENT ROW cannot end with PRECEDING
error is returned.When
start
isN FOLLOWING
andend
isN PRECEDING
, theWindow frame starting from FOLLOWING cannot end with PRECEDING
error is returned.When
start
isN FOLLOWING
andend
isCURRENT ROW
, theWindow frame starting from FOLLOWING cannot end with CURRENT ROW
error is returned.
When the window frame is in RANGE
mode, the following rules apply:
When
start
orend
isN PRECEDING
, theWindow frame RANGE PRECEDING is only supported with UNBOUNDED
error is returned.When
start
orend
isN FOLLOWING
, theWindow frame RANGE FOLLOWING is only supported with UNBOUNDED
error is returned.
Preparations
In this topic, data from the testwindow
table is used in the examples of window functions.
create table testwindow(year int, country varchar(20), product varchar(20), profit int) distributed by hash(year);
insert into testwindow values (2000,'Finland','Computer',1500);
insert into testwindow values (2001,'Finland','Phone',10);
insert into testwindow values (2000,'Germany','Calculator',75);
insert into testwindow values (2000,'Germany','Calculator',75);
insert into testwindow values (2001,'Germany','Calculator',79);
insert into testwindow values (2001,'USA','Calculator',50);
insert into testwindow values (2001,'USA','Computer',1500);
SELECT * FROM testwindow;
+------+---------+------------+--------+
| year | country | product | profit |
+------+---------+------------+--------+
| 2000 | Finland | Computer | 1500 |
| 2001 | Finland | Phone | 10 |
| 2000 | Germany | Calculator | 75 |
| 2000 | Germany | Calculator | 75 |
| 2001 | Germany | Calculator | 79 |
| 2001 | USA | Calculator | 50 |
| 2001 | USA | Computer | 1500 |
Aggregate functions
All aggregate functions can be used as window functions by adding an OVER
clause. An aggregate function calculates each row of data based on the rows within the current sliding window. For more information, see Aggregate functions.
For example, the following query produces a rolling sum of order prices by date for each clerk:
SELECT clerk, orderdate, orderkey, totalprice,sum(totalprice) OVER (PARTITION BY clerk ORDER BY orderdate) AS rolling_sum FROM orders ORDER BY clerk, orderdate, orderkey
CUME_DIST
CUME_DIST()
Description: This function returns the cumulative distribution of each value in a set of values.
Return result: the dataset obtained after sorting within the window partition, including the current row and the number of data rows preceding the current row. Any associated values in the sorting are calculated to the same distribution value.
Data type of the return value: DOUBLE.
Example:
select year,country,product,profit,cume_dist() over (partition by country order by profit) as cume_dist from testwindow; +------+---------+------------+--------+--------------------+ | year | country | product | profit | cume_dist | +------+---------+------------+--------+--------------------+ | 2001 | USA | Calculator | 50 | 0.5 | | 2001 | USA | Computer | 1500 | 1.0 | | 2001 | Finland | Phone | 10 | 0.5 | | 2000 | Finland | Computer | 1500 | 1.0 | | 2000 | Germany | Calculator | 75 | 0.6666666666666666 | | 2000 | Germany | Calculator | 75 | 0.6666666666666666 | | 2001 | Germany | Calculator | 79 | 1.0 |
RANK
RANK()
Description: This function returns the rank of each value in a dataset.
The rank value is the number of rows preceding the current row plus one. The current row is not counted. Therefore, the associated values in the sorting may produce gaps in the sequence. The rank is calculated for each window partition.
Data type of the return value: BIGINT.
Example:
select year,country,product,profit,rank() over (partition by country order by profit) as rank from testwindow; +------+---------+------------+--------+------+ | year | country | product | profit | rank | +------+---------+------------+--------+------+ | 2001 | Finland | Phone | 10 | 1 | | 2000 | Finland | Computer | 1500 | 2 | | 2001 | USA | Calculator | 50 | 1 | | 2001 | USA | Computer | 1500 | 2 | | 2000 | Germany | Calculator | 75 | 1 | | 2000 | Germany | Calculator | 75 | 1 | | 2001 | Germany | Calculator | 79 | 3 |
DENSE_RANK
DENSE_RANK()
Description: This function returns the rank of each value in a set of values.
DENSE_RANK()
andRANK()
have similar features, but the associated values ofDENSE_RANK()
do not produce gaps in the sequence.Data type of the return value: BIGINT.
Example:
select year,country,product,profit,dense_rank() over (partition by country order by profit) as dense_rank from testwindow; +------+---------+------------+--------+------------+ | year | country | product | profit | dense_rank | +------+---------+------------+--------+------------+ | 2001 | Finland | Phone | 10 | 1 | | 2000 | Finland | Computer | 1500 | 2 | | 2001 | USA | Calculator | 50 | 1 | | 2001 | USA | Computer | 1500 | 2 | | 2000 | Germany | Calculator | 75 | 1 | | 2000 | Germany | Calculator | 75 | 1 | | 2001 | Germany | Calculator | 79 | 2 |
NTILE
NTILE(n)
Description: This function distributes data within each window partition into
n
buckets. Buckets are numbered from1
ton
.The maximum difference between bucket numbers is
1
. If the data rows within the window partition are not evenly distributed to each bucket, the remaining data is distributed from thefirst
bucket withone
row of data foreach
bucket. For example, if six rows and four buckets exist, rows are distributed to the buckets in the following manner:1, 1, 2, 2, 3, and 4
.Data type of the return value: BIGINT.
Example:
select year,country,product,profit,ntile(2) over (partition by country order by profit) as ntile2 from testwindow; +------+---------+------------+--------+--------+ | year | country | product | profit | ntile2 | +------+---------+------------+--------+--------+ | 2001 | USA | Calculator | 50 | 1 | | 2001 | USA | Computer | 1500 | 2 | | 2001 | Finland | Phone | 10 | 1 | | 2000 | Finland | Computer | 1500 | 2 | | 2000 | Germany | Calculator | 75 | 1 | | 2000 | Germany | Calculator | 75 | 1 | | 2001 | Germany | Calculator | 79 | 2 |
ROW_NUMBER
ROW_NUMBER()
Description: This function returns a unique and sequential number for each row based on the sequence of the row within the window partition, starting from
1
.Data type of the return value: BIGINT.
Example:
SELECT year, country, product, profit, ROW_NUMBER() OVER(PARTITION BY country) AS row_num1 FROM testwindow; +------+---------+------------+--------+----------+ | year | country | product | profit | row_num1 | +------+---------+------------+--------+----------+ | 2001 | USA | Calculator | 50 | 1 | | 2001 | USA | Computer | 1500 | 2 | | 2000 | Germany | Calculator | 75 | 1 | | 2000 | Germany | Calculator | 75 | 2 | | 2001 | Germany | Calculator | 79 | 3 | | 2000 | Finland | Computer | 1500 | 1 | | 2001 | Finland | Phone | 10 | 2 |
PERCENT_RANK
PERCENT_RANK()
Description: This function returns the ranking percentage of each value in a dataset in the
(r - 1)/(n - 1)
format.r
is the rank of the current row calculated byRANK()
, andn
is the total number of rows within the current window partition.Data type of the return value: DOUBLE.
Example:
select year,country,product,profit,PERCENT_RANK() over (partition by country order by profit) as ntile3 from testwindow; +------+---------+------------+--------+--------+ | year | country | product | profit | ntile3 | +------+---------+------------+--------+--------+ | 2001 | Finland | Phone | 10 | 0.0 | | 2000 | Finland | Computer | 1500 | 1.0 | | 2001 | USA | Calculator | 50 | 0.0 | | 2001 | USA | Computer | 1500 | 1.0 | | 2000 | Germany | Calculator | 75 | 0.0 | | 2000 | Germany | Calculator | 75 | 0.0 | | 2001 | Germany | Calculator | 79 | 1.0 |
FIRST_VALUE
FIRST_VALUE(x)
Description: This function returns the value of the first row within the window partition.
Data type of the return value: the same as the input argument type.
Example:
select year,country,product,profit,first_value(profit) over (partition by country order by profit) as firstValue from testwindow; +------+---------+------------+--------+------------+ | year | country | product | profit | firstValue | +------+---------+------------+--------+------------+ | 2000 | Germany | Calculator | 75 | 75 | | 2000 | Germany | Calculator | 75 | 75 | | 2001 | Germany | Calculator | 79 | 75 | | 2001 | USA | Calculator | 50 | 50 | | 2001 | USA | Computer | 1500 | 50 | | 2001 | Finland | Phone | 10 | 10 | | 2000 | Finland | Computer | 1500 | 10 |
LAST_VALUE
LAST_VALUE(x)
Description: This function returns the value of the last row within the window partition. The default window frame of LAST_VALUE is rows between unbounded preceding and current row, which compares the data in the current row and that in all preceding rows. If you want LAST_VALUE to return the value of the last row, add the following clause after the ORDER BY clause: rows between unbounded preceding and unbounded following.
Data type of the return value: the same as the input argument type.
Example 1:
select year,country,product,profit,last_value(profit) over (partition by country order by profit) as firstValue from testwindow; +----------------+-------------------+-------------------+------------------+----------------------+ | year | country | product | profit | firstValue | +----------------+-------------------+-------------------+------------------+----------------------+ | 2001 | USA | Calculator | 50 | 50 | | 2001 | USA | Computer | 1500 | 1500 | | 2001 | Finland | Phone | 10 | 10 | | 2000 | Finland | Computer | 1500 | 1500 | | 2000 | Germany | Calculator | 75 | 75 | | 2000 | Germany | Calculator | 75 | 75 | | 2001 | Germany | Calculator | 79 | 79 |
Example 2:
select year,country,product,profit,last_value(profit) over (partition by country order by profitrows between unbounded preceding and unbounded following) as lastValue from testwindow; +------+---------+-----------+--------+-----------+ | year | country | product | profit | lastValue | +------+---------+-----------+--------+-----------+ | 2001 | Finland | Phone | 10 | 1500 | | 2000 | Finland | Computer | 1500 | 1500 | | 2000 | Germany | Calculator| 75 | 79 | | 2000 | Germany | Calculator| 75 | 79 | | 2001 | Germany | Calculator| 79 | 79 | | 2001 | USA | Calculator| 50 | 1500 | | 2001 | USA | Computer | 1500 | 1500 | +------+---------+-----------+--------+-----------+
LAG
LAG(x[, offset[, default_value]])
Description: This function returns the value of the row that precedes the current row by
offset
rows in the window.The starting offset value is
0
, which specifies the current data row. The offset value can be a scalar expression. The defaultoffset
value is1
.If the offset value is
null
or is greater than the window length,default_value
is returned. If you do not specifydefault_value
,null
is returned.Data type of the return value: the same as the input argument type.
Example:
select year,country,product,profit,lag(profit) over (partition by country order by profit) as lag from testwindow; +------+---------+------------+--------+------+ | year | country | product | profit | lag | +------+---------+------------+--------+------+ | 2001 | USA | Calculator | 50 | NULL | | 2001 | USA | Computer | 1500 | 50 | | 2000 | Germany | Calculator | 75 | NULL | | 2000 | Germany | Calculator | 75 | 75 | | 2001 | Germany | Calculator | 79 | 75 | | 2001 | Finland | Phone | 10 | NULL | | 2000 | Finland | Computer | 1500 | 10 |
LEAD
LEAD(x[,offset[, default_value]])
Description: This function returns the value of the row that follows the current row by
offset
rows in the window.The starting
offset
value is0
, which specifies the current data row. The offset value can be a scalar expression. The defaultoffset
value is1
.If the offset value is
null
or is greater than the window length,default_value
is returned. If you do not specifydefault_value
,null
is returned.Data type of the return value: the same as the input argument type.
Example:
select year,country,product,profit,lead(profit) over (partition by country order by profit) as lead from testwindow; +------+---------+------------+--------+------+ | year | country | product | profit | lead | +------+---------+------------+--------+------+ | 2000 | Germany | Calculator | 75 | 75 | | 2000 | Germany | Calculator | 75 | 79 | | 2001 | Germany | Calculator | 79 | NULL | | 2001 | Finland | Phone | 10 | 1500 | | 2000 | Finland | Computer | 1500 | NULL | | 2001 | USA | Calculator | 50 | 1500 | | 2001 | USA | Computer | 1500 | NULL |
NTH_VALUE
NTH_VALUE(x, offset)
Description: This function returns the value of the row that is offset by the specified number of
offset
rows in the window. The offset starts from1
.If the
offset
value isnull
or is greater than the number of values in the window,null
is returned. If theoffset
value is0
or negative, an error is returned.Data type of the return value: the same as the input argument type.
Example:
select year,country,product,profit,nth_value(profit,1) over (partition by country order by profit) as nth_value from testwindow; +------+---------+------------+--------+-----------+ | year | country | product | profit | nth_value | +------+---------+------------+--------+-----------+ | 2001 | Finland | Phone | 10 | 10 | | 2000 | Finland | Computer | 1500 | 10 | | 2001 | USA | Calculator | 50 | 50 | | 2001 | USA | Computer | 1500 | 50 | | 2000 | Germany | Calculator | 75 | 75 | | 2000 | Germany | Calculator | 75 | 75 | | 2001 | Germany | Calculator | 79 | 75 |