Use AnalyticDB for MySQL window functions - AnalyticDB - Alibaba Cloud Documentation Center

You can use window functions to perform complex calculations, such as group rankings, moving averages, and cumulative sums. This topic describes the window function syntax and provides examples on how to use the window functions in AnalyticDB for MySQL.

Aggregate functions
Sorting functions
- CUME_DIST: returns the cumulative distribution of each value in a set of values.
- RANK: returns the rank of each value in a dataset.
- DENSE_RANK: returns the rank of each value in a set of values.
- NTILE: distributes data within each window partition into n buckets. Buckets are numbered from 1 to n.
- ROW_NUMBER: returns a unique and sequential number for each row based on the sequence of the row within the window partition, starting from 1.
- PERCENT_RANK: returns the ranking percentage of each value in a dataset in the (r - 1)/(n - 1) format. r is the rank of the current row calculated by using the RANK() function, and n is the total number of rows within the current window partition.
Value functions
- FIRST_VALUE: returns the value of the first row within the window partition.
- LAST_VALUE: returns the value of the last row within the window partition.
- LAG: returns the value of the row that precedes the current row by offset rows in the window.
- LEAD: returns the value of the row that follows the current row by offset rows in the window.
- NTH_VALUE: returns the value of the row that is offset by the specified number of offset rows in the window. The offset starts from 1.

Overview

Window functions calculate an aggregate value based on row data from the query result. Window functions run after the HAVING clause and before the ORDER BY clause. A window function is triggered after you use an OVER clause to specify a window.

AnalyticDB for MySQL supports three types of window functions: aggregate functions, sorting functions, and value functions.

Syntax

function over ([partition by a] order by b RANGE|ROWS BETWEEN start AND end)

A window function contains the following parts:

Partition rule: divides input rows into different partitions. The process is similar to the grouping process of the GROUP BY clause. The partition rule part is optional.
Sorting rule: determines the order in which input rows are executed in the window function.
Window frame: specifies the boundary of the data on which the window function performs calculations.
A window frame supports the RANGE and ROWS modes:
- RANGE defines the range of column values.
- ROWS defines the number of rows relative to the current row.
- For RANGE and ROWS, you can use BETWEEN start AND end to specify the boundary value. Valid values for the arguments in BETWEEN start AND end:
  - CURRENT ROW: the current row.
  - N PRECEDING: the preceding n rows.
  - UNBOUNDED PRECEDING: the rows from the first row to the current row.
  - N FOLLOWING: the following n rows.
  - UNBOUNDED FOLLOWING: the rows from the current row to the last row.

For example, the following query calculates the partial sum of profit based on each row of data in the current window:

select year,country,profit,sum(profit) over (partition by country order by year ROWS BETWEEN UNBOUNDED PRECEDING and CURRENT ROW) as slidewindow from testwindow;
+------+---------+--------+-------------+
| year | country | profit | slidewindow |
+------+---------+--------+-------------+
| 2001 | USA     |     50 |          50 |
| 2001 | USA     |   1500 |        1550 |
| 2000 | Germany |     75 |          75 |
| 2000 | Germany |     75 |         150 |
| 2001 | Germany |     79 |         229 |
| 2000 | Finland |   1500 |        1500 |
| 2001 | Finland |     10 |        1510 |

The following query can calculate only the total sum of profit:

select country,sum(profit) over (partition by country) from testwindow;
+---------+-----------------------------------------+
| country | sum(profit) OVER (PARTITION BY country) |
+---------+-----------------------------------------+
| Germany |                                     229 |
| Germany |                                     229 |
| Germany |                                     229 |
| USA     |                                    1550 |
| USA     |                                    1550 |
| Finland |                                    1510 |
| Finland |                                    1510 |

Usage notes

Make sure that the boundary values meet the following requirements:

start cannot be UNBOUNDED FOLLOWING. Otherwise, the Window frame start cannot be UNBOUNDED FOLLOWING error is returned.
end cannot be UNBOUNDED PRECEDING. Otherwise, the Window frame end cannot be UNBOUNDED PRECEDING error is returned.
When start is CURRENT ROW and end is N PRECEDING, the Window frame starting from CURRENT ROW cannot end with PRECEDING error is returned.
When start is N FOLLOWING and end is N PRECEDING, the Window frame starting from FOLLOWING cannot end with PRECEDING error is returned.
When start is N FOLLOWING and end is CURRENT ROW, the Window frame starting from FOLLOWING cannot end with CURRENT ROW error is returned.

When the window frame is in RANGE mode, the following rules apply:

When start or end is N PRECEDING, the Window frame RANGE PRECEDING is only supported with UNBOUNDED error is returned.
When start or end is N FOLLOWING, the Window frame RANGE FOLLOWING is only supported with UNBOUNDED error is returned.

Preparations

In this topic, data from the testwindow table is used in the examples of window functions.

create table testwindow(year int, country varchar(20), product varchar(20), profit int) distributed by hash(year);

insert into testwindow values (2000,'Finland','Computer',1500);
insert into testwindow values (2001,'Finland','Phone',10);
insert into testwindow values (2000,'Germany','Calculator',75);
insert into testwindow values (2000,'Germany','Calculator',75);
insert into testwindow values (2001,'Germany','Calculator',79);
insert into testwindow values (2001,'USA','Calculator',50);
insert into testwindow values (2001,'USA','Computer',1500);

SELECT * FROM testwindow;
+------+---------+------------+--------+
| year | country | product    | profit |
+------+---------+------------+--------+
| 2000 | Finland | Computer   |   1500 |
| 2001 | Finland | Phone      |     10 |
| 2000 | Germany | Calculator |     75 |
| 2000 | Germany | Calculator |     75 |
| 2001 | Germany | Calculator |     79 |
| 2001 | USA     | Calculator |     50 |
| 2001 | USA     | Computer   |   1500 |

Aggregate functions

All aggregate functions can be used as window functions by adding an OVER clause. An aggregate function calculates each row of data based on the rows within the current sliding window. For more information, see Aggregate functions.

For example, the following query produces a rolling sum of order prices by date for each clerk:

SELECT clerk, orderdate, orderkey, totalprice,sum(totalprice) OVER (PARTITION BY clerk ORDER BY orderdate) AS rolling_sum FROM orders ORDER BY clerk, orderdate, orderkey

CUME_DIST

CUME_DIST()

Description: This function returns the cumulative distribution of each value in a set of values.
Return result: the dataset obtained after sorting within the window partition, including the current row and the number of data rows preceding the current row. Any associated values in the sorting are calculated to the same distribution value.
Data type of the return value: DOUBLE.

Example:

select year,country,product,profit,cume_dist() over (partition by country order by profit) as cume_dist from testwindow;
+------+---------+------------+--------+--------------------+
| year | country | product    | profit | cume_dist          |
+------+---------+------------+--------+--------------------+
| 2001 | USA     | Calculator |     50 |                0.5 |
| 2001 | USA     | Computer   |   1500 |                1.0 |
| 2001 | Finland | Phone      |     10 |                0.5 |
| 2000 | Finland | Computer   |   1500 |                1.0 |
| 2000 | Germany | Calculator |     75 | 0.6666666666666666 |
| 2000 | Germany | Calculator |     75 | 0.6666666666666666 |
| 2001 | Germany | Calculator |     79 |                1.0 |

RANK

RANK()

Description: This function returns the rank of each value in a dataset.
The rank value is the number of rows preceding the current row plus one. The current row is not counted. Therefore, the associated values in the sorting may produce gaps in the sequence. The rank is calculated for each window partition.
Data type of the return value: BIGINT.

Example:

select year,country,product,profit,rank() over (partition by country order by profit) as rank from testwindow;
+------+---------+------------+--------+------+
| year | country | product    | profit | rank |
+------+---------+------------+--------+------+
| 2001 | Finland | Phone      |     10 |    1 |
| 2000 | Finland | Computer   |   1500 |    2 |
| 2001 | USA     | Calculator |     50 |    1 |
| 2001 | USA     | Computer   |   1500 |    2 |
| 2000 | Germany | Calculator |     75 |    1 |
| 2000 | Germany | Calculator |     75 |    1 |
| 2001 | Germany | Calculator |     79 |    3 |

DENSE_RANK

DENSE_RANK()

Description: This function returns the rank of each value in a set of values.
DENSE_RANK() and RANK() have similar features, but the associated values of DENSE_RANK() do not produce gaps in the sequence.
Data type of the return value: BIGINT.

Example:

select year,country,product,profit,dense_rank() over (partition by country order by profit) as dense_rank from testwindow;
+------+---------+------------+--------+------------+
| year | country | product    | profit | dense_rank |
+------+---------+------------+--------+------------+
| 2001 | Finland | Phone      |     10 |          1 |
| 2000 | Finland | Computer   |   1500 |          2 |
| 2001 | USA     | Calculator |     50 |          1 |
| 2001 | USA     | Computer   |   1500 |          2 |
| 2000 | Germany | Calculator |     75 |          1 |
| 2000 | Germany | Calculator |     75 |          1 |
| 2001 | Germany | Calculator |     79 |          2 |

NTILE

NTILE(n)

Description: This function distributes data within each window partition into n buckets. Buckets are numbered from 1 to n.
The maximum difference between bucket numbers is 1. If the data rows within the window partition are not evenly distributed to each bucket, the remaining data is distributed from the first bucket with one row of data for each bucket. For example, if six rows and four buckets exist, rows are distributed to the buckets in the following manner: 1, 1, 2, 2, 3, and 4.
Data type of the return value: BIGINT.

Example:

select year,country,product,profit,ntile(2) over (partition by country order by profit) as ntile2 from testwindow;
+------+---------+------------+--------+--------+
| year | country | product    | profit | ntile2 |
+------+---------+------------+--------+--------+
| 2001 | USA     | Calculator |     50 |      1 |
| 2001 | USA     | Computer   |   1500 |      2 |
| 2001 | Finland | Phone      |     10 |      1 |
| 2000 | Finland | Computer   |   1500 |      2 |
| 2000 | Germany | Calculator |     75 |      1 |
| 2000 | Germany | Calculator |     75 |      1 |
| 2001 | Germany | Calculator |     79 |      2 |

ROW_NUMBER

ROW_NUMBER()

Description: This function returns a unique and sequential number for each row based on the sequence of the row within the window partition, starting from 1.
Data type of the return value: BIGINT.

Example:

SELECT year, country, product, profit, ROW_NUMBER() OVER(PARTITION BY country) AS row_num1 FROM testwindow;
+------+---------+------------+--------+----------+
| year | country | product    | profit | row_num1 |
+------+---------+------------+--------+----------+
| 2001 | USA     | Calculator |     50 |        1 |
| 2001 | USA     | Computer   |   1500 |        2 |
| 2000 | Germany | Calculator |     75 |        1 |
| 2000 | Germany | Calculator |     75 |        2 |
| 2001 | Germany | Calculator |     79 |        3 |
| 2000 | Finland | Computer   |   1500 |        1 |
| 2001 | Finland | Phone      |     10 |        2 |

PERCENT_RANK

PERCENT_RANK()

Description: This function returns the ranking percentage of each value in a dataset in the (r - 1)/(n - 1) format. r is the rank of the current row calculated by RANK(), and n is the total number of rows within the current window partition.
Data type of the return value: DOUBLE.

Example:

select year,country,product,profit,PERCENT_RANK() over (partition by country order by profit) as ntile3 from testwindow;
+------+---------+------------+--------+--------+
| year | country | product    | profit | ntile3 |
+------+---------+------------+--------+--------+
| 2001 | Finland | Phone      |     10 |    0.0 |
| 2000 | Finland | Computer   |   1500 |    1.0 |
| 2001 | USA     | Calculator |     50 |    0.0 |
| 2001 | USA     | Computer   |   1500 |    1.0 |
| 2000 | Germany | Calculator |     75 |    0.0 |
| 2000 | Germany | Calculator |     75 |    0.0 |
| 2001 | Germany | Calculator |     79 |    1.0 |

FIRST_VALUE

FIRST_VALUE(x)

Description: This function returns the value of the first row within the window partition.
Data type of the return value: the same as the input argument type.

Example:

select year,country,product,profit,first_value(profit) over (partition by country order by profit) as firstValue from testwindow;
+------+---------+------------+--------+------------+
| year | country | product    | profit | firstValue |
+------+---------+------------+--------+------------+
| 2000 | Germany | Calculator |     75 |         75 |
| 2000 | Germany | Calculator |     75 |         75 |
| 2001 | Germany | Calculator |     79 |         75 |
| 2001 | USA     | Calculator |     50 |         50 |
| 2001 | USA     | Computer   |   1500 |         50 |
| 2001 | Finland | Phone      |     10 |         10 |
| 2000 | Finland | Computer   |   1500 |         10 |

LAST_VALUE

LAST_VALUE(x)

Description: This function returns the value of the last row within the window partition. The default window frame of LAST_VALUE is rows between unbounded preceding and current row, which compares the data in the current row and that in all preceding rows. If you want LAST_VALUE to return the value of the last row, add the following clause after the ORDER BY clause: rows between unbounded preceding and unbounded following.
Data type of the return value: the same as the input argument type.

Example 1:

select year,country,product,profit,last_value(profit) over (partition by country order by profit) as firstValue from testwindow;
+----------------+-------------------+-------------------+------------------+----------------------+
| year           | country           | product           | profit           | firstValue           |
+----------------+-------------------+-------------------+------------------+----------------------+
|           2001 | USA               | Calculator        |               50 |                   50 |
|           2001 | USA               | Computer          |             1500 |                 1500 |
|           2001 | Finland           | Phone             |               10 |                   10 |
|           2000 | Finland           | Computer          |             1500 |                 1500 |
|           2000 | Germany           | Calculator        |               75 |                   75 |
|           2000 | Germany           | Calculator        |               75 |                   75 |
|           2001 | Germany           | Calculator        |               79 |                   79 |

Example 2:

select year,country,product,profit,last_value(profit) over (partition by country order by profitrows between unbounded preceding and unbounded following) as lastValue from testwindow;
+------+---------+-----------+--------+-----------+
| year | country | product   | profit | lastValue |
+------+---------+-----------+--------+-----------+
| 2001 | Finland | Phone     |   10   |   1500    |
| 2000 | Finland | Computer  |  1500  |   1500    |
| 2000 | Germany | Calculator|   75   |    79     |
| 2000 | Germany | Calculator|   75   |    79     |
| 2001 | Germany | Calculator|   79   |    79     |
| 2001 | USA     | Calculator|   50   |   1500    |
| 2001 | USA     | Computer  |  1500  |   1500    |
+------+---------+-----------+--------+-----------+

LAG

LAG(x[, offset[, default_value]])

Description: This function returns the value of the row that precedes the current row by offset rows in the window.
The starting offset value is 0, which specifies the current data row. The offset value can be a scalar expression. The default offset value is 1.
If the offset value is null or is greater than the window length, default_value is returned. If you do not specify default_value, null is returned.
Data type of the return value: the same as the input argument type.

Example:

select year,country,product,profit,lag(profit) over (partition by country order by profit) as lag from testwindow;
+------+---------+------------+--------+------+
| year | country | product    | profit | lag  |
+------+---------+------------+--------+------+
| 2001 | USA     | Calculator |     50 | NULL |
| 2001 | USA     | Computer   |   1500 |   50 |
| 2000 | Germany | Calculator |     75 | NULL |
| 2000 | Germany | Calculator |     75 |   75 |
| 2001 | Germany | Calculator |     79 |   75 |
| 2001 | Finland | Phone      |     10 | NULL |
| 2000 | Finland | Computer   |   1500 |   10 |

LEAD

LEAD(x[,offset[, default_value]])

Description: This function returns the value of the row that follows the current row by offset rows in the window.
The starting offset value is 0, which specifies the current data row. The offset value can be a scalar expression. The default offset value is 1.
If the offset value is null or is greater than the window length, default_value is returned. If you do not specify default_value, null is returned.
Data type of the return value: the same as the input argument type.

Example:

select year,country,product,profit,lead(profit) over (partition by country order by profit) as lead from testwindow;
+------+---------+------------+--------+------+
| year | country | product    | profit | lead |
+------+---------+------------+--------+------+
| 2000 | Germany | Calculator |     75 |   75 |
| 2000 | Germany | Calculator |     75 |   79 |
| 2001 | Germany | Calculator |     79 | NULL |
| 2001 | Finland | Phone      |     10 | 1500 |
| 2000 | Finland | Computer   |   1500 | NULL |
| 2001 | USA     | Calculator |     50 | 1500 |
| 2001 | USA     | Computer   |   1500 | NULL |

NTH_VALUE

NTH_VALUE(x, offset)

Description: This function returns the value of the row that is offset by the specified number of offset rows in the window. The offset starts from 1.
If the offset value is null or is greater than the number of values in the window, null is returned. If the offset value is 0 or negative, an error is returned.
Data type of the return value: the same as the input argument type.

Example:

select year,country,product,profit,nth_value(profit,1) over (partition by country order by profit) as nth_value from testwindow;
+------+---------+------------+--------+-----------+
| year | country | product    | profit | nth_value |
+------+---------+------------+--------+-----------+
| 2001 | Finland | Phone      |     10 |        10 |
| 2000 | Finland | Computer   |   1500 |        10 |
| 2001 | USA     | Calculator |     50 |        50 |
| 2001 | USA     | Computer   |   1500 |        50 |
| 2000 | Germany | Calculator |     75 |        75 |
| 2000 | Germany | Calculator |     75 |        75 |
| 2001 | Germany | Calculator |     79 |        75 |