Window functions can be used to aggregate or compute data of a dynamically defined data subset. For example, you can use window functions to process time series data, calculate the rankings of specific data, and calculate the moving average of specific data. This topic describes the syntax and parameters of window functions that are supported by MaxCompute SQL. This topic also provides examples on how to use window functions to develop data.
The following table lists the window functions that are supported by MaxCompute SQL.
Function | Description |
Calculates the average value of data in a window. | |
Samples random rows of data. If true is returned, the specified row of data is sampled. | |
Calculates the number of rows in a window. | |
Calculates the cumulative distribution of data in a partition. | |
Calculates the percentile rank of a row in a group of rows. The ranks are consecutive. | |
Obtains the calculated result of the first row of data in the window to which the current row belongs. | |
Obtains the calculated result of the Nth row of data that precedes the current row at a given offset in a window. | |
Obtains the calculated result of the last row of data in the window to which the current row belongs. | |
Obtains the calculated result of the Nth row of data that follows the current row at a given offset in a window. | |
Calculates the maximum value in a window. | |
Calculates the median in a window. | |
Calculates the minimum value in a window. | |
Splits rows of data in a partition into N groups of equal size and returns the number of the group to which the current row belongs. The group number ranges from 1 to N. | |
Obtains the calculated result of the Nth row of data in a window to which the current row belongs. | |
Calculates the percentile rank of a row in a group of rows. The function returns a percentage value. | |
Calculates the percentile rank of a row in a group of rows. The ranks may not be consecutive integers. | |
Calculates the sequence number of a row. The row number starts from 1. | |
Returns the population standard deviation of all the input values. This function is also called STDDEV_POP. | |
Returns the sample standard deviation of all the input values. | |
Calculates the sum of data in a window. |
Limits
Before you use window functions, take note of the following limits:
Window functions are supported only in
SELECT
statements.A window function cannot contain nested window functions or aggregate functions.
You cannot use window functions together with aggregate functions of the same level.
Syntax
Syntax of window functions:
<function_name>([distinct][<expression> [, ...]]) over (<window_definition>)
<function_name>([distinct][<expression> [, ...]]) over <window_name>
function_name: a built-in window function, built-in aggregate function, or user-defined aggregate function (UDAF). For more information about built-in aggregate functions, see Aggregate functions. For more information about UDAFs, see Overview.
expression: the format of a window function. The format is subject to the function syntax.
windowing_definition: the definition of a window. For information about the syntax of windowing_definition, see windowing_definition.
window_name: the name of a window. You can use the
window
keyword to configure a window and use windowing_definition to specify the name of the window. Syntax of named_window_def:window <window_name> as (<window_definition>)
Position of named_window_def in an SQL statement:
select ... from ... [where ...] [group by ...] [having ...] named_window_def [order by ...] [limit ...]
windowing_definition
Syntax
-- partition_clause:
[partition by <expression> [, ...]]
-- orderby_clause:
[order by <expression> [asc|desc][nulls {first|last}] [, ...]]
[<frame_clause>]
If you use a window function in a SELECT statement, data is partitioned and sorted based on partition by
and order by
in windowing_definition when the window function is executed. If the SELECT statement does not include partition by
, only one partition exists and the partition contains all data. If the SELECT statement does not include order by
, data in a partition is arranged in a random order, and a data stream is generated. After the data stream is generated, a group of rows is extracted from the data stream based on frame_clause
in windowing_definition to create a window for the current row. The window function calculates the data included in the window to which the current row belongs.
partition by <expression> [, ...]: optional. This parameter specifies the partition information. If the values of partition key columns are the same for a group of rows, these rows are included in the same window. For more information about the format of partition by, see Table operations.
order by <expression> [asc|desc][nulls {first|last}] [, ...]: optional. This parameter specifies how to sort rows of data in a window.
NoteIf the values of the column that is specified in
order by
are the same, the sorting result may not be accurate. To reduce the random ordering of data, make sure that the values of the column that is specified inorder by
are unique.frame_clause: optional. This parameter is used to determine the data boundaries of a window. For more information about frame_clause, see frame_clause.
filter_clause
Syntax
FILTER (WHERE filter_condition)
filter_condition
is a Boolean expression, which is used in the same way as the WHERE clause in the select ... from ... where
statement.
If a FILTER clause is provided, only rows whose filter_condition
values are true are included in the window frame. For aggregate window functions such as COUNT, SUM, AVG, MAX, MIN, and WM_CONCAT, a value is still returned for each row. However, non-true values such as NULL and false are not included in the window frame of any row. NULL is processed in the same way as false.
Examples
Prepare data
-- Create a table. CREATE TABLE IF NOT EXISTS mf_window_fun(key BIGINT,value BIGINT) STORED AS ALIORC; -- Insert data into the table. insert into mf_window_fun values (1,100),(2,200),(1,150),(2,250),(3,300),(4,400),(5,500),(6,600),(7,700); -- Query data in the mf_window_fun table. select * from mf_window_fun; -- The following result is returned: +------------+------------+ | key | value | +------------+------------+ | 1 | 100 | | 2 | 200 | | 1 | 150 | | 2 | 250 | | 3 | 300 | | 4 | 400 | | 5 | 500 | | 6 | 600 | | 7 | 700 | +------------+------------+
Query the cumulative sum of rows whose values are greater than 100 in the window
select key,sum(value) filter(where value > 100) over (partition by key order by key) from mf_window_fun;
The following result is returned:
+------------+------------+ | key | _c1 | +------------+------------+ | 1 | NULL | -- Skipped | 1 | 150 | | 2 | 200 | | 2 | 450 | | 3 | 300 | | 4 | 400 | | 5 | 500 | | 6 | 600 | | 7 | 700 | +------------+------------+
The FILTER clause does not remove rows that do not meet the conditions specified by filter_condition from the query results, and considers that the rows do not exist during the calculation of the window function. If you want to remove a specific row, you still need to specify the row in the WHERE clause in the
select ... from ... where
statement. The window function value of this row is not 0 or NULL but the window function value of the previous row.The FILTER clause can be used only when the window function is an aggregate function such as COUNT, SUM, AVG, MAX, MIN, or WM_CONCAT. The FILTER clause cannot be used when the window function is a non-aggregate function such as RANK, ROW_NUMBER, or NTILE. Otherwise, syntax errors may occur.
If you want to use the FILTER syntax in a window function, you must add the
set odps.sql.window.function.newimpl=true;
configuration.
frame_clause
Syntax
-- Syntax 1
{ROWS|RANGE|GROUPS} <frame_start> [<frame_exclusion>]
-- Syntax 2
{ROWS|RANGE|GROUPS} between <frame_start> and <frame_end> [<frame_exclusion>]
frame_clause is a closed interval that is used to determine the data boundaries of a window. The data boundaries are determined based on the rows that are specified by frame_start and frame_end.
ROWS|RANGE|GROUPS: required. ROWS, RANGE, and GROUPS indicate the types of frame_clause. The implementation rules of frame_start and frame_end vary based on the type of frame_clause.
ROWS: The data boundaries of a window are determined based on the number of rows.
RANGE: The data boundaries of a window are determined based on the comparison results of the values of the column that is specified in
order by
. In most cases,order by
is specified in windowing_definition. Iforder by
is not specified in windowing_definition, the values of the column that is specified inorder by
are the same for all rows in a partition. null values are considered equivalent.In a partition, rows that have the same value of the column specified in
order by
form a group. Iforder by
is not specified, all rows in the partition form a group. NULL values are considered equivalent.
frame_start and frame_end: the start and end rows of a window. frame_start is required. frame_end is optional. If frame_end is not specified, the default value CURRENT ROW is used.
The row specified by frame_start must precede or be the same as the row specified by frame_end. Compared with the row specified by frame_end, the row specified by frame_start is closer to the first row in a window after all data in the window is sorted based on the column that is specified in
order by
of windowing_definition. The following table describes the valid values and logic of frame_start and frame_end when the type of frame_clause is ROWS, RANGE, or GROUPS.frame_clause type
frame_start or frame_end value
Description
ROWS, RANGE, and GROUPS
UNBOUNDED PRECEDING
Indicates the first row of a partition. Rows are counted from 1.
UNBOUNDED FOLLOWING
Indicates the last row of a partition.
ROWS
CURRENT ROW
Indicates the current row. Each row of data corresponds to a result calculated by a window function. The current row indicates the row whose data is calculated by using a window function.
offset PRECEDING
Indicates the Nth row that precedes the current row at a given
offset
. For example, if0 PRECEDING
indicates the current row,1 PRECEDING
indicates the previous row.offset
must be a non-negative integer.offset FOLLOWING
Indicates the Nth row that follows the current row at a given
offset
. For example, if0 FOLLOWING
indicates the current row,1 FOLLOWING
indicates the next row.offset
must be a non-negative integer.RANGE
CURRENT ROW
If frame_start is set to CURRENT ROW, it indicates the first row that has the same value of the column specified in
order by
as the current row.If frame_end is set to CURRENT ROW, it indicates the last row that has the same value of the column specified in
order by
as the current row.
offset PRECEDING
The rows that are specified by frame_start and frame_end are determined based on the sorting order that is specified by
order by
. For example, data in a window is sorted based on column X, Xi indicates the X value that corresponds to the ith row, and Xc indicates the X value that corresponds to the current row. Positions of rows specified by frame_start and frame_end:order by
is set to asc:frame_start indicates the first row that meets the following requirement:
Xc - Xi <= offset
.frame_end indicates the last row that meets the following requirement:
Xc - Xi >= offset
.
order by
is set to desc:frame_start indicates the first row that meets the following requirement:
Xi - Xc <= offset
.frame_end indicates the last row that meets the following requirement:
Xi - Xc >= offset
.
The column that is specified by
order by
can be of the following data types: TINYINT, SMALLINT, INT, BIGINT, FLOAT, DOUBLE, DECIMAL, DATETIME, DATE, and TIMESTAMP.Syntax for
offset
of a date type:N
: indicates N days or N seconds. It must be a non-negative integer. For an offset of the DATETIME or TIMESTAMP type, it indicates N seconds. For an offset of the DATE type, it indicates N days.interval 'N' {YEAR\MONTH\DAY\HOUR\MINUTE\SECOND}
: indicates N years, months, days, hours, minutes, or seconds. For example,INTERVAL '3' YEAR
indicates 3 years.INTERVAL 'N-M' YEAR TO MONTH
: indicates N years and M months. For example,INTERVAL '1-3' YEAR TO MONTH
indicates 1 year and 3 months.INTERVAL 'D[ H[:M[:S[:N]]]]' DAY TO SECOND
: indicates D days, H hours, M minutes, S seconds, and N nanoseconds. For example,INTERVAL '1 2:3:4:5' DAY TO SECOND
indicates 1 day, 2 hours, 3 minutes, 4 seconds, and 5 nanoseconds.
offset FOLLOWING
The rows that are specified by frame_start and frame_end are determined based on the sorting order that is specified by
order by
. For example, data in a window is sorted based on column X, Xi indicates the X value that corresponds to the ith row, and Xc indicates the X value that corresponds to the current row. Positions of rows specified by frame_start and frame_end:order by
is set to asc:frame_start: indicates the first row that meets the following requirement:
Xi - Xc >= offset
.frame_end: indicates the last row that meets the following requirement:
Xi - Xc <= offset
.
order by
is set to desc:frame_start: indicates the first row that meets the following requirement:
Xc - Xi >= offset
.frame_end: indicates the last row that meets the following requirement:
Xc - Xi <= offset
.
GROUPS
CURRENT ROW
If frame_start is set to CURRENT ROW, it indicates the first row of the group to which the current row belongs.
If frame_end is set to CURRENT ROW, it indicates the last row of the group to which the current row belongs.
offset PRECEDING
If frame_start is set to offset PRECEDING, it indicates the first row of the Nth group that precedes the group of the current row at a given
offset
.If frame_end is set to offset PRECEDING, it indicates the last row of the Nth group that precedes the group of the current row at a given
offset
.
NoteYou cannot set frame_start to UNBOUNDED FOLLOWING, and you cannot set frame_end to UNBOUNED PRECEDING.
offset FOLLOWING
If frame_start is set to offset FOLLOWING, it indicates the first row of the Nth group that follows the group of the current row at a given
offset
.If frame_end is set to offset FOLLOWING, the last row of the Nth group that follows the group of the current row at a specified
offset
is used.
NoteYou cannot set frame_start to UNBOUNDED FOLLOWING, and you cannot set frame_end to UNBOUNED PRECEDING.
frame_exclusion: optional. This parameter is used to remove specific rows from a window. Valid values:
EXCLUDE NO OTHERS: No rows are excluded from the window.
EXCLUDE CURRENT ROW: The current row is excluded from the window.
EXCLUDE GROUP: An entire group of rows in a partition is excluded from the window. In the group, all rows have the same value of the column that is specified in
order by
as the current row.EXCLUDE TIES: An entire group of rows, except for the current row, are excluded from the window.
Default frame_clause
If you do not specify frame_clause, MaxCompute uses the default frame_clause to determine the data boundaries of a window. Values of the default frame_clause:
If
odps.sql.hive.compatible
is set to true, the following default frame_clause is used. This rule applies to most SQL systems.RANGE between UNBOUNDED PRECEDING and CURRENT ROW EXCLUDE NO OTHERS
If
odps.sql.hive.compatible
is set to false,order by
is specified, and one of the following window functions is used, the default frame_clause in ROWS mode is used: AVG, COUNT, MAX, MIN, STDDEV, STEDEV_POP, STDDEV_SAMP, and SUM.ROWS between UNBOUNDED PRECEDING and CURRENT ROW EXCLUDE NO OTHERS
Example of data boundaries of a window
In this example, a table named tbl contains three columns that are of the BIGINT type: pid, oid, and rid
. The tbl table contains the following data:
+------------+------------+------------+
| pid | oid | rid |
+------------+------------+------------+
| 1 | NULL | 1 |
| 1 | NULL | 2 |
| 1 | 1 | 3 |
| 1 | 1 | 4 |
| 1 | 2 | 5 |
| 1 | 4 | 6 |
| 1 | 7 | 7 |
| 1 | 11 | 8 |
| 2 | NULL | 9 |
| 2 | NULL | 10 |
+------------+------------+------------+
Windows in ROWS mode
windowing_definition 1
partition by pid order by oid ROWS between UNBOUNDED PRECEDING and CURRENT ROW -- Sample SQL statement: select pid, oid, rid, collect_list(rid) over(partition by pid order by oid ROWS between UNBOUNDED PRECEDING and CURRENT ROW) as window from tbl;
The following result is returned:
+------------+------------+------------+--------+ | pid | oid | rid | window | +------------+------------+------------+--------+ | 1 | NULL | 1 | [1] | | 1 | NULL | 2 | [1, 2] | | 1 | 1 | 3 | [1, 2, 3] | | 1 | 1 | 4 | [1, 2, 3, 4] | | 1 | 2 | 5 | [1, 2, 3, 4, 5] | | 1 | 4 | 6 | [1, 2, 3, 4, 5, 6] | | 1 | 7 | 7 | [1, 2, 3, 4, 5, 6, 7] | | 1 | 11 | 8 | [1, 2, 3, 4, 5, 6, 7, 8] | | 2 | NULL | 9 | [9] | | 2 | NULL | 10 | [9, 10] | +------------+------------+------------+--------+
windowing_definition 2
partition by pid order by oid ROWS between UNBOUNDED PRECEDING and UNBOUNDED FOLLOWING -- Sample SQL statement: select pid, oid, rid, collect_list(rid) over(partition by pid order by oid ROWS between UNBOUNDED PRECEDING and UNBOUNDED FOLLOWING) as window from tbl;
The following result is returned:
+------------+------------+------------+--------+ | pid | oid | rid | window | +------------+------------+------------+--------+ | 1 | NULL | 1 | [1, 2, 3, 4, 5, 6, 7, 8] | | 1 | NULL | 2 | [1, 2, 3, 4, 5, 6, 7, 8] | | 1 | 1 | 3 | [1, 2, 3, 4, 5, 6, 7, 8] | | 1 | 1 | 4 | [1, 2, 3, 4, 5, 6, 7, 8] | | 1 | 2 | 5 | [1, 2, 3, 4, 5, 6, 7, 8] | | 1 | 4 | 6 | [1, 2, 3, 4, 5, 6, 7, 8] | | 1 | 7 | 7 | [1, 2, 3, 4, 5, 6, 7, 8] | | 1 | 11 | 8 | [1, 2, 3, 4, 5, 6, 7, 8] | | 2 | NULL | 9 | [9, 10] | | 2 | NULL | 10 | [9, 10] | +------------+------------+------------+--------+
windowing_definition 3
partition by pid order by oid ROWS between 1 FOLLOWING and 3 FOLLOWING -- Sample SQL statement: select pid, oid, rid, collect_list(rid) over(partition by pid order by oid ROWS between 1 FOLLOWING and 3 FOLLOWING) as window from tbl;
The following result is returned:
+------------+------------+------------+--------+ | pid | oid | rid | window | +------------+------------+------------+--------+ | 1 | NULL | 1 | [2, 3, 4] | | 1 | NULL | 2 | [3, 4, 5] | | 1 | 1 | 3 | [4, 5, 6] | | 1 | 1 | 4 | [5, 6, 7] | | 1 | 2 | 5 | [6, 7, 8] | | 1 | 4 | 6 | [7, 8] | | 1 | 7 | 7 | [8] | | 1 | 11 | 8 | NULL | | 2 | NULL | 9 | [10] | | 2 | NULL | 10 | NULL | +------------+------------+------------+--------+
windowing_definition 4
partition by pid order by oid ROWS between UNBOUNDED PRECEDING and CURRENT ROW EXCLUDE CURRENT ROW -- Sample SQL statement: select pid, oid, rid, collect_list(rid) over(partition by pid order by oid ROWS between UNBOUNDED PRECEDING and CURRENT ROW EXCLUDE CURRENT ROW) as window from tbl;
The following result is returned:
+------------+------------+------------+--------+ | pid | oid | rid | window | +------------+------------+------------+--------+ | 1 | NULL | 1 | NULL | | 1 | NULL | 2 | [1] | | 1 | 1 | 3 | [1, 2] | | 1 | 1 | 4 | [1, 2, 3] | | 1 | 2 | 5 | [1, 2, 3, 4] | | 1 | 4 | 6 | [1, 2, 3, 4, 5] | | 1 | 7 | 7 | [1, 2, 3, 4, 5, 6] | | 1 | 11 | 8 | [1, 2, 3, 4, 5, 6, 7] | | 2 | NULL | 9 | NULL | | 2 | NULL | 10 | [9] | +------------+------------+------------+--------+
windowing_definition 5
partition by pid order by oid ROWS between UNBOUNDED PRECEDING and CURRENT ROW EXCLUDE GROUP -- Sample SQL statement: select pid, oid, rid, collect_list(rid) over(partition by pid order by oid ROWS between UNBOUNDED PRECEDING and CURRENT ROW EXCLUDE GROUP) as window from tbl;
The following result is returned:
+------------+------------+------------+--------+ | pid | oid | rid | window | +------------+------------+------------+--------+ | 1 | NULL | 1 | NULL | | 1 | NULL | 2 | NULL | | 1 | 1 | 3 | [1, 2] | | 1 | 1 | 4 | [1, 2] | | 1 | 2 | 5 | [1, 2, 3, 4] | | 1 | 4 | 6 | [1, 2, 3, 4, 5] | | 1 | 7 | 7 | [1, 2, 3, 4, 5, 6] | | 1 | 11 | 8 | [1, 2, 3, 4, 5, 6, 7] | | 2 | NULL | 9 | NULL | | 2 | NULL | 10 | NULL | +------------+------------+------------+--------+
windowing_definition 6
partition by pid order by oid ROWS between UNBOUNDED PRECEDING and CURRENT ROW EXCLUDE TIES -- Sample SQL statement: select pid, oid, rid, collect_list(rid) over(partition by pid order by oid ROWS between UNBOUNDED PRECEDING and CURRENT ROW EXCLUDE TIES) as window from tbl;
The following result is returned:
+------------+------------+------------+--------+ | pid | oid | rid | window | +------------+------------+------------+--------+ | 1 | NULL | 1 | [1] | | 1 | NULL | 2 | [2] | | 1 | 1 | 3 | [1, 2, 3] | | 1 | 1 | 4 | [1, 2, 4] | | 1 | 2 | 5 | [1, 2, 3, 4, 5] | | 1 | 4 | 6 | [1, 2, 3, 4, 5, 6] | | 1 | 7 | 7 | [1, 2, 3, 4, 5, 6, 7] | | 1 | 11 | 8 | [1, 2, 3, 4, 5, 6, 7, 8] | | 2 | NULL | 9 | [9] | | 2 | NULL | 10 | [10] | +------------+------------+------------+--------+
The differences between EXCLUDE CURRENT ROW and EXCLUDE GROUP can be obtained based on the comparison between the
window
column values of rows with therid
column values of 2, 4, and 10 in Syntax 5 and Syntax 6. If frame_exclusion is set to EXCLUDE GROUP, the rows that have the samepid
column value in a partition are extracted when the rows have the sameoid
column value as the current row.
Windows in RANGE mode
windowing_definition 1
partition by pid order by oid RANGE between UNBOUNDED PRECEDING and CURRENT ROW -- Sample SQL statement: select pid, oid, rid, collect_list(rid) over(partition by pid order by oid RANGE between UNBOUNDED PRECEDING and CURRENT ROW) as window from tbl;
The following result is returned:
+------------+------------+------------+--------+ | pid | oid | rid | window | +------------+------------+------------+--------+ | 1 | NULL | 1 | [1, 2] | | 1 | NULL | 2 | [1, 2] | | 1 | 1 | 3 | [1, 2, 3, 4] | | 1 | 1 | 4 | [1, 2, 3, 4] | | 1 | 2 | 5 | [1, 2, 3, 4, 5] | | 1 | 4 | 6 | [1, 2, 3, 4, 5, 6] | | 1 | 7 | 7 | [1, 2, 3, 4, 5, 6, 7] | | 1 | 11 | 8 | [1, 2, 3, 4, 5, 6, 7, 8] | | 2 | NULL | 9 | [9, 10] | | 2 | NULL | 10 | [9, 10] | +------------+------------+------------+--------+
If frame_end is set to CURRENT ROW, the last row that has the same value of the
oid
column inorder by
as the current row is obtained. Therefore, thewindow
column value of the row whoserid
column value is 1 is [1, 2].windowing_definition 2
partition by pid order by oid RANGE between CURRENT ROW and UNBOUNDED FOLLOWING -- Sample SQL statement: select pid, oid, rid, collect_list(rid) over(partition by pid order by oid RANGE between CURRENT ROW and UNBOUNDED FOLLOWING) as window from tbl;
The following result is returned:
+------------+------------+------------+--------+ | pid | oid | rid | window | +------------+------------+------------+--------+ | 1 | NULL | 1 | [1, 2, 3, 4, 5, 6, 7, 8] | | 1 | NULL | 2 | [1, 2, 3, 4, 5, 6, 7, 8] | | 1 | 1 | 3 | [3, 4, 5, 6, 7, 8] | | 1 | 1 | 4 | [3, 4, 5, 6, 7, 8] | | 1 | 2 | 5 | [5, 6, 7, 8] | | 1 | 4 | 6 | [6, 7, 8] | | 1 | 7 | 7 | [7, 8] | | 1 | 11 | 8 | [8] | | 2 | NULL | 9 | [9, 10] | | 2 | NULL | 10 | [9, 10] | +------------+------------+------------+--------+
windowing_definition 3
partition by pid order by oid RANGE between 3 PRECEDING and 1 PRECEDING -- Sample SQL statement: select pid, oid, rid, collect_list(rid) over(partition by pid order by oid RANGE between 3 PRECEDING and 1 PRECEDING) as window from tbl;
The following result is returned:
+------------+------------+------------+--------+ | pid | oid | rid | window | +------------+------------+------------+--------+ | 1 | NULL | 1 | [1, 2] | | 1 | NULL | 2 | [1, 2] | | 1 | 1 | 3 | NULL | | 1 | 1 | 4 | NULL | | 1 | 2 | 5 | [3, 4] | | 1 | 4 | 6 | [3, 4, 5] | | 1 | 7 | 7 | [6] | | 1 | 11 | 8 | NULL | | 2 | NULL | 9 | [9, 10] | | 2 | NULL | 10 | [9, 10] | +------------+------------+------------+--------+
For the row whose value of
oid
inorder by
is null, if frame_start is set tooffset PRECEDING or offset FOLLOWING
, the row is the first row whose value of oid inorder by
is null. If frame_end is set to offset PRECEDING or offset FOLLOWING, the row is the last row whose value of oid inorder by
is null.
Windows in GROUPS mode
windowing_definition
partition by pid order by oid GROUPS between 2 PRECEDING and CURRENT ROW -- Sample SQL statement: select pid, oid, rid, collect_list(rid) over(partition by pid order by oid GROUPS between 2 PRECEDING and CURRENT ROW) as window from tbl;
The following result is returned:
+------------+------------+------------+--------+ | pid | oid | rid | window | +------------+------------+------------+--------+ | 1 | NULL | 1 | [1, 2] | | 1 | NULL | 2 | [1, 2] | | 1 | 1 | 3 | [1, 2, 3, 4] | | 1 | 1 | 4 | [1, 2, 3, 4] | | 1 | 2 | 5 | [1, 2, 3, 4, 5] | | 1 | 4 | 6 | [3, 4, 5, 6] | | 1 | 7 | 7 | [5, 6, 7] | | 1 | 11 | 8 | [6, 7, 8] | | 2 | NULL | 9 | [9, 10] | | 2 | NULL | 10 | [9, 10] | +------------+------------+------------+--------+
Sample data
This section provides sample source data and examples for you to understand how to use the functions. Create a table named emp and insert the sample data into the table. Sample commands:
create table if not exists emp
(empno bigint,
ename string,
job string,
mgr bigint,
hiredate datetime,
sal bigint,
comm bigint,
deptno bigint);
tunnel upload emp.txt emp;
The emp.txt file contains the following sample data:
7369,SMITH,CLERK,7902,1980-12-17 00:00:00,800,,20
7499,ALLEN,SALESMAN,7698,1981-02-20 00:00:00,1600,300,30
7521,WARD,SALESMAN,7698,1981-02-22 00:00:00,1250,500,30
7566,JONES,MANAGER,7839,1981-04-02 00:00:00,2975,,20
7654,MARTIN,SALESMAN,7698,1981-09-28 00:00:00,1250,1400,30
7698,BLAKE,MANAGER,7839,1981-05-01 00:00:00,2850,,30
7782,CLARK,MANAGER,7839,1981-06-09 00:00:00,2450,,10
7788,SCOTT,ANALYST,7566,1987-04-19 00:00:00,3000,,20
7839,KING,PRESIDENT,,1981-11-17 00:00:00,5000,,10
7844,TURNER,SALESMAN,7698,1981-09-08 00:00:00,1500,0,30
7876,ADAMS,CLERK,7788,1987-05-23 00:00:00,1100,,20
7900,JAMES,CLERK,7698,1981-12-03 00:00:00,950,,30
7902,FORD,ANALYST,7566,1981-12-03 00:00:00,3000,,20
7934,MILLER,CLERK,7782,1982-01-23 00:00:00,1300,,10
7948,JACCKA,CLERK,7782,1981-04-12 00:00:00,5000,,10
7956,WELAN,CLERK,7649,1982-07-20 00:00:00,2450,,10
7956,TEBAGE,CLERK,7748,1982-12-30 00:00:00,1300,,10
AVG
Syntax
double avg([distinct] double <expr>) over ([partition_clause] [orderby_clause] [frame_clause]) decimal avg([distinct] decimal <expr>) over ([partition_clause] [orderby_clause] [frame_clause])
Description
Returns the average value of expr in a window.
Parameters
expr: required. The expression that is used to calculate the returned result. A value of the DOUBLE or DECIMAL type.
If the input value is of the STRING or BIGINT type, the value is implicitly converted into a value of the DOUBLE type before calculation. If the input value is of another data type, an error is returned.
If the input value is null, the row that contains the value is not used for calculation.
If the distinct keyword is specified, the average value of distinct values is calculated.
partition_clause, orderby_clause, and frame_clause: For more information about these parameters, see windowing_definition.
Return value
If the input value of expr is of the DECIMAL type, a value of the DECIMAL type is returned. If the input value is of another data type, a value of the DOUBLE type is returned. If the input value of expr is null, null is returned.
Examples
Example 1: Use the deptno column to define a window and calculate the average value of the sal column. The order by clause is not specified. This function returns the cumulative average value of the values from the first row to the last row in the current window. The current window includes the rows that have the same deptno column value. Sample statement:
select deptno, sal, avg(sal) over (partition by deptno) from emp;
The following result is returned:
+------------+------------+------------+ | deptno | sal | _c2 | +------------+------------+------------+ | 10 | 1300 | 2916.6666666666665 | -- This row is the first row of this window. The return value is the cumulative average value of the values from the first row to the sixth row. | 10 | 2450 | 2916.6666666666665 | -- The return value is the cumulative average value of the values from the first row to the sixth row. | 10 | 5000 | 2916.6666666666665 | -- The return value is the cumulative average value of the values from the first row to the sixth row. | 10 | 1300 | 2916.6666666666665 | | 10 | 5000 | 2916.6666666666665 | | 10 | 2450 | 2916.6666666666665 | | 20 | 3000 | 2175.0 | | 20 | 3000 | 2175.0 | | 20 | 800 | 2175.0 | | 20 | 1100 | 2175.0 | | 20 | 2975 | 2175.0 | | 30 | 1500 | 1566.6666666666667 | | 30 | 950 | 1566.6666666666667 | | 30 | 1600 | 1566.6666666666667 | | 30 | 1250 | 1566.6666666666667 | | 30 | 1250 | 1566.6666666666667 | | 30 | 2850 | 1566.6666666666667 | +------------+------------+------------+
Example 2: Disable the Hive-compatible data type edition and use the deptno column to define a window and calculate the average value of the sal column. ORDER BY is specified. This function returns the cumulative average value of the values from the first row to the current row in the current window. The current window includes the rows that have the same deptno value. Sample statements:
-- Disable the Hive-compatible data type edition. set odps.sql.hive.compatible=false; -- Execute the following statement: select deptno, sal, avg(sal) over (partition by deptno order by sal) from emp;
The following result is returned:
+------------+------------+------------+ | deptno | sal | _c2 | +------------+------------+------------+ | 10 | 1300 | 1300.0 | -- This row is the first row of this window. | 10 | 1300 | 1300.0 | -- The return value is the cumulative average value of the values in the first and second rows. | 10 | 2450 | 1683.3333333333333 | -- The return value is the cumulative average value of the values from the first row to the third row. | 10 | 2450 | 1875.0 | -- The return value is the cumulative average value of the values from the first row to the fourth row. | 10 | 5000 | 2500.0 | -- The return value is the cumulative average value of the values from the first row to the fifth row. | 10 | 5000 | 2916.6666666666665 | -- The return value is the cumulative average value of the values from the first row to the sixth row. | 20 | 800 | 800.0 | | 20 | 1100 | 950.0 | | 20 | 2975 | 1625.0 | | 20 | 3000 | 1968.75 | | 20 | 3000 | 2175.0 | | 30 | 950 | 950.0 | | 30 | 1250 | 1100.0 | | 30 | 1250 | 1150.0 | | 30 | 1500 | 1237.5 | | 30 | 1600 | 1310.0 | | 30 | 2850 | 1566.6666666666667 | +------------+------------+------------+
Example 3: Enable the Hive-compatible data type edition and use the deptno column to define a window and calculate the average value of the sal column. ORDER BY is specified. This function returns the cumulative average value of the values from the first row to the row that has the same sal value as the current row in the current window. The average values for the rows that have the same sal value are the same. The current window includes the rows that have the same deptno value. Sample statements:
-- Enable the Hive-compatible data type edition. set odps.sql.hive.compatible=true; -- Execute the following statement: select deptno, sal, avg(sal) over (partition by deptno order by sal) from emp;
The following result is returned:
+------------+------------+------------+ | deptno | sal | _c2 | +------------+------------+------------+ | 10 | 1300 | 1300.0 | -- This row is the first row of this window. The average value for the first row is the cumulative average value of the values in the first and second rows because the two rows have the same sal value. | 10 | 1300 | 1300.0 | -- The return value is the cumulative average value of the values in the first and second rows. | 10 | 2450 | 1875.0 | -- The average value for the third row is the cumulative average value of the values from the first row to the fourth row because the third and fourth rows have the same sal value. | 10 | 2450 | 1875.0 | -- The return value is the cumulative average value of the values from the first row to the fourth row. | 10 | 5000 | 2916.6666666666665 | | 10 | 5000 | 2916.6666666666665 | | 20 | 800 | 800.0 | | 20 | 1100 | 950.0 | | 20 | 2975 | 1625.0 | | 20 | 3000 | 2175.0 | | 20 | 3000 | 2175.0 | | 30 | 950 | 950.0 | | 30 | 1250 | 1150.0 | | 30 | 1250 | 1150.0 | | 30 | 1500 | 1237.5 | | 30 | 1600 | 1310.0 | | 30 | 2850 | 1566.6666666666667 | +------------+------------+------------+
CLUSTER_SAMPLE
Syntax
boolean cluster_sample(bigint <N>) OVER ([partition_clause]) boolean cluster_sample(bigint <N>, bigint <M>) OVER ([partition_clause])
Description
cluster_sample(bigint <N>)
: specifies that N random rows of data are sampled.cluster_sample(bigint <N>, bigint <M>)
: specifies that rows are sampled based on a specified ratio (M/N). The number of rows that are sampled is calculated by using the following formula:partition_row_count × M/N
.partition_row_count
specifies the number of rows in a partition.
Parameters
N: required. A constant of the BIGINT type. If N is set to null, null is returned.
M: required. A constant of the BIGINT type. If M is set to null, null is returned.
partition_clause: optional. For more information, see windowing_definition.
Return value
A value of the BOOLEAN type is returned.
Examples
Sample about 20% of data entries in each group. Sample statement:
select deptno, sal from ( select deptno, sal, cluster_sample(5, 1) over (partition by deptno) as flag from emp ) sub where flag = true;
The following result is returned:
+------------+------------+ | deptno | sal | +------------+------------+ | 10 | 1300 | | 20 | 3000 | | 30 | 950 | +------------+------------+
COUNT
Syntax
bigint count(*) over ([partition_clause] [orderby_clause] [frame_clause]) bigint count([distinct] <expr>[,...]) over ([partition_clause] [orderby_clause] [frame_clause])
Description
count(*)
: the total number of rows to return.count([distinct] <expr>[,...])
: When you calculate the number of rows, the rows whose expr is null are ignored. If multiple expr parameters exist, any rows whose expr is null are ignored. In addition, if the distinct keyword is specified, the number of rows after deduplication is calculated. Any rows whose expr is null are ignored.
Parameters
expr: required. This parameter specifies the column whose values you want to count. All data types are supported. If the value for a row is null, this row is not used for the calculation. If the DISTINCT keyword is specified, the count value of distinct values is obtained.
partition_clause, orderby_clause, and frame_clause: For more information about these parameters, see windowing_definition.
Return value
A value of the BIGINT type is returned.
Sample statement
Example 1: Use the sal column to define a window. The order by clause is not specified. This function returns the cumulative count from the first row to the last row in the current window. The current window includes the rows that have the same sal value. Sample statement:
select sal, count(sal) over (partition by sal) as count from emp;
The following result is returned:
+------------+------------+ | sal | count | +------------+------------+ | 800 | 1 | | 950 | 1 | | 1100 | 1 | | 1250 | 2 | -- This row is the first row of this window. The value of COUNT for the first row is the cumulative count for the second row because the two rows have the same sal value. | 1250 | 2 | -- The return value is the cumulative count from the first row to the second row in the current window. | 1300 | 2 | | 1300 | 2 | | 1500 | 1 | | 1600 | 1 | | 2450 | 2 | | 2450 | 2 | | 2850 | 1 | | 2975 | 1 | | 3000 | 2 | | 3000 | 2 | | 5000 | 2 | | 5000 | 2 | +------------+------------+
Example 2: Disable the Hive-compatible data type edition and use the sal column to define a window. ORDER BY is specified. This function returns the cumulative count from the first row to the current row in the current window. The current window includes the rows that have the same sal value. Sample statements:
-- Disable the Hive-compatible data type edition. set odps.sql.hive.compatible=false; -- Execute the following statement: select sal, count(sal) over (partition by sal order by sal) as count from emp;
The following result is returned:
+------------+------------+ | sal | count | +------------+------------+ | 800 | 1 | | 950 | 1 | | 1100 | 1 | | 1250 | 1 | -- This row is the first row of this window. The cumulative count for the first row is 1. | 1250 | 2 | -- The cumulative count for the second row is 2. | 1300 | 1 | | 1300 | 2 | | 1500 | 1 | | 1600 | 1 | | 2450 | 1 | | 2450 | 2 | | 2850 | 1 | | 2975 | 1 | | 3000 | 1 | | 3000 | 2 | | 5000 | 1 | | 5000 | 2 | +------------+------------+
Example 3: Enable the Hive-compatible data type edition and use the sal column to define a window. ORDER BY is specified. This function returns the cumulative count from the first row to the last row in the current window. The current window includes the rows that have the same sal value. Sample statements:
-- Enable the Hive-compatible data type edition. set odps.sql.hive.compatible=true; -- Execute the following statement: select sal, count(sal) over (partition by sal order by sal) as count from emp;
The following result is returned:
+------------+------------+ | sal | count | +------------+------------+ | 800 | 1 | | 950 | 1 | | 1100 | 1 | | 1250 | 2 | -- This row is the first row of this window. The value of COUNT for the first row is the cumulative count for the second row because the two rows have the same sal value. | 1250 | 2 | -- The return value is the cumulative count from the first row to the second row in the current window. | 1300 | 2 | | 1300 | 2 | | 1500 | 1 | | 1600 | 1 | | 2450 | 2 | | 2450 | 2 | | 2850 | 1 | | 2975 | 1 | | 3000 | 2 | | 3000 | 2 | | 5000 | 2 | | 5000 | 2 | +------------+------------+
CUME_DIST
Syntax
double cume_dist() over([partition_clause] [orderby_clause])
Description
Calculates the cumulative distribution. The cumulative distribution indicates the ratio of rows whose values are greater than or equal to the values of the current row to all rows in a partition. The ratio is determined based on orderby_clause.
Parameters
partition_clause and orderby_clause: For more information about these parameters, see windowing_definition.
Return value
A value of the DOUBLE type is returned. The return value is calculated by using the following formula:
row_number_of_last_peer/partition_row_count
.row_number_of_last_peer
indicates the value returned by ROW_NUMBER that corresponds to the last row of the group to which the current row belongs.partition_row_count
indicates the number of rows in a partition to which the current row belongs.Examples
Group all employees based on the deptno column and calculate the cumulative distribution of employees in each group by salary. Sample statement:
select deptno, ename, sal, concat(round(cume_dist() over (partition by deptno order by sal desc)*100,2),'%') as cume_dist from emp;
The following result is returned:
+------------+------------+------------+------------+ | deptno | ename | sal | cume_dist | +------------+------------+------------+------------+ | 10 | JACCKA | 5000 | 33.33% | | 10 | KING | 5000 | 33.33% | | 10 | CLARK | 2450 | 66.67% | | 10 | WELAN | 2450 | 66.67% | | 10 | TEBAGE | 1300 | 100.0% | | 10 | MILLER | 1300 | 100.0% | | 20 | SCOTT | 3000 | 40.0% | | 20 | FORD | 3000 | 40.0% | | 20 | JONES | 2975 | 60.0% | | 20 | ADAMS | 1100 | 80.0% | | 20 | SMITH | 800 | 100.0% | | 30 | BLAKE | 2850 | 16.67% | | 30 | ALLEN | 1600 | 33.33% | | 30 | TURNER | 1500 | 50.0% | | 30 | MARTIN | 1250 | 83.33% | | 30 | WARD | 1250 | 83.33% | | 30 | JAMES | 950 | 100.0% | +------------+------------+------------+------------+
DENSE_RANK
Syntax
bigint dense_rank() over ([partition_clause] [orderby_clause])
Description
Returns the rank of the current row in a partition based on the order specified by orderby_clause. The rank counts from 1. In a partition, rows with the same value of the column that is specified in
order by
have the same rank. The rank increases by 1 each time the value of the column that is specified inorder by
changes.Parameters
partition_clause and orderby_clause: For more information about these parameters, see windowing_definition.
Return value
A value of the BIGINT type is returned. If orderby_clause is not specified, the values in the returned results are all 1.
Examples
Group all employees based on the deptno column. In each group, sort the employees in descending order based on the sal value to obtain the ordinal numbers of the employees in their own groups. Sample statement:
select deptno, ename, sal, dense_rank() over (partition by deptno order by sal desc) as nums from emp;
The following result is returned:
+------------+------------+------------+------------+ | deptno | ename | sal | nums | +------------+------------+------------+------------+ | 10 | JACCKA | 5000 | 1 | | 10 | KING | 5000 | 1 | | 10 | CLARK | 2450 | 2 | | 10 | WELAN | 2450 | 2 | | 10 | TEBAGE | 1300 | 3 | | 10 | MILLER | 1300 | 3 | | 20 | SCOTT | 3000 | 1 | | 20 | FORD | 3000 | 1 | | 20 | JONES | 2975 | 2 | | 20 | ADAMS | 1100 | 3 | | 20 | SMITH | 800 | 4 | | 30 | BLAKE | 2850 | 1 | | 30 | ALLEN | 1600 | 2 | | 30 | TURNER | 1500 | 3 | | 30 | MARTIN | 1250 | 4 | | 30 | WARD | 1250 | 4 | | 30 | JAMES | 950 | 5 | +------------+------------+------------+------------+
FIRST_VALUE
Syntax
first_value(<expr>[, <ignore_nulls>]) over ([partition_clause] [orderby_clause] [frame_clause])
Description
Returns the value of expr that corresponds to the first row of a window.
Parameters
expr: required. The expression that is used to calculate the returned result.
ignore_nulls: optional. A value of the BOOLEAN type. This parameter specifies whether to ignore null values. Default value: false. If this parameter is set to True, a non-null value of expr that corresponds to the first row of a window is returned.
partition_clause, orderby_clause, and frame_clause: For more information about these parameters, see windowing_definition.
Return value
A value of the same data type as expr is returned.
Sample statement
Group all employees by department and return the first row of data in each group. Sample statement:
order by is not specified.
select deptno, ename, sal, first_value(sal) over (partition by deptno) as first_value from emp;
The following result is returned:
+------------+------------+------------+-------------+ | deptno | ename | sal | first_value | +------------+------------+------------+-------------+ | 10 | TEBAGE | 1300 | 1300 | -- This row is the first row of this window. | 10 | CLARK | 2450 | 1300 | | 10 | KING | 5000 | 1300 | | 10 | MILLER | 1300 | 1300 | | 10 | JACCKA | 5000 | 1300 | | 10 | WELAN | 2450 | 1300 | | 20 | FORD | 3000 | 3000 | -- This row is the first row of this window. | 20 | SCOTT | 3000 | 3000 | | 20 | SMITH | 800 | 3000 | | 20 | ADAMS | 1100 | 3000 | | 20 | JONES | 2975 | 3000 | | 30 | TURNER | 1500 | 1500 | -- This row is the first row of this window. | 30 | JAMES | 950 | 1500 | | 30 | ALLEN | 1600 | 1500 | | 30 | WARD | 1250 | 1500 | | 30 | MARTIN | 1250 | 1500 | | 30 | BLAKE | 2850 | 1500 | +------------+------------+------------+-------------+
order by is specified.
select deptno, ename, sal, first_value(sal) over (partition by deptno order by sal desc) as first_value from emp;
The following result is returned:
+------------+------------+------------+-------------+ | deptno | ename | sal | first_value | +------------+------------+------------+-------------+ | 10 | JACCKA | 5000 | 5000 | -- This row is the first row of this window. | 10 | KING | 5000 | 5000 | | 10 | CLARK | 2450 | 5000 | | 10 | WELAN | 2450 | 5000 | | 10 | TEBAGE | 1300 | 5000 | | 10 | MILLER | 1300 | 5000 | | 20 | SCOTT | 3000 | 3000 | -- This row is the first row of this window. | 20 | FORD | 3000 | 3000 | | 20 | JONES | 2975 | 3000 | | 20 | ADAMS | 1100 | 3000 | | 20 | SMITH | 800 | 3000 | | 30 | BLAKE | 2850 | 2850 | -- This row is the first row of this window. | 30 | ALLEN | 1600 | 2850 | | 30 | TURNER | 1500 | 2850 | | 30 | MARTIN | 1250 | 2850 | | 30 | WARD | 1250 | 2850 | | 30 | JAMES | 950 | 2850 | +------------+------------+------------+-------------+
LAG
Syntax
lag(<expr>[,bigint <offset>[, <default>]]) over([partition_clause] orderby_clause)
Description
Returns the value of expr that precedes the current row at a given offset. The expression expr can be a column, a column operation, or a function operation.
Parameters
expr: required. The expression that is used to calculate the returned result.
offset: optional. The value is a constant of the BIGINT type and must be greater than or equal to 0. The value 0 indicates the current row, and the value 1 indicates the previous row. Default value: 1. If the input value is of the STRING or DOUBLE type, it is implicitly converted into a value of the BIGINT type before calculation.
default: optional. The default value when the value of offset is out of the valid range. The value of this parameter must be a constant. The default value of this parameter is null. The value of this parameter must be of the same data type as the value of expr. If the value of expr is not a constant, the parameter value is determined based on the current row.
partition_clause and orderby_clause: For more information about these parameters, see windowing_definition.
Return value
A value of the same data type as expr is returned.
Examples
Group all employees based on the deptno column and calculate the value of sal for each employee at a given offset. Sample statement:
select deptno, ename, sal, lag(sal, 1) over (partition by deptno order by sal) as sal_new from emp;
The following result is returned:
+------------+------------+------------+------------+ | deptno | ename | sal | sal_new | +------------+------------+------------+------------+ | 10 | TEBAGE | 1300 | NULL | | 10 | MILLER | 1300 | 1300 | | 10 | CLARK | 2450 | 1300 | | 10 | WELAN | 2450 | 2450 | | 10 | KING | 5000 | 2450 | | 10 | JACCKA | 5000 | 5000 | | 20 | SMITH | 800 | NULL | | 20 | ADAMS | 1100 | 800 | | 20 | JONES | 2975 | 1100 | | 20 | SCOTT | 3000 | 2975 | | 20 | FORD | 3000 | 3000 | | 30 | JAMES | 950 | NULL | | 30 | MARTIN | 1250 | 950 | | 30 | WARD | 1250 | 1250 | | 30 | TURNER | 1500 | 1250 | | 30 | ALLEN | 1600 | 1500 | | 30 | BLAKE | 2850 | 1600 | +------------+------------+------------+------------+
LAST_VALUE
Syntax
last_value(<expr>[, <ignore_nulls>]) over([partition_clause] [orderby_clause] [frame_clause])
Description
Returns the calculated result of the last row of data in a window using the expression that is specified by expr.
Parameters
expr: required. The expression that is used to calculate the returned result.
ignore_nulls: optional. A value of the BOOLEAN type. This parameter specifies whether to ignore null values. Default value: false. If this parameter is set to true, a non-null value of expr that corresponds to the last row of a window is returned.
partition_clause, orderby_clause, and frame_clause: For more information about these parameters, see windowing_definition.
Return value
A value of the same data type as expr is returned.
Examples
Group all employees by department and return the last row of data in each group. Sample statement:
If order by is not specified, the rows from the first row to the last row belong to the current window. The value of the last row in the current window is returned.
select deptno, ename, sal, last_value(sal) over (partition by deptno) as last_value from emp;
The following result is returned:
+------------+------------+------------+-------------+ | deptno | ename | sal | last_value | +------------+------------+------------+-------------+ | 10 | TEBAGE | 1300 | 2450 | | 10 | CLARK | 2450 | 2450 | | 10 | KING | 5000 | 2450 | | 10 | MILLER | 1300 | 2450 | | 10 | JACCKA | 5000 | 2450 | | 10 | WELAN | 2450 | 2450 | -- This row is the last row of this window. | 20 | FORD | 3000 | 2975 | | 20 | SCOTT | 3000 | 2975 | | 20 | SMITH | 800 | 2975 | | 20 | ADAMS | 1100 | 2975 | | 20 | JONES | 2975 | 2975 | -- This row is the last row of this window. | 30 | TURNER | 1500 | 2850 | | 30 | JAMES | 950 | 2850 | | 30 | ALLEN | 1600 | 2850 | | 30 | WARD | 1250 | 2850 | | 30 | MARTIN | 1250 | 2850 | | 30 | BLAKE | 2850 | 2850 | -- This row is the last row of the current window. +------------+------------+------------+-------------+
If order by is specified, the rows from the first row to the current row belong to the current window. The value of the current row in the current window is returned.
select deptno, ename, sal, last_value(sal) over (partition by deptno order by sal desc) as last_value from emp;
The following result is returned:
+------------+------------+------------+-------------+ | deptno | ename | sal | last_value | +------------+------------+------------+-------------+ | 10 | JACCKA | 5000 | 5000 | -- This row is the current row of this window. | 10 | KING | 5000 | 5000 | -- This row is the current row of this window. | 10 | CLARK | 2450 | 2450 | -- This row is the current row of this window. | 10 | WELAN | 2450 | 2450 | -- This row is the current row of this window. | 10 | TEBAGE | 1300 | 1300 | -- This row is the current row of this window. | 10 | MILLER | 1300 | 1300 | -- This row is the current row of this window. | 20 | SCOTT | 3000 | 3000 | -- This row is the current row of this window. | 20 | FORD | 3000 | 3000 | -- This row is the current row of this window. | 20 | JONES | 2975 | 2975 | -- This row is the current row of this window. | 20 | ADAMS | 1100 | 1100 | -- This row is the current row of this window. | 20 | SMITH | 800 | 800 | -- This row is the current row of this window. | 30 | BLAKE | 2850 | 2850 | -- This row is the current row of this window. | 30 | ALLEN | 1600 | 1600 | -- This row is the current row of this window. | 30 | TURNER | 1500 | 1500 | -- This row is the current row of this window. | 30 | MARTIN | 1250 | 1250 | -- This row is the current row of this window. | 30 | WARD | 1250 | 1250 | -- This row is the current row of this window. | 30 | JAMES | 950 | 950 | -- This row is the current row of this window. +------------+------------+------------+-------------+
LEAD
Syntax
lead(<expr>[, bigint <offset>[, <default>]]) over([partition_clause] orderby_clause)
Description
Returns the value of expr that corresponds to the Nth row following the current row at a given offset. The expression expr can be a column, a column operation, or a function operation.
Parameters
expr: required. The expression that is used to calculate the returned result.
offset: optional. The value is a constant of the BIGINT type and must be greater than or equal to 0. The value 0 indicates the current row, and the value 1 indicates the next row. Default value: 1. If the input value is of the STRING or DOUBLE type, it is implicitly converted into a value of the BIGINT type before calculation.
default: optional. The default value when the value of offset is out of the valid range. The value of this parameter must be a constant. The default value of this parameter is null. The value of this parameter must be of the same data type as the value of expr. If the value of expr is not a constant, the parameter value is determined based on the current row.
partition_clause and orderby_clause: For more information about these parameters, see windowing_definition.
Return value
A value of the same data type as expr is returned.
Examples
Group all employees based on the deptno column and calculate the value of sal for each employee at a given offset. Sample statement:
select deptno, ename, sal, lead(sal, 1) over (partition by deptno order by sal) as sal_new from emp;
The following result is returned:
+------------+------------+------------+------------+ | deptno | ename | sal | sal_new | +------------+------------+------------+------------+ | 10 | TEBAGE | 1300 | 1300 | | 10 | MILLER | 1300 | 2450 | | 10 | CLARK | 2450 | 2450 | | 10 | WELAN | 2450 | 5000 | | 10 | KING | 5000 | 5000 | | 10 | JACCKA | 5000 | NULL | | 20 | SMITH | 800 | 1100 | | 20 | ADAMS | 1100 | 2975 | | 20 | JONES | 2975 | 3000 | | 20 | SCOTT | 3000 | 3000 | | 20 | FORD | 3000 | NULL | | 30 | JAMES | 950 | 1250 | | 30 | MARTIN | 1250 | 1250 | | 30 | WARD | 1250 | 1500 | | 30 | TURNER | 1500 | 1600 | | 30 | ALLEN | 1600 | 2850 | | 30 | BLAKE | 2850 | NULL | +------------+------------+------------+------------+
MAX
Syntax
max(<expr>) over([partition_clause] [orderby_clause] [frame_clause])
Description
Returns the maximum value of expr in a window.
Parameters
expr: required. The expression that is used to calculate the maximum value. The input value can be of any data type other than BOOLEAN. If the value for a row is null, this row is not used for calculation.
partition_clause, orderby_clause, and frame_clause: For more information about these parameters, see windowing_definition.
Return value
A value of the same type as expr is returned.
Examples
Example 1: Use the deptno column to define a window and obtain the maximum value of the sal column. The order by clause is not specified. This function returns the maximum value of the current window. The current window includes the rows that have the same deptno value. Sample statement:
select deptno, sal, max(sal) over (partition by deptno) from emp;
The following result is returned:
+------------+------------+------------+ | deptno | sal | _c2 | +------------+------------+------------+ | 10 | 1300 | 5000 | -- This row is the first row of this window. The return value is the maximum value among the values from the first row to the sixth row. | 10 | 2450 | 5000 | -- The return value is the maximum value among the values from the first row to the sixth row. | 10 | 5000 | 5000 | -- The return value is the maximum value among the values from the first row to the sixth row. | 10 | 1300 | 5000 | | 10 | 5000 | 5000 | | 10 | 2450 | 5000 | | 20 | 3000 | 3000 | | 20 | 3000 | 3000 | | 20 | 800 | 3000 | | 20 | 1100 | 3000 | | 20 | 2975 | 3000 | | 30 | 1500 | 2850 | | 30 | 950 | 2850 | | 30 | 1600 | 2850 | | 30 | 1250 | 2850 | | 30 | 1250 | 2850 | | 30 | 2850 | 2850 | +------------+------------+------------+
Example 2: Use the deptno column to define a window and obtain the maximum value of the sal column. The order by clause is specified. This function returns the maximum value among the values from the first row to the current row in the current window. The current window includes the rows that have the same deptno value. Sample statement:
select deptno, sal, max(sal) over (partition by deptno order by sal) from emp;
The following result is returned:
+------------+------------+------------+ | deptno | sal | _c2 | +------------+------------+------------+ | 10 | 1300 | 1300 | -- This row is the first row of this window. | 10 | 1300 | 1300 | -- The return value is the maximum value among the values in the first and second rows. | 10 | 2450 | 2450 | -- The return value is the maximum value among the values from the first row to the third row. | 10 | 2450 | 2450 | -- The return value is the maximum value among the values from the first row to the fourth row. | 10 | 5000 | 5000 | | 10 | 5000 | 5000 | | 20 | 800 | 800 | | 20 | 1100 | 1100 | | 20 | 2975 | 2975 | | 20 | 3000 | 3000 | | 20 | 3000 | 3000 | | 30 | 950 | 950 | | 30 | 1250 | 1250 | | 30 | 1250 | 1250 | | 30 | 1500 | 1500 | | 30 | 1600 | 1600 | | 30 | 2850 | 2850 | +------------+------------+------------+
MEDIAN
Syntax
median(<expr>) over ([partition_clause] [orderby_clause] [frame_clause])
Description
Returns the median of expr in a window.
Parameters
expr: required. The expression that is used to calculate the median. A value of the DOUBLE or DECIMAL type. The value of this parameter must be 1 to 255 digits in length.
If the input value is of the STRING or BIGINT type, the value is implicitly converted into a value of the DOUBLE type before calculation. If it is of another data type, an error is returned.
If the input value is null, null is returned.
partition_clause, orderby_clause, and frame_clause: For more information about these parameters, see windowing_definition.
Return value
A value of the DOUBLE or DECIMAL type is returned. If the values of all expressions that are specified by expr are null, null is returned.
Examples
Use the deptno column to define a window and calculate the median value of the sal column. This function returns the median value of the current window. The current window includes the rows that have the same deptno value. Sample statement:
select deptno, sal, median(sal) over (partition by deptno) from emp;
The following result is returned:
+------------+------------+------------+ | deptno | sal | _c2 | +------------+------------+------------+ | 10 | 1300 | 2450.0 | -- This row is the first row of this window. The return value is the median value of the values from the first row to the sixth row. | 10 | 2450 | 2450.0 | | 10 | 5000 | 2450.0 | | 10 | 1300 | 2450.0 | | 10 | 5000 | 2450.0 | | 10 | 2450 | 2450.0 | | 20 | 3000 | 2975.0 | | 20 | 3000 | 2975.0 | | 20 | 800 | 2975.0 | | 20 | 1100 | 2975.0 | | 20 | 2975 | 2975.0 | | 30 | 1500 | 1375.0 | | 30 | 950 | 1375.0 | | 30 | 1600 | 1375.0 | | 30 | 1250 | 1375.0 | | 30 | 1250 | 1375.0 | | 30 | 2850 | 1375.0 | +------------+------------+------------+
MIN
Syntax
min(<expr>) over([partition_clause] [orderby_clause] [frame_clause])
Description
Returns the minimum value of expr in a window.
Parameters
expr: required. The expression that is used to calculate the minimum value. The input value can be of any data type other than BOOLEAN. If the value for a row is null, this row is not used for calculation.
partition_clause, orderby_clause, and frame_clause: For more information about these parameters, see windowing_definition.
Return value
A value of the same data type as expr is returned.
Examples
Example 1: Use the deptno column to define a window and obtain the minimum value of the sal column. The order by clause is not specified. This function returns the minimum value of the current window. The current window includes the rows that have the same deptno value. Sample statement:
select deptno, sal, min(sal) over (partition by deptno) from emp;
The following result is returned:
+------------+------------+------------+ | deptno | sal | _c2 | +------------+------------+------------+ | 10 | 1300 | 1300 | -- This row is the first row of this window. The return value is the minimum value among the values from the first row to the sixth row. | 10 | 2450 | 1300 | -- The return value is the minimum value among the values from the first row to the sixth row. | 10 | 5000 | 1300 | -- The return value is the minimum value among the values from the first row to the sixth row. | 10 | 1300 | 1300 | | 10 | 5000 | 1300 | | 10 | 2450 | 1300 | | 20 | 3000 | 800 | | 20 | 3000 | 800 | | 20 | 800 | 800 | | 20 | 1100 | 800 | | 20 | 2975 | 800 | | 30 | 1500 | 950 | | 30 | 950 | 950 | | 30 | 1600 | 950 | | 30 | 1250 | 950 | | 30 | 1250 | 950 | | 30 | 2850 | 950 | +------------+------------+------------+
Example 2: Use the deptno column to define a window and obtain the minimum value of the sal column. The order by clause is specified. This function returns the minimum value among the values from the first row to the current row in the current window. The current window includes the rows that have the same deptno value. Sample statement:
select deptno, sal, min(sal) over (partition by deptno order by sal) from emp;
The following result is returned:
+------------+------------+------------+ | deptno | sal | _c2 | +------------+------------+------------+ | 10 | 1300 | 1300 | -- This row is the first row of this window. | 10 | 1300 | 1300 | -- The return value is the minimum value among the values in the first and second rows. | 10 | 2450 | 1300 | -- The return value is the minimum value among the values from the first row to the third row. | 10 | 2450 | 1300 | | 10 | 5000 | 1300 | | 10 | 5000 | 1300 | | 20 | 800 | 800 | | 20 | 1100 | 800 | | 20 | 2975 | 800 | | 20 | 3000 | 800 | | 20 | 3000 | 800 | | 30 | 950 | 950 | | 30 | 1250 | 950 | | 30 | 1250 | 950 | | 30 | 1500 | 950 | | 30 | 1600 | 950 | | 30 | 2850 | 950 | +------------+------------+------------+
NTILE
Syntax
bigint ntile(bigint <N>) over ([partition_clause] [orderby_clause])
Description
Splits rows of data in a partition into N groups of equal size and returns the number of the group to which the specified row belongs. If data in the partition cannot be split into N groups of equal size, one more row is preferentially allocated to the first M groups.
Parameters
N: required. This parameter specifies the number of splits. A value of the BIGINT type.
partition_clause and orderby_clause: For more information about these parameters, see windowing_definition.
Return value
A value of the BIGINT type is returned.
Examples
Divide all employees into three groups based on the sal column in descending order and obtain the number of the group to which each employee belongs. Sample statement:
select deptno, ename, sal, ntile(3) over (partition by deptno order by sal desc) as nt3 from emp;
The following result is returned:
+------------+------------+------------+------------+ | deptno | ename | sal | nt3 | +------------+------------+------------+------------+ | 10 | JACCKA | 5000 | 1 | | 10 | KING | 5000 | 1 | | 10 | CLARK | 2450 | 2 | | 10 | WELAN | 2450 | 2 | | 10 | TEBAGE | 1300 | 3 | | 10 | MILLER | 1300 | 3 | | 20 | SCOTT | 3000 | 1 | | 20 | FORD | 3000 | 1 | | 20 | JONES | 2975 | 2 | | 20 | ADAMS | 1100 | 2 | | 20 | SMITH | 800 | 3 | | 30 | BLAKE | 2850 | 1 | | 30 | ALLEN | 1600 | 1 | | 30 | TURNER | 1500 | 2 | | 30 | MARTIN | 1250 | 2 | | 30 | WARD | 1250 | 3 | | 30 | JAMES | 950 | 3 | +------------+------------+------------+------------+
NTH_VALUE
Syntax
nth_value(<expr>, <number> [, <ignore_nulls>]) over ([partition_clause] [orderby_clause] [frame_clause])
Description
Returns the value of expr that corresponds to the Nth row in a window.
Parameters
expr: required. The expression that is used to calculate the returned result.
number: required. A value of the BIGINT type. The value must be an integer greater than or equal to 1. If the input value is 1, this function is equivalent to FIRST_VALUE.
ignore_nulls: optional. A value of the BOOLEAN type. This parameter specifies whether to ignore null values. Default value: false. If this parameter is set to True, a non-null value of expr that corresponds to the Nth row of a window is returned.
partition_clause, orderby_clause, and frame_clause: For more information about these parameters, see windowing_definition.
Return value
A value of the same data type as expr is returned.
Examples
Group all employees by department and return the sixth row of data in each group. Sample statement:
If order by is not specified, the rows from the first row to the last row belong to the current window. The value of the sixth row in the current window is returned. Sample statement:
select deptno, ename, sal, nth_value(sal,6) over (partition by deptno) as nth_value from emp;
The following result is returned:
+------------+------------+------------+------------+ | deptno | ename | sal | nth_value | +------------+------------+------------+------------+ | 10 | TEBAGE | 1300 | 2450 | | 10 | CLARK | 2450 | 2450 | | 10 | KING | 5000 | 2450 | | 10 | MILLER | 1300 | 2450 | | 10 | JACCKA | 5000 | 2450 | | 10 | WELAN | 2450 | 2450 | -- This row is the sixth row of this window. | 20 | FORD | 3000 | NULL | | 20 | SCOTT | 3000 | NULL | | 20 | SMITH | 800 | NULL | | 20 | ADAMS | 1100 | NULL | | 20 | JONES | 2975 | NULL | -- This current window has less than six rows, and null is returned. | 30 | TURNER | 1500 | 2850 | | 30 | JAMES | 950 | 2850 | | 30 | ALLEN | 1600 | 2850 | | 30 | WARD | 1250 | 2850 | | 30 | MARTIN | 1250 | 2850 | | 30 | BLAKE | 2850 | 2850 | -- This row is the sixth row of the current window. +------------+------------+------------+------------+
If order by is specified, the rows from the first row to the current row belong to the current window. The value of the sixth row in the current window is returned.
select deptno, ename, sal, nth_value(sal,6) over (partition by deptno order by sal) as nth_value from emp;
The following result is returned:
+------------+------------+------------+------------+ | deptno | ename | sal | nth_value | +------------+------------+------------+------------+ | 10 | TEBAGE | 1300 | NULL | | 10 | MILLER | 1300 | NULL | -- This window has only two rows, and null is returned. | 10 | CLARK | 2450 | NULL | | 10 | WELAN | 2450 | NULL | | 10 | KING | 5000 | 5000 | | 10 | JACCKA | 5000 | 5000 | | 20 | SMITH | 800 | NULL | | 20 | ADAMS | 1100 | NULL | | 20 | JONES | 2975 | NULL | | 20 | SCOTT | 3000 | NULL | | 20 | FORD | 3000 | NULL | | 30 | JAMES | 950 | NULL | | 30 | MARTIN | 1250 | NULL | | 30 | WARD | 1250 | NULL | | 30 | TURNER | 1500 | NULL | | 30 | ALLEN | 1600 | NULL | | 30 | BLAKE | 2850 | 2850 | +------------+------------+------------+------------+
PERCENT_RANK
Syntax
double percent_rank() over([partition_clause] [orderby_clause])
Description
Calculates the percentile rank of the current row in a partition based on orderby_clause.
Parameters
partition_clause and orderby_clause: For more information about these parameters, see windowing_definition.
Return value
A value of the DOUBLE type is returned. The valid value range is [0.0, 1.0]. The return value is calculated by using the following formula:
"(rank - 1)/(partition_row_count - 1)"
.rank
indicates the return value of the RANK function that corresponds to the current row.partition_row_count
indicates the number of rows in the partition to which the current row belongs. If the partition contains only one row of data, 0.0 is returned.Examples
Calculate the percentile rank of each employee in a group based on the sal column. Sample statement:
select deptno, ename, sal, percent_rank() over (partition by deptno order by sal desc) as sal_new from emp;
The following result is returned:
+------------+------------+------------+------------+ | deptno | ename | sal | sal_new | +------------+------------+------------+------------+ | 10 | JACCKA | 5000 | 0.0 | | 10 | KING | 5000 | 0.0 | | 10 | CLARK | 2450 | 0.4 | | 10 | WELAN | 2450 | 0.4 | | 10 | TEBAGE | 1300 | 0.8 | | 10 | MILLER | 1300 | 0.8 | | 20 | SCOTT | 3000 | 0.0 | | 20 | FORD | 3000 | 0.0 | | 20 | JONES | 2975 | 0.5 | | 20 | ADAMS | 1100 | 0.75 | | 20 | SMITH | 800 | 1.0 | | 30 | BLAKE | 2850 | 0.0 | | 30 | ALLEN | 1600 | 0.2 | | 30 | TURNER | 1500 | 0.4 | | 30 | MARTIN | 1250 | 0.6 | | 30 | WARD | 1250 | 0.6 | | 30 | JAMES | 950 | 1.0 | +------------+------------+------------+------------+
RANK
Syntax
bigint rank() over ([partition_clause] [orderby_clause])
Description
Returns the rank of the current row in a partition based on the order specified by orderby_clause. The rank counts from 1.
Parameters
partition_clause and orderby_clause: For more information about these parameters, see windowing_definition.
Return value
A value of the BIGINT type is returned. The return value may be duplicate and inconsecutive. The return value is the sequence number of the first row in the group to which the current row belongs. The sequence number of the first row is calculated by using the
ROW_NUMBER()
function. If orderby_clause is not specified, the values in the returned results are all 1.Examples
Group all employees based on the deptno column. In each group, sort the employees in descending order based on the sal value to obtain the ordinal numbers of the employees in their own groups. Sample statement:
select deptno, ename, sal, rank() over (partition by deptno order by sal desc) as nums from emp;
The following result is returned:
+------------+------------+------------+------------+ | deptno | ename | sal | nums | +------------+------------+------------+------------+ | 10 | JACCKA | 5000 | 1 | | 10 | KING | 5000 | 1 | | 10 | CLARK | 2450 | 3 | | 10 | WELAN | 2450 | 3 | | 10 | TEBAGE | 1300 | 5 | | 10 | MILLER | 1300 | 5 | | 20 | SCOTT | 3000 | 1 | | 20 | FORD | 3000 | 1 | | 20 | JONES | 2975 | 3 | | 20 | ADAMS | 1100 | 4 | | 20 | SMITH | 800 | 5 | | 30 | BLAKE | 2850 | 1 | | 30 | ALLEN | 1600 | 2 | | 30 | TURNER | 1500 | 3 | | 30 | MARTIN | 1250 | 4 | | 30 | WARD | 1250 | 4 | | 30 | JAMES | 950 | 6 | +------------+------------+------------+------------+
ROW_NUMBER
Syntax
row_number() over([partition_clause] [orderby_clause])
Description
Returns the sequence number of the current row in a partition. The sequence number counts from 1.
Parameters
For more information, see windowing_definition. frame_clause is not supported.
Return value
A value of the BIGINT type is returned.
Examples
Group all employees based on the deptno column. In each group, sort the employees in descending order based on the sal value to obtain the sequence numbers of the employees in their respective groups. Sample statement:
select deptno, ename, sal, row_number() over (partition by deptno order by sal desc) as nums from emp;
The following result is returned:
+------------+------------+------------+------------+ | deptno | ename | sal | nums | +------------+------------+------------+------------+ | 10 | JACCKA | 5000 | 1 | | 10 | KING | 5000 | 2 | | 10 | CLARK | 2450 | 3 | | 10 | WELAN | 2450 | 4 | | 10 | TEBAGE | 1300 | 5 | | 10 | MILLER | 1300 | 6 | | 20 | SCOTT | 3000 | 1 | | 20 | FORD | 3000 | 2 | | 20 | JONES | 2975 | 3 | | 20 | ADAMS | 1100 | 4 | | 20 | SMITH | 800 | 5 | | 30 | BLAKE | 2850 | 1 | | 30 | ALLEN | 1600 | 2 | | 30 | TURNER | 1500 | 3 | | 30 | MARTIN | 1250 | 4 | | 30 | WARD | 1250 | 5 | | 30 | JAMES | 950 | 6 | +------------+------------+------------+------------+
STDDEV
Syntax
double stddev|stddev_pop([distinct] <expr>) over ([partition_clause] [orderby_clause] [frame_clause]) decimal stddev|stddev_pop([distinct] <expr>) over ([partition_clause] [orderby_clause] [frame_clause])
Description
Returns the population standard deviation of all input values. This function is also called STDDEV_POP.
Parameters
expr: required. The expression that is used to calculate the population standard deviation. A value of the DOUBLE or DECIMAL type.
If an input value is of the STRING or BIGINT type, the value is implicitly converted into a value of the DOUBLE type before calculation. If it is of another data type, an error is returned.
If the input value is null, the row that contains the value is not used for calculation.
If the distinct keyword is specified, the population standard deviation of distinct values is calculated.
partition_clause, orderby_clause, and frame_clause: For more information about these parameters, see windowing_definition.
Return value
A value of the same data type as expr is returned. If the values of all expressions that are specified by expr are null, null is returned.
Examples
Example 1: Use the deptno column to define a window and calculate the population standard deviation of the sal column. The order by clause is not specified. This function returns the cumulative population standard deviation of the current window. The current window includes the rows that have the same deptno value. Sample statement:
select deptno, sal, stddev(sal) over (partition by deptno) from emp;
The following result is returned:
+------------+------------+------------+ | deptno | sal | _c2 | +------------+------------+------------+ | 10 | 1300 | 1546.1421524412158 | -- This row is the first row of this window. The return value is the cumulative population standard deviation of the values from the first row to the sixth row. | 10 | 2450 | 1546.1421524412158 | -- The return value is the cumulative population standard deviation of the values from the first row to the sixth row. | 10 | 5000 | 1546.1421524412158 | | 10 | 1300 | 1546.1421524412158 | | 10 | 5000 | 1546.1421524412158 | | 10 | 2450 | 1546.1421524412158 | | 20 | 3000 | 1004.7387720198718 | | 20 | 3000 | 1004.7387720198718 | | 20 | 800 | 1004.7387720198718 | | 20 | 1100 | 1004.7387720198718 | | 20 | 2975 | 1004.7387720198718 | | 30 | 1500 | 610.1001739241042 | | 30 | 950 | 610.1001739241042 | | 30 | 1600 | 610.1001739241042 | | 30 | 1250 | 610.1001739241042 | | 30 | 1250 | 610.1001739241042 | | 30 | 2850 | 610.1001739241042 | +------------+------------+------------+
Example 2: Disable the Hive-compatible data type edition and use the deptno column to define a window and calculate the population standard deviation of the sal column. The ORDER BY clause is specified. This function returns the cumulative population standard deviation of the values from the first row to the current row in the current window. The current window includes the rows that have the same deptno value. Sample statement:
-- Disable the Hive-compatible data type edition. set odps.sql.hive.compatible=false; -- Execute the following statement: select deptno, sal, stddev(sal) over (partition by deptno order by sal) from emp;
The following result is returned:
+------------+------------+------------+ | deptno | sal | _c2 | +------------+------------+------------+ | 10 | 1300 | 0.0 | -- This row is the first row of this window. | 10 | 1300 | 0.0 | -- The return value is the cumulative population standard deviation of the values in the first and second rows. | 10 | 2450 | 542.1151989096865 | -- The return value is the cumulative population standard deviation of the values from the first row to the third row. | 10 | 2450 | 575.0 | -- The return value is the cumulative population standard deviation of the values from the first row to the fourth row. | 10 | 5000 | 1351.6656391282572 | | 10 | 5000 | 1546.1421524412158 | | 20 | 800 | 0.0 | | 20 | 1100 | 150.0 | | 20 | 2975 | 962.4188277460079 | | 20 | 3000 | 1024.2947268730811 | | 20 | 3000 | 1004.7387720198718 | | 30 | 950 | 0.0 | | 30 | 1250 | 150.0 | | 30 | 1250 | 141.4213562373095 | | 30 | 1500 | 194.8557158514987 | | 30 | 1600 | 226.71568097509268 | | 30 | 2850 | 610.1001739241042 | +------------+------------+------------+
Example 3: Enable the Hive-compatible data type edition and use the deptno column to define a window and calculate the population standard deviation of the sal column. The ORDER BY clause is specified. This function returns the cumulative population standard deviation of the values from the first row to the row that has the same sal value as the current row in the current window. The population standard deviations for the rows that have the same sal value are the same. The current window includes the rows that have the same deptno value. Sample statements:
-- Enable the Hive-compatible data type edition. set odps.sql.hive.compatible=true; -- Execute the following statement: select deptno, sal, stddev(sal) over (partition by deptno order by sal) from emp;
The following result is returned:
+------------+------------+------------+ | deptno | sal | _c2 | +------------+------------+------------+ | 10 | 1300 | 0.0 | -- This row is the first row of this window. The population standard deviation for the first row is the cumulative population standard deviation of the values in the first and second rows because the two rows have the same sal value. | 10 | 1300 | 0.0 | -- The return value is the cumulative population standard deviation of the values in the first and second rows. | 10 | 2450 | 575.0 | -- The population standard deviation for the third row is the cumulative population standard deviation of the values from the first row to the fourth row because the third and fourth rows have the same sal value. | 10 | 2450 | 575.0 | -- The return value is the cumulative population standard deviation of the values from the first row to the fourth row. | 10 | 5000 | 1546.1421524412158 | | 10 | 5000 | 1546.1421524412158 | | 20 | 800 | 0.0 | | 20 | 1100 | 150.0 | | 20 | 2975 | 962.4188277460079 | | 20 | 3000 | 1004.7387720198718 | | 20 | 3000 | 1004.7387720198718 | | 30 | 950 | 0.0 | | 30 | 1250 | 141.4213562373095 | | 30 | 1250 | 141.4213562373095 | | 30 | 1500 | 194.8557158514987 | | 30 | 1600 | 226.71568097509268 | | 30 | 2850 | 610.1001739241042 | +------------+------------+------------+
STDDEV_SAMP
Syntax
double stddev_samp([distinct] <expr>) over([partition_clause] [orderby_clause] [frame_clause]) decimal stddev_samp([distinct] <expr>) over([partition_clause] [orderby_clause] [frame_clause])
Description
Returns the sample standard deviation of all the input values.
Parameters
expr: required. This parameter specifies the expression that is used to calculate the sample standard deviation. A value of the DOUBLE or DECIMAL type.
If an input value is of the STRING or BIGINT type, the value is implicitly converted into a value of the DOUBLE type before calculation. If it is of another data type, an error is returned.
If the input value is null, the row that contains the value is not used for calculation.
If the distinct keyword is specified, the sample standard deviation of distinct values is calculated.
partition_clause, orderby_clause, and frame_clause: For more information about these parameters, see windowing_definition.
Return value
A value of the same data type as expr is returned. If the values of all expressions that are specified by expr are null, null is returned. If the window has only one row of data whose expr value is not null, 0 is returned.
Examples
Example 1: Use the deptno column to define a window and calculate the sample standard deviation of the sal column. The order by clause is not specified. This function returns the cumulative sample standard deviation of the current window. The current window includes the rows that have the same deptno value. Sample statement:
select deptno, sal, stddev_samp(sal) over (partition by deptno) from emp;
The following result is returned:
+------------+------------+------------+ | deptno | sal | _c2 | +------------+------------+------------+ | 10 | 1300 | 1693.7138680032904 | -- This row is the first row of this window. The return value is the cumulative sample standard deviation of the values from the first row to the sixth row. | 10 | 2450 | 1693.7138680032904 | -- The return value is the cumulative sample standard deviation of the values from the first row to the sixth row. | 10 | 5000 | 1693.7138680032904 | -- The return value is the cumulative sample standard deviation of the values from the first row to the sixth row. | 10 | 1300 | 1693.7138680032904 | | 10 | 5000 | 1693.7138680032904 | | 10 | 2450 | 1693.7138680032904 | | 20 | 3000 | 1123.3320969330487 | | 20 | 3000 | 1123.3320969330487 | | 20 | 800 | 1123.3320969330487 | | 20 | 1100 | 1123.3320969330487 | | 20 | 2975 | 1123.3320969330487 | | 30 | 1500 | 668.331255192114 | | 30 | 950 | 668.331255192114 | | 30 | 1600 | 668.331255192114 | | 30 | 1250 | 668.331255192114 | | 30 | 1250 | 668.331255192114 | | 30 | 2850 | 668.331255192114 | +------------+------------+------------+
Example 2: Use the deptno column to define a window and calculate the sample standard deviation of the sal column. The ORDER BY clause is specified. This function returns the cumulative sample standard deviation of the values from the first row to the current row in the current window. The current window includes the rows that have the same deptno value. Sample statement:
select deptno, sal, stddev_samp(sal) over (partition by deptno order by sal) from emp;
The following result is returned:
+------------+------------+------------+ | deptno | sal | _c2 | +------------+------------+------------+ | 10 | 1300 | 0.0 | -- This row is the first row of this window. | 10 | 1300 | 0.0 | -- The return value is the cumulative sample standard deviation of the values in the first and second rows. | 10 | 2450 | 663.9528095680697 | -- The return value is the cumulative sample standard deviation of the values from the first row to the third row. | 10 | 2450 | 663.9528095680696 | | 10 | 5000 | 1511.2081259707413 | | 10 | 5000 | 1693.7138680032904 | | 20 | 800 | 0.0 | | 20 | 1100 | 212.13203435596427 | | 20 | 2975 | 1178.7175234126282 | | 20 | 3000 | 1182.7536725793752 | | 20 | 3000 | 1123.3320969330487 | | 30 | 950 | 0.0 | | 30 | 1250 | 212.13203435596427 | | 30 | 1250 | 173.20508075688772 | | 30 | 1500 | 225.0 | | 30 | 1600 | 253.4758371127315 | | 30 | 2850 | 668.331255192114 | +------------+------------+------------+
SUM
Syntax
sum([distinct] <expr>) over ([partition_clause] [orderby_clause] [frame_clause])
Description
Returns the sum of expr in a window.
Parameters
expr: required. This parameter specifies the column whose sum you want to calculate. The column is of the DOUBLE, DECIMAL, or BIGINT type.
If an input value is of the STRING type, the value is implicitly converted into a value of the DOUBLE type before calculation. If it is of another data type, an error is returned.
If the input value is null, the row that contains the value is not used for calculation.
If the distinct keyword is specified, the sum of distinct values is calculated.
partition_clause, orderby_clause, and frame_clause: For more information about these parameters, see windowing_definition.
Return value
If the input values are of the BIGINT type, a value of the BIGINT type is returned.
If the input values are of the DECIMAL type, a value of the DECIMAL type is returned.
If input values are of the DOUBLE or STRING type, a value of the DOUBLE type is returned.
If input values are null, null is returned.
Examples
Example 1: Use the deptno column to define a window and calculate the sum of the sal column. The order by clause is not specified. This function returns the cumulative sum of the current window. The current window includes the rows that have the same deptno value. Sample statement:
select deptno, sal, sum(sal) over (partition by deptno) from emp;
The following result is returned:
+------------+------------+------------+ | deptno | sal | _c2 | +------------+------------+------------+ | 10 | 1300 | 17500 | -- This row is the first row of this window. The return value is the cumulative sum of the values from the first row to the sixth row. | 10 | 2450 | 17500 | -- The return value is the cumulative sum of the values from the first row to the sixth row. | 10 | 5000 | 17500 | -- The return value is the cumulative sum of the values from the first row to the sixth row. | 10 | 1300 | 17500 | | 10 | 5000 | 17500 | | 10 | 2450 | 17500 | | 20 | 3000 | 10875 | | 20 | 3000 | 10875 | | 20 | 800 | 10875 | | 20 | 1100 | 10875 | | 20 | 2975 | 10875 | | 30 | 1500 | 9400 | | 30 | 950 | 9400 | | 30 | 1600 | 9400 | | 30 | 1250 | 9400 | | 30 | 1250 | 9400 | | 30 | 2850 | 9400 | +------------+------------+------------+
Example 2: Disable the Hive-compatible data type edition and use the deptno column to define a window and calculate the sum of the sal column. The ORDER BY clause is specified. This function returns the cumulative sum of the values from the first row to the current row in the current window. The current window includes the rows that have the same deptno value. Sample statements:
-- Disable the Hive-compatible data type edition. set odps.sql.hive.compatible=false; -- Execute the following statement: select deptno, sal, sum(sal) over (partition by deptno order by sal) from emp;
The following result is returned:
+------------+------------+------------+ | deptno | sal | _c2 | +------------+------------+------------+ | 10 | 1300 | 1300 | -- This row is the first row of this window. | 10 | 1300 | 2600 | -- The return value is the cumulative sum of the values in the first and second rows. | 10 | 2450 | 5050 | -- The return value is the cumulative sum of the values from the first row to the third row. | 10 | 2450 | 7500 | | 10 | 5000 | 12500 | | 10 | 5000 | 17500 | | 20 | 800 | 800 | | 20 | 1100 | 1900 | | 20 | 2975 | 4875 | | 20 | 3000 | 7875 | | 20 | 3000 | 10875 | | 30 | 950 | 950 | | 30 | 1250 | 2200 | | 30 | 1250 | 3450 | | 30 | 1500 | 4950 | | 30 | 1600 | 6550 | | 30 | 2850 | 9400 | +------------+------------+------------+
Example 3: Enable the Hive-compatible data type edition and use the deptno column to define a window and calculate the sum of the sal column. The ORDER BY clause is specified. This function returns the cumulative sum of the values from the first row to the row that has the same sal value as the current row in the current window. The sum values for the rows that have the same sal value are the same. The current window includes the rows that have the same deptno value. Sample statements:
-- Enable the Hive-compatible data type edition. set odps.sql.hive.compatible=true; -- Execute the following statement: select deptno, sal, sum(sal) over (partition by deptno order by sal) from emp;
The following result is returned:
+------------+------------+------------+ | deptno | sal | _c2 | +------------+------------+------------+ | 10 | 1300 | 2600 | -- This row is the first row of this window. The sum for the first row is the cumulative sum of the values in the first and second rows because the two rows have the same sal value. | 10 | 1300 | 2600 | -- The return value is the cumulative sum of the values in the first and second rows. | 10 | 2450 | 7500 | -- The sum for the third row is the cumulative sum of the values from the first row to the fourth row because the third and fourth rows have the same sal value. | 10 | 2450 | 7500 | -- The return value is the cumulative sum of the values from the first row to the fourth row. | 10 | 5000 | 17500 | | 10 | 5000 | 17500 | | 20 | 800 | 800 | | 20 | 1100 | 1900 | | 20 | 2975 | 4875 | | 20 | 3000 | 10875 | | 20 | 3000 | 10875 | | 30 | 950 | 950 | | 30 | 1250 | 3450 | | 30 | 1250 | 3450 | | 30 | 1500 | 4950 | | 30 | 1600 | 6550 | | 30 | 2850 | 9400 | +------------+------------+------------+
References
If the preceding built-in functions do not meet your business requirements, MaxCompute also provides custom functions. For more information about UDFs, see Overview.
For more information about FAQ about SQL of MaxCompute, see the following topics:
For more information about common errors and FAQ about built-in functions of MaxCompute, see the following topics: