MaxCompute provides a large number of built-in functions to meet data processing requirements in most business scenarios. This topic describes the types of built-in functions that are provided by MaxCompute. This topic also describes how to use the built-in functions.
Background information
For more information about the precautions you must take into account when you use built-in functions provided by MaxCompute, see Precautions.
The following table describes the types of built-in functions that are provided by MaxCompute.
Type | Description |
Used to process data of a date type, such as DATE, DATETIME, or TIMESTAMP. For example, you can use these functions to add and subtract date values, calculate date value differences, extract date fields, obtain the current time, and convert date formats. | |
Used to process data of a numeric type, such as BIGINT, DOUBLE, DECIMAL, or FLOAT. For example, you can use these functions to convert numeral systems, perform mathematical operations, round values, and obtain random numbers. | |
Used to process the data of columns in a window. For example, you can use these functions to calculate the sum, maximum value, minimum value, average value, and median value of column data, sort column data, obtain the data of columns at a given offset, and sample column data. | |
Used to aggregate multiple input records into an output value. For example, you can use these functions to calculate the sum, maximum value, minimum value, and average value of data, aggregate parameters, and concatenate strings. | |
Used to process data of the STRING type. For example, you can use these functions to truncate strings, replace strings, search for strings, convert uppercase and lowercase letters, and convert string formats. | |
Used to process data of the MAP, ARRAY, STRUCT, or JSON type. For example, you can use these functions to deduplicate, aggregate, sort, and merge elements. | |
Used to encrypt and decrypt data of the STRING or BINARY type in a table. | |
Used to process data in other business scenarios. |
For more information about typical cases, error codes, and FAQs of built-in functions that are provided by MaxCompute, see Fix the precision issue of the ROUND function, Implement capabilities provided by the GROUP_CONCAT function, Common errors for built-in functions, and FAQ about built-in functions.
Precautions
When you use built-in functions that are provided by MaxCompute, take note of the following items:
For a built-in function, the types and number of input parameters and function format must meet the function syntax requirements. If the function syntax requirements are not met, MaxCompute cannot parse the built-in function and an error may occur when you execute the SQL statement in which the built-in function is called.
If the input parameters of a built-in function are of a type that is supported by the MaxCompute V2.0 data type edition, you must enable the MaxCompute V2.0 data type edition. The data types supported by the MaxCompute V2.0 data type edition include TINYINT, SMALLINT, INT, FLOAT, VARCHAR, TIMESTAMP, and BINARY. If you do not enable the MaxCompute V2.0 data type edition, an error may occur when you execute the SQL statement in which the built-in function is called. You can enable the MaxCompute V2.0 data type edition at the session or project level.
Session level: Add
set odps.sql.type.system.odps2=true;
before the SQL statement in which a built-in function is called. Then, commit and execute them together. This configuration is valid for only the current SQL statement.Project level: The owner of a project can enable the MaxCompute V2.0 data type edition for the project based on the project requirements. The configuration takes effect after 10 to 15 minutes. This configuration is valid for all the subsequent SQL statements.
setproject odps.sql.type.system.odps2=true;
If you enable the MaxCompute V2.0 data type edition for a project, some implicit conversions are disabled, such as the conversions from STRING to BIGINT, STRING to DATETIME, DOUBLE to BIGINT, DECIMAL to DOUBLE, and DECIMAL to BIGINT. This may cause a loss of precision or errors. In this case, you can use the CAST function to forcefully convert the data types to resolve these issues. You can also disable the MaxCompute V2.0 data type edition.
If the name of a UDF is the same as that of a built-in function, the UDF is preferentially called. For example, if UDF CONCAT and built-in function CONCAT both exist in MaxCompute, the system automatically calls UDF CONCAT instead of the built-in function CONCAT. If you want to call the built-in function, you must add the symbol
::
before the built-in function. For example, you can useselect ::concat('ab', 'c');
.If the settings of global properties of MaxCompute projects are different, the execution results of built-in functions may be different. You can run the
setproject;
command to configure the global properties of a MaxCompute project.
For more information about the mappings between the built-in functions of MaxCompute and the built-in functions of open source systems, see Mappings between built-in functions of MaxCompute and built-in functions of Hive, MySQL, and Oracle.
Date functions
The following table describes date functions provided by MaxCompute SQL. You can select an appropriate date function based on your business requirements to complete date calculations and conversions.
Function | Description |
Changes a date value based on the time unit specified by datepart and the interval specified by delta. | |
Adds or subtracts a number of days to or from a date value based on the interval specified by delta. The DATE_ADD function is the inverse of the | |
Converts a date value into a string in a specified format. | |
Adds or subtracts a number of days to or from a date value based on the interval specified by delta. The DATE_SUB function is the inverse of the | |
Calculates the difference between two date values based on the time unit specified by datepart. | |
Returns a specified component of a date value based on the time unit specified by datepart. | |
Truncates a date value based on the time unit specified by datepart. | |
Converts a UNIX timestamp of the BIGINT type into a date value of the DATETIME type. | |
Returns the current system time as a date value. | |
Determines whether a date string can be converted into a date value in a specified format. | |
Returns the last day of the month in which a date value falls. | |
Converts a string into a date value in a specified format. | |
Converts a date value into a string in a specified format. | |
Converts a date value into a UNIX timestamp that is an integer. | |
Returns a number that represents the day of the week in which a date value falls. | |
Returns a number that represents the week of the year in which a date value falls. | |
Returns a date value that is obtained after a number of months are added to a specified date. | |
Returns the current timestamp. | |
Returns the time zone of the current system. | |
Returns the day in which a date value falls. | |
Returns the day component of a date value. | |
Returns the day of the week in which a date value falls. | |
Returns the sequence number of the day in the year. | |
Returns a specified component of a timestamp. | |
Converts a UTC timestamp into a timestamp for a specified time zone. | |
Returns the hour component of a date value. | |
Returns the last day of the month in which a date value falls. | |
Returns the minute component of a date value. | |
Returns the month in which a date value falls. | |
Returns the number of months between specified date values. | |
Returns the date of the first weekday that is later than a date value and matches the specified week. | |
Returns the quarter in which a date value falls. | |
Returns the second component of a date value. | |
Converts a date value into a UNIX timestamp that is accurate to the millisecond. | |
Returns the year in which a date value falls. |
Mathematical functions
The following table describes mathematical functions that are provided by MaxCompute SQL for you to use during development. You can select mathematical functions based on your business requirements to compute data or convert data types.
For more information about operators, such as the operator that is used to calculate remainders, see Arithmetic operators.
Function | Description |
Calculates the absolute value. | |
Calculates the arccosine. | |
Calculates the arcsine. | |
Calculates the arctangent. | |
Calculates the arctangent of expr1/expr2. | |
Rounds up a number and returns the nearest integer. | |
Converts a number from one number system to another. | |
Calculates the cosine. | |
Calculates the hyperbolic cosine. | |
Calculates the cotangent. | |
Calculates the exponential value. | |
Rounds down a number and returns the nearest integer. | |
Checks whether the value of an expression is NaN. | |
Calculates the natural logarithm. | |
Calculates the logarithm. | |
Returns the negative value of an expression. | |
Returns the value of an expression. | |
Calculates the nth power of a value. | |
Returns a random number. | |
Returns a value rounded to the specified decimal place. | |
Calculates the sine. | |
Calculates the hyperbolic sine. | |
Calculates the square root. | |
Calculates the tangent. | |
Calculates the hyperbolic tangent. | |
Truncates the input value to the specified decimal place. | |
Calculates the binary code. | |
Calculates the cube root. | |
Calculates the Pearson correlation coefficient. | |
Converts a radian value into a degree. | |
Calculates the value of e. | |
Calculates the factorial. | |
Converts a number into a string in the specified format. | |
Converts an integer or a string into a hexadecimal number. | |
Calculates the logarithm of a number with the base number of 2. | |
Calculates the logarithm of a number with the base number of 10. | |
Calculates the value of π. | |
Converts a degree into a radian value. | |
Returns the sign of the input value. | |
Shifts a value left by a specific number of places. | |
Shifts a value right by a specific number of places. | |
Shifts an unsigned value right by a specific number of places. | |
Converts a hexadecimal string into a string. | |
Returns the ID of the bucket into which the value of a specific expression falls. |
Window functions
The following table describes window functions that are provided by MaxCompute SQL for you to flexibly analyze and process data of specific columns in a window.
Function | Description |
Calculates the sequence number of a row. The row number starts from 1. | |
Calculates the rank of a row in an ordered group of rows. The ranks may not be consecutive integers. | |
Calculates the rank of a row in an ordered group of rows. The ranks are consecutive integers. | |
Calculates the percentile rank of a row in an ordered group of rows. | |
Calculates the cumulative distribution of data in a partition. | |
Splits rows of data in a partition into N groups of equal size and returns the number of the group to which the current row belongs. The group number ranges from 1 to N. | |
Obtains the calculated result of the Nth row of data that precedes the current row at a given offset in a window. | |
Obtains the calculated result of the Nth row of data that follows the current row at a given offset in a window. | |
Obtains the calculated result of the first row of data in the window to which the current row belongs. | |
Obtains the calculated result of the last row of data in the window to which the current row belongs. | |
Obtains the calculated result of the Nth row of data in a window to which the current row belongs. | |
Samples random rows of data. If true is returned, the specified row of data is sampled. | |
Calculates the number of rows in a window. | |
Calculates the minimum value in a window. | |
Calculates the maximum value in a window. | |
Calculates the average value of data in a window. | |
Calculates the sum of data in a window. | |
Calculates the median in a window. | |
Returns the population standard deviation of all input values. This function is also called STDDEV_POP. | |
Returns the sample standard deviation of all input values. |
Syntax
Syntax of window functions:
<function_name>([distinct][<expression> [, ...]]) over (<window_definition>) <function_name>([distinct][<expression> [, ...]]) over <window_name>
function_name: the name of a built-in window function, aggregate function, or user-defined aggregate function (UDAF).
expression: the format of a window function. The format is subject to the function syntax.
windowing_definition: the definition of a window. For more information about the syntax, see windowing_definition.
window_name: the name of a window. You can use the
window
keyword to configure a window and use windowing_definition to specify the name of the window. Syntax of named_window_def:window <window_name> as (<window_definition>)
Position of named_window_def in an SQL statement:
select ... from ... [where ...] [group by ...] [having ...] named_window_def [order by ...] [limit ...]
windowing_definition
Syntax
--partition_clause: [partition by <expression> [, ...]] --orderby_clause: [order by <expression> [asc|desc][nulls {first|last}] [, ...]] [<frame_clause>]
If you use a window function in a SELECT statement, data is partitioned and sorted based on
PARTITION BY
andORDER BY
in windowing_definition when the window function is executed. If the SELECT statement does not includePARTITION BY
, only one partition exists and the partition contains all data. If the SELECT statement does not includeORDER BY
, data in a partition is arranged in a random order, and a data stream is generated. After the data stream is generated, a group of rows is extracted from the data stream based onframe_clause
in windowing_definition to create a window for the current row. The window function calculates the data included in the window to which the current row belongs.partition by <expression> [, ...]: optional. This parameter specifies the partition information. If the values of partition key columns are the same for a group of rows, these rows are included in the same window. For more information about the format of PARTITION BY, see Table operations.
order by <expression> [asc|desc][nulls {first|last}] [, ...]: optional. This parameter specifies how to sort rows of data in a window.
NoteIf the values of the column that is specified in
order by
are the same, the sorting result may not be accurate. To reduce the random ordering of data, make sure that the values of the column that is specified inorder by
are unique.frame_clause: optional. This parameter is used to determine the data boundaries of a window. The frame_clause section in this topic provides details about this parameter.
frame_clause
Syntax
-- Syntax 1 {ROWS|RANGE|GROUPS} <frame_start> [<frame_exclusion>] -- Syntax 2 {ROWS|RANGE|GROUPS} between <frame_start> and <frame_end> [<frame_exclusion>]
frame_clause is a closed interval that is used to determine the data boundaries of a window. The data boundaries are determined based on the rows that are specified by frame_start and frame_end.
ROWS|RANGE|GROUPS: required. ROWS, RANGE, and GROUPS indicate the types of frame_clause. The implementation rules of frame_start and frame_end vary based on the type of frame_clause. Take note of the following points:
ROWS: The data boundaries of a window are determined based on the number of rows.
RANGE: The data boundaries of a window are determined based on the comparison results of the values of the column that is specified in
order by
. In most cases,order by
is specified in windowing_definition. Iforder by
is not specified in windowing_definition, the values of the column that is specified inorder by
are the same for all rows in a partition. NULL values are considered equivalent.GROUPS: In a partition, rows that have the same value of the column specified in
order by
form a group. Iforder by
is not specified, all rows in the partition form a group. NULL values are considered equivalent.
frame_start and frame_end: the start and end rows of a window. frame_start is required. frame_end is optional. If frame_end is not specified, the default value CURRENT ROW is used.
The row specified by frame_start must precede or be the same as the row specified by frame_end. Compared with the row specified by frame_end, the row specified by frame_start is closer to the first row in a window after all data in the window is sorted based on the column that is specified in
order by
of windowing_definition. The following table describes the valid values and logic of frame_start and frame_end when the type of frame_clause is ROWS, RANGE, or GROUPS.frame_clause type
Value of frame_start or frame_end
Description
ROWS, RANGE, and GROUPS
UNBOUNDED PRECEDING
Indicates the first row of a partition. Rows are counted from 1.
UNBOUNDED FOLLOWING
Indicates the last row of a partition.
ROWS
CURRENT ROW
Indicates the current row. Each row of data corresponds to a result calculated by a window function. The current row indicates the row whose data is calculated by using a window function.
offset PRECEDING
Indicates the Nth row that precedes the current row at a given
offset
. For example, if0 PRECEDING
indicates the current row,1 PRECEDING
indicates the previous row.offset
must be a non-negative integer.offset FOLLOWING
Indicates the Nth row that follows the current row at a given
offset
. For example, if0 FOLLOWING
indicates the current row,1 FOLLOWING
indicates the next row.offset
must be a non-negative integer.RANGE
CURRENT ROW
If frame_start is set to CURRENT ROW, it indicates the first row that has the same value of the column specified in
order by
as the current row.If frame_end is set to CURRENT ROW, it indicates the last row that has the same value of the column specified in
order by
as the current row.
offset PRECEDING
The rows that are specified by frame_start and frame_end are determined based on the sorting order that is specified by
order by
. For example, if data in a window is sorted by X, Xi indicates the X value that corresponds to the ith row, and Xc indicates the X value that corresponds to the current row. Positions of rows specified by frame_start and frame_end:order by
is set to asc:frame_start indicates the first row that meets the following requirement:
Xc - Xi ≤ offset
.frame_end indicates the last row that meets the following requirement:
Xc - Xi ≥ offset
.
order by
is set to desc:frame_start indicates the first row that meets the following requirement:
Xi - Xc ≤ offset
.frame_end indicates the last row that meets the following requirement:
Xi - Xc ≥ offset
.
The column that is specified in
order by
can be of the following data types: TINYINT, SMALLINT, INT, BIGINT, FLOAT, DOUBLE, DECIMAL, DATETIME, DATE, and TIMESTAMP.Syntax for
offset
of the DATE type:N
: indicates N days or N seconds. It must be a non-negative integer. For an offset of the DATETIME or TIMESTAMP type, it indicates N seconds. For an offset of the DATE type, it indicates N days.interval 'N' {YEAR\MONTH\DAY\HOUR\MINUTE\SECOND}
: indicates N years, months, days, hours, minutes, or seconds. For example,INTERVAL '3' YEAR
indicates 3 years.INTERVAL 'N-M' YEAR TO MONTH
: indicates N years and M months. For example,INTERVAL '1-3' YEAR TO MONTH
indicates 1 year and 3 months.INTERVAL 'D[ H[:M[:S[:N]]]]' DAY TO SECOND
: indicates D days, H hours, M minutes, S seconds, and N nanoseconds. For example,INTERVAL '1 2:3:4:5' DAY TO SECOND
indicates 1 day, 2 hours, 3 minutes, 4 seconds, and 5 nanoseconds.
offset FOLLOWING
The rows that are specified by frame_start and frame_end are determined based on the sorting order that is specified by
order by
. For example, if data in a window is sorted by X, Xi indicates the X value that corresponds to the ith row, and Xc indicates the X value that corresponds to the current row. Positions of rows specified by frame_start and frame_end:order by
is set to asc:frame_start indicates the first row that meets the following requirement:
Xi - Xc ≥ offset
.frame_end indicates the last row that meets the following requirement:
Xi - Xc ≤ offset
.
order by
is set to desc:frame_start indicates the first row that meets the following requirement:
Xc - Xi >= offset
.frame_end indicates the last row that meets the following requirement:
Xc - Xi <= offset
.
GROUPS
CURRENT ROW
If frame_start is set to CURRENT ROW, it indicates the first row of the group to which the current row belongs.
If frame_end is set to CURRENT ROW, it indicates the last row of the group to which the current row belongs.
offset PRECEDING
If frame_start is set to offset PRECEDING, it indicates the first row of the Nth group that precedes the group of the current row at a given
offset
.If frame_end is set to offset PRECEDING, it indicates the last row of the Nth group that precedes the group of the current row at a given
offset
.
NoteYou cannot set frame_start to UNBOUNDED FOLLOWING, and you cannot set frame_end to UNBOUNED PRECEDING.
offset FOLLOWING
If frame_start is set to offset FOLLOWING, it indicates the first row of the Nth group that follows the group of the current row at a given
offset
.If frame_end is set to offset FOLLOWING, it indicates the last row of the Nth group that follows the group of the current row at a given
offset
.
NoteYou cannot set frame_start to UNBOUNDED FOLLOWING, and you cannot set frame_end to UNBOUNED PRECEDING.
frame_exclusion: optional. This parameter is used to remove specific rows from a window. Valid values:
EXCLUDE NO OTHERS: No rows are excluded from the window.
EXCLUDE CURRENT ROW: The current row is excluded from the window.
EXCLUDE GROUP: indicates that an entire group of rows in a partition is excluded from the window. In the group, all rows have the same value of the column that is specified in
order by
as the current row.EXCLUDE TIES: An entire group of rows, except for the current row, are excluded from the window.
Default frame_clause
If you do not specify frame_clause, MaxCompute uses the default frame_clause to determine the data boundaries of a window. Values of the default frame_clause:
If
odps.sql.hive.compatible
is set to true, the following default frame_clause is used. This rule applies to most SQL systems.RANGE between UNBOUNDED PRECEDING and CURRENT ROW EXCLUDE NO OTHERS
If
odps.sql.hive.compatible
is set to false,order by
is specified in windowing_definition, and one of the following window functions is used, the default frame_clause in ROWS mode is used: AVG, COUNT, MAX, MIN, STDDEV, STEDEV_POP, STDDEV_SAMP, and SUM.ROWS between UNBOUNDED PRECEDING and CURRENT ROW EXCLUDE NO OTHERS
Example of data boundaries of a window
In this example, a table named tbl contains three columns that are of the BIGINT type:
pid, oid, and rid
. The tbl table contains the following data:+------------+------------+------------+ | pid | oid | rid | +------------+------------+------------+ | 1 | NULL | 1 | | 1 | NULL | 2 | | 1 | 1 | 3 | | 1 | 1 | 4 | | 1 | 2 | 5 | | 1 | 4 | 6 | | 1 | 7 | 7 | | 1 | 11 | 8 | | 2 | NULL | 9 | | 2 | NULL | 10 | +------------+------------+------------+
You can replace ellipses (
...
) in the following SQL statements with windowing_definition to display the data in the windows in which each row of data is included.NoteIf a value in the window column in the returned result is NULL, no data is contained in the window.
Windows in ROWS mode
windowing_definition 1
partition by pid order by oid ROWS between UNBOUNDED PRECEDING and CURRENT ROW -- Sample SQL statement: select pid, oid, rid, collect_list(rid) over(partition by pid order by oid ROWS between UNBOUNDED PRECEDING and CURRENT ROW) as window from tbl;
The following result is returned:
+------------+------------+------------+--------+ | pid | oid | rid | window | +------------+------------+------------+--------+ | 1 | NULL | 1 | [1] | | 1 | NULL | 2 | [1, 2] | | 1 | 1 | 3 | [1, 2, 3] | | 1 | 1 | 4 | [1, 2, 3, 4] | | 1 | 2 | 5 | [1, 2, 3, 4, 5] | | 1 | 4 | 6 | [1, 2, 3, 4, 5, 6] | | 1 | 7 | 7 | [1, 2, 3, 4, 5, 6, 7] | | 1 | 11 | 8 | [1, 2, 3, 4, 5, 6, 7, 8] | | 2 | NULL | 9 | [9] | | 2 | NULL | 10 | [9, 10] | +------------+------------+------------+--------+
windowing_definition 2
partition by pid order by oid ROWS between UNBOUNDED PRECEDING and UNBOUNDED FOLLOWING -- Sample SQL statement: select pid, oid, rid, collect_list(rid) over(partition by pid order by oid ROWS between UNBOUNDED PRECEDING and UNBOUNDED FOLLOWING) as window from tbl;
The following result is returned:
+------------+------------+------------+--------+ | pid | oid | rid | window | +------------+------------+------------+--------+ | 1 | NULL | 1 | [1, 2, 3, 4, 5, 6, 7, 8] | | 1 | NULL | 2 | [1, 2, 3, 4, 5, 6, 7, 8] | | 1 | 1 | 3 | [1, 2, 3, 4, 5, 6, 7, 8] | | 1 | 1 | 4 | [1, 2, 3, 4, 5, 6, 7, 8] | | 1 | 2 | 5 | [1, 2, 3, 4, 5, 6, 7, 8] | | 1 | 4 | 6 | [1, 2, 3, 4, 5, 6, 7, 8] | | 1 | 7 | 7 | [1, 2, 3, 4, 5, 6, 7, 8] | | 1 | 11 | 8 | [1, 2, 3, 4, 5, 6, 7, 8] | | 2 | NULL | 9 | [9, 10] | | 2 | NULL | 10 | [9, 10] | +------------+------------+------------+--------+
windowing_definition 3
partition by pid order by oid ROWS between 1 FOLLOWING and 3 FOLLOWING -- Sample SQL statement: select pid, oid, rid, collect_list(rid) over(partition by pid order by oid ROWS between 1 FOLLOWING and 3 FOLLOWING) as window from tbl;
The following result is returned:
+------------+------------+------------+--------+ | pid | oid | rid | window | +------------+------------+------------+--------+ | 1 | NULL | 1 | [2, 3, 4] | | 1 | NULL | 2 | [3, 4, 5] | | 1 | 1 | 3 | [4, 5, 6] | | 1 | 1 | 4 | [5, 6, 7] | | 1 | 2 | 5 | [6, 7, 8] | | 1 | 4 | 6 | [7, 8] | | 1 | 7 | 7 | [8] | | 1 | 11 | 8 | NULL | | 2 | NULL | 9 | [10] | | 2 | NULL | 10 | NULL | +------------+------------+------------+--------+
windowing_definition 4
partition by pid order by oid ROWS between UNBOUNDED PRECEDING and CURRENT ROW EXCLUDE CURRENT ROW -- Sample SQL statement: select pid, oid, rid, collect_list(rid) over(partition by pid order by oid ROWS between UNBOUNDED PRECEDING and CURRENT ROW EXCLUDE CURRENT ROW) as window from tbl;
The following result is returned:
+------------+------------+------------+--------+ | pid | oid | rid | window | +------------+------------+------------+--------+ | 1 | NULL | 1 | NULL | | 1 | NULL | 2 | [1] | | 1 | 1 | 3 | [1, 2] | | 1 | 1 | 4 | [1, 2, 3] | | 1 | 2 | 5 | [1, 2, 3, 4] | | 1 | 4 | 6 | [1, 2, 3, 4, 5] | | 1 | 7 | 7 | [1, 2, 3, 4, 5, 6] | | 1 | 11 | 8 | [1, 2, 3, 4, 5, 6, 7] | | 2 | NULL | 9 | NULL | | 2 | NULL | 10 | [9] | +------------+------------+------------+--------+
windowing_definition 5
partition by pid order by oid ROWS between UNBOUNDED PRECEDING and CURRENT ROW EXCLUDE GROUP -- Sample SQL statement: select pid, oid, rid, collect_list(rid) over(partition by pid order by oid ROWS between UNBOUNDED PRECEDING and CURRENT ROW EXCLUDE GROUP) as window from tbl;
The following result is returned:
+------------+------------+------------+--------+ | pid | oid | rid | window | +------------+------------+------------+--------+ | 1 | NULL | 1 | NULL | | 1 | NULL | 2 | NULL | | 1 | 1 | 3 | [1, 2] | | 1 | 1 | 4 | [1, 2] | | 1 | 2 | 5 | [1, 2, 3, 4] | | 1 | 4 | 6 | [1, 2, 3, 4, 5] | | 1 | 7 | 7 | [1, 2, 3, 4, 5, 6] | | 1 | 11 | 8 | [1, 2, 3, 4, 5, 6, 7] | | 2 | NULL | 9 | NULL | | 2 | NULL | 10 | NULL | +------------+------------+------------+--------+
windowing_definition 6
partition by pid order by oid ROWS between UNBOUNDED PRECEDING and CURRENT ROW EXCLUDE TIES -- Sample SQL statement: select pid, oid, rid, collect_list(rid) over(partition by pid order by oid ROWS between UNBOUNDED PRECEDING and CURRENT ROW EXCLUDE TIES) as window from tbl;
The following result is returned:
+------------+------------+------------+--------+ | pid | oid | rid | window | +------------+------------+------------+--------+ | 1 | NULL | 1 | [1] | | 1 | NULL | 2 | [2] | | 1 | 1 | 3 | [1, 2, 3] | | 1 | 1 | 4 | [1, 2, 4] | | 1 | 2 | 5 | [1, 2, 3, 4, 5] | | 1 | 4 | 6 | [1, 2, 3, 4, 5, 6] | | 1 | 7 | 7 | [1, 2, 3, 4, 5, 6, 7] | | 1 | 11 | 8 | [1, 2, 3, 4, 5, 6, 7, 8] | | 2 | NULL | 9 | [9] | | 2 | NULL | 10 | [10] | +------------+------------+------------+--------+
The differences between EXCLUDE CURRENT ROW and EXCLUDE GROUP can be obtained based on the comparison between the
window
column values of rows with therid
column values of 2, 4, and 10 in windowing_definition 5 and windowing_definition 6. If frame_exclusion is set to EXCLUDE GROUP, the rows that have the samepid
column value in a partition are extracted when the rows have the sameoid
column value as the current row.
Windows in RANGE mode
windowing_definition 1
partition by pid order by oid RANGE between UNBOUNDED PRECEDING and CURRENT ROW -- Sample SQL statement: select pid, oid, rid, collect_list(rid) over(partition by pid order by oid RANGE between UNBOUNDED PRECEDING and CURRENT ROW) as window from tbl;
The following result is returned:
+------------+------------+------------+--------+ | pid | oid | rid | window | +------------+------------+------------+--------+ | 1 | NULL | 1 | [1, 2] | | 1 | NULL | 2 | [1, 2] | | 1 | 1 | 3 | [1, 2, 3, 4] | | 1 | 1 | 4 | [1, 2, 3, 4] | | 1 | 2 | 5 | [1, 2, 3, 4, 5] | | 1 | 4 | 6 | [1, 2, 3, 4, 5, 6] | | 1 | 7 | 7 | [1, 2, 3, 4, 5, 6, 7] | | 1 | 11 | 8 | [1, 2, 3, 4, 5, 6, 7, 8] | | 2 | NULL | 9 | [9, 10] | | 2 | NULL | 10 | [9, 10] | +------------+------------+------------+--------+
If frame_end is set to CURRENT ROW, the last row that has the same value of the
oid
column inorder by
as the current row is obtained. Therefore, thewindow
column value of the row whoserid
column value is 1 is [1, 2].windowing_definition 2
partition by pid order by oid RANGE between CURRENT ROW and UNBOUNDED FOLLOWING -- Sample SQL statement: select pid, oid, rid, collect_list(rid) over(partition by pid order by oid RANGE between CURRENT ROW and UNBOUNDED FOLLOWING) as window from tbl;
The following result is returned:
+------------+------------+------------+--------+ | pid | oid | rid | window | +------------+------------+------------+--------+ | 1 | NULL | 1 | [1, 2, 3, 4, 5, 6, 7, 8] | | 1 | NULL | 2 | [1, 2, 3, 4, 5, 6, 7, 8] | | 1 | 1 | 3 | [3, 4, 5, 6, 7, 8] | | 1 | 1 | 4 | [3, 4, 5, 6, 7, 8] | | 1 | 2 | 5 | [5, 6, 7, 8] | | 1 | 4 | 6 | [6, 7, 8] | | 1 | 7 | 7 | [7, 8] | | 1 | 11 | 8 | [8] | | 2 | NULL | 9 | [9, 10] | | 2 | NULL | 10 | [9, 10] | +------------+------------+------------+--------+
windowing_definition 3
partition by pid order by oid RANGE between 3 PRECEDING and 1 PRECEDING -- Sample SQL statement: select pid, oid, rid, collect_list(rid) over(partition by pid order by oid RANGE between 3 PRECEDING and 1 PRECEDING) as window from tbl;
The following result is returned:
+------------+------------+------------+--------+ | pid | oid | rid | window | +------------+------------+------------+--------+ | 1 | NULL | 1 | [1, 2] | | 1 | NULL | 2 | [1, 2] | | 1 | 1 | 3 | NULL | | 1 | 1 | 4 | NULL | | 1 | 2 | 5 | [3, 4] | | 1 | 4 | 6 | [3, 4, 5] | | 1 | 7 | 7 | [6] | | 1 | 11 | 8 | NULL | | 2 | NULL | 9 | [9, 10] | | 2 | NULL | 10 | [9, 10] | +------------+------------+------------+--------+
For the row whose value of
oid
inorder by
is NULL, if frame_start is set tooffset PRECEDING
or offset FOLLOWING, the row is the first row whose value of oid inorder by
is NULL only if theoffset
is not unbounded. If frame_end is set to offset PRECEDING or offset FOLLOWING, the row is the last row whose value of oid inorder by
is NULL only if the offset is not unbounded.
Windows in GROUPS mode
windowing_definition
partition by pid order by oid GROUPS between 2 PRECEDING and CURRENT ROW -- Sample SQL statement: select pid, oid, rid, collect_list(rid) over(partition by pid order by oid GROUPS between 2 PRECEDING and CURRENT ROW) as window from tbl;
The following result is returned:
+------------+------------+------------+--------+ | pid | oid | rid | window | +------------+------------+------------+--------+ | 1 | NULL | 1 | [1, 2] | | 1 | NULL | 2 | [1, 2] | | 1 | 1 | 3 | [1, 2, 3, 4] | | 1 | 1 | 4 | [1, 2, 3, 4] | | 1 | 2 | 5 | [1, 2, 3, 4, 5] | | 1 | 4 | 6 | [3, 4, 5, 6] | | 1 | 7 | 7 | [5, 6, 7] | | 1 | 11 | 8 | [6, 7, 8] | | 2 | NULL | 9 | [9, 10] | | 2 | NULL | 10 | [9, 10] | +------------+------------+------------+--------+
Aggregate functions
The following table describes the aggregate functions that are supported by MaxCompute SQL.
Function | Description |
Returns the average value of a column. | |
Returns the number of records that match the specified criteria. | |
Returns the number of records whose expr value is True. | |
Returns the maximum value of a column. | |
Returns the minimum value of a column. | |
Returns the median value of a column. | |
Returns the population standard deviation of all input values. | |
Returns the sample standard deviation of all input values. | |
Returns the sum of a column. | |
Concatenates strings with a specified delimiter. | |
Returns a random value from a specified column. | |
Returns the approximate number of distinct input values in a specified column. | |
Returns the column value of the row that corresponds to the maximum value of a specified column. | |
Returns the column value of the row that corresponds to the minimum value of a specified column. | |
Returns the column value of the row that corresponds to the maximum value of a specified column. | |
Returns the column value of the row that corresponds to the minimum value of a specified column. | |
Aggregates values from a specified column into an array. | |
Aggregates only distinct values from a specified column into an array. | |
Returns the approximate histogram of a specified column. | |
Returns approximate percentiles. This function applies to scenarios in which a large amount of data is calculated. | |
Aggregates input values based on the bitwise OR operation. | |
Aggregates input values based on the bitwise AND operation. | |
Returns a map that is created by using a and b. a is the key in the map. b is the value of the key in the map. | |
Returns a map that is created by using a and b. a is the key in the map. b is used to create an array, which is used as the value of the key in the map. | |
Returns a new map that is the union of all input maps. | |
Returns a new map that is the union of all input maps. The output map sums the values of the matching keys in all input maps. | |
Returns a map that contains the number of times each input value appears. |
Syntax
Syntax of an aggregate function:
<aggregate_name>(<expression>[,..]) [within group (order by <col1>[,<col2>…])] [filter (where <where_condition>)]
<aggregate_name>(<expression>[,..])
: a built-in aggregate function or a UDAF. The format of an aggregate function is based on its syntax.within group (order by <col1>[,<col2>…])
: If the syntax of an aggregate function includes this expression, the system automatically sorts the input data of<col1>[,<col2>…]
in ascending order. To sort the input data in descending order, use the expressionwithin group (order by <col1>[,<col2>…] [desc])
.Before you use this expression, take note of the following points:
You can use this expression only for WM_CONCAT, COLLECT_LIST, COLLECT_SET, and UDAFs.
If multiple aggregate functions of a SELECT statement include the expression
within group (order by <col1>[,<col2>…])
,order by <col1>[,<col2>…]
must be the same for these functions.If the parameters of an aggregate function include the DISTINCT keyword, columns with distinct values must be specified in the expression
order by <col1>[,<col2>…]
.
Examples:
-- Example 1: Sort the input data in ascending order and return the output data. select x, wm_concat(',', y) within group (order by y) from values('k', 1),('k', 3),('k', 2) as t(x, y) group by x; -- The following result is returned: +------------+------------+ | x | _c1 | +------------+------------+ | k | 1,2,3 | +------------+------------+ -- Example 2: Sort the input data in descending order and return the output data. select x, wm_concat(',', y) within group (order by y desc) from values('k', 1),('k', 3),('k', 2) as t(x, y) group by x; -- The following result is returned: +------------+------------+ | x | _c1 | +------------+------------+ | k | 3,2,1 | +------------+------------+
[filter (where <where_condition>)]
: If an aggregate function includes this expression, the aggregate function processes only the data that meets the condition specified by<where_condition>
. For more information about<where_condition>
, see WHERE clause (where_condition).Before you use this expression, take note of the following points:
Only built-in aggregate functions support this expression. UDAFs do not support this expression.
count(*)
does not support the[filter (where <where_condition>)]
expression. To add filter conditions tocount(*)
, you can use the COUNT_IF function.The COUNT_IF function does not support this expression.
Examples:
-- Example 1: Filter and aggregate data. select sum(x), sum(x) filter (where y > 1), sum(x) filter (where y > 2) from values(null, 1),(1, 2),(2, 3),(3, null) as t(x, y); -- The following result is returned: +------------+------------+------------+ | _c0 | _c1 | _c2 | +------------+------------+------------+ | 6 | 3 | 2 | +------------+------------+------------+ -- Example 2: Use multiple aggregate functions to filter and aggregate data. select count_if(x > 2), sum(x) filter (where y > 1), sum(x) filter (where y > 2) from values(null, 1),(1, 2),(2, 3),(3, null) as t(x, y); -- The following result is returned: +------------+------------+------------+ | _c0 | _c1 | _c2 | +------------+------------+------------+ | 1 | 3 | 2 | +------------+------------+------------+
Filter expressions
Limits
Only built-in aggregate functions of MaxCompute support filter expressions. UDAFs do not support filter expressions.
COUNT(*)
cannot be used with filter expressions. Use the COUNT_IF function with filter expressions.
Syntax
<aggregate_name>(<expression>[,...]) [filter (where <where_condition>)]
Description
All aggregate functions support filter expressions. If you specify a filter condition, only the row data that meets the filter condition can be passed to the related aggregate function for data processing.
Parameters
aggregate_name: required. The name of the aggregate function. Select an aggregate function that is described in this topic based on your business requirements.
expression: required. The parameters of the aggregate function that you select. Specify this parameter based on the description of the aggregate function that you select.
where_condition: optional. The filter condition. For more information about where_condition, see WHERE clause (where_condition).
Return value
For more information, see the description of the return value for each aggregate function.
Sample statement:
select sum(sal) filter (where deptno=10), sum(sal) filter (where deptno=20), sum(sal) filter (where deptno=30) from emp;
The following result is returned:
+------------+------------+------------+ | _c0 | _c1 | _c2 | +------------+------------+------------+ | 17500 | 10875 | 9400 | +------------+------------+------------+
String functions
The following table describes the string functions that are supported by MaxCompute SQL. For more information about the limits on string functions, see Limits on string functions.
Function | Description |
Returns the ASCII code of the first character in a specified string. | |
Calculates the number of characters of String A that appear in String B. | |
Converts an ASCII code into a character. | |
Concatenates all the specified strings and returns the final string. | |
Concatenates all input strings in an array by using a specified delimiter. | |
Decodes a string in the specified encoding format. | |
Encodes a string in the specified encoding format. | |
Returns the position of the specified string among multiple strings that are separated by commas (,). | |
Converts a number into a string in the specified format. | |
Returns data of the ARRAY, MAP, or STRUCT type based on a given JSON string and a given output format. | |
Extracts a single string from a standard JSON string by using a specific method. | |
Converts a string into a string with words in title caps and separated by spaces. In title caps, the first letter of each word is capitalized, and the other letters are in lowercase. | |
Returns the position of String A in String B. | |
Determines whether a string can be converted from one character set to another character set. | |
Splits a string into key-value pairs, separates the key-value pairs, and then returns the value that corresponds to a specified key. | |
Splits a string into key-value pairs, separates the key-value pairs, and then returns the values that correspond to the keys. | |
Calculates the length of a string. | |
Calculates the length of a string in bytes. | |
Returns the position of a specified string in another string. | |
Removes the characters from the left side of a string. | |
Returns the MD5 value of a string. | |
Parses a URL and returns the specified part of the URL. | |
Parses a URL and returns multiple parts of the URL. | |
Returns the number of substrings that match a specified pattern from a specified position. | |
Splits a string into groups based on a specified pattern and returns the string in a specified group. | |
Returns the start or end position of a substring that starts at a specified position and matches a specified pattern for a specified number of times. | |
Uses a string to replace a substring of another string if the substring matches a specified pattern for a specified number of times. | |
Returns a substring in a string that matches a specified pattern for a specified number of times from a specified position. | |
Returns a string that repeats a specified string for a specified number of times. | |
Returns a string in reverse order. | |
Removes the characters from the right side of a string. | |
Generates a space string. | |
Uses a delimiter to split a string into substrings and returns a substring of the specified part of the string. | |
Returns a substring that has a specified length from a specified position of a string. The string is of the STRING type. | |
Returns a substring that has a specified length from a specified position of a string. The string is of the STRING or BINARY type. | |
Converts data of the BOOLEAN, BIGINT, DECIMAL, or DOUBLE type into the STRING type. | |
Converts data of a complex data type into a JSON string. | |
Converts uppercase letters in a string into lowercase letters. | |
Converts lowercase letters in a string into uppercase letters. | |
Removes the characters from both of the left and right sides of a string. | |
Converts an input string that is in the | |
Encodes the input string in the | |
Extracts strings from a standard JSON string based on a set of input keys. | |
Left pads a string to a specified length. | |
Right pads a string to a specified length. | |
Replaces a substring in String A that matches String B with another substring. | |
Converts a string into a string of the SOUNDEX type. | |
Truncates a string from a specified delimiter. | |
Replaces the part of String A that appears in String B with String C. | |
Finds all substrings that match the pattern of a regular expression in a string and returns the substrings as an array. |
Complex type functions
The following table describes the complex type functions that are supported by MaxCompute SQL. For more information about the limits on JSON functions, see Limits on JSON functions.
Function type | Function | Description |
ARRAY functions | Checks whether all elements in an array meet a specific condition. | |
Checks whether an element in an array meets a specific condition. | ||
Creates an array based on given values. | ||
Checks whether an array contains a given value. | ||
Removes duplicate elements from an array. | ||
Finds the elements that exist in Array A but do not exist in Array B and returns the elements as a new array without duplicates. | ||
Calculates the intersection of two arrays. | ||
Concatenates the elements in an array by using a delimiter. | ||
Returns the largest element in an array. | ||
Returns the smallest element in an array. | ||
Returns the position of the first occurrence of a given element in an array. | ||
Aggregates the elements in an array. | ||
Removes a given element from an array. | ||
Returns a new array in which a given element is repeated for a specified number of times. | ||
Sorts the elements in an array based on a comparator. | ||
Calculates the union of two arrays and returns the union as a new array without duplicates. | ||
Checks whether two arrays contain the same element. | ||
Merges multiple arrays. | ||
Concatenates multiple arrays or strings. | ||
Transposes one row of data into multiple rows. This function is a user-defined table-valued function (UDTF). | ||
Filters the elements in an array. | ||
Returns the element at a specific position in an array. | ||
Converts an array into a table that has two columns. The first column lists the position of each element in the array, starting from 0. The second column lists the elements. | ||
Returns the number of elements in an array. | ||
Copies the elements in an array from a specific position based on a specific length and returns the elements as a new array. | ||
Sorts the elements in an array. | ||
Transforms the elements in an array. | ||
Merges two arrays at the element level based on element positions and returns a new array. | ||
MAP functions | Transposes one row of data into multiple rows. This function is a UDTF. | |
Returns the value that meets a specific condition in a map. | ||
Creates a map based on given key-value pairs. | ||
Returns the union of multiple maps. | ||
Converts key-value pairs in a map into a struct array. | ||
Filters the elements in a map. | ||
Creates a map based on given arrays. | ||
Creates a map based on given struct arrays. | ||
Returns all keys in a map as an array. | ||
Returns all values in a map as an array. | ||
Merges two given maps into a single map. | ||
Returns the number of key-value pairs in a map. | ||
Transforms the keys in a map by using a given function. The values in the map are not changed. | ||
Transforms the values in a map by using a given function. The keys in the map are not changed. | ||
STRUCT functions | Obtains the value of a member variable in a struct. | |
Expands a given struct array. Each array element corresponds to a row and each struct element corresponds to a column in each row. | ||
Creates a struct based on a given value list. | ||
Creates a struct based on given name-value pairs. | ||
JSON functions | Returns data of the ARRAY, MAP, or STRUCT type based on a given JSON string and a given output format. | |
Extracts a single string from a standard JSON string by using a specific method. | ||
Extracts strings from a standard JSON string based on a set of input keys. | ||
Converts data of a complex data type into a JSON string. | ||
Returns a JSON object that contains key-value pairs. | ||
Evaluates a possibly empty list of values and returns a JSON array that contains these values. | ||
Parses the value of json_path in a JSON expression. Note that an error is returned if the value of json_path is invalid. | ||
Determines whether the JSON value of json_path exists. | ||
Returns a JSON value in a format that is easy to read by adding line breaks and spaces. | ||
Returns the data type of a JSON value. | ||
Converts a value of the JSON data type into a value of the STRING data type. By default, JSON data is not automatically prettified. | ||
Converts a value of the STRING data type into a value of the JSON data type. If a non-JSON-formatted value is converted into a value of the STRING data type, an error is returned. | ||
Determines whether a string is in a valid JSON format. | ||
Supports conversion between basic data types and JSON data types. |
Encryption and decryption functions
The following table describes the encryption function and decryption function that are supported by MaxCompute SQL.
Function | Description |
Encrypts data in specified columns of a table by using a random key and returns the ciphertext of the BINARY type. | |
Decrypts the encrypted data in the specified columns and returns the plaintext of the BINARY type. |
Other functions
The following table describes other types of functions that are supported by MaxCompute SQL.
Function | Description |
Converts a binary value into a Base64-encoded string. | |
Returns the values that fall in or fall out of the specified range. | |
Returns values based on the computing result of an expression. | |
Converts the result of an expression into the specified data type. | |
Returns the first non-null value in the parameter list. | |
Uses the GZIP algorithm to compress input parameters of the STRING or BINARY type. | |
Calculates the cyclic redundancy check (CRC) value of a value that is of the STRING or BINARY type. | |
Implements the | |
Uses the GZIP algorithm to decompress input parameters of the BINARY type. | |
Returns an age in years based on the ID card number. | |
Returns the date of birth based on the ID card number. | |
Returns the gender based on the ID card number. | |
Obtains the ID of the current account. | |
Returns the maximum value of the input parameters. | |
Calculates a hash value based on the input parameters. | |
Checks whether a specified condition is true. | |
Returns the minimum value of the input parameters. | |
Returns the name of the largest level-1 partition in a partitioned table. | |
Checks whether the values of two input parameters are the same. | |
Specifies the return values of the parameters whose values are null. | |
Sorts the values of the input variables in ascending order and returns the value that is ranked at a specified position. | |
Checks whether a specified partition exists in a table. | |
Samples all column values that are read and filters out the rows that do not meet sampling conditions. | |
Calculates the SHA-1 hash value of a value that is of the STRING or BINARY type. | |
Calculates the SHA-1 hash value of a value that is of the STRING or BINARY type. | |
Calculates the SHA-2 hash value of a value that is of the STRING or BINARY type. | |
Determines the sign of a value. The sign indicates whether a value is positive or negative. | |
Splits a string with a specified delimiter and returns an array. | |
Splits a specified parameter group into a specified number of rows. | |
Splits a string with a specified delimiter and returns a key-value pair. | |
Checks whether a specified table exists. | |
Transposes one row of data into multiple rows. This function is a UDTF that transposes an array separated by fixed delimiters in a column into multiple rows. | |
Transposes one row of data into multiple rows. This function is a UDTF that transposes columns into rows. | |
Converts a Base64-encoded string into a binary value. | |
Returns a unique ID. This function is more efficient than the UUID function. | |
Returns a random ID. |
Limits on JSON functions
The development tools that are supported for the JSON type include odpscmd and MaxCompute Studio. External ecosystems such as DataWorks and Dataphin are not supported. If you need to use the JSON type together with an external ecosystem, you must check whether the external ecosystem is supported for the JSON type. The following table describes the items that you must take note of when you use odpscmd and MaxCompute Studio as development tools for the JSON type.
odpscmd
MaxCompute Studio
You must upgrade the client to the latest version. Otherwise, the
desc json_table
command cannot be used.You must set the
se_instance_tunnel
parameter in the conf\odps_config.ini file in the installation path of the client to false. Otherwise, an error is reported when you perform a query.
MaxCompute Studio allows you only to query JSON data. It does not allow you to upload or download JSON data.
If you want to use another type of engine, such as Hologres, to read data from a MaxCompute table, JSON data in the table cannot be read.
Columns of the JSON type cannot be added to a MaxCompute table.
You are not allowed to compare data of the JSON type with data of other types, execute SQL statements that contain the
ORDER BY
orGROUP BY
clause on data of the JSON type, or use columns of the JSON type asjoin
keys.The integer part and decimal part of a value of the JSON NUMBER type are stored by using the BIGINT type and the DOUBLE type separately. If the integer part exceeds the range that is supported by the BIGINT type, an integer overflow error occurs. Conversion of the decimal part to the DOUBLE type leads to precision loss.
Strings that can be used to generate JSON data do not include
\u0000
, which is the Unicode character for NULL.Java UDFs and Python UDFs do not support the JSON type.
Limits on string functions
The following functions support only English characters:
TRIM, RTRIM, and LTRIM: The value of the trimChars parameter can contain only English characters.
REVERSE: This function supports only English characters in the Hive-compatible data type edition.
SOUNDEX: This function supports only English characters.
TOLOWER: This function is used to convert English characters in a string into lowercase characters.
TOUPPER: This function is used to convert English characters in a string into uppercase characters.
INITCAP: This function is used to convert the first letter of each word in English in a string into an uppercase letter and the other letters of each word into lowercase characters.