Overview - MaxCompute - Alibaba Cloud Documentation Center

MaxCompute provides a large number of built-in functions to meet data processing requirements in most business scenarios. This topic describes the types of built-in functions that are provided by MaxCompute. This topic also describes how to use the built-in functions.

Background information

For more information about the precautions you must take into account when you use built-in functions provided by MaxCompute, see Precautions.

The following table describes the types of built-in functions that are provided by MaxCompute.

Type	Description
Date functions	Used to process data of a date type, such as DATE, DATETIME, or TIMESTAMP. For example, you can use these functions to add and subtract date values, calculate date value differences, extract date fields, obtain the current time, and convert date formats.
Mathematical functions	Used to process data of a numeric type, such as BIGINT, DOUBLE, DECIMAL, or FLOAT. For example, you can use these functions to convert numeral systems, perform mathematical operations, round values, and obtain random numbers.
Window functions	Used to process the data of columns in a window. For example, you can use these functions to calculate the sum, maximum value, minimum value, average value, and median value of column data, sort column data, obtain the data of columns at a given offset, and sample column data.
Aggregate functions	Used to aggregate multiple input records into an output value. For example, you can use these functions to calculate the sum, maximum value, minimum value, and average value of data, aggregate parameters, and concatenate strings.
String functions	Used to process data of the STRING type. For example, you can use these functions to truncate strings, replace strings, search for strings, convert uppercase and lowercase letters, and convert string formats.
Complex type functions	Used to process data of the MAP, ARRAY, STRUCT, or JSON type. For example, you can use these functions to deduplicate, aggregate, sort, and merge elements.
Encryption and decryption functions	Used to encrypt and decrypt data of the STRING or BINARY type in a table.
Other functions	Used to process data in other business scenarios.

For more information about typical cases, error codes, and FAQs of built-in functions that are provided by MaxCompute, see Fix the precision issue of the ROUND function, Implement capabilities provided by the GROUP_CONCAT function, Common errors for built-in functions, and FAQ about built-in functions.

Precautions

When you use built-in functions that are provided by MaxCompute, take note of the following items:

For a built-in function, the types and number of input parameters and function format must meet the function syntax requirements. If the function syntax requirements are not met, MaxCompute cannot parse the built-in function and an error may occur when you execute the SQL statement in which the built-in function is called.
If the input parameters of a built-in function are of a type that is supported by the MaxCompute V2.0 data type edition, you must enable the MaxCompute V2.0 data type edition. The data types supported by the MaxCompute V2.0 data type edition include TINYINT, SMALLINT, INT, FLOAT, VARCHAR, TIMESTAMP, and BINARY. If you do not enable the MaxCompute V2.0 data type edition, an error may occur when you execute the SQL statement in which the built-in function is called. You can enable the MaxCompute V2.0 data type edition at the session or project level.
- Session level: Add set odps.sql.type.system.odps2=true; before the SQL statement in which a built-in function is called. Then, commit and execute them together. This configuration is valid for only the current SQL statement.
- Project level: The owner of a project can enable the MaxCompute V2.0 data type edition for the project based on the project requirements. The configuration takes effect after 10 to 15 minutes. This configuration is valid for all the subsequent SQL statements.
```
setproject odps.sql.type.system.odps2=true;
```
If you enable the MaxCompute V2.0 data type edition for a project, some implicit conversions are disabled, such as the conversions from STRING to BIGINT, STRING to DATETIME, DOUBLE to BIGINT, DECIMAL to DOUBLE, and DECIMAL to BIGINT. This may cause a loss of precision or errors. In this case, you can use the CAST function to forcefully convert the data types to resolve these issues. You can also disable the MaxCompute V2.0 data type edition.
If the name of a UDF is the same as that of a built-in function, the UDF is preferentially called. For example, if UDF CONCAT and built-in function CONCAT both exist in MaxCompute, the system automatically calls UDF CONCAT instead of the built-in function CONCAT. If you want to call the built-in function, you must add the symbol :: before the built-in function. For example, you can use select ::concat('ab', 'c');.
If the settings of global properties of MaxCompute projects are different, the execution results of built-in functions may be different. You can run the setproject; command to configure the global properties of a MaxCompute project.

For more information about the mappings between the built-in functions of MaxCompute and the built-in functions of open source systems, see Mappings between built-in functions of MaxCompute and built-in functions of Hive, MySQL, and Oracle.

Date functions

The following table describes date functions provided by MaxCompute SQL. You can select an appropriate date function based on your business requirements to complete date calculations and conversions.

Function	Description
DATEADD	Changes a date value based on the time unit specified by datepart and the interval specified by delta.
DATE_ADD	Adds or subtracts a number of days to or from a date value based on the interval specified by delta. The DATE_ADD function is the inverse of the `DATE_SUB` function.
DATE_FORMAT	Converts a date value into a string in a specified format.
DATE_SUB	Adds or subtracts a number of days to or from a date value based on the interval specified by delta. The DATE_SUB function is the inverse of the `DATE_ADD` function.
DATEDIFF	Calculates the difference between two date values based on the time unit specified by datepart.
DATEPART	Returns a specified component of a date value based on the time unit specified by datepart.
DATETRUNC	Truncates a date value based on the time unit specified by datepart.
FROM_UNIXTIME	Converts a UNIX timestamp of the BIGINT type into a date value of the DATETIME type.
GETDATE	Returns the current system time as a date value.
ISDATE	Determines whether a date string can be converted into a date value in a specified format.
LASTDAY	Returns the last day of the month in which a date value falls.
TO_DATE	Converts a string into a date value in a specified format.
TO_CHAR	Converts a date value into a string in a specified format.
UNIX_TIMESTAMP	Converts a date value into a UNIX timestamp that is an integer.
WEEKDAY	Returns a number that represents the day of the week in which a date value falls.
WEEKOFYEAR	Returns a number that represents the week of the year in which a date value falls.
ADD_MONTHS	Returns a date value that is obtained after a number of months are added to a specified date.
CURRENT_TIMESTAMP	Returns the current timestamp.
CURRENT_TIMEZONE	Returns the time zone of the current system.
DAY	Returns the day in which a date value falls.
DAYOFMONTH	Returns the day component of a date value.
DAYOFWEEK	Returns the day of the week in which a date value falls.
DAYOFYEAR	Returns the sequence number of the day in the year.
EXTRACT	Returns a specified component of a timestamp.
FROM_UTC_TIMESTAMP	Converts a UTC timestamp into a timestamp for a specified time zone.
HOUR	Returns the hour component of a date value.
LAST_DAY	Returns the last day of the month in which a date value falls.
MINUTE	Returns the minute component of a date value.
MONTH	Returns the month in which a date value falls.
MONTHS_BETWEEN	Returns the number of months between specified date values.
NEXT_DAY	Returns the date of the first weekday that is later than a date value and matches the specified week.
QUARTER	Returns the quarter in which a date value falls.
SECOND	Returns the second component of a date value.
TO_MILLIS	Converts a date value into a UNIX timestamp that is accurate to the millisecond.
YEAR	Returns the year in which a date value falls.

Mathematical functions

The following table describes mathematical functions that are provided by MaxCompute SQL for you to use during development. You can select mathematical functions based on your business requirements to compute data or convert data types.

Note

For more information about operators, such as the operator that is used to calculate remainders, see Arithmetic operators.

Function	Description
ABS	Calculates the absolute value.
ACOS	Calculates the arccosine.
ASIN	Calculates the arcsine.
ATAN	Calculates the arctangent.
ATAN2	Calculates the arctangent of expr1/expr2.
CEIL	Rounds up a number and returns the nearest integer.
CONV	Converts a number from one number system to another.
COS	Calculates the cosine.
COSH	Calculates the hyperbolic cosine.
COT	Calculates the cotangent.
EXP	Calculates the exponential value.
FLOOR	Rounds down a number and returns the nearest integer.
ISNAN	Checks whether the value of an expression is NaN.
LN	Calculates the natural logarithm.
LOG	Calculates the logarithm.
NEGATIVE	Returns the negative value of an expression.
POSITIVE	Returns the value of an expression.
POW	Calculates the nth power of a value.
RAND	Returns a random number.
ROUND	Returns a value rounded to the specified decimal place.
SIN	Calculates the sine.
SINH	Calculates the hyperbolic sine.
SQRT	Calculates the square root.
TAN	Calculates the tangent.
TANH	Calculates the hyperbolic tangent.
TRUNC	Truncates the input value to the specified decimal place.
BIN	Calculates the binary code.
CBRT	Calculates the cube root.
CORR	Calculates the Pearson correlation coefficient.
DEGREES	Converts a radian value into a degree.
E	Calculates the value of e.
FACTORIAL	Calculates the factorial.
FORMAT_NUMBER	Converts a number into a string in the specified format.
HEX	Converts an integer or a string into a hexadecimal number.
LOG2	Calculates the logarithm of a number with the base number of 2.
LOG10	Calculates the logarithm of a number with the base number of 10.
PI	Calculates the value of π.
RADIANS	Converts a degree into a radian value.
SIGN	Returns the sign of the input value.
SHIFTLEFT	Shifts a value left by a specific number of places.
SHIFTRIGHT	Shifts a value right by a specific number of places.
SHIFTRIGHTUNSIGNED	Shifts an unsigned value right by a specific number of places.
UNHEX	Converts a hexadecimal string into a string.
WIDTH_BUCKET	Returns the ID of the bucket into which the value of a specific expression falls.

Window functions

The following table describes window functions that are provided by MaxCompute SQL for you to flexibly analyze and process data of specific columns in a window.

Function	Description
ROW_NUMBER	Calculates the sequence number of a row. The row number starts from 1.
RANK	Calculates the rank of a row in an ordered group of rows. The ranks may not be consecutive integers.
DENSE_RANK	Calculates the rank of a row in an ordered group of rows. The ranks are consecutive integers.
PERCENT_RANK	Calculates the percentile rank of a row in an ordered group of rows.
CUME_DIST	Calculates the cumulative distribution of data in a partition.
NTILE	Splits rows of data in a partition into N groups of equal size and returns the number of the group to which the current row belongs. The group number ranges from 1 to N.
LAG	Obtains the calculated result of the Nth row of data that precedes the current row at a given offset in a window.
LEAD	Obtains the calculated result of the Nth row of data that follows the current row at a given offset in a window.
FIRST_VALUE	Obtains the calculated result of the first row of data in the window to which the current row belongs.
LAST_VALUE	Obtains the calculated result of the last row of data in the window to which the current row belongs.
NTH_VALUE	Obtains the calculated result of the Nth row of data in a window to which the current row belongs.
CLUSTER_SAMPLE	Samples random rows of data. If true is returned, the specified row of data is sampled.
COUNT	Calculates the number of rows in a window.
MIN	Calculates the minimum value in a window.
MAX	Calculates the maximum value in a window.
AVG	Calculates the average value of data in a window.
SUM	Calculates the sum of data in a window.
MEDIAN	Calculates the median in a window.
STDDEV	Returns the population standard deviation of all input values. This function is also called STDDEV_POP.
STDDEV_SAMP	Returns the sample standard deviation of all input values.

Syntax
Syntax of window functions:
```
<function_name>([distinct][<expression> [, ...]]) over (<window_definition>)
<function_name>([distinct][<expression> [, ...]]) over <window_name>
```
- function_name: the name of a built-in window function, aggregate function, or user-defined aggregate function (UDAF).
- expression: the format of a window function. The format is subject to the function syntax.
- windowing_definition: the definition of a window. For more information about the syntax, see windowing_definition.
- window_name: the name of a window. You can use the window keyword to configure a window and use windowing_definition to specify the name of the window. Syntax of named_window_def:
```
window <window_name> as (<window_definition>)
```
  Position of named_window_def in an SQL statement:
```
select ... from ... [where ...] [group by ...] [having ...] named_window_def [order by ...] [limit ...]
```
windowing_definition
Syntax
```
--partition_clause:
[partition by <expression> [, ...]]
--orderby_clause:
[order by <expression> [asc|desc][nulls {first|last}] [, ...]]
[<frame_clause>]
```
If you use a window function in a SELECT statement, data is partitioned and sorted based on PARTITION BY and ORDER BY in windowing_definition when the window function is executed. If the SELECT statement does not include PARTITION BY, only one partition exists and the partition contains all data. If the SELECT statement does not include ORDER BY, data in a partition is arranged in a random order, and a data stream is generated. After the data stream is generated, a group of rows is extracted from the data stream based on frame_clause in windowing_definition to create a window for the current row. The window function calculates the data included in the window to which the current row belongs.
- partition by <expression> [, ...]: optional. This parameter specifies the partition information. If the values of partition key columns are the same for a group of rows, these rows are included in the same window. For more information about the format of PARTITION BY, see Table operations.
- order by <expression> [asc|desc][nulls {first|last}] [, ...]: optional. This parameter specifies how to sort rows of data in a window.
  Note
  If the values of the column that is specified in order by are the same, the sorting result may not be accurate. To reduce the random ordering of data, make sure that the values of the column that is specified in order by are unique.
- frame_clause: optional. This parameter is used to determine the data boundaries of a window. The frame_clause section in this topic provides details about this parameter.

frame_clause

Syntax

-- Syntax 1 
{ROWS|RANGE|GROUPS} <frame_start> [<frame_exclusion>]
-- Syntax 2 
{ROWS|RANGE|GROUPS} between <frame_start> and <frame_end> [<frame_exclusion>]

frame_clause is a closed interval that is used to determine the data boundaries of a window. The data boundaries are determined based on the rows that are specified by frame_start and frame_end.

ROWS|RANGE|GROUPS: required. ROWS, RANGE, and GROUPS indicate the types of frame_clause. The implementation rules of frame_start and frame_end vary based on the type of frame_clause. Take note of the following points:
- ROWS: The data boundaries of a window are determined based on the number of rows.
- RANGE: The data boundaries of a window are determined based on the comparison results of the values of the column that is specified in order by. In most cases, order by is specified in windowing_definition. If order by is not specified in windowing_definition, the values of the column that is specified in order by are the same for all rows in a partition. NULL values are considered equivalent.
- GROUPS: In a partition, rows that have the same value of the column specified in order by form a group. If order by is not specified, all rows in the partition form a group. NULL values are considered equivalent.

frame_start and frame_end: the start and end rows of a window. frame_start is required. frame_end is optional. If frame_end is not specified, the default value CURRENT ROW is used.

The row specified by frame_start must precede or be the same as the row specified by frame_end. Compared with the row specified by frame_end, the row specified by frame_start is closer to the first row in a window after all data in the window is sorted based on the column that is specified in order by of windowing_definition. The following table describes the valid values and logic of frame_start and frame_end when the type of frame_clause is ROWS, RANGE, or GROUPS.

frame_clause type	Value of frame_start or frame_end	Description
ROWS, RANGE, and GROUPS	UNBOUNDED PRECEDING	Indicates the first row of a partition. Rows are counted from 1.
ROWS, RANGE, and GROUPS	UNBOUNDED FOLLOWING	Indicates the last row of a partition.
ROWS	CURRENT ROW	Indicates the current row. Each row of data corresponds to a result calculated by a window function. The current row indicates the row whose data is calculated by using a window function.
	offset PRECEDING	Indicates the Nth row that precedes the current row at a given `offset`. For example, if `0 PRECEDING` indicates the current row, `1 PRECEDING` indicates the previous row. `offset` must be a non-negative integer.
	offset FOLLOWING	Indicates the Nth row that follows the current row at a given `offset`. For example, if `0 FOLLOWING` indicates the current row, `1 FOLLOWING` indicates the next row. `offset` must be a non-negative integer.
RANGE	CURRENT ROW	If frame_start is set to CURRENT ROW, it indicates the first row that has the same value of the column specified in `order by` as the current row. If frame_end is set to CURRENT ROW, it indicates the last row that has the same value of the column specified in `order by` as the current row.
	offset PRECEDING	The rows that are specified by frame_start and frame_end are determined based on the sorting order that is specified by `order by`. For example, if data in a window is sorted by X, Xi indicates the X value that corresponds to the ith row, and Xc indicates the X value that corresponds to the current row. Positions of rows specified by frame_start and frame_end: `order by` is set to asc: frame_start indicates the first row that meets the following requirement: `Xc - Xi ≤ offset`. frame_end indicates the last row that meets the following requirement: `Xc - Xi ≥ offset`. `order by` is set to desc: frame_start indicates the first row that meets the following requirement: `Xi - Xc ≤ offset`. frame_end indicates the last row that meets the following requirement: `Xi - Xc ≥ offset`. The column that is specified in `order by` can be of the following data types: TINYINT, SMALLINT, INT, BIGINT, FLOAT, DOUBLE, DECIMAL, DATETIME, DATE, and TIMESTAMP. Syntax for `offset` of the DATE type: `N`: indicates N days or N seconds. It must be a non-negative integer. For an offset of the DATETIME or TIMESTAMP type, it indicates N seconds. For an offset of the DATE type, it indicates N days. `interval 'N' {YEAR\MONTH\DAY\HOUR\MINUTE\SECOND}`: indicates N years, months, days, hours, minutes, or seconds. For example, `INTERVAL '3' YEAR` indicates 3 years. `INTERVAL 'N-M' YEAR TO MONTH`: indicates N years and M months. For example, `INTERVAL '1-3' YEAR TO MONTH` indicates 1 year and 3 months. `INTERVAL 'D[ H[:M[:S[:N]]]]' DAY TO SECOND`: indicates D days, H hours, M minutes, S seconds, and N nanoseconds. For example, `INTERVAL '1 2:3:4:5' DAY TO SECOND` indicates 1 day, 2 hours, 3 minutes, 4 seconds, and 5 nanoseconds.
	offset FOLLOWING	The rows that are specified by frame_start and frame_end are determined based on the sorting order that is specified by `order by`. For example, if data in a window is sorted by X, Xi indicates the X value that corresponds to the ith row, and Xc indicates the X value that corresponds to the current row. Positions of rows specified by frame_start and frame_end: `order by` is set to asc: frame_start indicates the first row that meets the following requirement: `Xi - Xc ≥ offset`. frame_end indicates the last row that meets the following requirement: `Xi - Xc ≤ offset`. `order by` is set to desc: frame_start indicates the first row that meets the following requirement: `Xc - Xi >= offset`. frame_end indicates the last row that meets the following requirement: `Xc - Xi <= offset`.
GROUPS	CURRENT ROW	If frame_start is set to CURRENT ROW, it indicates the first row of the group to which the current row belongs. If frame_end is set to CURRENT ROW, it indicates the last row of the group to which the current row belongs.
	offset PRECEDING	If frame_start is set to offset PRECEDING, it indicates the first row of the Nth group that precedes the group of the current row at a given `offset`. If frame_end is set to offset PRECEDING, it indicates the last row of the Nth group that precedes the group of the current row at a given `offset`. Note You cannot set frame_start to UNBOUNDED FOLLOWING, and you cannot set frame_end to UNBOUNED PRECEDING.
	offset FOLLOWING	If frame_start is set to offset FOLLOWING, it indicates the first row of the Nth group that follows the group of the current row at a given `offset`. If frame_end is set to offset FOLLOWING, it indicates the last row of the Nth group that follows the group of the current row at a given `offset`. Note You cannot set frame_start to UNBOUNDED FOLLOWING, and you cannot set frame_end to UNBOUNED PRECEDING.

frame_exclusion: optional. This parameter is used to remove specific rows from a window. Valid values:
- EXCLUDE NO OTHERS: No rows are excluded from the window.
- EXCLUDE CURRENT ROW: The current row is excluded from the window.
- EXCLUDE GROUP: indicates that an entire group of rows in a partition is excluded from the window. In the group, all rows have the same value of the column that is specified in order by as the current row.
- EXCLUDE TIES: An entire group of rows, except for the current row, are excluded from the window.

Default frame_clause

If you do not specify frame_clause, MaxCompute uses the default frame_clause to determine the data boundaries of a window. Values of the default frame_clause:

If odps.sql.hive.compatible is set to true, the following default frame_clause is used. This rule applies to most SQL systems.
```
RANGE between UNBOUNDED PRECEDING and CURRENT ROW EXCLUDE NO OTHERS
```
If odps.sql.hive.compatible is set to false, order by is specified in windowing_definition, and one of the following window functions is used, the default frame_clause in ROWS mode is used: AVG, COUNT, MAX, MIN, STDDEV, STEDEV_POP, STDDEV_SAMP, and SUM.
```
ROWS between UNBOUNDED PRECEDING and CURRENT ROW EXCLUDE NO OTHERS
```

Example of data boundaries of a window

In this example, a table named tbl contains three columns that are of the BIGINT type: pid, oid, and rid. The tbl table contains the following data:

+------------+------------+------------+
| pid        | oid        | rid        |
+------------+------------+------------+
| 1          | NULL       | 1          |
| 1          | NULL       | 2          |
| 1          | 1          | 3          |
| 1          | 1          | 4          |
| 1          | 2          | 5          |
| 1          | 4          | 6          |
| 1          | 7          | 7          |
| 1          | 11         | 8          |
| 2          | NULL       | 9          |
| 2          | NULL       | 10         |
+------------+------------+------------+

You can replace ellipses (...) in the following SQL statements with windowing_definition to display the data in the windows in which each row of data is included.

Note

If a value in the window column in the returned result is NULL, no data is contained in the window.

Windows in ROWS mode

windowing_definition 1

partition by pid order by oid ROWS between UNBOUNDED PRECEDING and CURRENT ROW
-- Sample SQL statement: 
select pid, 
       oid, 
       rid, 
collect_list(rid) over(partition by pid order by 
oid ROWS between UNBOUNDED PRECEDING and CURRENT ROW) as window from tbl;