MaxCompute provides a large number of built-in functions to meet data processing requirements in most business scenarios. This topic describes the types of built-in functions that are provided by MaxCompute. This topic also describes how to use the built-in functions.
Background information
The following table describes the types of built-in functions that are provided by MaxCompute.
Type | Description |
Used to process data of a date type, such as DATE, DATETIME, or TIMESTAMP. For example, you can use these functions to add and subtract date values, calculate date value differences, extract date fields, obtain the current time, and convert date formats. | |
Used to process data of a numeric type, such as BIGINT, DOUBLE, DECIMAL, or FLOAT. For example, you can use these functions to convert numeral systems, perform mathematical operations, round values, and obtain random numbers. | |
Used to process the data of columns in a window. For example, you can use these functions to calculate the sum, maximum value, minimum value, average value, and median value of column data, sort column data, obtain the data of columns at a given offset, and sample column data. | |
Used to aggregate multiple input records into one output value. For example, you can use these functions to calculate the sum, maximum value, minimum value, and average value of data, aggregate parameters, and concatenate strings. | |
Used to process data of the STRING type. For example, you can use these functions to truncate strings, replace strings, search for strings, convert uppercase and lowercase letters, and convert string formats. | |
Used to process data of the MAP, ARRAY, STRUCT, or JSON type. For example, you can use these functions to deduplicate, aggregate, sort, and merge elements. For more information about the limits on JSON functions, see Limits on JSON functions. | |
Used to process table data of the STRING and BINARY types. For example, you can use these functions to encrypt and decrypt data. | |
Used to process data in other business scenarios. |
For more information about the mappings between the built-in functions of MaxCompute and the built-in functions of open source systems, see Mappings between built-in functions of MaxCompute and built-in functions of Hive, MySQL, and Oracle.
Precautions
When you use built-in functions that are provided by MaxCompute, take note of the following items:
For a built-in function, the types and number of input parameters and function format must meet the function syntax requirements. If the function syntax requirements are not met, MaxCompute cannot parse the built-in function and an error may occur when you execute the SQL statement in which the built-in function is called.
If the input parameters of a built-in function are of a type that is supported by the MaxCompute V2.0 data type edition, you must enable the MaxCompute V2.0 data type edition. The data types supported by the MaxCompute V2.0 data type edition include TINYINT, SMALLINT, INT, FLOAT, VARCHAR, TIMESTAMP, and BINARY. If you do not enable the MaxCompute V2.0 data type edition, an error may occur when you execute the SQL statement in which the built-in function is called. You can enable the MaxCompute V2.0 data type edition at the session or project level.
Session level: Add
set odps.sql.type.system.odps2=true;
before the SQL statement in which a built-in function is called. Then, commit and execute them together. This configuration is valid for only the current SQL statement.Project level: The owner of a project can enable the MaxCompute V2.0 data type edition for the project based on the project requirements. The configuration takes effect after 10 to 15 minutes. This configuration is valid for all the subsequent SQL statements.
setproject odps.sql.type.system.odps2=true;
If you enable the MaxCompute V2.0 data type edition for a project, some implicit conversions are disabled, such as the conversions from STRING to BIGINT, STRING to DATETIME, DOUBLE to BIGINT, DECIMAL to DOUBLE, and DECIMAL to BIGINT. This may cause a loss of precision or errors. In this case, you can use the CAST function to forcefully convert the data types to resolve these issues. You can also disable the MaxCompute V2.0 data type edition.
If the name of a UDF is the same as that of a built-in function, the UDF is preferentially called. For example, if UDF CONCAT and built-in function CONCAT both exist in MaxCompute, the system automatically calls UDF CONCAT instead of the built-in function CONCAT. If you want to call the built-in function, you must add the symbol
::
before the built-in function. For example, you can useselect ::concat('ab', 'c');
.If the settings of global properties of MaxCompute projects are different, the execution results of built-in functions may be different. You can run the
setproject;
command to configure the global properties of a MaxCompute project.
Limits on JSON functions
The development tools that are supported for the JSON type include odpscmd and MaxCompute Studio. External ecosystems such as DataWorks and Dataphin are not supported. If you need to use the JSON type together with an external ecosystem, you must check whether the external ecosystem is supported for the JSON type. The following table describes the items that you must take note of when you use odpscmd and MaxCompute Studio as development tools for the JSON type.
odpscmd
MaxCompute Studio
You must upgrade the client to the latest version. Otherwise, the
desc json_table
command cannot be used.You must set the
use_instance_tunnel
parameter in the conf\odps_config.ini file in the installation path of the client to false. Otherwise, an error is reported when you perform a query.
MaxCompute Studio allows you only to query JSON data. It does not allow you to upload or download JSON data.
If you want to use another type of engine, such as Hologres, to read data from a MaxCompute table, JSON data in the table cannot be read.
Columns of the JSON type cannot be added to a MaxCompute table.
You are not allowed to compare data of the JSON type with data of other types, execute SQL statements that contain the
ORDER BY
orGROUP BY
clause on data of the JSON type, or use columns of the JSON type asjoin
keys.The integer part and decimal part of a value of the JSON NUMBER type are stored by using the BIGINT type and the DOUBLE type separately. If the integer part exceeds the range that is supported by the BIGINT type, an integer overflow error occurs. Conversion of the decimal part to the DOUBLE type leads to precision loss.
Strings that can be used to generate JSON data do not include
\u0000
, which is the Unicode character for NULL.Java UDFs and Python UDFs do not support the JSON type.
Clustered tables cannot store JSON data.
Limits on string functions
The following functions support only English characters:
TRIM, RTRIM, and LTRIM: The value of the trimChars parameter can contain only English characters.
REVERSE: This function supports only English characters in the Hive-compatible data type edition.
SOUNDEX: This function supports only English characters.
TOLOWER: This function is used to convert English characters in a string into lowercase characters.
TOUPPER: This function is used to convert English characters in a string into uppercase characters.
INITCAP: This function is used to convert the first letter of each word in English in a string into an uppercase letter and the other letters of each word into lowercase characters.