All Products
Search
Document Center

MaxCompute:Complex type functions

Last Updated:Dec 09, 2024

You can use complex type functions in MaxCompute SQL to process data of complex data types, such as ARRAY, MAP, STRUCT, and JSON. This topic describes the syntax and parameters of complex type functions that are supported by MaxCompute SQL, and provides examples on how to use complex type functions. This topic guides you through data development by using complex type functions.

The following table describes the complex type functions that are supported by MaxCompute SQL. For more information about the limits on JSON functions, see Limits.

Function type

Function

Description

ARRAY functions

ALL_MATCH

Checks whether all elements in an array meet a specific condition.

ANY_MATCH

Checks whether an element in an array meets a specific condition.

ARRAY

Creates an array based on given values.

ARRAY_CONTAINS

Checks whether an array contains a given value.

ARRAY_DISTINCT

Removes duplicate elements from an array.

ARRAY_EXCEPT

Finds the elements that exist in Array A but do not exist in Array B and returns the elements as a new array without duplicates.

ARRAY_INTERSECT

Calculates the intersection of two arrays.

ARRAY_JOIN

Concatenates the elements in an array by using a delimiter.

ARRAY_MAX

Returns the largest element in an array.

ARRAY_MIN

Returns the smallest element in an array.

ARRAY_NORMALIZE

Returns an array whose elements are normalized based on the specified p norm.

ARRAY_POSITION

Returns the position of the first occurrence of a given element in an array.

ARRAY_REDUCE

Aggregates the elements in an array.

ARRAY_REMOVE

Removes a given element from an array.

ARRAY_REPEAT

Returns a new array in which a given element is repeated several times.

ARRAY_SORT

Sorts the elements in an array based on a comparator.

ARRAY_UNION

Calculates the union of two arrays and returns the union as a new array without duplicates.

ARRAYS_OVERLAP

Checks whether two arrays contain the same element.

ARRAYS_ZIP

Merges multiple arrays.

COMBINATIONS

Returns an array of n-element subsets based on the elements in an input array.

CONCAT

Concatenates multiple arrays or strings.

EXPLODE

Transposes one row of data into multiple rows. This function is a user-defined table-valued function (UDTF).

FILTER

Filters the elements in an array.

FLATTEN

Converts multiple arrays of the ARRAY data type into a single array.

INDEX

Returns the element at a specific position in an array.

NGRAMS

Returns an n-gram array of the elements of the specified array.

POSEXPLODE

Converts an array into a table that has two columns. The first column lists the position of each element in the array, starting from 0. The second column lists the elements.

REVERSE

Returns an array in reverse order of the elements of a specific array.

SEQUENCE

Returns an array that contains the specified elements based on expressions.

SHUFFLE

Returns elements of an array in random order.

SIZE

Returns the number of elements in an array.

SLICE

Copies the elements in an array from a specific position based on a specific length and returns the elements as a new array.

SORT_ARRAY

Sorts the elements in an array.

SPLIT

Splits a string with a specified delimiter and returns an array.

TRANSFORM

Transforms the elements in an array.

ZIP_WITH

Merges two arrays at the element level based on element positions and returns a new array.

MAP functions

EXPLODE

Transposes one row of data into multiple rows. This function is a UDTF.

INDEX

Returns the value that meets a specific condition in a map.

MAP

Creates a map based on given key-value pairs.

MAP_CONCAT

Returns the union of multiple maps.

MAP_ENTRIES

Converts key-value pairs in a map into a struct array.

MAP_FILTER

Filters the elements in a map.

MAP_FROM_ARRAYS

Creates a map based on given arrays.

MAP_FROM_ENTRIES

Creates a map based on given struct arrays.

MAP_KEYS

Returns all keys in a map as an array.

MAP_VALUES

Returns all values in a map as an array.

MAP_ZIP_WITH

Merges two given maps into a single map.

MULTIMAP_FROM_ENTRIES

Returns a map that consists of the keys in a struct array and an array containing all values.

SIZE

Returns the number of key-value pairs in a map.

TRANSFORM_KEYS

Transforms the keys in a map by using a given function. The values in the map are not changed.

TRANSFORM_VALUES

Transforms the values in a map by using a given function. The keys in the map are not changed.

STRUCT functions

FIELD

Obtains the value of a member variable in a struct.

INLINE

Expands a given struct array. Each array element corresponds to a row and each struct element corresponds to a column in each row.

NAMED_STRUCT

Creates a struct based on given name-value pairs.

STRUCT

Creates a struct based on a given value list.

JSON functions

FROM_JSON

Returns data of the ARRAY, MAP, or STRUCT type based on a given JSON string and a given output format.

GET_JSON_OBJECT

Extracts a single string from a standard JSON string by using a specific method.

JSON_TUPLE

Extracts strings from a standard JSON string based on a set of input keys.

TO_JSON

Converts data of a complex data type into a JSON string.

JSON_OBJECT

Returns a JSON object that contains key-value pairs.

JSON_ARRAY

Evaluates a possibly empty list of values and returns a JSON array that contains these values. Evaluates a possibly empty list of values and returns a JSON array that contains these values.

JSON_EXPLODE

Expands each element in a JSON array or JSON object into multiple output rows.

JSON_EXTRACT

Parses the value of json_path in a JSON expression. Note that an error is returned if the value of json_path is invalid.

JSON_EXISTS

Determines whether the JSON value of json_path exists.

JSON_PRETTY

Returns a JSON value in a format that is easy to read by adding line breaks and spaces.

JSON_TYPE

Returns the data type of a JSON value.

JSON_FORMAT

Converts a value of the JSON data type into a value of the STRING data type. By default, JSON data is not automatically prettified.

JSON_PARSE

Converts a value of the STRING data type into a value of the JSON data type. If a non-JSON-formatted value is converted into a value of the STRING data type, an error is returned.

JSON_VALID

Determines whether a string is in a valid JSON format.

CAST

Supports conversion between basic data types and JSON data types.

Note

In the following examples, the combination of a hyphen and a closing angle bracket (->) is used. For more information about how to use the combination of a hyphen and a closing angle bracket (->) in Lambda functions, see Lambda functions.

ALL_MATCH

  • Syntax

    boolean all_match(array<T> <a>, function<T, boolean> <predicate>)
  • Description

    Checks whether all elements in Array a meet the predicate condition.

  • Parameters

    • a: required. This parameter specifies an array. T in array<T> specifies the data type of the elements in the array. The elements can be of any data type.

    • predicate: required. This parameter specifies the built-in function, user-defined function, or expression that is used to determine whether all elements in Array a meet requirements. The data type of the input parameter must be the same as the data type of the elements in Array a.

  • Return value

    A value of the BOOLEAN type is returned. The return value varies based on the following rules:

    • If all elements in Array a meet the predicate condition or the array is empty, True is returned.

    • If one or more elements in Array a do not meet the predicate condition, false is returned.

    • If an element in Array a is null and other elements meet the predicate condition, null is returned.

  • Examples

    • Example 1: Check whether all elements in array(4, 5, 6) meet the x x > 3 condition. This condition specifies that all elements are greater than 3. Sample statement:

      -- The return value is true. 
      select all_match(array(4, 5, 6), x -> x>3);
    • Example 2: The array is empty. Sample statement:

      -- The return value is true. 
      select all_match(array(), x -> x>3);
    • Example 3: Check whether all elements in array(1, 2, -10, 100, -30) meet the x-> x > 3 condition. Sample statement:

      -- The return value is false. 
      select all_match(array(1, 2, -10, 100, -30), x -> x>3);
    • Example 4: Check whether all elements in array(10, 100, 30, null) meet the x-> x > 3 condition. Sample statement:

      -- The return value is null. 
      select all_match(array(10, 100, 30, null), x -> x>3);

ANY_MATCH

  • Syntax

    boolean any_match(array<T> <a>, function<T, boolean> <predicate>)
  • Description

    Checks whether an element in Array a meets the predicate condition.

  • Parameters

    • a: required. This parameter specifies an array. T in array<T> specifies the data type of the elements in the array. The elements can be of any data type.

    • predicate: required. This parameter specifies the built-in function, user-defined function, or expression that is used to determine whether all elements in Array a meet requirements. The data type of the input parameter must be the same as the data type of the elements in Array a.

  • Return value

    A value of the BOOLEAN type is returned. The return value varies based on the following rules:

    • If one or more elements in Array a meet the predicate condition, true is returned.

    • If no elements in Array a meet the predicate condition or the array is empty, false is returned.

    • If an element in Array a is null and other elements do not meet the predicate condition, null is returned.

  • Examples

    • Example 1: Check whether an element in array(1, 2, -10, 100, -30) meets the x-> x > 3 condition. Sample statement:

      -- The return value is true. 
      select any_match(array(1, 2, -10, 100, -30), x-> x > 3);
    • Example 2: The array is empty. Sample statement:

      -- The return value is false. 
      select any_match(array(), x-> x > 3);
    • Example 3: Check whether an element in array(1, 2, -10, -20, -30) meets the x-> x > 3 condition. Sample statement:

      -- The return value is false. 
      select any_match(array(1, 2, -10, -20, -30), x-> x > 3);
    • Example 4: Check whether an element in array(1, 2, null, -10) meets the x-> x > 3 condition. Sample statement:

      -- The return value is null. 
      select any_match(array(1, 2, null, -10), x-> x > 3);

ARRAY

  • Syntax

    array array(<value>,<value>[, ...])
  • Description

    Creates an array based on given values.

  • Parameters

    value: required. A value of any data type. All values must be of the same data type.

  • Return value

    A value of the ARRAY type is returned.

  • Examples

    Create an array based on the data in the t_table table that contains the c1 (BIGINT), c2 (STRING), c3 (STRING), c4 (BIGINT), and c5 (BIGINT) columns. Data in the table:

    +------------+----+----+------------+------------+
    | c1         | c2 | c3 | c4         | c5         |
    +------------+----+----+------------+------------+
    | 1000       | k11 | k21 | 86         | 15         |
    | 1001       | k12 | k22 | 97         | 2          |
    | 1002       | k13 | k23 | 99         | 1          |
    +------------+----+----+------------+------------+

    Sample statement:

    -- Create an array based on the data in the c2, c4, c3, and c5 columns. 
    select array(c2,c4,c3,c5) from t_table;
    
    -- The following result is returned: 
    +------+
    | _c0  |
    +------+
    | [k11, 86, k21, 15] |
    | [k12, 97, k22, 2] |
    | [k13, 99, k23, 1] |
    +------+

ARRAY_CONTAINS

  • Syntax

    boolean array_contains(array<T> <a>, value <v>)
  • Description

    Checks whether Array a contains Element v.

  • Parameters

    • a: required. This parameter specifies an array. T in array<T> specifies the data type of the elements in the array. The elements can be of any data type.

    • v: required. This parameter specifies the element that you want to check. The value of v must be of the same data type as the elements in Array a.

  • Return value

    A value of the BOOLEAN type is returned.

  • Examples

    The t_table_array table contains the c1 (BIGINT) and t_array (ARRAY<STRING>) columns. Data in the table:

    +------------+---------+
    | c1         | t_array |
    +------------+---------+
    | 1000       | [k11, 86, k21, 15] |
    | 1001       | [k12, 97, k22, 2] |
    | 1002       | [k13, 99, k23, 1] |
    +------------+---------+

    Sample statement:

    -- Check whether the t_array column contains the value 1. 
    select c1, array_contains(t_array,'1') from t_table_array;
    -- The following result is returned: 
    +------------+------+
    | c1         | _c1  |
    +------------+------+
    | 1000       | false |
    | 1001       | false |
    | 1002       | true |
    +------------+------+

ARRAY_DISTINCT

  • Syntax

    array<T> array_distinct(array<T> <a>)
  • Description

    Removes duplicate elements from Array a.

  • Parameters

    a: required. This parameter specifies an array. T in array<T> specifies the data type of the elements in the array. The elements can be of any data type.

  • Return value

    A value of the ARRAY type is returned. The return value varies based on the following rules:

    • The returned array has no duplicate elements, and the elements are sorted in the same order as Array a.

    • If an element in Array a is null, the null element is involved in the operation.

    • If the input array is empty, an empty array is returned.

  • Examples

    • Example 1: Remove duplicate elements from array(10, 20, 30, 30, 20, 10). Sample statement:

      -- The return value is [10,20,30]. 
      select array_distinct(array(10, 20, 30, 30, 20, 10));
    • Example 2: Remove duplicate elements from array(10, 20, 20, null, null, 30, 20, null). Sample statement:

      -- The return value is [10,20,null,30]. 
      select array_distinct(array(10, 20, 20, null, null, 30, 20, null)); 
    • Example 3: Remove duplicate elements from an empty array. Sample statement:

      -- The return value is []. 
      select array_distinct(array());

ARRAY_EXCEPT

  • Syntax

    array<T> array_except(array<T> <a>, array<T> <b>)
  • Description

    Finds the elements that exist in Array a but do not exist in Array b and returns the elements as a new array without duplicates.

  • Parameters

    a and b: required. These parameters specify arrays. T in array<T> specifies the data type of the elements in the arrays. The elements can be of any data type. The elements in Array a and the elements in Array b must be of the same data type.

  • Return value

    A value of the ARRAY type is returned. The return value varies based on the following rules:

    • The returned array has no duplicate elements, and the elements are sorted in the same order as Array a.

    • If an element in an array is null, the null element is involved in the operation.

    • If one of the input arrays is empty, a new array is returned based on the non-empty array without duplicates.

    • If the two input arrays are empty, an empty array is returned.

  • Examples

    • Example 1: Find the elements that exist in array(1, 1, 3, 3, 5, 5) but do not exist in array(1, 1, 2, 2, 3, 3) and return the elements as a new array without duplicates. Sample statement:

      -- The return value is [5]. 
      select array_except(array(1, 1, 3, 3, 5, 5), array(1, 1, 2, 2, 3, 3));
    • Example 2: Find the elements that exist in array(1, 1, 3, 3, 5, 5, null, null) but do not exist in array(1, 1, 2, 2, 3, 3) and return the elements as a new array without duplicates. Sample statement:

      -- The return value is [5,null]. 
      select array_except(array(1, 1, 3, 3, 5, 5, null, null), array(1, 1, 2, 2, 3, 3));
    • Example 3: Find the elements that exist in an array but do not exist in another array and return the elements as a new array without duplicates. One of the input arrays is empty. Sample statement:

      -- The return value is [2,1]. 
      select array_except(array(2, 1, 1, 2), cast(array() as array<int>)); 
    • Example 4: Find the elements that exist in an array but do not exist in another array and return the elements as a new array without duplicates. The two input arrays are empty. Sample statement:

      -- The return value is []. 
      select array_except(cast(array() as array<int>), cast(array() as array<int>));

ARRAY_INTERSECT

  • Syntax

    array<T> array_intersect(array<T> <a>, array<T> <b>) 
  • Description

    Calculates the intersection of Array a and Array b and removes duplicate elements.

  • Parameters

    a and b: required. These parameters specify arrays. T in array<T> specifies the data type of the elements in the arrays. The elements can be of any data type. The elements in Array a and the elements in Array b must be of the same data type.

  • Return value

    A value of the ARRAY type is returned. The return value varies based on the following rules:

    • If an element in an array is null, the null element is involved in the operation.

    • The returned array has no duplicate elements, and the elements are sorted in the same order as Array a.

    • If Array a or Array b is null, null is returned.

  • Examples

    • Example 1: Calculate the intersection of array(1, 2, 3) and array(1, 3, 5), and remove duplicate elements. Sample statement:

      -- The return value is [1,3]. 
      select array_intersect(array(1, 2, 3), array(1, 3, 5));
    • Example 2: Calculate the intersection of array(10, 20, 20, 30, 30, null, null) and array(30, 30, 20, 20, 40, null, null), and remove duplicate elements. Sample statement:

      -- The return value is [20,30,null]. 
      select array_intersect(array(10, 20, 20, 30, 30, null, null), array(30, 30, 20, 20, 40, null, null)); 

ARRAY_JOIN

  • Syntax

    array_join(array<T> <a>, <delimiter>[, <nullreplacement>])
  • Description

    Concatenates the elements in Array a by using a delimiter. If the array contains a null element, use nullreplacement to specify the string that you want to use to replace the null element in the results. If you do not configure nullreplacement, the null element is ignored.

  • Parameters

    • a: required. This parameter specifies an array. T in array<T> specifies the data type of the elements in the array.

      Note

      If the elements in the array are not of the STRING type, MaxCompute converts the data type of the elements into STRING.

    • delimiter: required. A value of the STRING type. This parameter specifies the string that is used to separate the concatenated elements in Array a.

    • nullreplacement: optional. This parameter specifies the string that is used to replace null elements.

  • Return value

    A value of the STRING type is returned.

  • Examples

    -- The return value is 10,20,20,30. 
    select array_join(array(10, 20, 20, null, null, 30), ",");
    -- The return value is 10##20##20##null##null##30. 
    select array_join(array(10, 20, 20, null, null, 30), "##", "null");

ARRAY_MAX

  • Syntax

    T array_max(array<T> <a>) 
  • Description

    Returns the largest element in Array a.

  • Parameters

    a: required. This parameter specifies an array. T in array<T> specifies the data type of the elements in the array.

    The following data types are supported:

    • TINYINT, SMALLINT, INT, and BIGINT

    • FLOAT and DOUBLE

    • BOOLEAN

    • DECIMAL and DECIMALVAL

    • DATE, DATETIME, TIMESTAMP, IntervalDayTime, and IntervalYearMonth

    • STRING, BINARY, VARCHAR, and CHAR

    • ARRAY, STRUCT, and MAP

  • Return value

    The largest element in Array a is returned. The return value varies based on the following rules:

    • If Array a is null, null is returned.

    • If an element in Array a is null, the null element is not involved in the operation.

  • Examples

    -- The return value is 20. 
    select array_max(array(1, 20, null, 3));

ARRAY_MIN

  • Syntax

    T array_min(array<T> <a>) 
  • Description

    Returns the smallest element in Array a.

  • Parameters

    a: required. This parameter specifies an array. T in array<T> specifies the data type of the elements in the array.

    The following data types are supported:

    • TINYINT, SMALLINT, INT, and BIGINT

    • FLOAT and DOUBLE

    • BOOLEAN

    • DECIMAL and DECIMALVAL

    • DATE, DATETIME, TIMESTAMP, IntervalDayTime, and IntervalYearMonth

    • STRING, BINARY, VARCHAR, and CHAR

    • ARRAY, STRUCT, and MAP

  • Return value

    The smallest element in Array a is returned. The return value varies based on the following rules:

    • If Array a is null, null is returned.

    • If an element in Array a is null, the null element is not involved in the operation.

  • Examples

    -- The return value is 1. 
    select array_min(array(1, 20, null, 3));

ARRAY_NORMALIZE

  • Syntax

    array_normalize(array, p)
  • Description

    Returns an array whose elements are normalized based on the specified p norm.

    This function is equivalent to TRANSFORM(array, v -> v / REDUCE(array, 0, (a, v) -> a + POW(ABS(v), p), a -> POW(a, 1 / p)). However, the REDUCE function is called only once.

  • Parameters

    array: an input array. The elements in the array can be of only the FLOAT and DOUBLE data types.

    p: the p norm of the array.

  • Return value

    An array whose elements are normalized based on the specified p norm is returned.

    • If the array is null or contains null elements, null is returned.

    • If the p=0 condition is met, the original array is returned. If the p<0 condition is met, an error is returned.

  • Examples

    SELECT  array_normalize(array(10.0, 20.0, 50.0), 1.0);

    The following result is returned:

    [0.125, 0.25, 0.625]

ARRAY_POSITION

  • Syntax

    bigint array_position(array<T> <a>, T <element>)
  • Description

    Returns the position of the first occurrence of a given element in Array a. The position numbers of elements are counted from left to right and start from 1.

  • Parameters

    • a: required. This parameter specifies an array. T in array<T> specifies the data type of the elements in the array. The following data types are supported:

      • TINYINT, SMALLINT, INT, and BIGINT

      • FLOAT and DOUBLE

      • BOOLEAN

      • DECIMAL and DECIMALVAL

      • DATE, DATETIME, TIMESTAMP, IntervalDayTime, and IntervalYearMonth

      • STRING, BINARY, VARCHAR, and CHAR

      • ARRAY, STRUCT, and MAP

    • element: required. The element whose position you want to query. The data type of this parameter must be the same as the data type of the elements in Array a.

  • Return value

    A value of the BIGINT type is returned. The return value varies based on the following rules:

    • If Array a or element is null, null is returned.

    • If the specific element is not found, 0 is returned.

  • Examples

    • Example 1: Return the position of the first occurrence of 1 in array(3, 2, 1). Sample statement:

      -- The return value is 3. 
      select array_position(array(3, 2, 1), 1);
    • Example 2: The specified element is null. Sample statement:

      -- The return value is null. 
      select array_position(array(3, 1, null), null);

ARRAY_REDUCE

  • Syntax

    R array_reduce(array<T> <a>, buf <init>, function<buf, T, buf> <merge>, function<buf, R> <final>)
  • Description

    Aggregates the elements in Array a.

  • Parameters

    • a: required. This parameter specifies an array. T in array<T> specifies the data type of the elements in the array. The elements can be of any data type.

    • init: required. The initial value of the intermediate result that is used to aggregate elements.

    • merge: required. A built-in function, user-defined function, or expression that is used to perform an operation on each element in Array a and the intermediate result. This function or expression uses the elements of Array a and the init parameter as input parameters.

    • final: required. A built-in function, user-defined function, or expression that is used to convert the intermediate result into the final result. This function or expression uses the result of merge as the input parameter. R specifies the data type of the output.

  • Return value

    The data type of the return value is the same as the data type that is specified for final.

  • Examples

    -- The return value is 6. 
    select array_reduce(array(1, 2, 3), 0, (buf, e)->buf + e, buf->buf);
    -- The return value is 2.5. 
    select array_reduce(array(1, 2, 3, 4), named_struct('sum', 0, 'count', 0), (buf, e)->named_struct('sum', buf.sum + e, 'count', buf.count + 1), buf -> buf.sum / buf.count);

ARRAY_REMOVE

  • Syntax

    array<T> array_remove(array<T> <a>, T <element>)
  • Description

    Removes a given element from Array a.

  • Parameters

    • a: required. This parameter specifies an array. T in array<T> specifies the data type of the elements in the array. The following data types are supported:

      • TINYINT, SMALLINT, INT, and BIGINT

      • FLOAT and DOUBLE

      • BOOLEAN

      • DECIMAL and DECIMALVAL

      • DATE, DATETIME, TIMESTAMP, IntervalDayTime, and IntervalYearMonth

      • STRING, BINARY, VARCHAR, and CHAR

      • ARRAY, STRUCT, and MAP

    • element: required. The element that you want to remove. The data type of this parameter must be the same as the data type of the elements in Array a.

  • Return value

    A value of the ARRAY type is returned. The return value varies based on the following rules:

    • If an element in Array a is null, the null element is not involved in the operation.

    • If Array a or element is null, null is returned.

    • If Array a does not contain the specific element, Array a is returned.

  • Examples

    • Example 1: Remove 1 from array(3, 2, 1). Sample statement:

      -- The return value is [3,2]. 
      select array_remove(array(3, 2, 1), 1);
    • Example 2: The specified element is null. Sample statement:

      -- The return value is null. 
      select array_remove(array(3, 1, null), null);
    • Example 3: Remove 2 from array(3, 1, null). Sample statement:

      -- The return value is [3,1,null]. 
      select array_remove(array(3, 1, null), 2);

ARRAY_REPEAT

  • Syntax

    array<T> array_repeat(T <element>, int <count>)
  • Description

    Returns a new array in which element t is repeated count times.

  • Parameters

    • t: required. This parameter specifies the element that you want to repeat. The following data types are supported:

      • TINYINT, SMALLINT, INT, and BIGINT

      • FLOAT and DOUBLE

      • BOOLEAN

      • DECIMAL and DECIMALVAL

      • DATE, DATETIME, TIMESTAMP, IntervalDayTime, and IntervalYearMonth

      • STRING, BINARY, VARCHAR, and CHAR

      • ARRAY, STRUCT, and MAP

    • count: required. This parameter specifies the number of repetitions. A value of the INT type is required. The value must be greater than or equal to 0.

  • Return value

    A value of the ARRAY type is returned. The return value varies based on the following rules:

    • If the value of count is null, null is returned.

    • If the value of count is less than 0, an empty array is returned.

  • Examples

    • Example 1: Repeat 123 twice and return the new array. Sample statement:

      -- The return value is [123, 123]. 
      select array_repeat('123', 2);
    • Example 2: The value of count is null. Sample statement:

      -- The return value is null. 
      select array_repeat('123', null);
    • Example 3: The value of count is less than 0. Sample statement:

      -- The return value is []. 
      select array_repeat('123', -1);

ARRAY_SORT

  • Syntax

    array<T> array_sort(array<T> <a>, function<T, T, bigint> <comparator>)
  • Description

    Sorts the elements in Array a based on a comparator.

  • Parameters

    • a: required. This parameter specifies an array. T in array<T> specifies the data type of the elements in the array. The elements can be of any data type.

    • comparator: required. A built-in function, user-defined function (UDF), or expression that is used to compare two elements in the array.

      Processing logic of comparator(a, b): If a is equal to b, 0 is returned. If a is less than b, a negative integer is returned. If a is greater than b, a positive integer is returned. If comparator(a, b) returns null, an error is returned.

      Important

      Comparison functions in ARRAY_SORT must be self-consistent. Examples:

      • If compare(a, b) > 0 is met, compare(b, a) < 0 must be met.

      • If compare(a, b) = 0 is met, compare(b, a) = 0 must be met.

      • If compare(a, b) < 0 is met, compare(b, a) > 0 must be met.

      Examples of functions that are not self-consistent:

      • (left, right) -> CASE WHEN left <= right THEN -1L ELSE 0L END:

        If you configure a = 1 and b = 1, the compare(a, b) function returns -1, and the compare(b, a) function also returns -1. The comparison results conflict with each other, and the functions are not self-consistent.

      • (left, right) -> CASE WHEN left < right THEN - 1L WHEN left = right THEN 0L ELSE 1L END:

        If you configure a = NULL and b = 1, the compare(a, b) function returns 1 and the compare(b, a) function also returns 1. The comparison results conflict with each other, and the functions are not self-consistent.

  • Return value

    A value of the ARRAY type is returned.

  • Examples

    • Example 1: Sort the elements in array(5,6,1).

      SELECT array_sort(array(5,6,1), (left,right) -> CASE WHEN left < right THEN -1L WHEN left > right THEN 1L ELSE 0L END);
      
      -- The following result is returned:
      +------------+
      | _c0        |
      +------------+
      | [1,5,6]    |
      +------------+
    • Example 2:

      SELECT array_sort(a, (a,b)-> CASE WHEN a.a>b.a THEN 1L WHEN a.a<b.a THEN -1L ELSE 0L END) 
      FROM VALUES ( 
        array(named_struct('a', 1, 'b', 10),
              named_struct('a', 3, 'b', 11),
              named_struct('a', 2, 'b', 12)))
        AS t(a);
        
      -- The following result is returned:
      +------+
      | _c0 |
      +------+
      | [{a:1, b:10}, {a:2, b:12}, {a:3, b:11}] |
      +------+

ARRAY_UNION

  • Syntax

    array<T> array_union(array<T> <a>,  array<T> <b>)
  • Description

    Calculates the union of Array a and Array b, and removes duplicate elements.

  • Parameters

    a and b: required. These parameters specify arrays. T in array<T> specifies the data type of the elements in the arrays. The elements can be of any data type. The elements in Array a and the elements in Array b must be of the same data type.

    The following data types are supported:

    • TINYINT, SMALLINT, INT, and BIGINT

    • FLOAT and DOUBLE

    • BOOLEAN

    • DECIMAL and DECIMALVAL

    • DATE, DATETIME, TIMESTAMP, IntervalDayTime, and IntervalYearMonth

    • STRING, BINARY, VARCHAR, and CHAR

    • ARRAY, STRUCT, and MAP

  • Return value

    A value of the ARRAY type is returned. If Array a or Array b is null, null is returned.

  • Examples

    • Example 1: Calculate the union of array(1, 2, 3) and array(1, 3, 5), and remove duplicate elements. Sample statement:

      -- The return value is [1,2,3,5]. 
      select array_union(array(1, 2, 3), array(1, 3, 5));
    • Example 2: Calculate the union of two arrays and remove duplicate elements. One of the arrays is null. Sample statement:

      -- The return value is null. 
      select array_union(array(1, 2, 3), null);

ARRAYS_OVERLAP

  • Syntax

    boolean arrays_overlap(array<T> <a>,  array<T> <b>)
  • Description

    Checks whether Array a and Array b contain the same element.

  • Parameters

    a and b: required. These parameters specify arrays. T in array<T> specifies the data type of the elements in the arrays. The elements can be of any data type. The elements in Array a and the elements in Array b must be of the same data type.

    The following data types are supported:

    • TINYINT, SMALLINT, INT, and BIGINT

    • FLOAT and DOUBLE

    • BOOLEAN

    • DECIMAL and DECIMALVAL

    • DATE, DATETIME, TIMESTAMP, IntervalDayTime, and IntervalYearMonth

    • STRING, BINARY, VARCHAR, and CHAR

    • ARRAY, STRUCT, and MAP

  • Return value

    A value of the BOOLEAN type is returned. The return value varies based on the following rules:

    • If Array a contains at least one element that is in Array b and is not null, true is returned.

    • If Array a and Array b do not contain the same element, both of the arrays are not empty, and one or both of the arrays contain a null element, null is returned.

    • If Array a and Array b do not contain the same element, and both of the arrays are not empty and do not contain a null element, false is returned.

  • Examples

    • Example 1: Check whether array(1, 2, 3) and array(3, 4, 5) contain the same element. Sample statement:

      -- The return value is true. 
      select arrays_overlap(array(1, 2, 3), array(3, 4, 5));
    • Example 2: Check whether array(1, 2, 3) and array(6, 4, 5) contain the same element. Sample statement:

      -- The return value is false. 
      select arrays_overlap(array(1, 2, 3), array(6, 4, 5));
    • Example 3: Check whether two arrays contain the same element. One of the arrays contains the null element. Sample statement:

      -- The return value is null. 
      select arrays_overlap(array(1, 2, 3), array(5, 4, null));

ARRAYS_ZIP

  • Syntax

    array<struct<T, U, ...>> arrays_zip(array<T> <a>, array<U> <b>[, ...])
  • Description

    Merges multiple given arrays and returns a struct array, in which the Nth struct contains all the Nth elements of the input arrays.

  • Parameters

    a and b: required. These parameters specify arrays. T in array<T> and U in array<U> specify the data types of the elements in the arrays. The elements can be of any data type.

    The following data types are supported:

    • TINYINT, SMALLINT, INT, and BIGINT

    • FLOAT and DOUBLE

    • BOOLEAN

    • DECIMAL and DECIMALVAL

    • DATE, DATETIME, TIMESTAMP, IntervalDayTime, and IntervalYearMonth

    • STRING, BINARY, VARCHAR, and CHAR

    • ARRAY, STRUCT, and MAP

  • Return value

    A value of the ARRAY type is returned. The return value varies based on the following rules:

    • The Nth struct in the generated struct array contains all the Nth elements of the input arrays. If an array contains less than N elements, null is used as the Nth element of the array.

    • If one or more input arrays are null, null is returned.

  • Examples

    • Example 1: Merge array(1, 2, 3) and array(2, 3, 4) into a struct array. Sample statement:

      -- The return value is [{0:1, 1:2}, {0:2, 1:3}, {0:3, 1:4}]. 
      select arrays_zip(array(1, 2, 3), array(2, 3, 4));
    • Example 2: Merge array(1, 2, 3) and array(4, 5) into a struct array. Sample statement:

      -- The return value is [{0:1, 1:4}, {0:2, 1:5}, {0:3, 1:null}]. 
      select arrays_zip(array(1, 2, 3), array(4, 5));

COMBINATIONS

  • Syntax

    combinations(array(T), n)
  • Description

    Returns an array of n-element subsets based on the elements in an input array.

  • Parameters

    array: an input array.

    n: the number of elements in each subset.

  • Return value

    Returns an array of n-element subsets based on the elements in an input array.

    • If the input array has no duplicate elements, an array of n-element subsets is returned. The subsets are deterministic and listed in random order. Elements in a subset are deterministic and listed in random order.

    • By default, the maximum value of n is 5. You can specify the odps.sql.max.combination.length parameter to change the maximum value. By default, the total number of generated subsets cannot exceed 100,000. You can specify the odps.sql.max.combinations parameter to change the total number.

    • If the value of n is greater than the number of elements in the input array, an empty array is returned.

  • Examples

    • SELECT combinations(array('foo', 'bar', 'boo'),2);

      The following result is returned:

      [['foo', 'bar'], ['foo', 'boo']['bar', 'boo']]
    • SELECT combinations(array(1,2,3,4,5),3);

      The following result is returned:

      [[1, 2, 3], [1, 2, 4], [1, 3, 4], [2, 3, 4], [1, 2, 5], [1, 3, 5], [2, 3, 5], [1, 4, 5], [2, 4, 5], [3, 4, 5]]
    • SELECT combinations(array(1,2,2),2);

      The following result is returned:

      [[1,2], [1,2], [2,2]]	

CONCAT

  • Syntax

    array<T> concat(array<T> <a>, array<T> <b>[,...])
    string concat(string <str1>, string <str2>[,...])
  • Description

    • Arrays as inputs: Concatenates all elements of multiple arrays and returns a new array.

    • Strings as inputs: Concatenates multiple strings and returns a new string.

  • Parameters

    • a and b: required. These parameters specify arrays. T in array<T> specifies the data type of the elements in the arrays. The elements can be of any data type. The elements in Array a and the elements in Array b must be of the same data type. The null elements are also involved in the operation.

    • str1 and str2: required. Values of the STRING type. If the input values are of the BIGINT, DOUBLE, DECIMAL, or DATETIME type, they are implicitly converted into values of the STRING type before calculation. If the input values are of other data types, an error is returned.

  • Return value

    • A value of the ARRAY type is returned. If one of the input arrays is null, null is returned.

    • A value of the STRING type is returned. If no input parameters are configured or an input parameter is set to null, null is returned.

  • Examples

    • Example 1: Concatenate all elements of array(10, 20) and array(20, -20). Sample statement:

      -- The return value is [10, 20, 20, -20]. 
      select concat(array(10, 20), array(20, -20));
    • Example 2: One of the input arrays contains a null element. Sample statement:

      -- The return value is [10, null, 20, -20]. 
      select concat(array(10, null), array(20, -20));
    • Example 3: One of the input arrays is null. Sample statement:

      -- The return value is null. 
      select concat(array(10, 20), null);
    • Example 4: Concatenate strings aabc and abcde. Sample statement:

      -- The return value is aabcabcde. 
      select concat('aabc','abcde');
    • Example 5: The input is empty. Sample statement:

      -- The return value is null. 
      select concat();
    • Example 6: One of the input strings is null. Sample statement:

      -- The return value is null. 
      select concat('aabc', 'abcde', null);

EXPLODE

  • Limits

    • A SELECT statement can include only one EXPLODE function. Only data in a column can be calculated by using the EXPLODE function.

    • This function cannot be used with the GROUP BY, CLUSTER BY, DISTRIBUTE BY, or SORT BY clause.

  • Syntax

    explode (<var>)
  • Description

    Transposes one row of data into multiple rows. This function is a UDTF.

    • If the parameter value is of the array<T> type, the array stored in the column is transposed into multiple rows.

    • If the parameter value is of the map<K, V> type, each key-value pair of the map stored in the column is transposed into one row with two columns. One column is used to store keys, and the other column is used to store values.

  • Parameters

    var: required. The value must be of the array<T> or map<K, V> type.

  • Return value

    Rows after transposition are returned.

  • Examples

    The t_table_map table contains the c1 (BIGINT) and t_map (MAP<STRING,BIGINT>) columns. Data in the table:

      +------------+-------+
    | c1         | t_map |
    +------------+-------+
    | 1000       | {k11:86, k21:15} |
    | 1001       | {k12:97, k22:2} |
    | 1002       | {k13:99, k23:1} |
    +------------+-------+

    Sample statement:

    select explode(t_map) from t_table_map;
    -- The following result is returned: 
    +-----+------------+
    | key | value      |
    +-----+------------+
    | k11 | 86         |
    | k21 | 15         |
    | k12 | 97         |
    | k22 | 2          |
    | k13 | 99         |
    | k23 | 1          |
    +-----+------------+

FIELD

  • Syntax

    T field(struct <s>, string <fieldName>)
  • Description

    Obtains the value of a member variable in a struct.

  • Parameters

    • s: required. This parameter specifies a struct. The struct is in the format of {f1:T1, f2:T2[, ...]}. f1 and f2 specify member variables, T1 specifies the value of f1, and T2 specifies the value of f2.

    • fieldName: required. A value of the STRING type. This parameter specifies the member variable in the struct.

  • Return value

    A value of the specific member variable in the struct is returned.

  • Examples

    -- The return value is hello. 
    select field(named_struct('f1', 'hello', 'f2', 3), 'f1');

FILTER

  • Syntax

    array<T> filter(array<T> <a>, function<T,boolean> <func>)
  • Description

    Filters the elements in Array a by using func and returns a new array.

  • Parameters

    • a: required. This parameter specifies an array. T in array<T> specifies the data type of the elements in the array. The elements can be of any data type.

    • func: required. This parameter specifies the built-in function, user-defined function, or expression that is used to filter the elements in Array a. The value must be of the same data type as the elements in Array a. The output result of the function or expression is of the BOOLEAN type.

  • Return value

    A value of the ARRAY type is returned.

  • Examples

    -- The return value is [2, 3]. 
    select filter(array(1, 2, 3), x -> x > 1);

FLATTEN

  • Syntax

    flatten(arrayOfArray)
  • Description

    Converts multiple arrays of the ARRAY data type into a single array.

  • Parameters

    arrayOfArray: arrays of the ARRAY data type.

  • Return value

    Multiple arrays of the ARRAY data type are converted into a single array in the order of the elements.

    • If the input value is null, null is returned.

    • If the input parameters are not arrays of the ARRAY data type, an error is returned.

  • Examples

    SELECT flatten(array(array(1, 2), array(3, 4)));

    The following result is returned:

    [1,2,3,4]

FROM_JSON

  • Syntax

    from_json(<jsonStr>, <schema>)
  • Description

    Returns data of the ARRAY, MAP, or STRUCT type based on JSON string jsonStr and output format schema.

  • Parameters

    • jsonStr: required. The JSON string that you entered.

    • schema: required. The schema of the JSON string. The value of this parameter must be in the same format as that in the statement for creating a table, such as array<bigint>, map<string, array<string>>, or struct<a:int, b:double, `C`:map<string,string>>.

      Note

      Keys in a struct are case-sensitive. You can also specify a struct in the format of a BIGINT, b DOUBLE, which is equivalent to STRUCT<a:BIGINT, b:DOUBLE>.

      The following table describes the mappings between JSON data types and MaxCompute data types.

      JSON data type

      MaxCompute data type

      OBJECT

      STRUCT, MAP, and STRING

      ARRAY

      ARRAY and STRING

      NUMBER

      TINYINT, SMALLINT, INT, BIGINT, FLOAT, DOUBLE, DECIMAL, and STRING

      BOOLEAN

      BOOLEAN and STRING

      STRING

      STRING, CHAR, VARCHAR, BINARY, DATE, and DATETIME

      NULL

      All types

      Note

      The JSON string of the OBJECT and ARRAY types are parsed as much as possible. If the data type of the JSON string is not mapped to any MaxCompute data type, the JSON string is omitted. For ease of use, all JSON data types can be converted into the STRING data type supported by MaxCompute. When you convert a JSON string of the NUMBER type to a value of the FLOAT, DOUBLE, or DECIMAL type, the precision of the value cannot be ensured. We recommend you convert the JSON string to a value of the STRING type and then convert the obtained value to a value of the FLOAT, DOUBLE, or DECIMAL type.

  • Return value

    A value of the ARRAY, MAP, or STRUCT type is returned.

  • Examples

    • Example 1: Convert a specific JSON string into a value of a specific data type. Sample statement:

      -- The return value is {"a":1,"b":0.8}. 
      select from_json('{"a":1, "b":0.8}', 'a int, b double');
      -- The return value is {"time":"26/08/2015"}. 
      select from_json('{"time":"26/08/2015"}', 'time string');
      -- The return value is {"a":1,"b":0.8}. 
      select from_json('{"a":1, "b":0.8}', 'a int, b double, c string');
      -- The return value is [1,2,3]. 
      select from_json('[1, 2, 3, "a"]', 'array<bigint>');
      -- The return value is {"d":"v","a":"1","b":"[1,2,3]","c":"{}"}. 
      select from_json('{"a":1,"b":[1,2,3],"c":{},"d":"v"}', 'map<string, string>');
    • Example 2: Use the map_keys and from_json functions to obtain all keys in a JSON string. You can also use JSON_KEYS for the same purpose. Sample statement:

      -- The return value is ["a","b"]. 
      select map_keys(from_json('{"a":1,"b":2}','map<string,string>'));

GET_JSON_OBJECT

Usage notes

This function is used to extract a single string from a standard JSON string by using JSONPath. This function supports input parameters of the following data types:

  • JSON types: If input parameters are of JSON types, standard JSONPath expressions are used.

  • STRING type: If input parameters are of the STRING type, the original JSONPath expressions are used.

The methods and usage notes for using this function vary based on input parameter data types. This topic describes how to use the GET_JSON_OBJECT function when input parameters are of the JSON and STRING types.

Note
  • The JSONPath expressions that are used when input parameters are of JSON types and those that are used when input parameters are of the STRING type comply with different rules. This may cause incompatibility issues.

  • The GET_JSON_OBJECT function does not support the syntax of JSONPath expressions.

Input parameters of JSON data types

  • Syntax

    string get_json_object(json <json>, string <json_path>)
  • Description

    Extracts a single string from a standard JSON string based on JSON PATH.

  • Parameters

    • json: required. A JSON string from which you want to extract a single string.

    • json_path: required. A JSONPath expression based on which you want to extract a single string.

  • Return value

    A value of a STRING type is returned.

  • Examples

    • Example 1: Extract the value that corresponds to the key a from a JSON string.

      select get_json_object(json '{"a":1, "b":2}', '$.a');

      The following result is returned:

      +-----+
      | _c0 |
      +-----+
      | 1   |
      +-----+
    • Example 2: Extract the value that corresponds to the key c from a JSON string.

      select get_json_object(json '{"a":1, "b":2}', '$.c');

      The following result is returned:

      +-----+
      | _c0 |
      +-----+
      | NULL |
      +-----+
    • Example 3: If an invalid JSON path is specified, the return value is NULL.

      select get_json_object(json '{"a":1, "b":2}', '$invalid_json_path');

      The following result is returned:

      +-----+
      | _c0 |
      +-----+
      | NULL |
      +-----+

Input parameter of the STRING data type

  • Syntax

    string get_json_object(string <json>, string <path>)
  • Description

    Extracts a single string from a standard JSON string based on path. The original data is read each time this function is called. Therefore, repeated calls may affect system performance and increase costs. To prevent repeated calls, you can use the GET_JSON_OBJECT function with UDTFs. For more information, see Convert JSON log data by using MaxCompute built-in functions and UDTFs.

  • Parameters

    • json: required. A value of the STRING type. This parameter specifies a standard JSON object in the format of {Key:Value, Key:Value,...}. If the string contains a double quotation mark ("), use two backslashes (\\) to escape the double quotation mark (") before extraction. If the string contains a single quotation mark ('), use a single backslash (\) to escape the single quotation mark (') before extraction.

    • path: required. A value of the STRING type. This parameter specifies the path in the value of the json parameter and starts with $. For more information about the path parameter, see LanguageManual UDF. For more information about best practices, see Migrate JSON-formatted data from OSS to MaxCompute. Meanings of different characters:

      • $: indicates the root node.

      • . or ['']: indicates a child node. MaxCompute parses JSON objects by using . or ['']. If a key in a JSON object contains a period (.), [''] can be used.

      • [] ([number]): indicates an array subscript, which starts from 0.

      • *: indicates the wildcard for []. If this character is used in the path parameter, an entire array is returned. An asterisk (*) cannot be escaped.

  • Limits

    Only MaxCompute V2.0 allows you to extract data by using [''] in the path parameter. To use [''], you must add the set odps.sql.udf.getjsonobj.new=true; statement before the statement that you want to execute.

  • Return value

    • If the json parameter is empty or invalid, null is returned.

    • If the format of json is valid and path exists, the related string is returned.

    • You can specify the flagodps.sql.udf.getjsonobj.new parameter for a session to determine how this function returns a value.

      • If you execute the set odps.sql.udf.getjsonobj.new=true; statement, this function retains the original strings when it returns a value.

        We recommend that you use this configuration because it results in more standard function return behavior. This facilitates data processing and improves data processing performance. If a job in which this function escapes JSON reserved characters exists in a MaxCompute project, we recommend that you retain the original escape operation to prevent errors caused by lack of verification. The function complies with the following rules when it returns a value:

        • In this configuration, the return value is still a JSON string, which can be parsed as JSON data, without the need to use the REPLACE or REGEXP_REPLACE function to replace backslashes (\).

        • Duplicate keys are allowed in a JSON object. If duplicate keys exist, the data can be parsed.

          -- The return value is 1. 
          select get_json_object('{"a":"1","a":"2"}', '$.a');
        • The encoded strings that correspond to emojis are supported. However, DataWorks does not allow you to enter emojis. DataWorks allows you to enter only the encoded strings that correspond to emojis to MaxCompute by using a tool, such as Data Integration. DataWorks uses the GET_JSON_OBJECT function to process the data.

          -- The return value is an emoji. 
          select get_json_object('{"a":"<Emoji>"}', '$.a');
        • The output results are displayed in alphabetical order.

          -- The return value is {"b":"1","a":"2"}. 
          select get_json_object('{"b":"1","a":"2"}', '$');
      • If you execute the set odps.sql.udf.getjsonobj.new=false; statement, this function escapes JSON reserved characters when it returns a value. The function complies with the following rules when it returns a value:

        • JSON reserved characters such as line feeds (\n) and quotation marks (") are displayed as '\n' and '\"'.

        • Each key in a JSON object must be unique. If duplicate keys exist, the data may fail to be parsed. Sample statement:

          -- The return value is null. 
          select get_json_object('{"a":"1","a":"2"}', '$.a');
        • The encoded strings that correspond to emojis cannot be parsed. Sample statement:

          -- The return value is null. 
          select get_json_object('{"a":"<Emoji>"}', '$.a');
        • The output results are displayed in alphabetical order. Sample statement:

          -- The return value is {"a":"2","b":"1"}. 
          select get_json_object('{"b":"1","a":"2"}', '$');
      Note

      For MaxCompute projects that were created on or after January 21, 2021, the GET_JSON_OBJECT function retains the original strings when it returns a value. For MaxCompute projects that are created before January 21, 2021, the GET_JSON_OBJECT function escapes JSON reserved characters when it returns a value. The following example helps you determine how the GET_JSON_OBJECT function returns a value in a MaxCompute project.

      select get_json_object('{"a":"[\\"1\\"]"}', '$.a');
      -- Return JSON reserved characters by using escape characters.
      [\"1\"]
      
      -- Return the original strings.
      ["1"]

      You can submit an application or search for the DingTalk group ID 11782920 to join the MaxCompute developer community DingTalk group and request MaxCompute technical support engineers to configure the GET_JSON_OBJECT function to retain original strings. This way, you do not need to frequently specify set odps.sql.udf.getjsonobj.new=false; for a session.

  • Examples

    • Example 1: Extract information from the JSON object src_json.json. Sample statement:

      -- The JSON string src_json.json contains the following content: 
      +----+
      json
      +----+
      {"store":
      {"fruit":[{"weight":8,"type":"apple"},{"weight":9,"type":"pear"}],
      "bicycle":{"price":19.95,"color":"red"}
      },
      "email":"amy@only_for_json_udf_test.net",
      "owner":"amy"
      }
      -- Extract the information of the owner field and return amy. 
      select get_json_object(src_json.json, '$.owner') from src_json;
      -- Extract the information of the first array in the store.fruit field and return {"weight":8,"type":"apple"}. 
      select get_json_object(src_json.json, '$.store.fruit[0]') from src_json;
      -- Extract the information of the non-existent field and return null. 
      select get_json_object(src_json.json, '$.non_exist_key') from src_json;
    • Example 2: Extract information from a JSON object of the ARRAY type. Sample statement:

      -- The return value is 2222. 
      select get_json_object('{"array":[["aaaa",1111],["bbbb",2222],["cccc",3333]]}','$.array[1][1]');
      -- The return value is ["h0","h1","h2"]. 
      set odps.sql.udf.getjsonobj.new=true;
      select get_json_object('{"aaa":"bbb","ccc":{"ddd":"eee","fff":"ggg","hhh":["h0","h1","h2"]},"iii":"jjj"}','$.ccc.hhh[*]');
      -- The return value is ["h0","h1","h2"]. 
      set odps.sql.udf.getjsonobj.new=false;
      select get_json_object('{"aaa":"bbb","ccc":{"ddd":"eee","fff":"ggg","hhh":["h0","h1","h2"]},"iii":"jjj"}','$.ccc.hhh');
      -- The return value is h1. 
      select get_json_object('{"aaa":"bbb","ccc":{"ddd":"eee","fff":"ggg","hhh":["h0","h1","h2"]},"iii":"jjj"}','$.ccc.hhh[1]');
    • Example 3: Extract information from a JSON object that includes a period (.). Sample statement:

      -- Create a table. 
      create table mf_json (id string, json string);
      -- Insert data into the table. The key in the data contains a period (.). 
      insert into table mf_json (id, json) values ("1", "{
      \"China.beijing\":{\"school\":{\"id\":0,\"book\":[{\"title\": \"A\",
      \"price\": 8.95},{\"title\": \"B\",\"price\": 10.2}]}}}");
      -- Insert data into the table. The key in the data does not contain a period (.). 
      insert into table mf_json (id, json) values ("2", "{
      \"China_beijing\":{\"school\":{\"id\":0,\"book\":[{\"title\": \"A\",
      \"price\": 8.95},{\"title\": \"B\",\"price\": 10.2}]}}}");
      -- Query the value of id in the JSON object whose key is China.beijing. 0 is returned. Only [''] can be used to specify the key because the key contains a period (.). This way, MaxCompute can parse the key. 
      select get_json_object(json, "$['China.beijing'].school['id']") from mf_json where id =1;
      -- Query the value of id in the JSON object whose key is China_beijing. 0 is returned. You can use one of the following statements: 
      select get_json_object(json, "$['China_beijing'].school['id']") from mf_json where id =2;
      select get_json_object(json, "$.China_beijing.school['id']") from mf_json where id =2;
    • Example 4: The json parameter is empty or invalid. Sample statement:

      -- The return value is null. 
      select get_json_object('','$.array[1][1]');
      -- The return value is null. 
      select get_json_object('"array":["aaaa",1111],"bbbb":["cccc",3333]','$.array[1][1]');
    • Example 5: Escape a JSON string. Sample statement:

      set odps.sql.udf.getjsonobj.new=true;
      -- The return value is "1". 
      select get_json_object('{"a":"\\"1\\"","b":"2"}', '$.a'); 
      -- The return value is '1'. 
      select get_json_object('{"a":"\'1\'","b":"2"}', '$.a'); 

INDEX

  • Syntax

    index(<var1>[<var2>])
  • Description

    • If var1 is of the map<K, V> type, this function obtains the element that is at position var2 in var1. The position numbers of elements are counted from left to right and start from 0.

    • If var1 is of the map<K, V> type, this function obtains the value whose key is var2 in var1.

    Note

    When you use this function, you must remove index and directly execute <var1>[<var2>]. Otherwise, an error is returned.

  • Parameters

    • var1: required. The value must be of the array<T> or map<K, V> type. T in array<T> specifies the data type of the elements in an array. The elements can be of any data type. K and V in map<K, V> specify the keys and values of a map.

    • var2: required.

      • If var1 is of the array<T> type, var2 must be of the BIGINT type and greater than or equal to 0.

      • If var1 is of the map<K, V> type, var2 must be of the same data type as K.

  • Return value

    • If var1 is of the array<T> type, a value of the data type that is specified by T is returned. The return value varies based on the following rules:

      • If the number of elements in var1 is less than var2, null is returned.

      • If var1 is null, null is returned.

    • If var1 is of the map<K, V> type, a value of the data type that is specified by V is returned. The return value varies based on the following rules:

      • If map<K, V> does not contain a key whose value is var2, null is returned.

      • If var1 is null, null is returned.

  • Examples

    • Example 1: var1 is of the array<T> type. Sample statement:

      -- The return value is c. 
      select array('a','b','c')[2];
    • Example 2: var1 is of the map<K, V> type. Sample statement:

      -- The return value is 1. 
      select str_to_map("test1=1,test2=2")["test1"];

INLINE

  • Syntax

    inline(array<struct<f1:T1, f2:T2[, ...]>>)
  • Description

    Expands a given struct array. Each array element corresponds to a row and each struct element corresponds to a column in each row.

  • Parameters

    f1:T1 and f2:T2: required. Values of any data type. f1 and f2 specify member variables, T1 specifies the value of f1, and T2 specifies the value of f2.

  • Return value

    The expanded data of the struct array is returned.

  • Examples

    The t_table table contains the t_struct (STRUCT<user_id:BIGINT,user_name:STRING,married:STRING,weight:DOUBLE>) fields. Data in the table:

    +----------+
    | t_struct |
    +----------+
    | {user_id:10001, user_name:LiLei, married:N, weight:63.5} |
    | {user_id:10002, user_name:HanMeiMei, married:Y, weight:43.5} |
    +----------+

    Sample statement:

    -- Expand the t_struct column. 
    select inline(array(t_struct)) from t_table;
    -- The following result is returned: 
    +------------+-----------+---------+------------+
    | user_id    | user_name | married | weight     |
    +------------+-----------+---------+------------+
    | 10001      | LiLei     | N       | 63.5       |
    | 10002      | HanMeiMei | Y       | 43.5       |
    +------------+-----------+---------+------------+

JSON_TUPLE

  • Syntax

    string json_tuple(string <json>, string <key1>, string <key2>,...)
  • Description

    Extracts strings from a standard JSON string based on a set of input keys, such as (key1,key2,...).

  • Parameters

    • json: required. A value of the STRING type. This parameter specifies a standard JSON string.

    • key: required. A value of the STRING type. This parameter is used to describe the path of a JSON object in the JSON string. The value cannot start with a dollar sign ($). You can enter multiple keys at a time. MaxCompute parses JSON objects by using . or ['']. If a key in a JSON object includes a period (.), [''] can be used.

  • Return value

    A value of the STRING type is returned.

    Note
    • If json is empty or invalid, null is returned.

    • If key is empty, invalid, or does not exist in the JSON string, null is returned.

    • If json is valid and key exists, the related string is returned.

    • This function can parse JSON data that contains Chinese characters.

    • This function can parse nested JSON data.

    • This function can parse JSON data that contains nested arrays.

    • The parsing action is equivalent to the execution of GET_JSON_OBJECT along with set odps.sql.udf.getjsonobj.new=true;. To obtain multiple objects from a JSON string, you must call the GET_JSON_OBJECT function multiple times. As a result, the JSON string is parsed multiple times. The JSON_TUPLE function allows you to enter multiple keys at a time and the JSON string is parsed only once. JSON_TUPLE is more efficient than GET_JSON_OBJECT.

    • JSON_TUPLE is a user-defined table-valued function (UDTF). If you want to select some columns from a table, use JSON_TUPLE together with the LATERAL VIEW clause.

JSON_OBJECT

  • Syntax

    json json_object(<key1>,<value1>[,<keyn>,<valuen>])
  • Description

    Returns a JSON object that contains key-value pairs.

  • Parameters

    • key: You must specify at least one key. The key must be of the STRING type.

    • value: You must specify at least one value. The value must be of the STRING, BIGINT, INT, or BOOLEAN type.

  • Return value

    The return value is of the JSON type.

  • Examples

    • Example 1: Generate a JSON object that contains only one key-value pair.

      -- Generate a JSON object.
      select json_object('a', 123);

      The following result is returned:

      +-----+
      | _c0 |
      +-----+
      | {"a":123} |
      +-----+
    • Example 2: Generate a JSON object that contains multiple key-value pairs.

      -- Generate a JSON object.
      select json_object('a', 123,'b','hello');

      The following result is returned:

      +-----+
      | _c0 |
      +-----+
      | {"a":123,"b":"hello"} |
      +-----+

JSON_ARRAY

  • Syntax

    json json_array(<element>)
  • Description

    Evaluates a possibly empty list of values and returns a JSON array that contains these values.

  • Parameters

    element: required. The values in the list specified by this parameter must be of the STRING, BIGINT, BOOLEAN, or JSON type.

  • Return value

    The return value is of the JSON type.

  • Examples

    -- Generate a JSON array.
    select json_array('a', 45, true, 13, json '{"a":456}');

    The following result is returned:

    +-----+
    | _c0 |
    +-----+
    | ["a",45,true,13,{"a":456}] |
    +-----+

JSON_EXPLODE

  • Syntax

    JSON_EXPLODE(JSON <var>)
  • Description

    Supports expanding each element in a JSON array or JSON object into multiple output rows.

  • Parameters

    var: Required. Supports JSON_ARRAY or JSON_OBJECT types. Does not support JSON constants, JSON STRING, JSON NUMBER, JSON BOOLEAN, and NULL types.

  • Return value

    The function returns transformed rows in the following format:

    +-------+-------+
    |  KEY  | VALUE |
    |-------+-------|
    • For a JSON ARRAY type var, the outermost of the JSON ARRAY is expanded into multiple rows of JSON data, with KEY as NULL and VALUE as the array element.

    • For a JSON OBJECT type var, each KEY-VALUE pair of the outermost of the JSON OBJECT is expanded into rows with two columns: A STRING type KEY column is the KEY of the JSON OBJECT, and a JSON type VALUE column is the VALUE of the JSON OBJECT.

  • Examples

    Create a table named table_json and insert data into it, where the first row is a JSON OBJECT type and the second row is a JSON ARRAY type.

    -- Create table
    CREATE TABLE table_json(c1 json);
    -- Insert data
    INSERT INTO table_json(c1) SELECT JSON_OBJECT('a', 123,'b','hello');
    INSERT INTO table_json(c1) SELECT JSON_ARRAY(1, true, 2, json'{"a":456}');
    -- View table data
    SELECT * FROM table_json;

    The query returns the following result:

    +-----------------------------+
    | c1                            |
    +-------------------------------+
    | {"a":123,"b":"hello"}         |
    | [1,true,2,{"a":456}]          |
    +-------------------------------+

    The JSON_EXPLODE function expands each element in a JSON array or JSON object into multiple output rows. Below is a sample code:

    SELECT JSON_EXPLODE(table_json.c1) FROM table_json;

    The query returns the following result:

    +-----+------------+
    | key | value      |
    +-----+------------+
    | \N  | 1          |
    | \N  | true       |
    | \N  | 2          |
    | \N  | {"a":456}  |
    | a   | 123        |
    | b   | hello      |
    +-----+------------+
    Note

    When transforming JSON data, elements within the same JSON data maintain their original sequence. However, the order of multiple JSON data entries may vary.

JSON_EXTRACT

  • Syntax

    json json_extract(<json>, <json_path>)
  • Description

    Parses the value of json_path in a JSON expression. Note that an error is returned if the value of json_path is invalid.

  • Parameters

    • json: required. This parameter specifies the JSON expression that you want to process.

    • json_path: required. This parameter specifies the JSON path of the value that you want to be returned.

  • Return value

    The return value is of the JSON type.

  • Examples

    • Example 1: Obtain the value of the key a from a JSON object.

      select json_extract(json '{"a":1, "b":2}', '$.a');

      The following result is returned:

      +-----+
      | _c0 |
      +-----+
      | 1   |
      +-----+
    • Example 2: NULL is returned if the specified key does not exist in the JSON object.

      select json_extract(json '{"a":1, "b":2}', 'strict $.c');

      The following result is returned:

      +-----+
      | _c0 |
      +-----+
      | NULL |
      +-----+
    • Example 3: The error message Invalid argument - Param json path $invalid_json_path is invalid is returned if the JSON path that you specified is invalid.

      select json_extract(json '{"a":1, "b":2}', '$a');

      The following result is returned:

      -- An error message is returned.
      Invalid argument - Param json path $invalid_json_path is invalid

JSON_EXISTS

  • Syntax

    boolean json_exists(<json>, <json_path>)
  • Description

    Determines whether the JSON value of json_path exists.

  • Parameters

    • json: required. This parameter specifies the JSON expression that you want to process.

    • json_path: required. This parameter specifies the JSON path in which a specific JSON value is checked.

  • Return value

    The return value is true or false, which is of the BOOLEAN type.

  • Examples

    • Example 1: Check whether the value of the key a exists in a JSON object.

      select json_exists(json '{"a":1, "b":2}', '$.a');

      The following result is returned:

      +------+
      | _c0  |
      +------+
      | true |
      +------+
    • Example 2: Check whether the value of the key c exists in a JSON object.

      select json_exists(json '[1,2, {"a":34}]', '$[2].a');

      The following result is returned:

      +------+
      | _c0  |
      +------+
      | true |
      +------+
    • Example 3: Check whether values that are obtained based on a specific subscript exist.

      select json_exists(json '{"a":1, "b":2}', 'strict $.c');

      The following result is returned:

      +------+
      | _c0  |
      +------+
      | false |
      +------+

JSON_PRETTY

  • Syntax

    string json_pretty(<json>)
  • Description

    Returns a JSON value in a format that is easy to read by adding line breaks and spaces.

  • Parameters

    json: required. This parameter specifies the JSON expression that you want to process.

  • Return value

    The return value is of the STRING type.

  • Examples

    -- Beautify a JSON object.
    select json_pretty(json '{"a":1, "b":2}');

    The following result is returned:

    +-----+
    | _c0 |
    +-----+
    | {
        "a":1,
        "b":2
    } |
    +-----+

JSON_TYPE

  • Syntax

    string|number|boolean|null|object|array json_type(<json>)
  • Description

    Returns the data type of a JSON value.

  • Parameters

    json: required. The JSON expression that you want to process.

  • Return value

    A value of the STRING type is returned.

  • Examples

    • Example 1: Return the ARRAY type for a JSON value.

      select json_type(json '[{"a":1}, 23]');

      The following result is returned:

      +-----+
      | _c0 |
      +-----+
      | array |
      +-----+
    • Example 2: Return a numeric data type for a JSON value.

      select json_type(json '123');

      The following result is returned:

      +-----+
      | _c0 |
      +-----+
      | number |
      +-----+
    • Example 3: Return the STRING type for a JSON value.

      select json_type(json '"123"');

      The following result is returned:

      +-----+
      | _c0 |
      +-----+
      | string |
      +-----+

JSON_FORMAT

  • Syntax

    string json_format(<json>)
  • Description

    Converts a value of the JSON data type into a value of the STRING data type. By default, JSON data is not automatically prettified.

  • Parameters

    json: required. This parameter specifies the JSON data that you want to convert.

  • Return value

    The return value is of the STRING type.

  • Examples

    • Example 1: Convert JSON NUMBER data into strings.

      select json_format(json '123');

      The following result is returned:

      +-----+
      | _c0 |
      +-----+
      | 123 |
      +-----+
    • Example 2: Convert JSON STRING data into strings.

      select json_format(json '"123"');

      The following result is returned:

      +-----+
      | _c0 |
      +-----+
      | "123" |
      +-----+

JSON_PARSE

  • Syntax

    json json_parse(<string>)
  • Description

    Converts a value of the STRING data type into a value of the JSON data type. If a non-JSON-formatted value is converted into a value of the STRING data type, an error is returned.

  • Parameters

    string: required. This parameter specifies the string that you want to process.

  • Return value

    The return value is of the JSON type.

  • Examples

    • Example 1: Convert a string into data of the JSON type.

      select json_parse('{"a":1, "b":2}');

      The following result is returned:

      +-----+
      | _c0 |
      +-----+
      | {"a":1,"b":2} |
      +-----+
    • Example 2: Convert a string into data of the JSON type.

      select json_parse('"abc"');

      The following result is returned:

      +-----+
      | _c0 |
      +-----+
      | "abc" |
      +-----+
    • Example 3: Convert an invalid string into data of the JSON type. An error is reported.

      select json_parse('abc');

      The following result is returned:

      Invalid input syntax for type json, detail:Token "abc" is invalid.

JSON_VALID

  • Syntax

    boolean json_valid(<string>)
  • Description

    Determines whether a string is in a valid JSON format.

  • Parameters

    string: required. This parameter specifies the JSON string that you want to process.

  • Return value

    The return value is true or false, which is of the BOOLEAN type.

  • Examples

    • Example 1: Check whether "abc" is a valid JSON string.

      select json_valid('"abc"');

      The following result is returned:

      +------+
      | _c0  |
      +------+
      | true |
      +------+
    • Example 2: Check whether abc is a valid JSON string.

      select json_valid('abc');

      The following result is returned:

      +------+
      | _c0  |
      +------+
      | false |
      +------+

CAST

  • Syntax

    json/string/bigint/int/tinyint/smallint/double/float/boolean/sql-type 
    cast(json as string/
         string as json/
         json as bigint/
         bigint as json/
         json as int/
         int as json/
         json as tinyint/
         tinyint as json/
         json as smallint/
         smallint as json/
         json as double/
         double as json/
         json as float/
         float as json/
         boolean as json/
         json as boolean/
         null as json/
         json 'null' as ...
         )
  • Description

    Supports conversion between basic data types and JSON data types.

  • Parameters

    • The following data types are supported: JSON, STRING, BIGINT, INT, TINYINT, SMALLINT, DOUBLE, FLOAT, BOOLEAN, and SQL-TYPE.

    • To convert JSON data into data of the STRING data type, you must make sure the JSON data does not contain values of the ARRAY or OBJECT data type.

    • After you convert data of the STRING data type into JSON data, the output JSON data contains only values of the STRING data type. Take note of the following differences between the CAST function and the JSON_PARSE and JSON_FORMAT functions:

      • You can use the JSON_PARSE function to convert only valid JSON strings into JSON data, which can be a JSON object.

      • You can use the CAST function to convert any string into a JSON string that contains values of the STRING data type.

    • JSON 'null' values and common null values are converted into SQL null values.

  • Return value

    JSON data or data of a specific basic data type is returned.

  • Examples

    • Example 1: Perform conversion between the STRING data type and the JSON data type.

      -- Convert a JSON number into a value of the STRING type.
      select cast(json '123' as string);
      -- The following result is returned:
      +-----+
      | _c0 |
      +-----+
      | 123 |
      +-----+
      
      -- Convert a JSON string into a value of the STRING type.
      select cast(json '"abc"' as string);
      -- The following result is returned:
      +-----+
      | _c0 |
      +-----+
      | abc |
      +-----+
      
      -- Convert a JSON boolean into a value of the STRING type.
      select cast(json 'true' as string); 
      -- The following result is returned:
      +-----+
      | _c0 |
      +-----+
      | TRUE |
      +-----+
      
      -- Convert JSON 'null' into a null value of the STRING type.
      select cast(json 'null' as string);
      -- The following result is returned:
      +-----+
      | _c0 |
      +-----+
      | NULL |
      +-----+
      
      -- Convert a string into a value of a JSON type.
      select cast('{"a":2}' as json); 
      -- The following result is returned:
      +-----+
      | _c0 |
      +-----+
      | "{\"a\":2}" |
      +-----+
      
      
      -- Incorrect sample statement. Convert JSON expressions of the ARRAY or OBJECT type into values of the STRING type. 
      select cast(json '{"a":2}' as string);
      -- An error is returned.
      FAILED: ODPS-0123091:Illegal type cast - Unsupported cast from json array/object to string
    • Example 2: Perform conversion between the NUMBER data type and the JSON data type.

      -- Convert a JSON number into a value of the BIGINT type.
      select cast(json '123' as bigint);
      -- The following result is returned:
      +------------+
      | _c0        |
      +------------+
      | 123        |
      +------------+
      
      -- Convert a JSON number into a value of the FLOAT type.
      select cast(json '"1.23"' as float);
      -- The following result is returned:
      +------+
      | _c0  |
      +------+
      | 1.23 |
      +------+
      
      -- Convert a JSON number into a value of the DOUBLE type.
      select cast(json '1.23' as double);
      -- The following result is returned:
      +------------+
      | _c0        |
      +------------+
      | 1.23       |
      +------------+
      
      -- Convert a value of the INT type into a JSON number.
      select cast(123 as json);
      -- The following result is returned:
      +-----+
      | _c0 |
      +-----+
      | 123 |
      +-----+
      
      -- Convert a value of the FLOAT type into a JSON number.
      select cast(1.23 as json);
      -- The following result is returned:
      +-----+
      | _c0 |
      +-----+
      | 1.23 |
      +-----+
    • Example 3: Perform conversion between the BOOLEAN data type and the JSON data type.

      -- Convert a value of the BOOLEAN type into the BIGINT type.
      select cast(true as json);
      -- The following result is returned:
      +-----+
      | _c0 |
      +-----+
      | true |
      +-----+
      
      -- Convert a JSON boolean into a value of the BOOLEAN type.
      select cast(json 'false' as boolean);
      -- The following result is returned:
      +------+
      | _c0  |
      +------+
      | false |
      +------+
      
      -- Convert a JSON string into a value of the BOOLEAN type.
      select cast(json '"abc"' as boolean);
      -- The following result is returned:
      +------+
      | _c0  |
      +------+
      | true |
      +------+
      
      -- Convert a JSON array or JSON object into the BOOLEAN type.
      select cast(json '[1,2]' as boolean);
      -- An error is returned.
      Unsupported cast from json array/object to boolean
    • Example 4: Perform conversion between a null value and a value of the JSON data type.

      -- Convert a null value into a value of the STRING type.
      select json_type(cast(null as json));
      -- The following result is returned:
      +-----+
      | _c0 |
      +-----+
      | NULL |
      +-----+

MAP

  • Syntax

    map(K, V) map(K <key1>, V <value1>, K <key2>, V <value2>[, ...])
  • Description

    Creates a map based on given key-value pairs.

  • Parameters

    • key: required. All keys must be of the same data type after implicit conversions. Only basic data types are supported.

    • value: required. All values must be of the same data type or must be of the same data type after implicit conversions. Data types other than DECIMAL are supported.

  • Return value

    A value of the MAP type is returned.

    Note

    You can configure the odps.sql.map.key.dedup.policy parameter at the session level to specify the method that is used to process duplicate keys. Valid values:

    • exception: An error is returned.

    • last_win: The latter key overwrites the former key.

    If you do not specify the parameter, the default value last_win is used.

  • Examples

    • Example 1: No duplicate keys exist. The t_table table contains the c1 (BIGINT), c2 (STRING), c3 (STRING), c4 (BIGINT), and c5 (BIGINT) columns. Data in the table:

      +------------+----+----+------------+------------+
      | c1         | c2 | c3 | c4         | c5         |
      +------------+----+----+------------+------------+
      | 1000       | k11 | k21 | 86         | 15         |
      | 1001       | k12 | k22 | 97         | 2          |
      | 1002       | k13 | k23 | 99         | 1          |
      +------------+----+----+------------+------------+

      Sample statement:

      -- Define a map based on the key-value pairs between the c2 and c4 columns, and between the c3 and c5 columns. 
      select map(c2,c4,c3,c5) from t_table;
      -- The following result is returned: 
      +------+
      | _c0  |
      +------+
      | {k11:86, k21:15} |
      | {k12:97, k22:2} |
      | {k13:99, k23:1} |
      +------+
    • Example 2: Duplicate keys exist. The t_table table contains the c1 (BIGINT), c2 (STRING), c3 (STRING), c4 (BIGINT), and c5 (BIGINT) columns. Data in the table:

      1000,'k11','k11',86,15
      1001,'k12','k22',97,2
      1002,'k13','k23',99,1
      1003,'k13','k24',100,1
      1004,'k12','k25',95,1

      Sample statement:

      -- Define a map based on the key-value pairs between the c2 and c4 columns, and between the c3 and c5 columns. 
      select map(c2,c4,c3,c5) from t_table;
      -- The following result is returned: 
      +------+
      | _c0  |
      +------+
      | {'k11':15} |
      | {'k12':97, 'k22':2} |
      | {'k13':99, 'k23':1} |
      | {'k13':100, 'k24':1} |
      | {'k12':95, 'k25':1} |
      +------+

MAP_CONCAT

  • Syntax

    map<K, V> map_concat([string <mapDupKeyPolicy>,] map<K, V> <a>, map<K, V> <b>[,...])
  • Description

    Calculates the union of multiple maps.

  • Parameters

    • mapDupKeyPolicy: optional. A value of the STRING type. This parameter specifies the method that is used to process duplicate keys. Valid values:

      • exception: An error is returned.

      • last_win: The latter key overwrites the former key.

      You can also specify the odps.sql.map.key.dedup.policy parameter at the session level to configure the method that is used to process duplicate keys. For example, you can set odps.sql.map.key.dedup.policy to exception. If you do not specify this parameter, the default value last_win is used.

      Note

      The behavior implementation of MaxCompute is determined based on mapDupKeyPolicy. If you do not specify mapDupKeyPolicy, the value of odps.sql.map.key.dedup.policy is used.

    • a and b: required. These parameters specify maps. The keys of the maps must be of the same data type, and the values of the maps must be of the same data type. K and V in map<K, V> specify the keys and values of a map.

  • Return value

    A value of the MAP type is returned. The return value varies based on the following rules:

    • A map is null or the key of a map is null.

    • The data types of multiple maps are different.

  • Examples

    -- The return value is {1:a, 2:b, 3:c}. 
    select map_concat(map(1, 'a', 2, 'b'), map(3, 'c'));
    -- The return value is {1:a, 2:d, 3:c}. 
    select map_concat('last_win', map(1, 'a', 2, 'b'), map(3, 'c'), map(2, 'd'));

MAP_ENTRIES

  • Syntax

    array<struct<K, V>> map_entries(map<K, V> <a>): 
  • Description

    Converts key-value pairs in Map a into a struct array.

  • Parameters

    a: required. This parameter specifies a map. K and V in map<K, V> specify the keys and values of a map.

  • Return value

    A struct array is returned. If the input is null, null is returned.

  • Examples

    -- The return value is [{key:1, value:a}, {key:2, value:b}]. 
    select map_entries(map(1,  'a',  2,  'b'));

MAP_FILTER

  • Syntax

    map<K, V> map_filter(map<K, V> <input>, function <K, V, boolean> <predicate>)
  • Description

    Filters the elements in Map input and retains only the elements that meet the predicate condition.

  • Parameters

    • input: required. A value of the MAP type. K and V in map<K, V> specify the keys and values of a map.

    • predicate: required. This parameter specifies the built-in function, user-defined function, or expression that is used to filter the elements in the map. The predicate condition consists of two input parameters that correspond to the keys and values in input. The output result is of the BOOLEAN type.

  • Return value

    A value of the MAP type is returned.

  • Examples

    -- The return value is {-30:100, 20:50}. 
    select map_filter(map(10, -20, 20, 50, -30, 100, 21, null), (k, v) -> (k+v) > 10);

MAP_FROM_ARRAYS

  • Syntax

    map<K, V> map_from_arrays([string <mapDupKeyPolicy>,] array<K> <a>, array<V> <b>))
  • Description

    Creates a map based on Array a and Array b.

  • Parameters

    • mapDupKeyPolicy: optional. A value of the STRING type. This parameter specifies the method that is used to process duplicate keys. Valid values:

      • exception: An error is returned.

      • last_win: The latter key overwrites the former key.

      You can also specify the odps.sql.map.key.dedup.policy parameter at the session level to configure the method that is used to process duplicate keys. For example, you can set odps.sql.map.key.dedup.policy to exception. If you do not specify this parameter, the default value last_win is used.

      Note

      The behavior implementation of MaxCompute is determined based on mapDupKeyPolicy. If you do not specify mapDupKeyPolicy, the value of odps.sql.map.key.dedup.policy is used.

    • a: required. This parameter specifies an array. This parameter corresponds to the key in the generated map. K in array<K> specifies the data type of the elements in the array. The elements can be of any data type.

    • b: required. This parameter specifies an array. This parameter corresponds to the value in the generated map. V in array<V> specifies the data type of the elements in the array. The elements can be of any data type.

  • Return value

    A value of the MAP type is returned. The return value varies based on the following rules:

    • If Array a or Array b is null, null is returned.

    • If Array a contains a null element or the two arrays are of different lengths, an error is returned.

  • Examples

    -- The return value is {1:2, 3:4}. 
    select map_from_arrays(array(1.0, 3.0), array('2', '4'));
    -- The return value is {1:2, 3:6}. 
    select map_from_arrays('last_win', array(1.0, 3.0, 3), array('2', '4', '6'));

MAP_FROM_ENTRIES

  • Syntax

    map<K, V> map_from_entries([string <mapDupKeyPolicy>,] array <struct<K, V> , struct<K, V>[,...]>)
  • Description

    Creates a map based on given struct arrays.

  • Parameters

    • mapDupKeyPolicy: optional. A value of the STRING type. This parameter specifies the method that is used to process duplicate keys. Valid values:

      • exception: An error is returned.

      • last_win: The latter key overwrites the former key.

      You can also specify the odps.sql.map.key.dedup.policy parameter at the session level to configure the method that is used to process duplicate keys. For example, you can set odps.sql.map.key.dedup.policy to exception. If you do not specify this parameter, the default value last_win is used.

      Note

      The behavior implementation of MaxCompute is determined based on mapDupKeyPolicy. If you do not specify mapDupKeyPolicy, the value of odps.sql.map.key.dedup.policy is used.

    • Input parameters of this function are of the STRUCT type. K corresponds to the keys in the generated map. V corresponds to the values in the generated map. K and V in struct<K, V> specify the keys and values of a struct array.

  • Return value

    A value of the MAP type is returned. The return value varies based on the following rules:

    • If a struct array is null, null is returned.

    • If the number of fields in a struct array is not 2 or the key of a struct array is null, an error is returned.

  • Examples

    -- The return value is {1:a, 2:b}. 
    select map_from_entries(array(struct(1, 'a'), struct(2, 'b')));
    -- The return value is {1:a, 2:c}. 
    select map_from_entries(array(struct(1, 'a'), struct(2, 'b'), struct(2, 'c')));

MAP_KEYS

  • Syntax

    array<K> map_keys(map<K, V> <a>)
  • Description

    Returns all keys in Map a as an array.

  • Parameters

    a: required. This parameter specifies a map. K and V in map<K, V> specify the keys and values of a map.

  • Return value

    A value of the ARRAY type is returned. If the input map is null, null is returned.

  • Examples

    The t_table_map table contains the c1 (BIGINT) and t_map (MAP<STRING,BIGINT>) columns. Data in the table:

    +------------+-------+
    | c1         | t_map |
    +------------+-------+
    | 1000       | {k11:86, k21:15} |
    | 1001       | {k12:97, k22:2} |
    | 1002       | {k13:99, k23:1} |
    +------------+-------+

    Sample statement:

    -- Return keys in the t_map column as an array. 
    select c1, map_keys(t_map) from t_table_map;
    -- The following result is returned: 
    +------------+------+
    | c1         | _c1  |
    +------------+------+
    | 1000       | [k11, k21] |
    | 1001       | [k12, k22] |
    | 1002       | [k13, k23] |
    +------------+------+

MAP_VALUES

  • Syntax

    array<V> map_values(map<K, V> <a>)
  • Description

    Returns all values in Map a as an array.

  • Parameters

    a: required. This parameter specifies a map. K and V in map<K, V> specify the keys and values of a map.

  • Return value

    A value of the ARRAY type is returned. If the input map is null, null is returned.

  • Examples

    The t_table_map table contains the c1 (BIGINT) and t_map (MAP<STRING,BIGINT>) columns. Data in the table:

    +------------+-------+
    | c1         | t_map |
    +------------+-------+
    | 1000       | {k11:86, k21:15} |
    | 1001       | {k12:97, k22:2} |
    | 1002       | {k13:99, k23:1} |
    +------------+-------+

    Sample statement:

    -- Return keys in the t_map column as an array. 
    select c1,map_values(t_map) from t_table_map;
    -- The following result is returned: 
    +------------+------+
    | c1         | _c1  |
    +------------+------+
    | 1000       | [86, 15] |
    | 1001       | [97, 2] |
    | 1002       | [99, 1] |
    +------------+------+

MAP_ZIP_WITH

  • Syntax

    <K, V1, V2, V3> map<K, V3> map_zip_with(map<K, V1> <input1>, map<K, V2> <input2>, function<K, V1, V2, V3> <func>)
  • Description

    Merges Map input1 and Map input2 into a new map. The keys of the new map are the union of the keys of the two input maps. The value of each key of the new map is calculated by using func.

  • Parameters

    • input1 and input2: required. This parameter specifies a map. K and V in map<K, V> specify the keys and values of a map.

    • func: required. func consists of three input parameters, which correspond to a key, the value that corresponds to the key in input1, and the value that corresponds to the key in input2. If the key does not exist in input1 or input2, null is used to replace the value that corresponds to the key in func.

  • Return value

    Data of the data type defined by func is returned.

  • Examples

    -- The return value is {1:[1, 1, 4], 2:[2, 2, 5], 3:[3, null, null], 4:[4, null, 7]}. 
    select map_zip_with(map(1, 1, 2, 2, 3, null), map(1, 4, 2, 5, 4, 7), (k, v1, v2) -> array(k, v1, v2));

MULTIMAP_FROM_ENTRIES

  • Syntax

    	multimap_from_entries(array<struct<K, V>>)
  • Description

    Returns a map that consists of the keys in a struct array and an array containing all values.

  • Parameters

    array<struct<K, V>>: a struct array that consists of key-value pairs.

  • Return value

    A map that consists of keys in a struct array and an array containing all values is returned. The map is in the map<K, array<V>> format.

    • Each key in the returned map can be associated with multiple values, and the associated values are in the same array.

    • If the array is null, null is returned.

    • If the number of fields in the struct array is not 2 or the keys contain null values, an error is returned.

  • Examples

    SELECT multimap_from_entries(array(struct(1, 'a'), 
                                         struct(2, 'b'), 
                                         struct(1, 'c')));

    The following result is returned:

    {1 : ['a', 'c'],  2: ['b']}

NAMED_STRUCT

  • Syntax

    struct named_struct(string <name1>, T1 <value1>, string <name2>, T2 <value2>[, ...])
  • Description

    Creates a struct based on given name-value pairs.

  • Parameters

    • value: required. A value of any data type.

    • name: required. The column name of the STRING type. This parameter is a constant.

  • Return value

    A value of the STRUCT type is returned. Columns are sequentially named as name1, name2, ....

  • Examples

    -- The return value is {user_id:10001, user_name:LiLei, married:F, weight:63.5}. 
    select named_struct('user_id',10001,'user_name','LiLei','married','F','weight',63.50);

NGRAMS

  • Syntax

    ngrams(array(T), n)
  • Description

    Returns an n-gram array of the elements of the specified array.

  • Parameters

    • array: an input array.

    • n: the number of elements in each subset.

  • Return value

    Returns an n-gram array of the elements of the specified array.

    If the n<=0 condition is met, an error is returned.

  • Examples

    • SELECT ngrams(array('foo', 'bar', 'baz', 'foo'), 2); 

      The following result is returned:

      [['foo', 'bar'], ['bar', 'baz'], ['baz', 'foo']]	
    • SELECT ngrams(array('foo', 'bar', 'baz', 'foo'), 3); 

      The following result is returned:

      [['foo', 'bar', 'baz'], ['bar', 'baz', 'foo']]
    • SELECT ngrams(array('foo', 'bar', 'baz', 'foo'), 4); 

      The following result is returned:

      [['foo', 'bar', 'baz', 'foo']] 
    • SELECT ngrams(array('foo', 'bar', 'baz', 'foo'), 5);

      The following result is returned:

      [['foo', 'bar', 'baz', 'foo']]
    • SELECT ngrams(array(1, 2, 3, 4), 2);

      The following result is returned:

      [[1, 2], [2, 3], [3, 4]]	

POSEXPLODE

  • Syntax

    posexplode(array<T> <a>)
  • Description

    Converts Array a into a table that has two columns. The first column lists the position of each element in the array, starting from 0. The second column lists the elements.

  • Parameters

    a: required. This parameter specifies an array. T in array<T> specifies the data type of the elements in the array. The elements can be of any data type.

  • Return value

    A table is returned.

  • Examples

    select posexplode(array('a','c','f','b'));
    -- The following result is returned: 
    +------------+------------+
    | pos        | val        |
    +------------+------------+
    | 0          | a          |
    | 1          | c          |
    | 2          | f          |
    | 3          | b          |
    +------------+------------+

REVERSE

  • Syntax

    array reverse(array <value>)
  • Description

    Returns an array in reverse order of the elements of a specific array.

  • Parameters

    value: an input array.

  • Return value

    An array in reverse order of the elements of an input array is returned. If the input value is null, null is returned.

  • Examples

    -- The return value is [3, 4, 1, 2].
    SELECT reverse(array(2, 1, 4, 3));

SEQUENCE

  • Syntax

    sequence(start, stop, [step]) -> array
  • Description

    Returns an array that contains the specified elements based on expressions.

  • Parameters

    • start: an expression that represents the beginning of a sequence of elements. The sequence contains start.

      • start and stop support the following integer types: TINYINT, SMALLINT, INT, and BIGINT. The preceding integer types correspond to the following types supported by step: TINYINT, SMALLINT, INT, and BIGINT.

      • start and stop support the following date and time types: DATE, DATETIME, and TIMESTAMP. The preceding date and time types correspond to the IntervalDayTime or IntervalYearMonth type supported by step.

    • stop: an expression that represents the end of a sequence of elements. The sequence contains stop.

    • step: optional. The step size of a sequence of elements.

      By default, if the value of start is less than or equal to the value of stop, the value of step is 1. If the value of start is greater than the value of stop, the value of step is -1.

      If the sequence of elements is of the time type, the default value of step is 1 or -1 in days. When you configure step, take note of the following points: If the value of start is greater than the value of stop, set step to a negative value. If the value of start is less than or equal to the value of stop, set step to a positive value. If the requirements are not met, an error is returned.

  • Return value

    An array that contains the specified elements based on expressions is returned.

    • If the value of start is greater than the value of stop and the value of step is positive, an error is returned. If the value of start is less than or equal to the value of stop and the value of step is negative, an error is returned.

    • By default, the SEQUENCE function can generate a maximum of 10,000 elements. You can configure the odps.sql.max.sequence.length parameter to change the maximum number of elements that can be generated.

  • Examples

    • SELECT sequence(1, 5);

      The following result is returned:

      [1, 2, 3, 4, 5]
    • SELECT sequence(5, 1);

      The following result is returned:

      [5, 4, 3, 2, 1] 
    • SELECT sequence(to_date('2018-01-01'), to_date('2018-03-01'), interval 1 month);

      The following result is returned:

      [2018-01-01,  2018-02-01,  2018-03-01]	

SHUFFLE

  • Syntax

    shuffle(array)
  • Description

  • Parameters

    array: an input array.

  • Return value

    Returns elements of an array in random order.

    • If the input value is null, null is returned.

    • The returned result of this function is undefined.

  • Examples

    • SELECT shuffle(array(1, 20, 3, 5));

      The following result is returned:

      [3,1,5,20]
    • SELECT shuffle(array(1, 20, null, 3));

      The following result is returned:

      [20,null,3,1]

SIZE

  • Syntax

    int size(array<T> <a>)
    int size(map<K, V> <b> )
  • Description

    • An array as the input: Calculates the number of elements in Array a.

    • A map as the input: Calculates the number of key-value pairs in Map b.

  • Parameters

    • a: required. This parameter specifies an array. T in array<T> specifies the data type of the elements in the array. The elements can be of any data type.

    • b: required. This parameter specifies a map. K and V in map<K, V> specify the keys and values of a map.

  • Return value

    A value of the INT type is returned.

  • Examples

    • Example 1: Calculate the number of elements in array('a','b'). Sample statement:

      -- The return value is 2. 
      select size(array('a','b'));
    • Example 2: Calculate the number of key-value pairs in map('a',123,'b',456).

      -- The return value is 2. 
      select size(map('a',123,'b',456)); 

SLICE

  • Syntax

    array<T> slice(array<T> <a>, <start>, <length>)
  • Description

    Copies the elements in an array from the start position based on a specified length and returns the elements as a new array.

  • Parameters

    • a: required. This parameter specifies an array. T in array<T> specifies the data type of the elements in the array. The elements can be of any data type.

    • start: required. This parameter specifies the position at which the function starts to copy elements from left to right. The minimum positive value of this parameter is 1. You can also set the start parameter to a negative value. In this case, the start position is counted from the end of the array, but the elements are still copied from left to right.

    • length: required. The number of elements in the returned array. The value must be greater than or equal to 0. If the value is greater than the length of the input array, this function returns a new array that consists of the elements from the start position to the end of the input array.

  • Return value

    A value of the ARRAY type is returned.

  • Examples

    • Example 1: Copy the elements in array(10, 20, 20, null, null, 30) from position 1 based on the length of 3. Sample statement:

      -- The return value is [10, 20, 20]. 
      select slice(array(10, 20, 20, null, null, 30), 1, 3);
    • Example 2: Copy the elements in array(10, 20, 20, null, null, 30) from position -2 based on the length of 2. Sample statement:

      -- The return value is [null, 30]. 
      select slice(array(10, 20, 20, null, null, 30), -2, 2);  
    • Example 3: Copy the elements in array(10, 20, 20, null, null, 30) from position 3 based on the length of 10. Sample statement:

      -- The return value is [20, null, null, 30]. 
      select slice(array(10, 20, 20, null, null, 30), 3, 10); 
    • Example 4: Copy the elements in array(10, 20, 20, null, null, 30) from position 3 based on the length of 0. Sample statement:

      -- The return value is []. 
      select slice(array(10, 20, 20, null, null, 30), 3, 0);

SORT_ARRAY

  • Syntax

    array<T> sort_array(array<T> <a>[, <isasc>])
  • Description

    Sorts the elements in an array.

  • Parameters

    • a: required. This parameter specifies an array. T in array<T> specifies the data type of the elements in the array. The elements can be of any data type.

    • isasc: optional. This parameter specifies the order in which elements in the array are sorted. Valid values: true and false. The value true indicates the ascending order and false indicates the descending order. If you do not specify this parameter, the elements are sorted in ascending order.

  • Return value

    A value of the ARRAY type is returned. The value null is interpreted as the minimum value.

  • Examples

    • Example 1: Sort data in the t_array table that contains the c1 ARRAY<STRING>, c2 ARRAY<INT>, and c3 ARRAY<STRING> columns. Data in the table:

      +------------+---------+--------------+
      | c1         | c2      | c3           |
      +------------+---------+--------------+
      | [a, c, f, b]  | [4, 5, 7, 2, 5, 8]  |  [You, Me, Him] |
      +------------+---------+--------------+

      Sort data in each column of the table. Sample statement:

      -- The return value is [a, b, c, f] [2, 4, 5, 5, 7, 8] [Him, Me, You]. 
      select sort_array(c1),sort_array(c2),sort_array(c3) from t_array;
    • Example 2: Sort the elements in array(10, 20, 40, 30, 30, null, 50) in descending order. Sample statement:

      -- The return value is [50, 40, 30, 30, 20, 10, null]. 
      select sort_array(array(10, 20, 40, 30, 30, null, 50), false);

SPLIT

  • Syntax

    split(<str>, <pat>)
  • Description

    Returns an array after str is split with pat.

  • Parameters

    • str: required. A value of the STRING type. This parameter specifies the string that you want to split.

    • pat: required. A delimiter of the STRING type. Regular expressions are supported. For more information about regular expressions, see Regular expressions.

  • Return value

    An array is returned. The elements in the array are of the STRING type.

  • Examples

    -- The return value is [a, b, c]. 
    select split("a, b, c", ",");

STRUCT

  • Syntax

    struct struct(<value1>,<value2>[, ...])
  • Description

    Creates a struct based on a given value list.

  • Parameters

    value: required. A value of any data type.

  • Return value

    A value of the STRUCT type is returned. Columns are sequentially named as col1, col2, ....

  • Examples

    -- The return value is {col1:a, col2:123, col3:true, col4:56.9}. 
    select struct('a',123,'true',56.90);

TO_JSON

  • Syntax

    string to_json(<expr>)
  • Description

    Converts an expression specified by expr of a given complex data type into a JSON string.

  • Parameters

    expr: required. An expression of the ARRAY, MAP, or STRUCT type.

    Note

    If the input expression is of the STRUCT type (struct<key1:value1, key2:value2>), take note of the following points:

    • All keys are converted into lowercase letters when you convert the expression into a JSON string.

    • If a value is null, the key-value pair to which the value belongs is not included in the JSON string that is returned. For example, if value2 is null, key2:value2 is not included in the JSON string that is returned.

  • Return value

    A JSON string is returned.

  • Examples

    • Example 1: Convert an expression of a given complex data type into a JSON string. Sample statement:

      -- The return value is {"a":1,"b":2}. 
      select to_json(named_struct('a', 1, 'b', 2));
      -- The return value is {"time":"26/08/2015"}. 
      select to_json(named_struct('time', "26/08/2015"));
      -- The return value is [{"a":1,"b":2}]. 
      select to_json(array(named_struct('a', 1, 'b', 2)));
      -- The return value is {"a":{"b":1}}. 
      select to_json(map('a', named_struct('b', 1)));
      -- The return value is {"a":1}. 
      select to_json(map('a', 1));
      -- The return value is [{"a":1}]. 
      select to_json(array((map('a', 1))));
    • Example 2: The input expression is of the STRUCT type. Sample statement:

      -- The return value is {"a":"B"}. If the expression of the STRUCT type is converted into a JSON string, all keys are converted into lowercase letters. 
      select to_json(named_struct("A", "B"));
      -- The return value is {"k2":"v2"}. The key-value pair to which null belongs is not included in the JSON string that is returned. 
      select to_json(named_struct("k1", cast(null as string), "k2", "v2"));

TRANSFORM

  • Syntax

    array<R> transform(array<T> <a>, function<T, R> <func>)
  • Description

    Transforms the elements in Array a by using func and returns a new array.

  • Parameters

    • a: required. This parameter specifies an array. T in array<T> specifies the data type of the elements in the array. The elements can be of any data type.

    • func: required. This parameter specifies the built-in function, user-defined function, or expression that is used to transform the elements in Array a. The value must be of the same data type as the elements in Array a. R specifies the data type of the output results.

  • Return value

    A value of the ARRAY type is returned.

  • Examples

    -- The return value is [2, 3, 4]. 
    select transform(array(1, 2, 3), x -> x + 1);

TRANSFORM_KEYS

  • Syntax

    map<K2, V> transform_keys([string <mapDupKeyPolicy>,] map<K1, V> <input>, function<K1, V, K2> <func>)
  • Description

    Transforms the keys in Map input by using func. The values in the map are not changed.

  • Parameters

    • mapDupKeyPolicy: optional. A value of the STRING type. This parameter specifies the method that is used to process duplicate keys. Valid values:

      • exception: An error is returned.

      • last_win: The latter key overwrites the former key.

      You can also specify the odps.sql.map.key.dedup.policy parameter at the session level to configure the method that is used to process duplicate keys. For example, you can set odps.sql.map.key.dedup.policy to exception. If you do not specify this parameter, the default value last_win is used.

      Note

      The behavior implementation of MaxCompute is determined based on mapDupKeyPolicy. If you do not specify mapDupKeyPolicy, the value of odps.sql.map.key.dedup.policy is used.

    • input: required. This parameter specifies a map. K1 and V in map<K1, V> specify the keys and values of a map.

    • func: required. This parameter specifies the built-in function, user-defined function, or expression that is used to transform the keys. The function or expression consists of two input parameters that correspond to the keys and values in input. K2 specifies the data type of keys in the returned map.

  • Return value

    A value of the MAP type is returned. If one of the new keys is null, an error is returned.

  • Examples

    -- The return value is {-10:-20, 70:50, 71:101}. 
    select transform_keys(map(10, -20, 20, 50, -30, 101), (k, v) -> k + v);
    -- No error is returned. The returned result depends on the order of the elements in the input map. 
    select transform_keys("last_win", map(10, -20, 20, 50, -30, 100), (k, v) -> k + v);
    -- An error is returned because duplicate keys exist. 
    select transform_keys("exception", map(10, -20, 20, 50, -30, 100), (k, v) -> k + v);

TRANSFORM_VALUES

  • Syntax

    map<K, V2> transform_values(map<K, V1> <input>, function<K, V1, V2> <func>)
  • Description

    Transforms the values in Map input by using func. The keys in the map are not changed.

  • Parameters

    • input: required. This parameter specifies a map. K and V1 in map<K, V1> specify the keys and values of the map.

    • func: required. This parameter specifies the built-in function, user-defined function, or expression that is used to transform the keys. The function or expression consists of two input parameters that correspond to the keys and values in input. V2 specifies the data type of values in the returned map.

  • Return value

    A value of the MAP type is returned.

  • Examples

    -- The return value is {-30:71, 10:-10, 20:NULL}. 
    select transform_values(map(10, -20, 20, null, -30, 101), (k, v) -> k + v);

ZIP_WITH

  • Syntax

    array<R> zip_with(array<T> <a>, array<S> <b>, function<T, S, R> <combiner>)
  • Description

    Merges Array a and Array b at the element level based on element positions and a combiner and returns a new array.

  • Parameters

    • a and b: required. This parameter specifies an array. T in array<T> and S in array<S> specify the data types of the elements in the arrays. The elements can be of any data type.

    • combiner: required. This parameter specifies the built-in function, user-defined function, or expression that is used to merge Array a and Array b at the element level. The combiner consists of two input parameters. One input parameter must be of the same data type as the elements in Array a. The other input parameter must be of the same data type as the elements in Array b.

  • Return value

    A value of the ARRAY type is returned. The return value varies based on the following rules:

    • The elements in the returned array are at the same positions as the elements in Array a and Array b.

    • If Array a and Array b have different lengths, the null elements are added to the shorter array before the arrays are merged.

  • Examples

    -- The return value is [2, 4, 6, null]. 
    select zip_with(array(1,2,3), array(1,2,3,4), (x,y) -> x + y);