Aggregates the values in a column specified by colname into an array. This function is an additional function of MaxCompute V2.0.
Usage notes
MaxCompute V2.0 provides additional functions. If the functions that you use involve new data types, you must enable the MaxCompute V2.0 data type edition. The new data types include TINYINT, SMALLINT, INT, FLOAT, VARCHAR, TIMESTAMP, and BINARY.
Session level: To use the MaxCompute V2.0 data type edition, you must add
set odps.sql.type.system.odps2=true;
before the SQL statement that you want to execute, and commit and execute them together.Project level: The project owner can enable the MaxCompute V2.0 data type edition for the project based on the project requirements. The configuration takes effect after 10 to 15 minutes. Sample command:
setproject odps.sql.type.system.odps2=true;
For more information about
setproject
, see Project operations. For more information about the precautions that you must take when you enable the MaxCompute V2.0 data type edition at the project level, see Data type editions.
If you use an SQL statement that includes multiple aggregate functions and the project resources are insufficient, memory overflow may occur. We recommend that you optimize the SQL statement or purchase computing resources based on your business requirements.
Syntax
array collect_list(<colname>)
Parameters
colname: required. The name of a column, which can be of any data type.
Return value
A value of the ARRAY type is returned. If a value of the column specified by colname is null, the row that contains this value is not used for calculation.
Sample data
This section provides sample source data and examples for you to understand how to use the functions. Create a table named emp and insert the sample data into the table. Sample statement:
create table if not exists emp
(empno bigint,
ename string,
job string,
mgr bigint,
hiredate datetime,
sal bigint,
comm bigint,
deptno bigint);
tunnel upload emp.txt emp; -- Replace emp.txt with the actual path (path and name) to which you upload the data file.
The emp.txt file contains the following sample data:
7369,SMITH,CLERK,7902,1980-12-17 00:00:00,800,,20
7499,ALLEN,SALESMAN,7698,1981-02-20 00:00:00,1600,300,30
7521,WARD,SALESMAN,7698,1981-02-22 00:00:00,1250,500,30
7566,JONES,MANAGER,7839,1981-04-02 00:00:00,2975,,20
7654,MARTIN,SALESMAN,7698,1981-09-28 00:00:00,1250,1400,30
7698,BLAKE,MANAGER,7839,1981-05-01 00:00:00,2850,,30
7782,CLARK,MANAGER,7839,1981-06-09 00:00:00,2450,,10
7788,SCOTT,ANALYST,7566,1987-04-19 00:00:00,3000,,20
7839,KING,PRESIDENT,,1981-11-17 00:00:00,5000,,10
7844,TURNER,SALESMAN,7698,1981-09-08 00:00:00,1500,0,30
7876,ADAMS,CLERK,7788,1987-05-23 00:00:00,1100,,20
7900,JAMES,CLERK,7698,1981-12-03 00:00:00,950,,30
7902,FORD,ANALYST,7566,1981-12-03 00:00:00,3000,,20
7934,MILLER,CLERK,7782,1982-01-23 00:00:00,1300,,10
7948,JACCKA,CLERK,7782,1981-04-12 00:00:00,5000,,10
7956,WELAN,CLERK,7649,1982-07-20 00:00:00,2450,,10
7956,TEBAGE,CLERK,7748,1982-12-30 00:00:00,1300,,10
Examples
Example 1: Aggregate the values in the sal column into an array to obtain the salaries of all employees. Sample statement:
select collect_list(sal) from emp;
The following result is returned:
+------+ | _c0 | +------+ | [800, 1600, 1250, 2975, 1250, 2850, 2450, 3000, 5000, 1500, 1100, 950, 3000, 1300, 5000, 2450, 1300] | +------+
Example 2: Use this function with
GROUP BY
to group all employees based on the deptno column and aggregate the values in the sal column of each group into an array. Sample statement:select deptno, collect_list(sal) from emp group by deptno;
The following result is returned:
+------------+------+ | deptno | _c1 | +------------+------+ | 10 | [2450, 5000, 1300, 5000, 2450, 1300] | | 20 | [800, 2975, 3000, 1100, 3000] | | 30 | [1600, 1250, 1250, 2850, 1500, 950] | +------------+------+
Example 3: Use this function with
GROUP BY
to group all employees based on the deptno column and aggregate the values in the sal column of each group after duplicates are removed. Sample statement:select deptno, collect_list(distinct sal) from emp group by deptno;
The following result is returned:
+------------+------+ | deptno | _c1 | +------------+------+ | 10 | [1300, 2450, 5000] | | 20 | [800, 1100, 2975, 3000] | | 30 | [950, 1250, 1500, 1600, 2850] | +------------+------+
Example 4: Aggregate the values in the sal column into an array and concatenate the values in the returned array.
-- Use the COLLECT_LIST function. select array_join(collect_list(sal), ',') from emp; -- Use the WM_CONCAT function, which provides better performance. select wm_concat(',', sal) from emp;
The following result is returned:
+-----+ | _c0 | +-----+ | 800,1600,1250,2975,1250,2850,2450,3000,5000,1500,1100,950,3000,1300,5000,2450,1300 | +-----+
Example 5: Randomly aggregate values in the sal column into an array and select a value from the array.
-- Use the COLLECT_LIST function. select collect_list(sal)[0] from emp; -- Use the ANY_VALUE function, which provides better performance. select any_value(sal) from emp;
The following result is returned:
+------------+ | _c0 | +------------+ | 800 | +------------+
Example 6: Sort and concatenate values in the ename column.
-- Use the COLLECT_LIST function. select concat_ws(',', sort_array(collect_list(ename))) from emp; -- Use the WM_CONCAT function together with the WITHIN GROUP clause. select wm_concat(',', ename) WITHIN group (order by ename) from emp;
The following result is returned:
+-----+ | _c0 | +-----+ | ADAMS,ALLEN,BLAKE,CLARK,FORD,JACCKA,JAMES,JONES,KING,MARTIN,MILLER,SCOTT,SMITH,TEBAGE,TURNER,WARD,WELAN | +-----+
The COLLECT_LIST function can be used to aggregate all values in a group into an array. In specific scenarios, the COLLECT_LIST function can be replaced by other functions to reduce the overhead of the processing of complex data types. In Example 4, Example 5, and Example 6, the COLLECT_LIST function can be replaced by other functions for better computing performance.
Related functions
COLLECT_LIST is an aggregate function. For more information about the functions that are used to calculate the average value of multiple input records and to aggregate parameters, see Aggregate functions.