This topic describes how to perform the following operations on datasets of query results in MaxCompute: INTERSECT
, INTERSECT ALL
, INTERSECT DISTINCT
, UNION
, UNION ALL
, UNION DISTINCT
, EXCEPT
, EXCEPT ALL
, EXCEPT DISTINCT
, MINUS
, MINUS ALL
, and MINUS DISTINCT
.
Description
MaxCompute supports the following operations on datasets:
- INTERSECT: returns the intersection of two datasets. The intersection includes the values that are contained in both datasets.
- UNION: returns the union of two datasets. The union is a dataset that is obtained by combining the two datasets.
- EXCEPT and MINUS: returns distinct values from one of two datasets. These values are not contained in the other dataset.
Limits
The following limits are imposed on INTERSECT, UNION, EXCEPT, and MINUS:
- MaxCompute allows you to perform INTERSECT, UNION, EXCEPT, or MINUS operations on a maximum of 256 datasets at the same time. If the number of datasets exceeds 256, an error is returned.
- The number of columns in the left and right tables must be the same.
Usage notes
When you perform INTERSECT, UNION, EXCEPT, or MINUS operations on two datasets, take note of the following items:
- The results of INTERSECT, UNION, EXCEPT, or MINUS operations may not be sorted in a specific order.
- If data types of the datasets are not consistent, MaxCompute implicitly converts the data types before you perform INTERSECT, UNION, EXCEPT, or MINUS operations. For more information about implicit conversions, see Data types. To prevent compatibility issues, MaxCompute disables implicit conversions between data of the STRING type and data of other types for INTERSECT, UNION, EXCEPT, or MINUS operations.
INTERSECT
- Syntax
-- Obtain an intersection that contains duplicate values. <select_statement1> intersect all <select_statement2>; -- Obtain an intersection that does not contain duplicate values. The usage of
INTERSECT
is equivalent to that ofINTERSECT DISTINCT
. <select_statement1> intersect [distinct] <select_statement2>; - Parameters
- select_statement1 and select_statement2: required. These parameters specify the
SELECT
clauses. For more information about the syntax of the clauses, see SELECT syntax. - distinct: optional. This parameter is used to remove duplicate values from the intersection of two datasets.
- select_statement1 and select_statement2: required. These parameters specify the
- Examples
- Example 1: Obtain the intersection of two datasets. The intersection contains duplicate values. Sample statement:
select * from values (1, 2), (1, 2), (3, 4), (5, 6) t(a, b) intersect all select * from values (1, 2), (1, 2), (3, 4), (5, 7) t(a, b);
The following result is returned:+------------+------------+ | a | b | +------------+------------+ | 1 | 2 | | 1 | 2 | | 3 | 4 | +------------+------------+
- Example 2: Obtain the intersection of two datasets. The intersection does not contain duplicate values. Sample statements:
select * from values (1, 2), (1, 2), (3, 4), (5, 6) t(a, b) intersect distinct select * from values (1, 2), (1, 2), (3, 4), (5, 7) t(a, b); -- The preceding statement is equivalent to the following statement: select distinct * from (select * from values (1, 2), (1, 2), (3, 4), (5, 6) t(a, b) intersect all select * from values (1, 2), (1, 2), (3, 4), (5, 7) t(a, b)) t;
The following result is returned:+------------+------------+ | a | b | +------------+------------+ | 1 | 2 | | 3 | 4 | +------------+------------+
- Example 1: Obtain the intersection of two datasets. The intersection contains duplicate values. Sample statement:
UNION
- Syntax
-- Obtain a union that contains duplicate values. <select_statement1> union all <select_statement2>; -- Obtain a union that does not contain duplicate values. <select_statement1> union [distinct] <select_statement2>;
- Usage notes
- If multiple
UNION ALL
operations exist, use parentheses () to specify the priorities of theUNION ALL
operations. - If
UNION
is followed by aCLUSTER BY
,DISTRIBUTE BY
,SORT BY
,ORDER BY
, orLIMIT
clause andodps.sql.type.system.odps2
is set to false, the clause works only on the lastselect_statement
ofUNION
. Ifodps.sql.type.system.odps2
is set to true, the clause works on the results of allUNION
operations.
- If multiple
- Parameters
- select_statement1 and select_statement2: required. These parameters specify the
SELECT
clauses. For more information about the syntax of the clauses, see SELECT syntax. - distinct: optional. This parameter is used to remove duplicate values from the union of two datasets.
- select_statement1 and select_statement2: required. These parameters specify the
- Examples
- Example 1: Obtain the union of two datasets. The union contains duplicate values. Sample statement:
The following result is returned:select * from values (1, 2), (1, 2), (3, 4) t(a, b) union all select * from values (1, 2), (1, 4) t(a, b);
+------------+------------+ | a | b | +------------+------------+ | 1 | 2 | | 1 | 2 | | 3 | 4 | | 1 | 2 | | 1 | 4 | +------------+------------+
- Example 2: Obtain the union of two datasets. The union does not contain duplicate values. Sample statements:
The following result is returned:select * from values (1, 2), (1, 2), (3, 4) t(a, b) union distinct select * from values (1, 2), (1, 4) t(a, b); -- The preceding statement is equivalent to the following statement: select distinct * from ( select * from values (1, 2), (1, 2), (3, 4) t(a, b) union all select * from values (1, 2), (1, 4) t(a, b));
+------------+------------+ | a | b | +------------+------------+ | 1 | 2 | | 1 | 4 | | 3 | 4 | +------------+------------+
- Example 3: Use parentheses () to specify the priorities of
UNION ALL
operations. Sample statement:
The following result is returned:select * from values (1, 2), (1, 2), (5, 6) t(a, b) union all (select * from values (1, 2), (1, 2), (3, 4) t(a, b) union all select * from values (1, 2), (1, 4) t(a, b));
+------------+------------+ | a | b | +------------+------------+ | 1 | 2 | | 1 | 2 | | 5 | 6 | | 1 | 2 | | 1 | 2 | | 3 | 4 | | 1 | 2 | | 1 | 4 | +------------+------------+
- Example 4: Use
UNION
that is followed by aCLUSTER BY
,DISTRIBUTE BY
,SORT BY
,ORDER BY
, orLIMIT
clause in a SELECT statement and setodps.sql.type.system.odps2
to true. Sample statements:
The following result is returned:set odps.sql.type.system.odps2=true; select explode(array(3, 1)) as (a) union all select explode(array(0, 4, 2)) as (a) order by a limit 3;
+------------+ | a | +------------+ | 0 | | 1 | | 2 | +------------+
- Example 5: Use
UNION
that is followed by aCLUSTER BY
,DISTRIBUTE BY
,SORT BY
,ORDER BY
, orLIMIT
clause in a SELECT statement and setodps.sql.type.system.odps2
to false. Sample statements:
The following result is returned:set odps.sql.type.system.odps2=false; select explode(array(3, 1)) as (a) union all select explode(array(0, 4, 2)) as (a) order by a limit 3;
+------------+ | a | +------------+ | 3 | | 1 | | 0 | | 2 | | 4 | +------------+
- Example 1: Obtain the union of two datasets. The union contains duplicate values. Sample statement:
EXCEPT and MINUS
- Syntax
-- Obtain the supplementary set of two datasets. The supplementary set contains duplicate values. <select_statement1> except all <select_statement2>; <select_statement1> minus all <select_statement2>; -- Obtain the supplementary set of two datasets. The supplementary set does not contain duplicate values. <select_statement1> except [distinct] <select_statement2>; <select_statement1> minus [distinct] <select_statement2>;
Note The usage ofEXCEPT
is equivalent to that ofMINUS
. - Parameters
- select_statement1 and select_statement2: required. These parameters specify the
SELECT
clauses. For more information about the syntax of the clauses, see SELECT syntax. - distinct: optional. This parameter is used to remove duplicate values from the supplementary set of two datasets.
- select_statement1 and select_statement2: required. These parameters specify the
- Examples
- Example 1: Obtain the supplementary set of two datasets. The supplementary set contains duplicate values. Sample statements:
select * from values (1, 2), (1, 2), (3, 4), (3, 4), (5, 6), (7, 8) t(a, b) except all select * from values (3, 4), (5, 6), (5, 6), (9, 10) t(a, b); -- The preceding statement is equivalent to the following statement: select * from values (1, 2), (1, 2), (3, 4), (3, 4), (5, 6), (7, 8) t(a, b) minus all select * from values (3, 4), (5, 6), (5, 6), (9, 10) t(a, b);
The following result is returned:+------------+------------+ | a | b | +------------+------------+ | 1 | 2 | | 1 | 2 | | 3 | 4 | | 7 | 8 | +------------+------------+
- Example 2: Obtain the supplementary set of two datasets. The supplementary set does not contain duplicate values. Sample statements:
The following result is returned:select * from values (1, 2), (1, 2), (3, 4), (3, 4), (5, 6), (7, 8) t(a, b) except distinct select * from values (3, 4), (5, 6), (5, 6), (9, 10) t(a, b); -- The preceding statement is equivalent to the following statements: select * from values (1, 2), (1, 2), (3, 4), (3, 4), (5, 6), (7, 8) t(a, b) minus distinct select * from values (3, 4), (5, 6), (5, 6), (9, 10) t(a, b); -- The preceding statements are equivalent to the following statement: select distinct * from values (1, 2), (1, 2), (3, 4), (3, 4), (5, 6), (7, 8) t(a, b) except all select * from values (3, 4), (5, 6), (5, 6), (9, 10) t(a, b);
+------------+------------+ | a | b | +------------+------------+ | 1 | 2 | | 7 | 8 | +------------+------------+
- Example 1: Obtain the supplementary set of two datasets. The supplementary set contains duplicate values. Sample statements: