You can use a subquery if you want to perform a further query based on the result of a query. For example, you can use subqueries if you want to separately calculate aggregate values, check whether a record exists, filter data returned by a query, associate with an update or delete operation, simplify a JOIN operation to obtain a value, use a query result as a derived table for a main query, or use a query result as the basis for sorting or grouping and row-by-row comparison. This topic describes the definition and usage of subqueries that are supported in MaxCompute.
Description
Subqueries are nested inside a statement to perform complex data queries. MaxCompute supports the following types of subqueries:
A subquery in a
FROM
clause is used as a temporary table for complex computing or data conversions in queries.You can use IN subqueries in the
WHERE
clause if you want to match a group of values that are returned by a subquery. IN subqueries are applicable if you want to query data from rows that match specified conditions.NOT IN subqueries are used to exclude one collection from another. If you use a NOT IN subquery in the
WHERE
clause, rows that match the conditions of the subquery are removed from the result of the main query.EXISTS subqueries are used in main queries to check whether subqueries return any rows. EXISTS subqueries are applicable if you want to check whether a record exists in a subquery regardless of the returned content.
NOT EXISTS subqueries work opposite to EXISTS subqueries. Records in main queries are selected only if subqueries return no rows. NOT EXISTS subqueries are applicable if you want to query rows that are not matched in subqueries.
Scalar subqueries are subqueries that return exactly one column value from one row. In most cases, you can use scalar subqueries in SELECT statements or
WHERE
orHAVING
clauses. Scalar subqueries are applicable if you want to calculate a specific aggregate value or obtain a column value from a row.
Some subqueries such as SCALAR, IN, NOT IN, EXISTS, and NOT EXISTS can be converted into JOIN operations during execution. MAPJOIN is an efficient JOIN algorithm. If the result of the SUBQUERY is a small table, you can use a HINT in the subquery statement to explicitly specify the MAPJOIN algorithm.
Sample data
Sample source data is provided for you to better understand the examples in this topic. Sample statements:
-- Create a partitioned table named sale_detail.
create table if not exists sale_detail
(
shop_name string,
customer_id string,
total_price double
)
partitioned by (sale_date string, region string);
-- Add partitions to the sale_detail table.
alter table sale_detail add partition (sale_date='2013', region='china') partition (sale_date='2014', region='shanghai');
-- Insert data into the sale_detail table.
insert into sale_detail partition (sale_date='2013', region='china') values ('s1','c1',100.1),('s2','c2',100.2),('s3','c3',100.3);
insert into sale_detail partition (sale_date='2014', region='shanghai') values ('null','c5',null),('s6','c6',100.4),('s7','c7',100.5);
Query data in the sale_detail table. Sample statement:
set odps.sql.allow.fullscan=true;
select * from sale_detail;
-- The following result is returned:
+------------+-------------+-------------+------------+------------+
| shop_name | customer_id | total_price | sale_date | region |
+------------+-------------+-------------+------------+------------+
| s1 | c1 | 100.1 | 2013 | china |
| s2 | c2 | 100.2 | 2013 | china |
| s3 | c3 | 100.3 | 2013 | china |
| null | c5 | NULL | 2014 | shanghai |
| s6 | c6 | 100.4 | 2014 | shanghai |
| s7 | c7 | 100.5 | 2014 | shanghai |
+------------+-------------+-------------+------------+------------+
Basic subquery
The object of a common query is a destination table. You can also use a SELECT
statement as a query object. In this case, the SELECT statement is considered a subquery. A subquery in a FROM
clause can be used as a table. You can join
the subquery with other tables or subqueries. For more information about JOIN
operations, see JOIN.
Syntax
Syntax 1
select <select_expr> from (<select_statement>) [<sq_alias_name>];
Syntax 2
select (<select_statement>) from <table_name>;
Parameters
select_expr: required. The value of this parameter is in the format of
col1_name, col2_name, Regular expression,...
. This format indicates common columns or partition key columns that you want to query or regular expressions that are used for a query.select_statement: required. This parameter specifies a subquery clause. If you use Syntax 2, the subquery result can have only one row. For more information about the syntax, see SELECT syntax.
sq_alias_name: optional. This parameter specifies the alias of a subquery.
table_name: required. This parameter specifies the name of the table that you want to query.
Examples
Example 1: Use Syntax 1. Sample statements:
set odps.sql.allow.fullscan=true; select * from (select shop_name from sale_detail) a;
The following result is returned:
+------------+ | shop_name | +------------+ | s1 | | s2 | | s3 | | null | | s6 | | s7 | +------------+
Example 2: Use Syntax 2. Sample statements:
set odps.sql.allow.fullscan=true; select (select * from sale_detail where shop_name='s1') from sale_detail;
The following result is returned:
+------------+-------------+-------------+------------+------------+ | shop_name | customer_id | total_price | sale_date | region | +------------+-------------+-------------+------------+------------+ | s1 | c1 | 100.1 | 2013 | china | | s1 | c1 | 100.1 | 2013 | china | | s1 | c1 | 100.1 | 2013 | china | | s1 | c1 | 100.1 | 2013 | china | | s1 | c1 | 100.1 | 2013 | china | | s1 | c1 | 100.1 | 2013 | china | +------------+-------------+-------------+------------+------------+
Example 3: Use Syntax 1. In this example, a subquery in a
FROM
clause is used as a table, and the subquery isjoined
with other tables or subqueries. Sample statements:-- Create a table and join the table with a subquery. create table shop as select shop_name,customer_id,total_price from sale_detail; select a.shop_name, a.customer_id, a.total_price from (select * from shop) a join sale_detail on a.shop_name = sale_detail.shop_name;
The following result is returned:
+------------+-------------+-------------+ | shop_name | customer_id | total_price | +------------+-------------+-------------+ | null | c5 | NULL | | s6 | c6 | 100.4 | | s7 | c7 | 100.5 | | s1 | c1 | 100.1 | | s2 | c2 | 100.2 | | s3 | c3 | 100.3 | +------------+-------------+-------------+
IN SUBQUERY
IN SUBQUERY
is used in a similar manner to LEFT SEMI JOIN.
Syntax
Syntax 1
select<select_expr1>from<table_name1>where<select_expr2>in(select<select_expr3>from<table_name2>); -- The preceding statement is equivalent to the following statement with LEFT SEMI JOIN: select<select_expr1>from<table_name1><alias_name1>leftsemijoin<table_name2><alias_name2>on<alias_name1>.<select_expr2>=<alias_name2>.<select_expr3>;
NoteIf
select_expr2
specifies partition key columns,select <select_expr2> from <table_name2>
is not converted intoLEFT SEMI JOIN
. A separate job is started to run a subquery. MaxCompute compares the subquery results with the columns that you specify inselect_expr2
in sequence. If the partitions of the table specified bytable_name1
contain the columns inselect_expr2
and these columns are not included in the results, MaxCompute does not read data from these partitions. This ensures that partition pruning is still valid.Syntax 2
MaxCompute supports
IN SUBQUERY
and correlated conditions.where <table_name2_colname> = <table_name1>.<colname>
is a correlated condition. MaxCompute V1.0 does not support expressions that reference source tables from both subqueries and main queries. MaxCompute V2.0 supports such expressions. These filter conditions are part of theON
condition inSEMI JOIN
operations.select<select_expr1>from<table_name1>where<select_expr2>in(select<select_expr3>from<table_name2>where <table_name1>.<col_name>=<table_name2>.<col_name>);
NoteMaxCompute supports
IN SUBQUERY
that does not serve as aJOIN
condition. For example, anon-WHERE
clause uses IN SUBQUERY, or aWHERE
clause uses IN SUBQUERY that cannot be converted into aJOIN
condition. In this case, IN SUBQUERY cannot be converted intoSEMI JOIN
. A separate job must be started to run a subquery. Correlated conditions are not supported.Syntax 3
IN SUBQUERY supports multi-column subqueries based on the preceding capabilities and limits. This rule also applies to PostgreSQL. If you use Syntax 3 for IN SUBQUERY, you do not need to split a query into subqueries. Multi-column subqueries reduce one JOIN operation and save computing resources. You can use multi-column subqueries in the following ways:
Use a simple SELECT statement in which you specify multiple columns for the
IN SUBQUERY
expression.Use aggregate functions for the
IN SUBQUERY
expression. For more information about aggregate functions, see Aggregate functions.Use constants for the
IN SUBQUERY
expression.
Parameters
select_expr1: required. The value of this parameter is in the format of
col1_name, col2_name, Regular expression,...
. This parameter specifies the common columns or partition key columns that you want to query or regular expressions that are used for a query.table_name1 and table_name2: required. The parameters specify the names of the tables.
select_expr2 and select_expr3: required. The parameters specify the names of the columns in the tables specified by table_name1 and table_name2. Columns in the two tables are mapped to each other.
col_name: required. This parameter specifies the name of a column in the table.
Usage notes
If you use the
IN SUBQUERY
expression, null values are automatically excluded from the returned result of the subquery.Examples
Example 1: Use Syntax 1. Sample statements:
set odps.sql.allow.fullscan=true; select * from sale_detail where total_price in (select total_price from shop);
The following result is returned:
+-----------+-------------+-------------+-----------+--------+ | shop_name | customer_id | total_price | sale_date | region | +-----------+-------------+-------------+-----------+--------+ | s1 | c1 | 100.1 | 2013 | china | | s2 | c2 | 100.2 | 2013 | china | | s3 | c3 | 100.3 | 2013 | china | | s6 | c6 | 100.4 | 2014 | shanghai | | s7 | c7 | 100.5 | 2014 | shanghai | +-----------+-------------+-------------+-----------+--------+
Example 2: Use Syntax 2. Sample statements:
set odps.sql.allow.fullscan=true; select * from sale_detail where total_price in (select total_price from shop where customer_id = shop.customer_id);
The following result is returned:
+-----------+-------------+-------------+-----------+--------+ | shop_name | customer_id | total_price | sale_date | region | +-----------+-------------+-------------+-----------+--------+ | s1 | c1 | 100.1 | 2013 | china | | s2 | c2 | 100.2 | 2013 | china | | s3 | c3 | 100.3 | 2013 | china | | s6 | c6 | 100.4 | 2014 | shanghai | | s7 | c7 | 100.5 | 2014 | shanghai | +-----------+-------------+-------------+-----------+--------+
Example 3: Multiple columns are specified in a SELECT statement for a subquery. Sample statements:
-- Sample data is reconstructed to help you understand this example. create table if not exists t1(a bigint,b bigint,c bigint,d bigint,e bigint); create table if not exists t2(a bigint,b bigint,c bigint,d bigint,e bigint); insert into table t1 values (1,3,2,1,1),(2,2,1,3,1),(3,1,1,1,1),(2,1,1,0,1),(1,1,1,0,1); insert into table t2 values (1,3,5,0,1),(2,2,3,1,1),(3,1,1,0,1),(2,1,1,0,1),(1,1,1,0,1); -- Scenario 1: The IN SUBQUERY expression is a simple SELECT statement in which you specify multiple columns. select a, b from t1 where (c, d) in (select a, b from t2 where e = t1.e); -- The following result is returned: +------------+------------+ | a | b | +------------+------------+ | 1 | 3 | | 2 | 2 | | 3 | 1 | +------------+------------+ -- Scenario 2: The IN SUBQUERY expression uses aggregate functions. select a, b from t1 where (c, d) in (select max(a), b from t2 where e = t1.e group by b having max(a) > 0); -- The following result is returned: +------------+------------+ | a | b | +------------+------------+ | 2 | 2 | +------------+------------+ -- Scenario 3: The IN SUBQUERY expression uses constants. select a, b from t1 where (c, d) in ((1, 3), (1, 1)); -- The following result is returned: +------------+------------+ | a | b | +------------+------------+ | 2 | 2 | | 3 | 1 | +------------+------------+
NOT IN SUBQUERY
NOT IN SUBQUERY
is used in a similar manner to LEFT ANTI JOIN. However, if the values of a row are NULL for a specified column in the table that you want to query, the value of the expression in NOT IN SUBQUERY
is NULL. In this case, the WHERE
condition is invalid, and no data is returned. This processing logic is different from that of LEFT ANTI JOIN
.
Syntax
Syntax 1
select <select_expr1> from <table_name1> where <select_expr2> not in (select <select_expr2> from <table_name2>); -- The preceding statement is equivalent to the following statement with LEFT ANTI JOIN. select <select_expr1> from <table_name1> <alias_name1> left anti join <table_name2> <alias_name2> on <alias_name1>.<select_expr1> = <alias_name2>.<select_expr2>;
NoteIf
select_expr2
specifies partition key columns,select <select_expr2> from <table_name2>
is not converted intoLEFT ANTI JOIN
. A separate job is started to run a subquery. MaxCompute compares the subquery results with the columns specified inselect_expr2
in sequence. If the partitions of the table specified bytable_name1
contain the columns inselect_expr2
and these columns are not included in the results, MaxCompute does not read data from these partitions. This ensures that partition pruning is still valid.Syntax 2
MaxCompute supports
NOT IN SUBQUERY
and correlated conditions.where <table_name2_colname> = <table_name1>.<colname>
is a correlated condition. MaxCompute V1.0 does not support expressions that reference source tables from both subqueries and main queries. MaxCompute V2.0 supports such expressions. These expressions are part of theON
condition inANTI JOIN
operations.select <select_expr1> from <table_name1> where <select_expr2> not in (select <select_expr2> from <table_name2> where <table_name2_colname> = <table_name1>.<colname>);
NoteMaxCompute supports
NOT IN SUBQUERY
that does not serve as aJOIN
condition. For example, anon-WHERE
clause uses NOT IN SUBQUERY, or aWHERE
clause uses NOT IN SUBQUERY that cannot be converted into aJOIN
condition. In this case, NOT IN SUBQUERY cannot be converted intoANTI JOIN
. A separate job must be started to run a subquery. Correlated conditions are not supported.Syntax 3
NOT IN SUBQUERY supports multi-column subqueries based on the preceding capabilities and limits. This rule also applies to PostgreSQL. If you use Syntax 3 for NOT IN SUBQUERY, you do not need to split a query into multiple subqueries. Multi-column subqueries reduce one JOIN operation and save computing resources. You can use multi-column subqueries in the following ways:
Use a simple SELECT statement in which you specify multiple columns for the
NOT IN SUBQUERY
expression.Use aggregate functions for the
NOT IN SUBQUERY
expression. For more information about aggregate functions, see Aggregate functions.Use constants for the
NOT IN SUBQUERY
expression.
Parameters
select_expr1: required. The value of this parameter is in the format of
col1_name, col2_name, Regular expression,...
. This parameter specifies the common columns or partition key columns that you want to query or regular expressions that are used for a query.table_name1 and table_name2: required. The parameters specify the names of the tables.
select_expr2 and select_expr3: required. The parameters specify the names of the columns in the tables specified by table_name1 and table_name2. Columns in the two tables are mapped to each other.
col_name: required. This parameter specifies the name of a column in the table.
Usage notes
If you use the
NOT IN SUBQUERY
expression, null values are automatically excluded from the returned result of the subquery.Examples
Example 1: Use Syntax 1. Sample statements:
-- Create a table named shop1 and insert data into the table. create table shop1 as select shop_name,customer_id,total_price from sale_detail; insert into shop1 values ('s8','c1',100.1); select * from shop1 where shop_name not in (select shop_name from sale_detail);
The following result is returned:
+------------+-------------+-------------+ | shop_name | customer_id | total_price | +------------+-------------+-------------+ | s8 | c1 | 100.1 | +------------+-------------+-------------+
Example 2: Use Syntax 2. Sample statements:
set odps.sql.allow.fullscan=true; select * from shop1 where shop_name not in (select shop_name from sale_detail where customer_id = shop1.customer_id);
The following result is returned:
+------------+-------------+-------------+ | shop_name | customer_id | total_price | +------------+-------------+-------------+ | s8 | c1 | 100.1 | +------------+-------------+-------------+
Example 3:
NOT IN SUBQUERY
does not serve as aJOIN
condition. Sample statements:set odps.sql.allow.fullscan=true; select * from shop1 where shop_name not in (select shop_name from sale_detail) and total_price < 100.3;
NOT IN SUBQUERY cannot be converted into
ANTI JOIN
. This is because theWHERE
clause includes anAND
operator. A separate job is started to run a subquery.The following result is returned:
+------------+-------------+-------------+ | shop_name | customer_id | total_price | +------------+-------------+-------------+ | s8 | c1 | 100.1 | +------------+-------------+-------------+
Example 4: If the values of a row in the table from which you want to query data are NULL, no data is returned. Sample statements:
-- Create a table named sale and insert data into the table. create table if not exists sale ( shop_name string, customer_id string, total_price double ) partitioned by (sale_date string, region string); alter table sale add partition (sale_date='2013', region='china'); insert into sale partition (sale_date='2013', region='china') values ('null','null',null),('s2','c2',100.2),('s3','c3',100.3),('s8','c8',100.8); set odps.sql.allow.fullscan=true; select * from sale where shop_name not in (select shop_name from sale_detail);
The following result is returned:
+------------+-------------+-------------+------------+------------+ | shop_name | customer_id | total_price | sale_date | region | +------------+-------------+-------------+------------+------------+ +------------+-------------+-------------+------------+------------+
Example 5: Multiple columns are specified in a SELECT statement for a subquery. Sample statements:
-- Sample data is reconstructed to help you understand this example. The sample data is the same as that of IN SUBQUERY. create table if not exists t1(a bigint,b bigint,c bigint,d bigint,e bigint); create table if not exists t2(a bigint,b bigint,c bigint,d bigint,e bigint); insert into table t1 values (1,3,2,1,1),(2,2,1,3,1),(3,1,1,1,1),(2,1,1,0,1),(1,1,1,0,1); insert into table t2 values (1,3,5,0,1),(2,2,3,1,1),(3,1,1,0,1),(2,1,1,0,1),(1,1,1,0,1); -- Scenario 1: The NOT IN SUBQUERY expression is a simple SELECT statement in which you specify multiple columns. select a, b from t1 where (c, d) not in (select a, b from t2 where e = t1.e); -- The following result is returned: +------------+------------+ | a | b | +------------+------------+ | 2 | 1 | | 1 | 1 | +------------+------------+ -- Scenario 2: The NOT IN SUBQUERY expression uses aggregate functions. select a, b from t1 where (c, d) not in (select max(a), b from t2 where e = t1.e group by b having max(a) > 0); -- The following result is returned: +------------+------------+ | a | b | +------------+------------+ | 1 | 3 | | 3 | 1 | | 2 | 1 | | 1 | 1 | +------------+------------+ -- Scenario 3: The NOT IN SUBQUERY expression uses constants. select a, b from t1 where (c, d) not in ((1, 3), (1, 1)); -- The following result is returned: +------------+------------+ | a | b | +------------+------------+ | 1 | 3 | | 2 | 1 | | 1 | 1 | +------------+------------+
EXISTS SUBQUERY
When you use an EXISTS SUBQUERY
clause, if the subquery returns at least one row of data, True is returned. If the subquery does not return data, False is returned.
MaxCompute supports only the WHERE
subqueries that have correlated conditions. To use an EXISTS SUBQUERY
clause, you must convert this clause into LEFT SEMI JOIN
.
Syntax
select <select_expr> from <table_name1> where exists (select <select_expr> from <table_name2> where <table_name2_colname> = <table_name1>.<colname>);
Parameters
select_expr: required. The value of this parameter is in the format of
col1_name, col2_name, Regular expression,...
. This format indicates common columns or partition key columns that you want to query or regular expressions that are used for a query.table_name1 and table_name2: required. The parameters specify the names of the tables.
col_name: required. This parameter specifies the name of a column in the table.
Usage notes
If you use the
EXISTS SUBQUERY
clause, null values are automatically excluded from the returned result of the subquery.Examples
set odps.sql.allow.fullscan=true; select * from sale_detail where exists (select * from shop where customer_id = sale_detail.customer_id); -- The preceding statement is equivalent to the following statement: select * from sale_detail a left semi join shop b on a.customer_id = b.customer_id;
The following result is returned:
+------------+-------------+-------------+------------+------------+ | shop_name | customer_id | total_price | sale_date | region | +------------+-------------+-------------+------------+------------+ | null | c5 | NULL | 2014 | shanghai | | s6 | c6 | 100.4 | 2014 | shanghai | | s7 | c7 | 100.5 | 2014 | shanghai | | s1 | c1 | 100.1 | 2013 | china | | s2 | c2 | 100.2 | 2013 | china | | s3 | c3 | 100.3 | 2013 | china | +------------+-------------+-------------+------------+------------+
NOT EXISTS SUBQUERY
When you use a NOT EXISTS SUBQUERY
clause, if no data is returned, True is returned. Otherwise, False is returned.
MaxCompute supports only the WHERE
subqueries that have correlated conditions. To use a NOT EXISTS SUBQUERY
clause, you must convert this clause into LEFT ANTI JOIN
.
Syntax
select <select_expr> from <table_name1> where not exists (select <select_expr> from <table_name2> where <table_name2_colname> = <table_name1>.<colname>);
Parameters
select_expr: required. The value of this parameter is in the format of
col1_name, col2_name, Regular expression,...
. This format indicates common columns or partition key columns that you want to query or regular expressions that are used for a query.table_name1 and table_name2: required. The parameters specify the names of the tables.
col_name: required. This parameter specifies the name of a column in the table.
Usage notes
If you use the
NOT EXISTS SUBQUERY
clause, null values are automatically excluded from the returned result of the subquery.Examples
set odps.sql.allow.fullscan=true; select * from sale_detail where not exists (select * from shop where shop_name = sale_detail.shop_name); -- The preceding statement is equivalent to the following statement: select * from sale_detail a left anti join shop b on a.shop_name = b.shop_name;
The following result is returned:
+------------+-------------+-------------+------------+------------+ | shop_name | customer_id | total_price | sale_date | region | +------------+-------------+-------------+------------+------------+ +------------+-------------+-------------+------------+------------+
SCALAR SUBQUERY
If the output result of a SCALAR SUBQUERY clause contains only one row and one column of data, the result can be used as a scalar for data computations.
All SCALAR SUBQUERY clauses whose output result contains only one row and one column of data can be rewritten based on the following syntax. If the output result of SCALAR SUBQUERY contains only one row of data and one MAX
or MIN
operator is nested outside SCALAR SUBQUERY, the result does not change.
Syntax
select <select_expr> from <table_name1> where (<select count(*) from <table_name2> where <table_name2_colname> = <table_name1>.<colname>) <Scalar operator> <scalar_value>; -- The preceding statement is equivalent to the following statement: select <table_name1>.<select_expr> from <table_name1> left semi join (select <colname>, count(*) from <table_name2> group by <colname> having count(*) <Scalar operator> <scalar_value>) <table_name2> on <table_name1>.<colname> = <table_name2>.<colname>;
NoteThe output result of
select count(*) from <table_name2> where <table_name2_colname> = <table_name1>.<colname>
is a row set. The output contains only one row and one column of data. In this case, the result can be used as a scalar. In practical application, SCALAR SUBQUERY is converted intoJOIN
as much as possible.The output result of SCALAR SUBQUERY can be used as a scalar only if you can confirm in the compilation phase that SCALAR SUBQUERY returns only one row and one column of data. If you cannot make this confirmation until the running phase, the compiler reports an error. The compiler can compile the statements that meet the following requirements:
The
SELECT
list of SCALAR SUBQUERY uses aggregate functions that are not included in the parameters of a specified user-defined table-valued function (UDTF).SCALAR SUBQUERY that uses aggregate functions does not include a
GROUP BY
clause.
SCALAR SUBQUERY also supports multi-column subqueries based on the preceding capabilities and limits.
A SELECT list is a SCALAR SUBQUERY expression in which you specify multiple columns. The expression must be an equality expression.
Columns in a SELECT list can be an expression of the BOOLEAN type. Only equivalent comparison is supported.
A
WHERE
clause supports multi-column comparison. Only equivalent comparison is supported.
Parameters
select_expr: required. The value of this parameter is in the format of
col1_name, col2_name, Regular expression,...
. This format indicates common columns or partition key columns that you want to query or regular expressions that are used for a query.table_name1 and table_name2: required. The parameters specify the names of the tables.
col_name: required. This parameter specifies the name of a column in the table.
Scalar operator: required. The scalar operator can be greater than (>), less than (<), equal to (=), greater than or equal to (>=), or less than or equal to (<=).
scalar_value: required. This parameter specifies a scalar value.
Limits
SCALAR SUBQUERY
can reference columns from main queries. IfSCALAR SUBQUERY
uses multiple-level nesting, only the outermost column can be referenced.-- Sample statement that you can execute: select * from t1 where (select count(*) from t2 where t1.a = t2.a) = 3; -- Sample statement that you cannot execute. This is because columns from main queries cannot be referenced in a SELECT statement for a subquery. select * from t1 where (select count(*) from t2 where (select count(*) from t3 where t3.a = t1.a) = 2) = 3;
SCALAR SUBQUERY
can be used only in aWHERE
clause.-- Scalar subqueries cannot be referenced in a SELECT statement for a subquery. select * from t1 where (select t1.b + count(*) from t2) = 3; -- The columns returned by the SELECT statement cannot reference outer columns. select(selectcount(t1.a)fromt2wheret2.a=t1.a)fromt1; select(selectt1.afromt2wheret2.a=t1.a)fromt1;
Examples
Example 1: Common usage. Sample statements:
set odps.sql.allow.fullscan=true; select * from shop where (select count(*) from sale_detail where sale_detail.shop_name = shop.shop_name) >= 1;
The following result is returned:
+------------+-------------+-------------+ | shop_name | customer_id | total_price | +------------+-------------+-------------+ | s1 | c1 | 100.1 | | s2 | c2 | 100.2 | | s3 | c3 | 100.3 | | null | c5 | NULL | | s6 | c6 | 100.4 | | s7 | c7 | 100.5 | +------------+-------------+-------------+
Example 2: Multiple columns are specified in a SELECT statement for a subquery. Sample statements:
-- Sample data is reconstructed to help you understand this example. create table if not exists ts(a bigint,b bigint,c double); create table if not exists t(a bigint,b bigint,c double); insert into table ts values (1,3,4.0),(1,3,3.0); insert into table t values (1,3,4.0),(1,3,5.0); -- Scenario 1: The SELECT statement contains a scalar subquery expression in which multiple columns are specified. The expression must be an equality expression. Sample statements that you cannot execute: select (select a, b from t where c > ts.c) as (a, b), a from ts; select (select a, b from t where c = ts.c) as (a, b), a from ts; -- The following result is returned: +------------+------------+------------+ | a | b | a2 | +------------+------------+------------+ | 1 | 3 | 1 | | NULL | NULL | 1 | +------------+------------+------------+ -- Scenario 2: The SELECT statement contains an expression of the BOOLEAN type. Only equivalent comparison is supported. Sample statements that you cannot execute: select (a,b) > (select a,b from ts where c = t.c) from t; select (a,b) = (select a,b from ts where c = t.c) from t; -- The following result is returned: +------+ | _c0 | +------+ | true | | false | +------+ -- Scenario 3: A WHERE clause supports multi-column comparison. Only equivalent comparison is supported. Sample statements that you cannot execute: select * from t where (a,b) > (select a,b from ts where c = t.c); select * from t where c > 3.0 and (a,b) = (select a,b from ts where c = t.c); -- The following result is returned: +------------+------------+------------+ | a | b | c | +------------+------------+------------+ | 1 | 3 | 4.0 | +------------+------------+------------+ select * from t where c > 3.0 or (a,b) = (select a,b from ts where c = t.c); -- The following result is returned: +------------+------------+------------+ | a | b | c | +------------+------------+------------+ | 1 | 3 | 4.0 | | 1 | 3 | 5.0 | +------------+------------+------------+
References
A large number of subqueries or improper use of subqueries may cause slow queries, especially in a big data environment. You can use temporary tables or materialized views instead of subqueries, or reconstruct multiple subqueries into a JOIN operation to improve query efficiency. For more information, see Recommendations and management of materialized views and JOIN.