Clauses in a SELECT
statement that is written in compliance with the SELECT
syntax of MaxCompute are executed in a different sequence from the clauses in a standard SELECT statement. This topic describes the sequence for executing clauses in a SELECT
statement of MaxCompute and provides examples for reference.
Sequence for executing clauses in a SELECT statement
The SELECT syntax includes the following clauses:
select
from
where
group by
having
window
qualify
order by
distribute by
sort by
limit
The following clauses cannot be used with DISTRIBUTE BY
or SORT BY
: ORDER BY
and GROUP BY
. You can execute clauses in a SELECT
statement in one of the following sequences:
Sequence 1:
FROM
>WHERE
>GROUP BY
>HAVING
>SELECT
>ORDER BY
>LIMIT
Sequence 2:
FROM
>WHERE
>SELECT
>DISTRIBUTE BY
>SORT BY
To prevent confusion, MaxCompute allows you to write a SELECT statement in the preceding sequences. The syntax of a SELECT statement can be changed to the following form:
from <table_reference>
[where <where_condition>]
[group by <col_list>]
[having <having_condition>]
[window <window_name> AS (<window_definition>)]
[qualify <expression>]
select [all | distinct] <select_expr>, <select_expr>, ...
[order by <order_condition>]
[distribute by <distribute_condition> [sort by <sort_condition>] ]
[limit <number>]
Sample data
Sample source data is provided for you to better understand the examples in this topic. Sample statements:
-- Create a partitioned table named sale_detail.
create table if not exists sale_detail
(
shop_name string,
customer_id string,
total_price double
)
partitioned by (sale_date string, region string);
-- Add partitions to the sale_detail table.
alter table sale_detail add partition (sale_date='2013', region='china') partition (sale_date='2014', region='shanghai');
-- Insert data into the sale_detail table.
insert into sale_detail partition (sale_date='2013', region='china') values ('s1','c1',100.1),('s2','c2',100.2),('s3','c3',100.3);
insert into sale_detail partition (sale_date='2014', region='shanghai') values ('null','c5',null),('s6','c6',100.4),('s7','c7',100.5);
Query data in the sale_detail table. Sample statement:
set odps.sql.allow.fullscan=true;
select * from sale_detail;
-- The following result is returned:
+------------+-------------+-------------+------------+------------+
| shop_name | customer_id | total_price | sale_date | region |
+------------+-------------+-------------+------------+------------+
| s1 | c1 | 100.1 | 2013 | china |
| s2 | c2 | 100.2 | 2013 | china |
| s3 | c3 | 100.3 | 2013 | china |
| null | c5 | NULL | 2014 | shanghai |
| s6 | c6 | 100.4 | 2014 | shanghai |
| s7 | c7 | 100.5 | 2014 | shanghai |
+------------+-------------+-------------+------------+------------+
Examples
Example 1: Clauses in a SELECT statement are executed in Sequence 1.
NoteIf you execute the following statement to query data from a partitioned table, you must add
set odps.sql.allow.fullscan=true;
before the statement to enable a full table scan or specify partitions in the statement.-- Write a SELECT statement based on the SELECT syntax. set odps.sql.allow.fullscan=true; select region,max(total_price) from sale_detail where total_price > 100 group by region having sum(total_price)>300.5 order by region limit 5; -- Write a SELECT statement based on Sequence 1. The following statement is equivalent to the preceding statement. from sale_detail where total_price > 100 group by region having sum(total_price)>300.5 select region,max(total_price) order by region limit 5;
The following result is returned:
+------------+------------+ | region | _c1 | +------------+------------+ | china | 100.3 | +------------+------------+
Logic to execute clauses in the SELECT statement:
Retrieves the data that meets the condition (
WHERE total_price > 100
) from the sale_detail table (FROM sale_detail
).Groups the data obtained from Step a based on the values of the region column (
GROUP BY
).Retrieves the data of the groups whose sum of total_price is greater than 305 from the data obtained from Step b (
HAVING sum(total_price)>305
).Obtains the maximum value of the total_price column in each region (
SELECT region,max(total_price)
) from the data obtained from Step c.Sorts the data obtained from Step d based on the values of the region column (
ORDER BY region
).Displays the first five data records (
LIMIT 5
) of the data obtained from Step e.
Example 2: Clauses in a SELECT statement are executed in Sequence 2.
-- Write a SELECT statement based on the SELECT syntax. set odps.sql.allow.fullscan=true; select shop_name ,total_price ,region from sale_detail where total_price > 100.2 distribute by region sort by total_price; -- Write a SELECT statement based on Sequence 2. The following statement is equivalent to the preceding statement. from sale_detail where total_price > 100.2 select shop_name ,total_price ,region distribute by region sort by total_price;
The following result is returned:
+------------+-------------+------------+ | shop_name | total_price | region | +------------+-------------+------------+ | s3 | 100.3 | china | | s6 | 100.4 | shanghai | | s7 | 100.5 | shanghai | +------------+-------------+------------+
Logic to execute clauses in the SELECT statement:
Retrieves the data that meets the condition (
WHERE total_price > 100.2
) from the sale_detail table (FROM sale_detail
).Retrieves data from the data obtained from Step a based on the values of the shop name, total price, and region columns (
SELECT shop_name, total_price, region
).Performs hash partitioning on the data obtained from Step b based on the values of the region column (
DISTRIBUTE BY region
).Sorts the data obtained from Step c in ascending order based on the values of the total_price column (
SORT BY total_price
).