By Haiqing
MaxCompute (formerly known as ODPS) is a leading distributed big data processing platform developed by Alibaba Cloud. It is widely utilized, especially within the Alibaba Group, and supports the core businesses of multiple business units (BUs). ODPS V2.0, continuously optimizing performance, aims to enhance the user experience and expression capabilities of SQL, as well as improve the productivity of ODPS developers.
Building upon the SQL engine of ODPS V2.0, MaxCompute simplifies SQL compilation and enhances the language's expressiveness. We present a series of articles titled Unleash the Power of MaxCompute to explore the capabilities of MaxCompute (ODPS V2.0).
MaxCompute supports the use of the QUALIFY clause, which allows you to filter the results of window functions based on specified conditions. This is similar to how the HAVING clause is used to filter results after using aggregate functions and GROUP BY clauses.
QUALIFY [expression]
The QUALIFY clause filters the results of the Window function. The relationship between the Window function and the QUALIFY clause can be compared to the aggregate function + GROUP BY syntax and the HAVING syntax. A typical query statement is executed in the following sequence:
Generally, in a query statement, the QUALIFY clause is executed after the window function and is used to filter the data processed by the window function.
It is suitable for scenarios where the results of the window function need to be filtered. Before the QUALIFY clause is used, the SubQuery in the FROM statement and the WHERE clause were used together to filter the results. As shown in the following figure:
The QUALIFY clause is used when you need to filter the results of a window function. Before introducing the QUALIFY clause, a SubQuery in the FROM statement and the WHERE clause were typically used together to accomplish the filtering. This can be seen in the example below:
SELECT col1, col2
FROM
(
SELECT
t.a as col1,
sum(t.a) over (partition by t.b) as col2
FROM values (1, 2),(2,3),(2,2),(1,3),(4,2) t(a, b)
)
WHERE col2 > 4;
Query statement after rewriting:
SELECT
t.a as col1,
sum(t.a) over (partition by t.b) as col2
FROM values (1, 2),(2,3),(2,2),(1,3),(4,2) t(a, b)
QUALIFY col2 > 4;
You can filter the results of window functions without using aliases.
SELECT t.a as col1,
sum(t.a) over (partition by t.b) as col2
FROM values (1, 2),(2,3),(2,2),(1,3),(4,2) t(a, b)
QUALIFY sum(t.a) over (partition by t.b) > 4;
The QUALIFY, WHERE, and HAVING clauses are all used to filter data. They differ in the execution sequence. Therefore, the QUALIFY clause allows users to write some complex filters, such as:
SELECT *
FROM values (1, 2) t(a, b)
QUALIFY sum(t.a) over (partition by t.b) IN (SELECT a FROM t1)
The QUALIFY clause is executed after the window function takes effect. The following complex statement can help you know the execution sequence of the QUALIFY clause.
SELECT a, b, max(c)
FROM values (1, 2, 3),(1, 2, 4),(1, 3, 5),(2, 3, 6),(2, 4, 7),(3, 4, 8) t(a, b, c)
WHERE a < 3
GROUP BY a, b
HAVING max(c) > 5
QUALIFY sum(b) over (partition by a) > 3;
--+------------+------------+------------+
--| a | b | _c2 |
--+------------+------------+------------+
--| 2 | 3 | 6 |
--| 2 | 4 | 7 |
--+------------+------------+------------+
Take the row_number window function as an example. All employees are grouped by department (deptno) as a window column and sorted by salary (sal) in descending order to obtain the serial number of the employee in the group. To query the top 3 salaries of each department, you can refer to the following implementation:
• Data preparation
create table if not exists emp
(empno string,
ename string,
job string,
mgr string,
hiredate string,
sal string,
comm string,
deptno string);
insert into table emp values
('7369','SMITH','CLERK','7902','1980-12-17 00:00:00','800','','20')
,('7499','ALLEN','SALESMAN','7698','1981-02-20 00:00:00','1600','300','30')
,('7521','WARD','SALESMAN','7698','1981-02-22 00:00:00','1250','500','30')
,('7566','JONES','MANAGER','7839','1981-04-02 00:00:00','2975','','20')
,('7654','MARTIN','SALESMAN','7698','1981-09-28 00:00:00','1250','1400','30')
,('7698','BLAKE','MANAGER','7839','1981-05-01 00:00:00','2850','','30')
,('7782','CLARK','MANAGER','7839','1981-06-09 00:00:00','2450','','10')
,('7788','SCOTT','ANALYST','7566','1987-04-19 00:00:00','3000','','20')
,('7839','KING','PRESIDENT','','1981-11-17 00:00:00','5000','','10')
,('7844','TURNER','SALESMAN','7698','1981-09-08 00:00:00','1500','0','30')
,('7876','ADAMS','CLERK','7788','1987-05-23 00:00:00','1100','','20')
,('7900','JAMES','CLERK','7698','1981-12-03 00:00:00','950','','30')
,('7902','FORD','ANALYST','7566','1981-12-03 00:00:00','3000','','20')
,('7934','MILLER','CLERK','7782','1982-01-23 00:00:00','1300','','10')
,('7948','JACCKA','CLERK','7782','1981-04-12 00:00:00','5000','','10')
,('7956','WELAN','CLERK','7649','1982-07-20 00:00:00','2450','','10')
,('7956','TEBAGE','CLERK','7748','1982-12-30 00:00:00','1300','','10')
;
• Use SubQuery in the FROM statement and WHERE clause to implement filtering, as shown in the following picture:
SELECT a.*
FROM (
SELECT deptno
,ename
,sal
,ROW_NUMBER() OVER (PARTITION BY deptno ORDER BY sal DESC ) AS nums
FROM emp
) a
WHERE a.nums<=3
;
• The implementation using the QUALIFY clause is as follows:
SELECT deptno
,ename
,sal
,ROW_NUMBER() OVER (PARTITION BY deptno ORDER BY sal DESC ) AS nums
FROM emp
QUALIFY nums <= 3
;
The results are shown in the following figure. As you can see, using the QUALIFY clause makes the query statement more concise and easier to understand.
• The QUALIFY clause requires at least one window function to be specified in the SELECT statement. Otherwise, the following error is returned: FAILED: ODPS-0130071:[3,1] Semantic analysis exception - use QUALIFY clause without window function. Invalid sample code:
SELECT *
FROM values (1, 2) t(a, b)
QUALIFY a > 1;
• In a QUALIFY clause, you can use the alias of a column that is specified in the SELECT statement as a filter. Sample code:
SELECT
sum(t.a) over (partition by t.b) as c1
FROM values (1, 2) t(a, b)
QUALIFY c1 > 1;
137 posts | 19 followers
FollowAlibaba Cloud MaxCompute - February 18, 2024
Alibaba Cloud MaxCompute - January 29, 2024
Alibaba Cloud MaxCompute - January 22, 2024
Alibaba Cloud MaxCompute - January 22, 2024
Alibaba Cloud MaxCompute - January 29, 2024
Alibaba Cloud MaxCompute - February 18, 2024
137 posts | 19 followers
FollowAlibaba Cloud provides big data consulting services to help enterprises leverage advanced data technology.
Learn MoreConduct large-scale data warehousing with MaxCompute
Learn MoreAlibaba Cloud experts provide retailers with a lightweight and customized big data consulting service to help you assess your big data maturity and plan your big data journey.
Learn MoreApsaraDB for HBase is a NoSQL database engine that is highly optimized and 100% compatible with the community edition of HBase.
Learn MoreMore Posts by Alibaba Cloud MaxCompute