×
Community Blog MaxCompute Unleashed - Part 11: QUALIFY Clause

MaxCompute Unleashed - Part 11: QUALIFY Clause

Part 11 of the "Unleash the Power of MaxCompute" series introduces the features and use of QUALIFY Clause.

By Haiqing

MaxCompute (formerly known as ODPS) is a leading distributed big data processing platform developed by Alibaba Cloud. It is widely utilized, especially within the Alibaba Group, and supports the core businesses of multiple business units (BUs). ODPS V2.0, continuously optimizing performance, aims to enhance the user experience and expression capabilities of SQL, as well as improve the productivity of ODPS developers.

Building upon the SQL engine of ODPS V2.0, MaxCompute simplifies SQL compilation and enhances the language's expressiveness. We present a series of articles titled Unleash the Power of MaxCompute to explore the capabilities of MaxCompute (ODPS V2.0).

MaxCompute supports the use of the QUALIFY clause, which allows you to filter the results of window functions based on specified conditions. This is similar to how the HAVING clause is used to filter results after using aggregate functions and GROUP BY clauses.

Introduction to the Features of the QUALIFY Clause

Syntax

QUALIFY [expression]

The QUALIFY clause filters the results of the Window function. The relationship between the Window function and the QUALIFY clause can be compared to the aggregate function + GROUP BY syntax and the HAVING syntax. A typical query statement is executed in the following sequence:

  1. FROM
  2. WHERE
  3. GROUP BY and Aggregate Function
  4. HAVING
  5. WINDOW
  6. QUALIFY
  7. DISTINCT
  8. ORDER BY
  9. LIMIT

Generally, in a query statement, the QUALIFY clause is executed after the window function and is used to filter the data processed by the window function.

Scenarios

It is suitable for scenarios where the results of the window function need to be filtered. Before the QUALIFY clause is used, the SubQuery in the FROM statement and the WHERE clause were used together to filter the results. As shown in the following figure:

The QUALIFY clause is used when you need to filter the results of a window function. Before introducing the QUALIFY clause, a SubQuery in the FROM statement and the WHERE clause were typically used together to accomplish the filtering. This can be seen in the example below:

SELECT col1, col2
FROM
(
SELECT
t.a as col1,
sum(t.a) over (partition by t.b) as col2
FROM values (1, 2),(2,3),(2,2),(1,3),(4,2) t(a, b)
)
WHERE col2 > 4;

Query statement after rewriting:

SELECT 
t.a as col1, 
sum(t.a) over (partition by t.b) as col2 
FROM values (1, 2),(2,3),(2,2),(1,3),(4,2)  t(a, b) 
QUALIFY col2 > 4;

You can filter the results of window functions without using aliases.

SELECT t.a as col1,
sum(t.a) over (partition by t.b) as col2
FROM values (1, 2),(2,3),(2,2),(1,3),(4,2) t(a, b)
QUALIFY sum(t.a) over (partition by t.b)  > 4;

The QUALIFY, WHERE, and HAVING clauses are all used to filter data. They differ in the execution sequence. Therefore, the QUALIFY clause allows users to write some complex filters, such as:

SELECT *
FROM values (1, 2) t(a, b)
QUALIFY sum(t.a) over (partition by t.b)  IN (SELECT a FROM t1)

The QUALIFY clause is executed after the window function takes effect. The following complex statement can help you know the execution sequence of the QUALIFY clause.

SELECT a, b, max(c)
FROM values (1, 2, 3),(1, 2, 4),(1, 3, 5),(2, 3, 6),(2, 4, 7),(3, 4, 8) t(a, b, c)
WHERE a < 3
GROUP BY a, b
HAVING max(c) > 5
QUALIFY sum(b) over (partition by a) > 3; 
--+------------+------------+------------+
--| a          | b          | _c2        |
--+------------+------------+------------+
--| 2          | 3          | 6          |
--| 2          | 4          | 7          |
--+------------+------------+------------+

Examples

Take the row_number window function as an example. All employees are grouped by department (deptno) as a window column and sorted by salary (sal) in descending order to obtain the serial number of the employee in the group. To query the top 3 salaries of each department, you can refer to the following implementation:

• Data preparation

create table if not exists emp
 (empno string,
  ename string,
  job string,
  mgr string,
  hiredate string,
  sal string,
  comm string,
  deptno string);
insert into table emp values
('7369','SMITH','CLERK','7902','1980-12-17 00:00:00','800','','20')
,('7499','ALLEN','SALESMAN','7698','1981-02-20 00:00:00','1600','300','30')
,('7521','WARD','SALESMAN','7698','1981-02-22 00:00:00','1250','500','30')
,('7566','JONES','MANAGER','7839','1981-04-02 00:00:00','2975','','20')
,('7654','MARTIN','SALESMAN','7698','1981-09-28 00:00:00','1250','1400','30')
,('7698','BLAKE','MANAGER','7839','1981-05-01 00:00:00','2850','','30')
,('7782','CLARK','MANAGER','7839','1981-06-09 00:00:00','2450','','10')
,('7788','SCOTT','ANALYST','7566','1987-04-19 00:00:00','3000','','20')
,('7839','KING','PRESIDENT','','1981-11-17 00:00:00','5000','','10')
,('7844','TURNER','SALESMAN','7698','1981-09-08 00:00:00','1500','0','30')
,('7876','ADAMS','CLERK','7788','1987-05-23 00:00:00','1100','','20')
,('7900','JAMES','CLERK','7698','1981-12-03 00:00:00','950','','30')
,('7902','FORD','ANALYST','7566','1981-12-03 00:00:00','3000','','20')
,('7934','MILLER','CLERK','7782','1982-01-23 00:00:00','1300','','10')
,('7948','JACCKA','CLERK','7782','1981-04-12 00:00:00','5000','','10')
,('7956','WELAN','CLERK','7649','1982-07-20 00:00:00','2450','','10')
,('7956','TEBAGE','CLERK','7748','1982-12-30 00:00:00','1300','','10')
;

• Use SubQuery in the FROM statement and WHERE clause to implement filtering, as shown in the following picture:

SELECT  a.*
FROM    (
          SELECT  deptno
                  ,ename
                  ,sal
                  ,ROW_NUMBER() OVER (PARTITION BY deptno ORDER BY sal DESC ) AS nums
          FROM    emp
      ) a
WHERE a.nums<=3
;

• The implementation using the QUALIFY clause is as follows:

SELECT  deptno
      ,ename
      ,sal
      ,ROW_NUMBER() OVER (PARTITION BY deptno ORDER BY sal DESC ) AS nums
FROM    emp
QUALIFY nums <= 3
;

The results are shown in the following figure. As you can see, using the QUALIFY clause makes the query statement more concise and easier to understand.

Notes

• The QUALIFY clause requires at least one window function to be specified in the SELECT statement. Otherwise, the following error is returned: FAILED: ODPS-0130071:[3,1] Semantic analysis exception - use QUALIFY clause without window function. Invalid sample code:

SELECT * 
FROM values (1, 2) t(a, b) 
QUALIFY a > 1;

• In a QUALIFY clause, you can use the alias of a column that is specified in the SELECT statement as a filter. Sample code:

SELECT 
sum(t.a) over (partition by t.b) as c1 
FROM values (1, 2) t(a, b) 
QUALIFY c1 > 1;
0 1 0
Share on

Alibaba Cloud MaxCompute

137 posts | 19 followers

You may also like

Comments

Alibaba Cloud MaxCompute

137 posts | 19 followers

Related Products