This topic describes the common use scenarios of MaxCompute SQL statements and how to write the statements.
Prepare a dataset
In this example, the emp and dept tables are used as the dataset. To create tables in a MaxCompute project and upload data to the tables, perform the following steps:
Create tables.
Execute the following statement to create the emp table:
CREATE TABLE IF NOT EXISTS emp ( EMPNO STRING, ENAME STRING, JOB STRING, MGR BIGINT, HIREDATE DATETIME, SAL DOUBLE, COMM DOUBLE, DEPTNO BIGINT);
Execute the following statement to create the dept table:
CREATE TABLE IF NOT EXISTS dept ( DEPTNO BIGINT, DNAME STRING, LOC STRING);
Download the data file of the emp table and the data file of the dept table.
Upload data to the tables. For more information, see Data upload scenarios and tools.
Examples
Example 1: Query all departments that have at least one employee.
We recommend that you use the JOIN clause to avoid large amount of data in the query. Execute the following SQL statement:
SELECT d.* FROM dept d JOIN ( SELECT DISTINCT deptno AS no FROM emp ) e ON d.deptno = e.no;
The following result is returned:
+------------+------------+------------+ | deptno | dname | loc | +------------+------------+------------+ | 10 | ACCOUNTING | NEW YORK | | 20 | RESEARCH | DALLAS | | 30 | SALES | CHICAGO | +------------+------------+------------+
Example 2: Query all employees who have higher salaries than SMITH.
The following sample code shows how to use MAPJOIN in the SQL statement:
SELECT /*+ MapJoin(a) */ e.empno , e.ename , e.sal FROM emp e JOIN ( SELECT MAX(sal) AS sal FROM `emp` WHERE `ENAME` = 'SMITH' ) a ON e.sal > a.sal;
The following result is returned:
+------------+------------+------------+ | empno | ename | sal | +------------+------------+------------+ | 7499 | ALLEN | 1600.0 | | 7521 | WARD | 1250.0 | | 7566 | JONES | 2975.0 | | 7654 | MARTIN | 1250.0 | | 7698 | BLAKE | 2850.0 | | 7782 | CLARK | 2450.0 | | 7788 | SCOTT | 3000.0 | | 7839 | KING | 5000.0 | | 7844 | TURNER | 1500.0 | | 7876 | ADAMS | 1100.0 | | 7900 | JAMES | 950.0 | | 7902 | FORD | 3000.0 | | 7934 | MILLER | 1300.0 | +------------+------------+------------+
Example 3: Query the names of all employees and the names of their immediate superiors.
The following sample code shows how to use EQUI JOIN in the SQL statement:
SELECT a.ename , b.ename FROM emp a LEFT OUTER JOIN emp b ON b.empno = a.mgr;
The following result is returned:
+------------+------------+ | ename | ename2 | +------------+------------+ | SMITH | FORD | | ALLEN | BLAKE | | WARD | BLAKE | | JONES | KING | | MARTIN | BLAKE | | BLAKE | KING | | CLARK | KING | | SCOTT | JONES | | KING | NULL | | TURNER | BLAKE | | ADAMS | SCOTT | | JAMES | BLAKE | | FORD | JONES | | MILLER | CLARK | +------------+------------+
Example 4: Query all jobs that have basic salaries higher than USD 1,500.
The following sample code shows how to use the HAVING clause in the SQL statement:
SELECT emp.`JOB` , MIN(emp.sal) AS sal FROM `emp` GROUP BY emp.`JOB` HAVING MIN(emp.sal) > 1500;
The following result is returned:
+------------+------------+ | job | sal | +------------+------------+ | MANAGER | 2450.0 | | ANALYST | 3000.0 | | PRESIDENT | 5000.0 | +------------+------------+
Example 5: Query the number of employees, the average salary, and the average length of service in each department.
The following sample code shows how to use built-in functions in the SQL statement:
SELECT COUNT(empno) AS cnt_emp , ROUND(AVG(sal), 2) AS avg_sal , ROUND(AVG(datediff(getdate(), hiredate, 'dd')), 2) AS avg_hire FROM `emp` GROUP BY `DEPTNO`;
The following result is returned:
+------------+------------+------------+ | cnt_emp | avg_sal | avg_hire | +------------+------------+------------+ | 5 | 2175.0 | 14886.2 | | 6 | 1566.67 | 15715.33 | | 3 | 2916.67 | 15606.33 | +------------+------------+------------+
Example 6: Query the names and the ranks of the top three employees who have the highest salaries in each department.
The following sample code shows how to query top N results:
SELECT * FROM ( SELECT deptno , ename , sal , ROW_NUMBER() OVER (PARTITION BY deptno ORDER BY sal DESC) AS nums FROM emp ) emp1 WHERE emp1.nums < 4;
The following result is returned:
+------------+------------+------------+------------+ | deptno | ename | sal | nums | +------------+------------+------------+------------+ | 10 | KING | 5000.0 | 1 | | 10 | CLARK | 2450.0 | 2 | | 10 | MILLER | 1300.0 | 3 | | 20 | SCOTT | 3000.0 | 1 | | 20 | FORD | 3000.0 | 2 | | 20 | JONES | 2975.0 | 3 | | 30 | BLAKE | 2850.0 | 1 | | 30 | ALLEN | 1600.0 | 2 | | 30 | TURNER | 1500.0 | 3 | +------------+------------+------------+------------+
Example 7: Query the number of employees in each department and the proportion of clerks in each department.
SELECT deptno , COUNT(empno) AS cnt , ROUND(SUM(CASE WHEN job = 'CLERK' THEN 1 ELSE 0 END) / COUNT(empno), 2) AS rate FROM `EMP` GROUP BY deptno;
The following result is returned:
+------------+------------+------------+ | deptno | cnt | rate | +------------+------------+------------+ | 20 | 5 | 0.4 | | 30 | 6 | 0.17 | | 10 | 3 | 0.33 | +------------+------------+------------+
Precautions
When you use the GROUP BY clause, the SELECT list can consist of only aggregate functions and columns in the GROUP BY clause.
ORDER BY must be followed by LIMIT N.
The SELECT expression does not support subqueries. To use subqueries, rewrite the code to include a JOIN clause.
The JOIN clause does not support Cartesian products. You can use MAPJOIN as an alternative.
UNION ALL must be used with subqueries.
The subquery that is specified in the IN or NOT IN clause can contain only one column and return up to 1,000 rows. Otherwise, use the JOIN clause.