All Products
Search
Document Center

MaxCompute:Write MaxCompute SQL statements

Last Updated:Dec 18, 2024

This topic describes the common use scenarios of MaxCompute SQL statements and how to write the statements.

Prepare a dataset

In this example, the emp and dept tables are used as the dataset. To create tables in a MaxCompute project and upload data to the tables, perform the following steps:

  1. Create tables.

    • Execute the following statement to create the emp table:

      CREATE TABLE IF NOT EXISTS emp (
        EMPNO STRING,
        ENAME STRING,
        JOB STRING,
        MGR BIGINT,
        HIREDATE DATETIME,
        SAL DOUBLE,
        COMM DOUBLE,
        DEPTNO BIGINT);
    • Execute the following statement to create the dept table:

      CREATE TABLE IF NOT EXISTS dept (
        DEPTNO BIGINT,
        DNAME STRING,
        LOC STRING);
  2. Create the files emp.txt and dept.txt locally, with the following content.

    emp.txt

    7369,SMITH,CLERK,7902,1980-12-17 00:00:00,800,,20
    7499,ALLEN,SALESMAN,7698,1981-02-20 00:00:00,1600,300,30
    7521,WARD,SALESMAN,7698,1981-02-22 00:00:00,1250,500,30
    7566,JONES,MANAGER,7839,1981-04-02 00:00:00,2975,,20
    7654,MARTIN,SALESMAN,7698,1981-09-28 00:00:00,1250,1400,30
    7698,BLAKE,MANAGER,7839,1981-05-01 00:00:00,2850,,30
    7782,CLARK,MANAGER,7839,1981-06-09 00:00:00,2450,,10
    7788,SCOTT,ANALYST,7566,1987-04-19 00:00:00,3000,,20
    7839,KING,PRESIDENT,,1981-11-17 00:00:00,5000,,10
    7844,TURNER,SALESMAN,7698,1981-09-08 00:00:00,1500,0,30
    7876,ADAMS,CLERK,7788,1987-05-23 00:00:00,1100,,20
    7900,JAMES,CLERK,7698,1981-12-03 00:00:00,950,,30
    7902,FORD,ANALYST,7566,1981-12-03 00:00:00,3000,,20
    7934,MILLER,CLERK,7782,1982-01-23 00:00:00,1300,,10

    dept.txt

    10,ACCOUNTING,NEW YORK
    20,RESEARCH,DALLAS
    30,SALES,CHICAGO
    40,OPERATIONS,BOSTON
  3. Upload data to the tables. For more information, see Data upload scenarios and tools.

Examples

  • Example 1: Query all departments that have at least one employee.

    We recommend that you use the JOIN clause to avoid large amount of data in the query. Execute the following SQL statement:

    SELECT d.* 
    FROM dept d 
    JOIN (
        SELECT DISTINCT deptno AS no
        FROM emp
    ) e 
    ON d.deptno = e.no;

    The following result is returned:

    +------------+------------+------------+
    | deptno     | dname      | loc        |
    +------------+------------+------------+
    | 10         | ACCOUNTING | NEW YORK   |
    | 20         | RESEARCH   | DALLAS     |
    | 30         | SALES      | CHICAGO    |
    +------------+------------+------------+
  • Example 2: Query all employees who have higher salaries than SMITH.

    The following sample code shows how to use MAPJOIN in the SQL statement:

    SELECT /*+ MapJoin(a) */ e.empno
        , e.ename
        , e.sal 
    FROM emp e 
    JOIN (
        SELECT MAX(sal) AS sal
        FROM `emp`
        WHERE `ENAME` = 'SMITH'
    ) a 
    ON e.sal > a.sal;

    The following result is returned:

    +------------+------------+------------+
    | empno      | ename      | sal        |
    +------------+------------+------------+
    | 7499       | ALLEN      | 1600.0     |
    | 7521       | WARD       | 1250.0     |
    | 7566       | JONES      | 2975.0     |
    | 7654       | MARTIN     | 1250.0     |
    | 7698       | BLAKE      | 2850.0     |
    | 7782       | CLARK      | 2450.0     |
    | 7788       | SCOTT      | 3000.0     |
    | 7839       | KING       | 5000.0     |
    | 7844       | TURNER     | 1500.0     |
    | 7876       | ADAMS      | 1100.0     |
    | 7900       | JAMES      | 950.0      |
    | 7902       | FORD       | 3000.0     |
    | 7934       | MILLER     | 1300.0     |
    +------------+------------+------------+
  • Example 3: Query the names of all employees and the names of their immediate superiors.

    The following sample code shows how to use EQUI JOIN in the SQL statement:

    SELECT a.ename
        , b.ename 
    FROM emp a 
    LEFT OUTER JOIN emp b 
    ON b.empno = a.mgr;

    The following result is returned:

    +------------+------------+
    | ename      | ename2     |
    +------------+------------+
    | SMITH      | FORD       |
    | ALLEN      | BLAKE      |
    | WARD       | BLAKE      |
    | JONES      | KING       |
    | MARTIN     | BLAKE      |
    | BLAKE      | KING       |
    | CLARK      | KING       |
    | SCOTT      | JONES      |
    | KING       | NULL       |
    | TURNER     | BLAKE      |
    | ADAMS      | SCOTT      |
    | JAMES      | BLAKE      |
    | FORD       | JONES      |
    | MILLER     | CLARK      |
    +------------+------------+
  • Example 4: Query all jobs that have basic salaries higher than USD 1,500.

    The following sample code shows how to use the HAVING clause in the SQL statement:

    SELECT emp.`JOB`
        , MIN(emp.sal) AS sal 
    FROM `emp`
    GROUP BY emp.`JOB`
    HAVING MIN(emp.sal) > 1500;

    The following result is returned:

    +------------+------------+
    | job        | sal        |
    +------------+------------+
    | MANAGER    | 2450.0     |
    | ANALYST    | 3000.0     |
    | PRESIDENT  | 5000.0     |
    +------------+------------+
  • Example 5: Query the number of employees, the average salary, and the average length of service in each department.

    The following sample code shows how to use built-in functions in the SQL statement:

    SELECT COUNT(empno) AS cnt_emp
        , ROUND(AVG(sal), 2) AS avg_sal
        , ROUND(AVG(datediff(getdate(), hiredate, 'dd')), 2) AS avg_hire 
    FROM `emp`
    GROUP BY `DEPTNO`;

    The following result is returned:

    +------------+------------+------------+
    | cnt_emp    | avg_sal    | avg_hire   |
    +------------+------------+------------+
    | 5          | 2175.0     | 14886.2    |
    | 6          | 1566.67    | 15715.33   |
    | 3          | 2916.67    | 15606.33   |
    +------------+------------+------------+
  • Example 6: Query the names and the ranks of the top three employees who have the highest salaries in each department.

    The following sample code shows how to query top N results:

    SELECT * 
    FROM (
      SELECT deptno
        , ename
        , sal
        , ROW_NUMBER() OVER (PARTITION BY deptno ORDER BY sal DESC) AS nums 
      FROM emp 
    ) emp1 
    WHERE emp1.nums < 4;

    The following result is returned:

    +------------+------------+------------+------------+
    | deptno     | ename      | sal        | nums       |
    +------------+------------+------------+------------+
    | 10         | KING       | 5000.0     | 1          |
    | 10         | CLARK      | 2450.0     | 2          |
    | 10         | MILLER     | 1300.0     | 3          |
    | 20         | SCOTT      | 3000.0     | 1          |
    | 20         | FORD       | 3000.0     | 2          |
    | 20         | JONES      | 2975.0     | 3          |
    | 30         | BLAKE      | 2850.0     | 1          |
    | 30         | ALLEN      | 1600.0     | 2          |
    | 30         | TURNER     | 1500.0     | 3          |
    +------------+------------+------------+------------+
  • Example 7: Query the number of employees in each department and the proportion of clerks in each department.

    SELECT deptno
        , COUNT(empno) AS cnt
        , ROUND(SUM(CASE 
          WHEN job = 'CLERK' THEN 1
          ELSE 0
        END) / COUNT(empno), 2) AS rate 
    FROM `EMP`
    GROUP BY deptno;

    The following result is returned:

    +------------+------------+------------+
    | deptno     | cnt        | rate       |
    +------------+------------+------------+
    | 20         | 5          | 0.4        |
    | 30         | 6          | 0.17       |
    | 10         | 3          | 0.33       |
    +------------+------------+------------+

Precautions

  • When you use the GROUP BY clause, the SELECT list can consist of only aggregate functions and columns in the GROUP BY clause.

  • ORDER BY must be followed by LIMIT N.

  • The SELECT expression does not support subqueries. To use subqueries, rewrite the code to include a JOIN clause.

  • The JOIN clause does not support Cartesian products. You can use MAPJOIN as an alternative.

  • UNION ALL must be used with subqueries.

  • The subquery that is specified in the IN or NOT IN clause can contain only one column and return up to 1,000 rows. Otherwise, use the JOIN clause.