All Products
Search
Document Center

MaxCompute:SQL in script mode

最終更新日:May 24, 2024

If you want to run extract, transform, and load (ETL) tasks, automated periodic tasks, and complex query orchestration tasks for a large-scale dataset, you can use the script mode supported by the current SQL engine of MaxCompute. If you use the script mode to compile an SQL script file, all statements in the file are compiled at the same time. You do not need to compile each statement. After the script file is compiled, the script file is submitted to MaxCompute, and an execution plan is generated. This way, the statements in the script file are scheduled in one queue and are executed once. This allows you to fully utilize the resources in MaxCompute. This way, the work efficiency is improved and the flexibility and security of data processing and analysis workflow are enhanced.

Note
  • In script mode, you cannot estimate the expense of SQL statements by using the Cost SQL command that is described in Cost estimation. You can view your bill for the accurate expense in the Alibaba Cloud Management Console. For more information, see View billing details.

  • The number of tables that are referenced in a script file cannot exceed 10,000. If a view is included in a script file, the system calculates the number of tables that are referenced when the view is created. If a table is referenced multiple times in a script file, the number of times the table is referenced is counted into the number of tables that are referenced.

The script mode allows you to compile SQL statements similar to the way you compile SQL statements in a common programming language. You do not need to consider how to organize statements.

Syntax

--set
set odps.sql.type.system.odps2=true;
[set odps.stage.reducer.num=xxx;]
[...]
--ddl
create table table1 xxx;
[create table table2 xxx;]
[...]
--dml
@var1 := SELECT [ALL | DISTINCT] select_expr, select_expr, ...
        FROM table3
        [WHERE where_condition];
@var2 := SELECT [ALL | DISTINCT] select_expr, select_expr, ...
        FROM table4
        [WHERE where_condition];
@var3 := SELECT [ALL | DISTINCT] var1.select_expr, var2.select_expr, ...
        FROM @var1 join @var2 on ...;
INSERT OVERWRITE|INTO TABLE [PARTITION (partcol1=val1, partcol2=val2 ...)]
        SELECT [ALL | DISTINCT] select_expr, select_expr, ...
        FROM @var3;    
[@var4 := SELECT [ALL | DISTINCT] var1.select_expr, var.select_expr, ... FROM @var1 
        UNION ALL | UNION 
        SELECT [ALL | DISTINCT] var1.select_expr, var.select_expr, ... FROM @var2;    
CREATE [EXTERNAL] TABLE [IF NOT EXISTS] table_name 
        AS 
        SELECT [ALL | DISTINCT] select_expr, select_expr, ...
        FROM @var4;]
[...]

Syntax description

  • The script mode supports three types of statements: SET statements, DML statements, and some DDL statements. The DDL statements that are used to display results, such as DESC and SHOW, are not supported.

  • A complete script file must be a list of SET, DDL, and DML statements in sequence. A script file can contain multiple statements of each type, but statements of the same type must be placed together. You must place the three types of statements in the preceding sequence.

  • The at sign (@) declares a variable.

  • A script file can contain a maximum of one statement that is used to display results, such as a SELECT statement. If a script file contains more than one such a statement, an error is returned. We recommend that you do not include SELECT statements that are used to display results in the script file.

  • A script file can contain a maximum of one CREATE TABLE AS statement and must end with this statement. We recommend that you do not include the statements used to create tables and INSERT statements in the same script command.

  • In script mode, if a statement in the script file fails to be executed, all the other statements in the script file also fail.

  • In script mode, a job can be generated to process data only after all the input data is prepared and inserted.

  • In script mode, you cannot assign the value of a variable of the table type to a variable of the specified data type. Sample code:

    @a table (name string);
    @a:= select 'tom';
    @b string;
    @b:= select * from @a;
  • In script mode, if a table is written and then read, the following error is returned:

    INSERT OVERWRITE table src2 SELECT * FROM src WHERE key > 0;
    @a := SELECT * FROM src2;
    SELECT * FROM @a;

    To prevent this error, you can modify your script file based on the following sample script:

    @a := SELECT * FROM src WHERE key > 0;
    INSERT OVERWRITE table src2 SELECT * FROM @a;
    SELECT * FROM @a;
  • In script mode, the IF statement is supported.

    • The IF statement allows the system to automatically select the execution logic based on specific conditions. MaxCompute supports the following IF syntax:

      IF (condition) BEGIN
        statement 1
        statement 2
        ...
      END
      
      IF (condition) BEGIN
        statements
      END ELSE IF (condition2) BEGIN
        statements
      END ELSE BEGIN
        statements
      END
      Note
      • If the BEGIN and END conditional clauses contain only one statement, you can omit the BEGIN and END conditional clauses. This is similar to the code block { } in Java.

      • The IF statement cannot contain DDL statements, such as CREATE TABLE, ALTER TABLE, and TRUNCATE TABLE.

    • The IF statement can contain the following types of conditions:

      • Expressions of the BOOLEAN type. A BOOLEAN-type expression in the IF ELSE statement determines the branch to execute during code compilation. Sample code:

        @date := '20190101';
        @row TABLE(id STRING); -- Declare the row variable. The row type is TABLE and the fields in the table are of the STRING data type. 
        IF ( cast(@date  as bigint) % 2 == 0 ) BEGIN 
        @row  := SELECT id from src1; 
        END ELSE BEGIN
        @row  := SELECT id from src2; 
        END
        INSERT OVERWRITE TABLE dest SELECT * FROM @row; 
      • Scalar subqueries of the BOOLEAN type. A BOOLEAN-type scalar subquery in the IF ELSE statement determines the branch to execute during job running. Therefore, you must submit multiple jobs. Sample code:

        @i bigint;
        @t table(id bigint, value bigint);
        IF ((SELECT count(*) FROM src WHERE a = '5') > 1)  BEGIN
        @i := 1;
        @t := select @i, @i*2;
        END ELSE
        BEGIN
        @i := 2;
        @t := select @i, @i*2;
        END
        select id, value from @t; 
  • You can use the script mode in the following scenarios:

    • A single statement with nested subqueries needs to be rewritten, or a complex script must be split into multiple statements.

    • Data from different data sources are prepared at different time points, and the time gap is large. For example, the data from one data source is prepared at 01:00, and that from another data source is prepared at 07:00. In this case, the table variables are not suitable for packaging the statements of different types in a script file.

    • You must assign a constant value to a variable and execute the SELECT * FROM Variable statement to convert the variable into a scalar value. The scalar value is used together with other columns for calculation. The constant value can also be stored in a table that has only one row. The following statements show an example. For more information about the syntax of the SELECT * FROM Variable statement, see Subqueries.

      @a := SELECT 10; -- Assign the constant value 10 to @a. You can also execute the SELECT col1 FROM t1 statement to store the constant value to table t1 that has only one row. 
      @b := SELECT key,value+(SELECT * FROM @a) FROM t2 WHERE key >10000; -- Use the value of @a with value in table t2 for calculation. 
      SELECT * FROM @b;

Example

The following sample code provides an example on how to execute MaxCompute SQL statements in script mode:

CREATE TABLE IF NOT EXISTS dest(key STRING, value BIGINT) PARTITIONED BY (d STRING);
CREATE TABLE IF NOT EXISTS dest2(key STRING, value BIGINT) PARTITIONED  BY (d STRING);
@a := SELECT * FROM src WHERE value >0;
@b := SELECT * FROM src2 WHERE key is not null;
@c := SELECT * FROM src3 WHERE value is not null;
@d := SELECT a.key,b.value FROM @a LEFT OUTER JOIN @b ON a.key=b.key AND b.value>0;
@e := SELECT a.key,c.value FROM @a INNER JOIN @c ON a.key=c.key;
@f := SELECT * FROM @d UNION SELECT * FROM @e UNION SELECT * FROM @a;
INSERT OVERWRITE table dest PARTITION (d='20171111') SELECT * FROM @f;
@g := SELECT e.key,c.value FROM @e JOIN @c ON e.key=c.key;
INSERT OVERWRITE TABLE dest2 PARTITION (d='20171111') SELECT * FROM @g;

Required tools

Methods to use the script mode

  • Use MaxCompute Studio

    Before you use MaxCompute Studio, make sure that MaxCompute Studio is installed, project connections are added, and a MaxCompute SQL script file is created. For more information, see Install IntelliJ IDEA, Manage project connections, and Create a MaxCompute script module.

    After you compile and submit the script file, you can view the graphic execution plan. Although the script file contains multiple statements, the graphic execution plan is shown in a directed acyclic graph (DAG).

  • Use the MaxCompute client

    You must use the MaxCompute client of a version later than 0.27 to submit the script file. We recommend that you install the latest version of the MaxCompute client installation package. After you install the latest version of the package, use the -s option to submit the script file.

    Edit the source code file myscript.sql in script code and call odpscmd in the command-line interface (CLI) to run the following command. For more information about how to run the MaxCompute client by using the CLI, see Run the MaxCompute client.

    ..\bin>odpscmd -s myscript.sql
    Note

    Similar to the -f and -e options, the -s option is a command-line option for the MaxCompute client. The -s option is not a command in an interactive environment. If the MaxCompute client (odpscmd) is used in an interactive environment, the script mode and table variables are not supported.

  • Use the DataWorks console

    In the DataWorks console, you can create an ODPS Script node. The following figure shows how to create an ODPS Script node.脚本节点

    After the node is created, use the script mode to edit the script file and click the Run icon in the toolbar to submit the script file to MaxCompute. This way, you can use the LogView URL in the output to view the graphic execution plan and results.

  • Use the SDK for Java

    You can create an SQL script file by using the SDK for Java. For more information about the SDK for Java, see SDK for Java. The following sample code provides an example on how to create an SQL script file.

    import java.util.HashMap;
    import java.util.List;
    import java.util.Map;
    
    import com.aliyun.odps.Instance;
    import com.aliyun.odps.Odps;
    import com.aliyun.odps.OdpsException;
    import com.aliyun.odps.account.Account;
    import com.aliyun.odps.account.AliyunAccount;
    import com.aliyun.odps.data.Record;
    import com.aliyun.odps.task.SQLTask;
    
    public class SdkTest {
    
      public static void main(String[] args) throws OdpsException {
    		// The AccessKey pair of an Alibaba Cloud account has permissions on all API operations. If you use an AccessKey pair to call API operations, risks may occur. We recommend that you use a RAM user to call API operations or perform routine O&M. To create a RAM user, log on to the RAM console.
    		// In this example, the AccessKey ID and AccessKey secret are configured as environment variables. You can also save your AccessKey pair in the configuration file based on your business requirements.
    		// We recommend that you do not directly specify the AccessKey ID and AccessKey secret in code to prevent AccessKey pair leaks.
        Account account = new AliyunAccount(System.getenv("ALIBABA_CLOUD_ACCESS_KEY_ID"), System.getenv("ALIBABA_CLOUD_ACCESS_KEY_SECRET"));
        Odps odps = new Odps(account);
        odps.setDefaultProject("your project_name");
        odps.setEndpoint("your end_point");
    
        String sqlScript = "@a := SELECT * FROM jdbc_test;\n"
                           + "SELECT * FROM @a;";
    
        // Make sure that you add the following configuration to the SQL script file.
        Map<String, String> hints = new HashMap<>();
        hints.put("odps.sql.submit.mode", "script");
    
        Instance instance = SQLTask.run(odps, "your project_name", sqlScript, hints, null);
        instance.waitForSuccess();
    
        List<Record> recordList = SQLTask.getResult(instance);
        for (Record record : recordList) {
          System.out.println(record.get(0));
          System.out.println(record.get(1));
        }
      }
    
    }