DataWorks provides the ODPS SQL node to periodically schedule MaxCompute SQL tasks and integrate them with other node types. MaxCompute SQL tasks use an SQL-like syntax and are suitable for distributed processing scenarios that involve large volumes of data (terabyte-scale) but do not require high real-time performance. This topic describes how to develop SQL tasks in DataWorks and provides important considerations.
Node overview
You can use ODPS SQL to process and query data in MaxCompute. It supports common SQL operations, such as SELECT, INSERT, UPDATE, and DELETE, along with MaxCompute-specific syntax and functions. You can write SQL-like statements to query and process data without having to write complex data processing logic. For more information about the SQL syntax, see SQL overview.
Limits
The following limits apply when you develop ODPS SQL nodes in DataWorks:
Category | Description |
Comments | Only single-line comments in the format of For more information, see MaxCompute SQL comments. The following limits also apply to comments.
|
SQL submission | In ODPS SQL, you cannot run SET or USE statements alone. They must be executed together with a specific SQL statement. |
SQL development | The SQL code cannot exceed 128 KB in size. The number of SQL commands cannot exceed 200. |
Query results | Only SQL statements that start with SELECT or WITH can output a formatted result set. The following limits apply to query results:
Note If you encounter these limits, you can download the query results to your local computer for viewing using one of the following methods:
|
Notes
Make sure that the account used to run the ODPS SQL task has the required permissions on the corresponding MaxCompute project. For more information, see DataWorks On MaxCompute access control and MaxCompute permissions.
MaxCompute SQL tasks require quota resources to run. If your task runs for a long time, go to the MaxCompute console to check the quota resource consumption and make sure that sufficient resources are available to run the task. For more information, see Computing resources - Quota management.
When you develop an ODPS SQL node task, you must enclose special parameters, such as an OSS address, in double quotation marks (""). Otherwise, the task may fail to parse and run.
In some extreme cases, such as an unexpected server power outage or an active/standby switchover, DataWorks may not be able to completely stop the related MaxCompute task processes. In this case, go to the computing resources of the corresponding project in MaxCompute to stop the job.
When you run statements that contain keywords such as SET or USE in different DataWorks environments, the execution order varies.
In DataStudio for Data Development: All statements that contain keywords such as SET or USE in the current task code are merged and used as preamble statements for all subsequent SQL statements.
In the scheduling environment: The statements are executed in the order they are written.
For example, assume that the code defined in the node is as follows.
SET a=b; CREATE TABLE name1(id string); SET c=d; CREATE TABLE name2(id string);The execution order in different environments is as follows:
SQL statement to execute
DataStudio for Data Development
Scheduling O&M
First SQL statement
SET a=b; SET c=d; CREATE TABLE name1(id string);SET a=b; CREATE TABLE name1(id string);Second SQL statement
SET a=b; SET c=d; CREATE TABLE name2(id string);SET c=d; CREATE TABLE name2(id string);
Edit code: Simple examples
SQL commands
MaxCompute SQL commands use a syntax similar to standard SQL. They support DDL, DML, and DQL statements, along with MaxCompute-specific commands. For more information about the syntax requirements and usage examples of each SQL command, see SQL overview. The following example shows how to develop and run a simple SQL command.
When you use new data types in MaxCompute V2.0 extension functions, you must add
SET odps.sql.type.system.odps2=true;before the SQL statement of the function and submit them together as a single job. This ensures that the new data types can be correctly processed. For more information about V2.0 data types, see Data types (V2.0).DataWorks provides scheduling parameters that allow you to dynamically pass parameters to code in scheduling scenarios. In an ODPS SQL node, you can define variables in the
${variable_name}format in your code. Then, on the Scheduling Configuration tab, in the Parameter section, you can assign values to these variables. For more information about the supported formats of scheduling parameters, see Supported formats of scheduling parameters.
Create a table
You can use the
CREATE TABLEstatement to create non-partitioned tables, partitioned tables, foreign tables, and clustered tables. For more information, see CREATE TABLE. The following code provides an SQL example:-- Create a partitioned table named students. CREATE TABLE if NOT EXISTS students ( id BIGINT, name STRING, age BIGINT, birth DATE) partitioned BY (gender STRING);Insert data
You can use the
INSERT INTOorINSERT OVERWRITEstatement to insert data into or overwrite data in a destination table. For more information, see Insert or overwrite data (INSERT INTO | INSERT OVERWRITE). The following code provides an SQL example:-- Insert data. INSERT INTO students PARTITION(gender='boy') VALUES (1,'Zhang San',15,DATE '2008-05-15') ; INSERT INTO students PARTITION(gender='boy') VALUES (2,'Li Si',17,DATE '2006-07-20') ; INSERT INTO students PARTITION(gender='girl') VALUES (3,'Li Xia',20,DATE '2003-04-20') ; INSERT INTO students PARTITION(gender='girl') VALUES (4,'Wang Lan',21,DATE '2002-01-08') ; INSERT INTO students PARTITION(gender='boy') VALUES (5,'Wang Wu',17,DATE '2006-09-12') ;ImportantAvoid using the
INSERT INTOstatement to insert data. This statement may cause unexpected data duplication. We recommend that you use theINSERT OVERWRITEstatement instead. For more information, see Insert or overwrite data.Query data
You can use the
SELECTstatement to perform operations such as nested queries, group queries, and sorting. For more information, see SELECT syntax. The following code provides an SQL example:-- (Optional) Enable full table scan at the project level. This operation requires high-level permissions. -- SETPROJECT odps.sql.allow.fullscan=true; -- Enable full table scan at the session level. This setting is valid only for the current session. SET odps.sql.allow.fullscan=true; -- Query the information of all male students and sort the results by ID in ascending order. SELECT * FROM students WHERE gender='boy' ORDER BY id;NoteBy default, Resource Access Management (RAM) users do not have permissions to query production tables. To obtain these permissions, you must request them from Security Center. For more information about the preset database permissions and access control for MaxCompute in DataWorks, see Details of MaxCompute database permission control. For more information about how to grant permissions using MaxCompute commands, see Manage user permissions using commands.
SQL functions
MaxCompute supports built-in functions and user-defined functions (UDFs) for data development and analysis. For more information about built-in functions, see Built-in functions. For more information about UDFs, see MaxCompute UDFs. The following examples show how to use SQL functions.
Built-in functions: Built-in functions are preset in MaxCompute and can be called directly. For example, based on the preceding example, you can use the
dateaddfunction to change the values in thebirthcolumn by a specified unit and interval. The following code provides a command example:-- Enable full table scan at the session level. This setting is valid only for the current session. SET odps.sql.allow.fullscan=true; SELECT id, name, age, birth, dateadd(birth,1,'mm') AS birth_dateadd FROM students;User-defined functions (UDFs): To use a UDF, you must write the function code, upload it as a resource, and register the function. For more information, see Create and use a UDF.
Runtime and results
The run results are displayed in a workbook. You can open the results in a workbook, or copy and paste the content into a local Excel file. For more information, see Task debugging process.
NoteDue to an adjustment in the China time zone information released by the International Organization for Standardization, when you run related SQL statements in DataWorks, a time difference may exist for certain periods: a difference of 5 minutes and 52 seconds for dates and times between 1900 and 1928, and a difference of 9 seconds for dates and times before 1900.
Operational log: You can click the Operational Log tab to view the log in Logview. For more information, see Use Logview V2.0 to view job running information.
Results:
Query the information of all male students and sort the results by ID in ascending order.
+------------+------------+------------+------------+------------+ | id | name | age | birth | gender | +------------+------------+------------+------------+------------+ | 1 | Zhang San | 15 | 2008-05-15 | boy | | 2 | Li Si | 17 | 2006-07-20 | boy | | 5 | Wang Wu | 17 | 2006-09-12 | boy | +------------+------------+------------+------------+------------+Change the values in the birth column by a specified unit and interval.
+------------+------------+------------+------------+---------------+ | id | name | age | birth | birth_dateadd | +------------+------------+------------+------------+---------------+ | 4 | Wang Lan | 21 | 2002-01-08 | 2002-02-08 | | 3 | Li Xia | 20 | 2003-04-20 | 2003-05-20 | | 2 | Li Si | 17 | 2006-07-20 | 2006-08-20 | | 1 | Zhang San | 15 | 2008-05-15 | 2008-06-15 | | 5 | Wang Wu | 17 | 2006-09-12 | 2006-10-12 | +------------+------------+------------+------------+---------------+
On the query results page, you can sort the results by a specified field and download the results. For more information, see Process query results.
Edit code: Advanced examples
The following topics provide more advanced examples of ODPS SQL tasks:
FAQ
Q: Why does my ODPS SQL task run for a long time or remain in the waiting state?
A: MaxCompute SQL tasks require quota resources to run. If your task runs for a long time, go to the MaxCompute console to check the quota resource consumption and make sure that sufficient resources are available to run the task. For more information, see Computing resources - Quota management.
Q: Why do I receive the error message "You have No privilege 'odps:xxxx' on xxxx" when I run an ODPS SQL task?
A: This error indicates that the account used to run the ODPS SQL task does not have the required permissions on the corresponding MaxCompute project. For more information, see MaxCompute permissions.