Create ODPS nodes in DataWorks - DataWorks - Alibaba Cloud Documentation Center

DataWorks provides multiple types of ODPS nodes that you can use to develop MaxCompute tasks based on your business requirements. DataWorks also provides various node scheduling configurations to help you configure scheduling properties for a MaxCompute task in a flexible manner. This topic describes how to create and manage an ODPS node.

Prerequisites

A workflow is created.
Development operations in different types of compute engines in DataStudio are performed based on workflows. Before you create a node, you must create a workflow. For more information, see Create a workflow.
A MaxCompute data source is added and associated with DataStudio.
Before you create an ODPS node to develop a MaxCompute task, you must add a MaxCompute project to your DataWorks workspace as a MaxCompute data source and associate the MaxCompute data source with DataStudio as an underlying engine for MaxCompute task development. For more information, see Add a MaxCompute data source and Environment preparation.
The RAM user that you want to use to develop a MaxCompute task is added to the workspace as a member and is assigned the Development or Workspace Administrator role. The Workspace Administrator role has high permissions. Assign this role to the RAM user only if necessary. For information about how to add a member to a workspace and assign a role to the member, see Add workspace members and assign roles to them.

Create an ODPS node

Go to the DataStudio page.
Log on to the DataWorks console. In the top navigation bar, select the desired region. In the left-side navigation pane, choose Data Development and O&M > Data Development. On the page that appears, select the desired workspace from the drop-down list and click Go to Data Development.
On the DataStudio page, create a node. In this example, an ODPS SQL node is created.
1. In the Scheduled Workflow pane of the DataStudio page, find the created workflow, right-click the workflow name, and then choose Create Node > MaxCompute > ODPS SQL. You can also move the pointer over the Create icon in the Scheduled Workflow pane and select an ODPS node type to create a node of this type.
  Important
  If you cannot choose Create Node > MaxCompute > ODPS SQL on the DataStudio page, click Computing Resource in the left-side navigation pane of the DataStudio page to check whether a MaxCompute computing resource is associated with DataStudio. If no MaxCompute computing resource is associated with DataStudio, you must associate a MaxCompute computing resource with DataStudio and refresh the related page before you can create an ODPS node.
2. In the Create Node dialog box, configure the Name parameter and click Confirm. After the node is created, you can develop and configure a MaxCompute task based on the node.

Develop a MaxCompute task

DataWorks supports multiple types of ODPS nodes. You can develop MaxCompute tasks based on the node types.

Note

When you run a MaxCompute task, the system displays the estimated cost only for reference. The actually generated fees are subject to the bills and are included in the bills of the MaxCompute service. For more information about billing, see Billable items and billing methods.
If an error is reported during cost estimation, the table may not exist or you may not have the required permissions. You can temporarily ignore the error and handle the error based on the specific error message after you run the task on the node.

Node type	Use scenario	Task development guide
ODPS SQL	This type of node can be used to develop MaxCompute SQL tasks.	Develop a MaxCompute SQL task
SQL Snippet	This type of node can be used to develop MaxCompute SQL tasks. In actual business scenarios, a large number of SQL code processes are similar. The input tables or output tables of these processes may have the same schema or compatible data types but different table names. In this case, developers can create a script template based on an SQL code process to reuse SQL code. The script template extracts input parameters from input tables and output parameters from output tables.	Overview of a script template
PyODPS 3	This type of node can be used to develop PyODPS tasks of MaxCompute. The underlying language version of a PyODPS 3 node is Python 3.	Develop a PyODPS 3 task
PyODPS 2	This type of node can be used to develop PyODPS tasks of MaxCompute. The underlying language version of a PyODPS 2 node is Python 2.	Develop a PyODPS 2 task
ODPS Spark	This type of node can be used to develop MaxCompute Spark tasks.	Develop a MaxCompute Spark task
ODPS Script	This type of node can be used to develop MaxCompute script tasks.	Develop a MaxCompute script task
ODPS MR	This type of node can be used to develop MaxCompute MapReduce tasks.	Develop a MaxCompute MapReduce task

Develop a MaxCompute task: advanced capabilities

In addition to MaxCompute task development capabilities, DataWorks also provides capabilities related to tables, resources, and functions for MaxCompute. You can use these capabilities to perform task development operations in an efficient manner.

Table-related capabilities: You can quickly create a MaxCompute table, view information of a MaxCompute table, and manage a MaxCompute table by using entry points and features in the DataWorks console. For more information, see Create and manage MaxCompute tables and Manage tables.
Function- and resource-related capabilities:
- You can directly use built-in functions of MaxCompute when you develop a MaxCompute task in the DataWorks console. For information about how to view built-in functions of MaxCompute, see Use built-in functions.
- You can upload a user-defined function (UDF) to DataWorks as a resource and register the UDF in DataWorks. When you develop a MaxCompute task, you can directly call the UDF. For information about how to use a UDF, see Create and use MaxCompute resources and Create and use a MaxCompute function.
- You can upload a resource package that is developed on your on-premises machine to DataWorks or directly create a resource in DataWorks.
  DataWorks allows you to upload text files, Python code, and compressed packages such as .zip, .tgz, .tar.gz, .tar, and .jar packages to MaxCompute as different types of resources. When you create UDFs or run MapReduce tasks, you can reference these resources. For information about how to upload and use a resource, see Create and use MaxCompute resources.

Operations that you can perform after a task is developed

After you complete the development of a task by using the created node, you can perform the following operations:

Configure scheduling properties: You can configure properties for periodic scheduling of the node. If you want the system to periodically schedule and run the task, you must configure items for the node, such as rerun settings and scheduling dependencies. For more information, see Overview.
Debug the node: You can debug and test the code of the node to check whether the code logic meets your expectations. For more information, see Debugging procedure.
Deploy the node: After you complete all development operations, you can deploy the node. After the node is deployed, the system periodically schedules the node based on the scheduling properties of the node. For more information, see Deploy nodes.

Manage a node

After a node is created, you can perform various operations on the node, such as modifying or deleting the node. You can also combine the node with other nodes into a node group and then reference the node group in other workflows. For more information about management operations that you can perform on a node, see Create and manage a node group.