Supported node types

Updated at: 2025-02-13 02:11

The DataStudio service of DataWorks allows you to create various types of nodes, such as data synchronization nodes, computing resource nodes, and general nodes, to meet your different data processing requirements. Computing resource nodes include ODPS SQL nodes, Hologres SQL nodes, and E-MapReduce (EMR) Hive nodes. General nodes include zero load nodes and Check nodes.

Important

If you cannot create a computing resource node, such as an ODPS SQL, a Hologres SQL, or an EMR Hive node, in DataStudio, you can click Computing Resource in the left-side navigation pane of the DataStudio page to check whether the corresponding computing resource is associated with DataStudio. If the corresponding computing resource is associated with DataStudio, but you still cannot create a node of the computing resource type, you can refresh the current page to update the cached data or attempt to use the browser in incognito mode.

Data synchronization nodes

Type

Description

Node code

Task type (specified by TaskType)

Type

Description

Node code

Task type (specified by TaskType)

Batch synchronization node

This type of node is used to periodically synchronize offline data and to synchronize data between heterogeneous data sources in complex scenarios. For information about the data source types that support batch synchronization, see Supported data source types and read and write operations.

23

DI2

Real-time synchronization node

This type of node is used to synchronize incremental data in real time. A real-time synchronization node uses three basic plug-ins to read, convert, and write data. These plug-ins interact with each other based on an intermediate data format that is defined by the plug-ins. For information about the data source types that support real-time synchronization, see Supported data source types and read and write operations.

900

RI

Note

In addition to the nodes that can be created on the DataStudio page, DataWorks also allows you to create different types of synchronization tasks in Data Integration. For example, you can create a synchronization task in Data Integration that synchronizes full data at a time and then incremental data in real time or a synchronization task that synchronizes all data in a database in offline mode. For more information, see Overview of the full and incremental synchronization feature. In most cases, the node code of a task that is created in Data Integration is 24.

Compute engine nodes

In a specific workflow, you can create nodes of a specific compute engine type, use the nodes to develop data, issue the engine code to a corresponding data cleansing engine, and then run the code.

Compute engine integrated with DataWorks

Encapsulated engine capability

Node code

Task type (specified by TaskType)

Compute engine integrated with DataWorks

Encapsulated engine capability

Node code

Task type (specified by TaskType)

MaxCompute

Develop a MaxCompute SQL task

10

ODPS_SQL

Develop a MaxCompute Spark task

225

SPARK

Develop a PyODPS 2 task

221

PY_ODPS

Develop a PyODPS 3 task

1221

PY_ODPS3

Develop a MaxCompute script task

24

ODPS_SCRIPT

Develop a MaxCompute MR task

11

ODPS_MR

Reference a script template

1010

COMPONENT_SQL

E-MapReduce

Create an EMR Hive node

227

EMR_HIVE

Create an EMR MR node

230

EMR_MR

Create an EMR Spark SQL node

229

EMR_SPARK_SQL

Create an EMR Spark node

228

EMR_SPARK

Create an EMR Shell node

257

EMR_SHELL

Create an EMR Presto node

259

EMR_PRESTO

Create an EMR Spark Streaming node

264

SPARK_STREAMING

Create an EMR Kyuubi node

268

EMR_KYUUBI

Create an EMR Trino node

267

EMR_TRINO

CDH

Create a CDH Hive node

270

CDH_HIVE

Create a CDH Spark node

271

CDH_SPARK

Create a CDH MR node

273

CDH_MR

Create a CDH Presto node

278

CDH_PRESTO

Create a CDH Impala node

279

CDH_IMPALA

Create a CDH Spark SQL node

272

CDH_SPARK_SQL

AnalyticDB for PostgreSQL

Create and use AnalyticDB for PostgreSQL nodes

-

-

AnalyticDB for MySQL

Create an AnalyticDB for MySQL node

-

-

Hologres

Create a Hologres SQL node

1093

HOLOGRES_SQL

Create a node to synchronize schemas of MaxCompute tables with a few clicks

1094

HOLOGRES_SYNC_DDL

Create a node to synchronize MaxCompute data with a few clicks

1095

HOLOGRES_SYNC_DATA

ClickHouse

Configure a ClickHouse SQL node

1301

CLICK_SQL

StarRocks

Configure a StarRocks node

10004

-

PAI

Use DataWorks tasks to schedule pipelines in Machine Learning Designer

1117

PAI_STUDIO

Create and use a PAI DLC node

1119

PAI_DLC

Database

Create and use a MySQL node

1000039

-

Configure an SQL Server node

10001

-

Configure an Oracle node

10002

-

Configure a PostgreSQL node

10003

-

Configure a DRDS node

10005

-

Configure a PolarDB for MySQL node

10006

-

Configure a PolarDB for PostgreSQL node

10007

-

Configure a Doris node

10008

-

Configure a MariaDB node

10009

-

Configure a Redshift node

10011

-

Configure a SAP HANA node

-

-

Configure a Vertica node

10013

-

Configure a DM node

10014

-

Configure a KingbaseES node

10015

-

Configure an OceanBase node

10016

-

Configure a Db2 node

10017

-

Configure a GBase 8a node

-

-

Others

Create and use a Data Lake Analytics node

1000023

-

General nodes

In a specific workflow, you can create a general node and use the node together with compute engine nodes to process complex logic.

Scenario

Node type

Description

Node code

Task type (specified by TaskType)

Scenario

Node type

Description

Node code

Task type (specified by TaskType)

Business management

Zero load node

A zero load node is a control node that supports dry-run scheduling and does not generate data. In most cases, a zero load node serves as the root node of a workflow and allows you to easily manage nodes and workflows.

99

VIRTUAL_NODE

Event-based trigger

HTTP Trigger node

You can use this type of node if you want to trigger nodes in DataWorks to run after nodes in other scheduling systems finish running.

Note

DataWorks no longer allows you to create cross-tenant collaboration nodes. If you have used a cross-tenant collaboration node in your business, we recommend that you replace the cross-tenant collaboration node with an HTTP Trigger node. An HTTP Trigger node provides the same capabilities as a cross-tenant collaboration node.

1114

SCHEDULER_TRIGGER

OSS object inspection node

You can use this type of node if you want to trigger a descendant node to run after Object Storage Service (OSS) objects are generated.

239

-

FTP Check node

You can use this type of node if you want to trigger a descendant node to run after File Transfer Protocol (FTP) files are generated.

Note

FTP Check nodes are no longer supported in DataWorks. We recommend that you replace FTP Check nodes with Check nodes to perform checks in DataWorks.

1320

FTP_CHECK

Check node

You can use this type of node to check the availability of objects based on check policies. If the condition that is specified in the check policy for a Check node is met, the task on the Check node is successfully run. If the running of a task depends on an object, you can use a Check node to check the availability of the object and configure the task as the descendant task of the Check node. If the condition that is specified in the check policy for the Check node is met, the task on the Check node is successfully run and then its descendant task is triggered to run. Supported objects:

  • MaxCompute partitioned table

  • FTP file

  • OSS object

  • HDFS

  • OSS-HDFS

241

-

Data Quality

Data comparison node

You can use this type of node to compare data between different tables in multiple ways in a workflow.

1331

DATA_SYNCHRONIZATION_QUALITY_CHECK

Parameter value assignment and parameter passing

Assignment node

You can use this type of node if you want to use the outputs parameter of an assignment node to pass the data from the output of the last row of the code for the assignment node to its descendant nodes.

1100

CONTROLLER_ASSIGNMENT

Parameter node

You can use this type of node to aggregate parameters of its ancestor nodes and distribute parameters to its descendant nodes.

1115

PARAM_HUB

Control

For-each node

You can use this type of node to traverse the result set of an assignment node.

1106

CONTROLLER_TRAVERSE

Do-while node

You can use this type of node to execute the logic of specific nodes in loops. You can also use this type of node together with an assignment node to generate the data that is passed to a descendant node of the assignment node in loops.

1103

CONTROLLER_CYCLE

Branch node

You can use this type of node to route results based on logical conditions. You can also use this type of node together with an assignment node.

1101

CONTROLLER_BRANCH

Merge node

You can use this type of node to merge the status of its ancestor nodes and prevent dry run of its descendant nodes.

1102

CONTROLLER_JOIN

Others

Shell node

Shell nodes support the standard Shell syntax. The interactive syntax is not supported.

6

SHELL2

Function Compute node

You can use this type of node to periodically schedule and process event functions and complete integration and joint scheduling with other types of nodes.

1330

FUNCTION_COMPUTE

Data push node

You can use this type of node to obtain the data query results generated by other nodes and push the obtained data to DingTalk groups, Lark groups, WeCom groups, or Microsoft Teams. This way, members in the groups or teams can receive the latest data at the earliest opportunity.

1332

DATA_PUSH

  • On this page (1, T)
  • Data synchronization nodes
  • Compute engine nodes
  • General nodes
Feedback