DataWorks nodes - DataWorks - Alibaba Cloud Documentation Center

The DataStudio service of DataWorks allows you to create various types of nodes, such as data synchronization nodes, computing resource nodes, and general nodes, to meet your different data processing requirements. Computing resource nodes include ODPS SQL nodes, Hologres SQL nodes, and E-MapReduce (EMR) Hive nodes. General nodes include zero load nodes and Check nodes.

Important

If you cannot create a computing resource node, such as an ODPS SQL, a Hologres SQL, or an EMR Hive node, in DataStudio, you can click Computing Resource in the left-side navigation pane of the DataStudio page to check whether the corresponding computing resource is associated with DataStudio. If the corresponding computing resource is associated with DataStudio, but you still cannot create a node of the computing resource type, you can refresh the current page to update the cached data or attempt to use the browser in incognito mode.

Data synchronization nodes

Type	Description	Node code	Task type (specified by TaskType)
Batch synchronization node	This type of node is used to periodically synchronize offline data and to synchronize data between heterogeneous data sources in complex scenarios. For information about the data source types that support batch synchronization, see Supported data source types and read and write operations.	23	DI2
Real-time synchronization node	This type of node is used to synchronize incremental data in real time. A real-time synchronization node uses three basic plug-ins to read, convert, and write data. These plug-ins interact with each other based on an intermediate data format that is defined by the plug-ins. For information about the data source types that support real-time synchronization, see Supported data source types and read and write operations.	900	RI

Note

In addition to the nodes that can be created on the DataStudio page, DataWorks also allows you to create different types of synchronization tasks in Data Integration. For example, you can create a synchronization task in Data Integration that synchronizes full data at a time and then incremental data in real time or a synchronization task that synchronizes all data in a database in offline mode. For more information, see Overview of the full and incremental synchronization feature. In most cases, the node code of a task that is created in Data Integration is 24.

Compute engine nodes

In a specific workflow, you can create nodes of a specific compute engine type, use the nodes to develop data, issue the engine code to a corresponding data cleansing engine, and then run the code.

Compute engine integrated with DataWorks	Encapsulated engine capability	Node code	Task type (specified by TaskType)
MaxCompute	Develop a MaxCompute SQL task	10	ODPS_SQL
	Develop a MaxCompute Spark task	225	SPARK
	Develop a PyODPS 2 task	221	PY_ODPS
	Develop a PyODPS 3 task	1221	PY_ODPS3
	Develop a MaxCompute script task	24	ODPS_SCRIPT
	Develop a MaxCompute MR task	11	ODPS_MR
	Reference a script template	1010	COMPONENT_SQL
E-MapReduce	Create an EMR Hive node	227	EMR_HIVE
	Create an EMR MR node	230	EMR_MR
	Create an EMR Spark SQL node	229	EMR_SPARK_SQL
	Create an EMR Spark node	228	EMR_SPARK
	Create an EMR Shell node	257	EMR_SHELL
	Create an EMR Presto node	259	EMR_PRESTO
	Create an EMR Spark Streaming node	264	SPARK_STREAMING
	Create an EMR Kyuubi node	268	EMR_KYUUBI
	Create an EMR Trino node	267	EMR_TRINO
CDH	Create a CDH Hive node	270	CDH_HIVE
	Create a CDH Spark node	271	CDH_SPARK
	Create a CDH MR node	273	CDH_MR
	Create a CDH Presto node	278	CDH_PRESTO
	Create a CDH Impala node	279	CDH_IMPALA
	Create a CDH Spark SQL node	272	CDH_SPARK_SQL
AnalyticDB for PostgreSQL	Create and use AnalyticDB for PostgreSQL nodes	-	-
AnalyticDB for MySQL	Create an AnalyticDB for MySQL node	-	-
Hologres	Create a Hologres SQL node	1093	HOLOGRES_SQL
	Create a node to synchronize schemas of MaxCompute tables with a few clicks	1094	HOLOGRES_SYNC_DDL
	Create a node to synchronize MaxCompute data with a few clicks	1095	HOLOGRES_SYNC_DATA
ClickHouse	Configure a ClickHouse SQL node	1301	CLICK_SQL
StarRocks	Configure a StarRocks node	10004	-
PAI	Use DataWorks tasks to schedule pipelines in Machine Learning Designer	1117	PAI_STUDIO
PAI	Create and use a PAI DLC node	1119	PAI_DLC
Database	Create and use a MySQL node	1000039	-
	Configure an SQL Server node	10001	-
	Configure an Oracle node	10002	-
	Configure a PostgreSQL node	10003	-
	Configure a DRDS node	10005	-
	Configure a PolarDB for MySQL node	10006	-
	Configure a PolarDB for PostgreSQL node	10007	-
	Configure a Doris node	10008	-
	Configure a MariaDB node	10009	-
	Configure a Redshift node	10011	-
	Configure a SAP HANA node	-	-
	Configure a Vertica node	10013	-
	Configure a DM node	10014	-
	Configure a KingbaseES node	10015	-
	Configure an OceanBase node	10016	-
	Configure a Db2 node	10017	-
	Configure a GBase 8a node	-	-
Others	Create and use a Data Lake Analytics node	1000023	-

General nodes

In a specific workflow, you can create a general node and use the node together with compute engine nodes to process complex logic.

Scenario	Node type	Description	Node code	Task type (specified by TaskType)
Business management	Zero load node	A zero load node is a control node that supports dry-run scheduling and does not generate data. In most cases, a zero load node serves as the root node of a workflow and allows you to easily manage nodes and workflows.	99	VIRTUAL_NODE
Event-based trigger	HTTP Trigger node	You can use this type of node if you want to trigger nodes in DataWorks to run after nodes in other scheduling systems finish running. Note DataWorks no longer allows you to create cross-tenant collaboration nodes. If you have used a cross-tenant collaboration node in your business, we recommend that you replace the cross-tenant collaboration node with an HTTP Trigger node. An HTTP Trigger node provides the same capabilities as a cross-tenant collaboration node.	1114	SCHEDULER_TRIGGER
	OSS object inspection node	You can use this type of node if you want to trigger a descendant node to run after Object Storage Service (OSS) objects are generated.	239	-
	FTP Check node	You can use this type of node if you want to trigger a descendant node to run after File Transfer Protocol (FTP) files are generated. Note FTP Check nodes are no longer supported in DataWorks. We recommend that you replace FTP Check nodes with Check nodes to perform checks in DataWorks.	1320	FTP_CHECK
	Check node	You can use this type of node to check the availability of objects based on check policies. If the condition that is specified in the check policy for a Check node is met, the task on the Check node is successfully run. If the running of a task depends on an object, you can use a Check node to check the availability of the object and configure the task as the descendant task of the Check node. If the condition that is specified in the check policy for the Check node is met, the task on the Check node is successfully run and then its descendant task is triggered to run. Supported objects: MaxCompute partitioned table FTP file OSS object HDFS OSS-HDFS	241	-
Data Quality	Data comparison node	You can use this type of node to compare data between different tables in multiple ways in a workflow.	1331	DATA_SYNCHRONIZATION_QUALITY_CHECK
Parameter value assignment and parameter passing	Assignment node	You can use this type of node if you want to use the outputs parameter of an assignment node to pass the data from the output of the last row of the code for the assignment node to its descendant nodes.	1100	CONTROLLER_ASSIGNMENT
Parameter value assignment and parameter passing	Parameter node	You can use this type of node to aggregate parameters of its ancestor nodes and distribute parameters to its descendant nodes.	1115	PARAM_HUB
Control	For-each node	You can use this type of node to traverse the result set of an assignment node.	1106	CONTROLLER_TRAVERSE
	Do-while node	You can use this type of node to execute the logic of specific nodes in loops. You can also use this type of node together with an assignment node to generate the data that is passed to a descendant node of the assignment node in loops.	1103	CONTROLLER_CYCLE
	Branch node	You can use this type of node to route results based on logical conditions. You can also use this type of node together with an assignment node.	1101	CONTROLLER_BRANCH
	Merge node	You can use this type of node to merge the status of its ancestor nodes and prevent dry run of its descendant nodes.	1102	CONTROLLER_JOIN
Others	Shell node	Shell nodes support the standard Shell syntax. The interactive syntax is not supported.	6	SHELL2
	Function Compute node	You can use this type of node to periodically schedule and process event functions and complete integration and joint scheduling with other types of nodes.	1330	FUNCTION_COMPUTE
	Data push node	You can use this type of node to obtain the data query results generated by other nodes and push the obtained data to DingTalk groups, Lark groups, WeCom groups, or Microsoft Teams. This way, members in the groups or teams can receive the latest data at the earliest opportunity.	1332	DATA_PUSH