In complex data workflows, it's often necessary to pass dynamic information between nodes. While a common method is to use an intermediate table, this approach is highly inefficient for passing small amounts of data, adding unnecessary I/O and complexity. The assignment node provides a lightweight solution: it executes a short script (ODPS SQL, Python 2, or Shell) and passes its output directly as a parameter to the downstream node. This allows you to build flexible pipelines where tasks are dynamically configured based on the results from upstream tasks.
Usage notes
Edition: DataWorks Standard Edition or higher.
Permissions: The RAM user must be added to the target workspace and granted the required developer permissions.
How it works
The core function of the assignment node is parameter passing: transferring data from an assignment node to a downstream node.
The assignment node produces data by automatically assigning the last output or query result to a system-generated node output parameter named
outputs.The downstream node consumes the data. You configure it to use the
outputsby adding a node input parameter (e.g., param) .
Parameter format
The value and format of the outputs parameter depend on the script language used:
Language | Value | Format |
ODPS SQL | The output of the last | The result set is passed as a two-dimensional array. |
Python 2 | The output of the last | The output is treated as a single string, which is then split by commas ( |
Shell | The output of the last |
Procedure
The result of an assignment node can be passed to any type of downstream node. The following example demonstrates this workflow using a Shell node.
Configure the assignment node.
Log on to the DataWorks console. In the left navigation pane, click .
In your workflow, create and edit an assignment node. On the node configuration page, select ODPS SQL for the language and write the code.
select * from xc_dpe_e2.xc_rpt_user_info_d where dt='20191008' limit 10;(Optional) In the right pane, click Properties. On the Input and Output Parameters tab, the system automatically creates an output parameter named
outputsfor this node.
Configure the Shell node.
Create a Shell node and set it as a downstream node of the assignment node by dragging a connection from the assignment node to the Shell node in the workflow canvas.
On the configuration page for the Shell node, click Properties in the right-side pane and then select the Input and Output Parameters tab.
In the Input Parameters section, click Create.
In the dialog box that appears, select the
outputsparameter from the assignment node and specify a parameter name for the current node's input parameter, such as param.This configuration automatically creates a dependency between the downstream node and the assignment node.
After configuring the parameter, reference the value passed in the Shell node's code using the
${param}format.echo '${param}'; echo 'First row: '${param[0]}; echo 'Second field of the first row: '${param[0][1]};The
${...}variable access syntax, including array indexing, is not standard Shell syntax. DataWorks preprocesses and statically replaces these variables before execution. For example, DataWorks replaces${param[0][1]}with the actual value retrieved from the assignment node before submitting the final script to the Shell for execution.
Run and validate the result.
Double-click the business flow name to open it. On the workflow configuration page, click Run in the toolbar to run the workflow and validate the result. Alternatively, after you commit the node to the development environment, go to the Operation Center and use data backfill to test the execution.
Limitations
Parameters can only be passed to immediate downstream nodes.
Size limit: The
outputscannot exceed 2 MB, otherwise the assignment node will fail.Syntax limitations:
Do not include comments in the assignment node's code. Comments can disrupt output parsing and cause the node to fail or produce incorrect values.
The
WITHclause is not supported in ODPS SQL mode.
Examples by language
The outputs data format and referencing method vary by language.
Example 1: Pass an ODPS SQL query result
The result of a SQL query is passed to the Shell node as a two-dimensional array.
Assignment node
Write a
SELECTquery.select * from xc_dpe_e2.xc_rpt_user_info_d where dt='20191008' limit 2;Shell node
Add an input parameter named
paramthat references the assignment node'soutputs. Use the following script to read the data:# Output the entire 2D array echo "Full result set: ${param}" # Output the first row (a 1D array) echo "First row: ${param[0]}" # Output the second field of the first row echo "Second field of the first row: ${param[0][1]}"Expected output
DataWorks parses the parameter and performs a static replacement, producing the following output similar to this (values depend on your data):
Full result set: value1,value2 value3,value4 First row: value1,value2 Second field of the first row: value2
Example 2: Pass a Python 2 output
The output of a Python 2 print statement is split by commas (,) and passed as a one-dimensional array.
Assignment node
The Python 2 code is as follows:
print "hello,dataworks";Shell node
Add an input parameter named
paramthat references the assignment node'soutputs. Use the following script to read the data:# Output the entire 1D array echo "Full result set: ${param}" # Output elements by index echo "First element: ${param[0]}" echo "Second element: ${param[1]}"Expected output
DataWorks parses the parameter and performs a static replacement, producing the following output:
Full result set: "hello","dataworks" First element: hello Second element: dataworks
Example 3: Pass a Shell output
The output of a Shell echo statement is split by commas (,) and passed as a one-dimensional array.
Assignment node
The Shell code is as follows:
echo "hello,dataworks";Shell node
Add an input parameter named
paramthat references the assignment node'soutputs. Use the following script to read the data:# Output the entire 1D array echo "Full result set: ${param}" # Output elements by index echo "First element: ${param[0]}" echo "Second element: ${param[1]}"Expected output
DataWorks parses the parameter and performs a static replacement, producing the following output:
Full result set: "hello","dataworks" First element: hello Second element: dataworks
More use cases
Use with loop nodes
When the downstream node is a for-each or do-while node, refer to Configure a for-each node and Configure a do-while node.
Built-in parameter assignment in other node types
The assignment node supports only ODPS SQL, Python 2, and Shell. However, for many other node types, you can use their built-in parameter assignment feature to achieve the same goal and simplify your workflow. These nodes include: EMR Hive, Hologres SQL, EMR Spark SQL, AnalyticDB for PostgreSQL, ClickHouse SQL, and MySQL.