All Products
Search
Document Center

DataWorks:Configure an assignment node

Last Updated:May 23, 2024

If you want a node to pass its output to a descendant node, you can configure the current node as an assignment node. Assignment nodes support the Shell, ODPS SQL, and Python languages. An assignment node can assign the output of its last statement to the outputs parameter, which is a built-in output parameter of the node. This way, descendant nodes can reference the value of the outputs parameter. This topic describes how to use an assignment node.

Precautions

  • Features

    • Assignment nodes can pass data only to their level-1 child nodes.

    • An assignment node can use the outputs parameter to pass only the output of its last statement to its descendant nodes.

    • The version of the Python language used by assignment nodes is Python 2.0.

    • You cannot add comments to the code of assignment nodes. Otherwise, the result may be incorrect.

  • Edition and the outputs parameter

    • For some types of nodes, you do not need to configure assignment nodes if you want to pass data between nodes. You can manually add the outputs parameter to Output Parameters or Input Parameters for the nodes. The outputs parameter functions the same way as an assignment node. For example, you can manually add the outputs parameter to Output Parameters or Input Parameters for the EMR Hive, EMR Spark SQL, ODPS Script, Hologres SQL, AnalyticDB for PostgreSQL, and MySQL nodes. For more information about how to add the outputs parameter, see Configure input and output parameters.

    • Only DataWorks Standard Edition or a more advanced edition supports assignment nodes and allows you to use the outputs parameter for the EMR Hive, EMR Spark SQL, ODPS Script, Hologres SQL, AnalyticDB for PostgreSQL, and MySQL nodes. For information about how to activate DataWorks, see Purchase guide.

  • To prevent a node that depends on an assignment node from failing to obtain a result set of the outputs parameter from the assignment node, you must run the workflow to which the current node belongs after the current node and the assignment node are configured.

    • For information about how to debug and run a node, see the Debug and run nodes section in this topic.

    • You can commit a node that depends on an assignment node and the assignment node to Operation Center in the development environment and test whether the referenced data is correct after the current node references the result set of the outputs parameter from the assignment node. For more information, see the Test the result set passed from an assignment node section in this topic.

      Note

      All nodes that depend on an assignment node can obtain the result set of the outputs parameter from the assignment node. No limits are imposed on the node type. This topic uses ODPS SQL and Shell nodes as examples to describe how to obtain the result set of the outputs parameter from an assignment node.

  • You can use an extract, transform, and load (ETL) workflow template to experience the capabilities of using an assignment node to pass the output of its last statement to descendant nodes of the assignment node.

How it works

In DataWorks, input and output parameters are used to transmit parameter settings between ancestor and descendant nodes. An assignment node can assign the output of its last statement to the outputs parameter. If a node depends on the assignment node, you can configure the outputs parameter of the assignment node as an input parameter of the current node. This way, the current node can obtain the result set of the outputs parameter from the assignment node. You cannot modify the outputs parameter. The value of the outputs parameter is determined by the output of the last statement in the code.

  • If a node wants to obtain the result set of the outputs parameter from an assignment node, you must make sure that the node is a level-1 child node of the assignment node, and the outputs parameter is added to Input Parameters for the node. You can specify a custom name for the added input parameter, such as sql_inputs in the preceding figure.

  • The output format of the outputs parameter varies based on the assignment language used by an assignment node. The result set of the outputs parameter or the specified data in the result set is passed to the descendant nodes of the assignment node in the ${Parameter name} format as a one-dimensional array or two-dimensional array.

Procedure of using an assignment node

  1. Configure an assignment node: Define the result set of the outputs parameter of an assignment node. In this phase, you need to select the assignment language and determine the output of the last statement in the code for the assignment node.

  2. Configure scheduling dependencies: Configure scheduling dependencies to allow a node to directly depend on the assignment node.

  3. Configure settings to allow the descendant node of the assignment node to reference the result set: Add the outputs parameter of the assignment node in the ${Parameter name} format to Input Parameters in the Parameters section of the Properties tab for the descendant node. The specified data in the result set can be passed to the descendant node as a one-dimensional array or two-dimensional array based on the assignment language used by the assignment node.

  4. Debug and run the descendant node: Run the workflow to which the descendant node belongs to check whether the reference results are as expected.

  5. Test the obtained result set: After the descendant node of the assignment node references the result set passed from the assignment node, you can commit the descendant node and the assignment node to Operation Center in the development environment and test whether the referenced data is correct.

Go to an entry point for creating an assignment node

  1. Go to the DataStudio page.

    Log on to the DataWorks console. In the left-side navigation pane, choose Data Modeling and Development > DataStudio. On the page that appears, select the desired workspace from the drop-down list and click Go to DataStudio.

  2. Go to an entry point for creating an assignment node.

    In the Scheduled Workflow pane of the DataStudio page, find the desired workflow and create an assignment node in the workflow. Configure basic information for the node, such as the name and storage path. The following figure shows the entry points.

    In this example, three assignment nodes that use the Python, ODPS SQL, and Shell languages are created. The node names are fuzhi_python, fuzhi_sql, and fuzhi_shell.

    image.png

    The output format of the outputs parameter varies based on the assignment language. For more information, see the Output format of the outputs parameter section in this topic.

    In addition, you can configure basic properties, time properties, and resource properties for all the nodes based on your business requirements. For more information, see Configure basic properties, Configure time properties, and Configure the resource property.

Output format of the outputs parameter

Assignment nodes support the Shell, ODPS SQL, and Python languages. The output format of the outputs parameter varies based on the assignment language. The result set of the outputs parameter or the specified data in the result set is passed to the descendant nodes of the assignment node in the ${Parameter name} format as a one-dimensional array or two-dimensional array.

Language

Value of the outputs parameter

Output format of the outputs parameter

Size limit on the value of the outputs parameter

ODPS SQL

The output of the SELECT statement in the last row is used as the value of the outputs parameter for the assignment node. This way, the output can be referenced by other nodes.

The data is passed to the descendant nodes of the assignment node as a two-dimensional array.

The value of the outputs parameter cannot exceed 2 MB in size. If the value exceeds 2 MB in size, the assignment node fails to run.

Shell

The output of the ECHO statement in the last row is used as the value of the outputs parameter for the assignment node. This way, the output can be referenced by other nodes.

The data is passed to the descendant nodes of the assignment node as a one-dimensional array whose elements are separated by commas (,).

Python

The output of the PRINT statement in the last row is used as the value of the outputs parameter for the assignment node. This way, the output can be referenced by other nodes.

The data is passed to the descendant nodes of the assignment node as a one-dimensional array whose elements are separated by commas (,).

Example 1 of obtaining the result set passed from an assignment node

In this example, you can draw lines to configure the start node as the ancestor node of all assignment nodes and the down_compare node as the descendant node of all assignment nodes to establish dependencies among all the nodes. The down_compare node is a Shell node. The down_compare node references the result set or the specified data in the result set that is passed from the assignment nodes fuzhi_sql, fuzhi_python, and fuzhi_shell in the ${Parameter name} format as a one-dimensional array or two-dimensional array.参数透传

  • Assignment nodes (fuzhi_python, fuzhi_sql, and fuzhi_shell): contain a built-in output parameter named outputs.

  • Descendant node (down_compare): After scheduling dependencies are configured, add the outputs parameter to Input Parameters in the Parameters section of the Properties tab for the descendant node. You can specify a custom name for the input parameter.

The following sections describe the procedure details.

Configure Output Parameters for fuzhi_sql and Input Parameters for down_compare

This section describes how to configure Output Parameters for fuzhi_sql and Input Parameters for down_compare.

  1. Configure fuzhi_sql.

    1. In the desired workflow, find fuzhi_sql and double-click its name.

    2. On the configuration tab of fuzhi_sql, select ODPS SQL for Language and write value assignment code.

      Sample code:

      select * from xc_dpe_e2.xc_rpt_user_info_d  where dt='20191008' limit 10;  
    3. In the right-side navigation pane, click Properties. Then, configure Output Parameters in the Parameters section of the Properties tab.

      fuzhi_sql assigns the output of the code to the outputs parameter.

      上游节点

  2. Configure down_compare.

    1. In the desired workflow, find down_compare and double-click its name.

    2. On the configuration tab of down_compare, write code.

      Sample code:

      echo '${sql_inputs}';
      echo 'Use the data in the first row in the output of fuzhi_sql as the input'${sql_inputs[0]};
      echo 'Use the data in the second row in the output of fuzhi_sql as the input'${sql_inputs[1]};
      echo 'Use the value of the second field in the first row in the output of fuzhi_sql as the input'${sql_inputs[0][1]};
      echo 'Use the value of the third field in the second row in the output of fuzhi_sql as the input'${sql_inputs[1][2]};
    3. In the right-side navigation pane, click Properties. Then, configure Input Parameters in the Parameters section of the Properties tab.

      Add the outputs parameter of fuzhi_sql to Input Parameters for down_compare and rename the parameter sql_inputs.

  3. Debug and run the descendant node.

Configure Output Parameters for fuzhi_python and Input Parameters for down_compare

This section describes how to configure Output Parameters for fuzhi_python and Input Parameters for down_compare.

  1. Configure fuzhi_python.

    1. In the desired workflow, find fuzhi_python and double-click its name.

    2. On the configuration tab of fuzhi_python, select Python for Language and write value assignment code.

      Sample code:

      print "a,b,c";
    3. In the right-side navigation pane, click Properties. Then, configure Output Parameters in the Parameters section of the Properties tab.

      fuzhi_python assigns the output of the code to the outputs parameter. In this example, the output is a,b,c.Python

      The data a,b,c is assigned to the outputs parameter of fuzhi_python as a one-dimensional array.

  2. Configure down_compare.

    1. In the desired workflow, find down_compare and double-click its name.

    2. On the configuration tab of down_compare, write code.

      Sample code:

      echo 'The output of fuzhi_python'${python_inputs};
      echo 'Use the first value in the output of fuzhi_python as the input'${python_inputs[0]};
      echo 'Use the second value in the output of fuzhi_python as the input'${python_inputs[1]};
    3. In the right-side navigation pane, click Properties. Then, configure Input Parameters in the Parameters section of the Properties tab.

      Add the outputs parameter of fuzhi_python to Input Parameters for down_compare and rename the parameter python_inputs.

  3. Debug and run the descendant node.

Configure Output Parameters for fuzhi_shell and Input Parameters for down_compare

This section describes how to configure Output Parameters for fuzhi_shell and Input Parameters for down_compare.

  1. Configure fuzhi_shell.

    1. In the desired workflow, find fuzhi_shell and double-click its name.

    2. On the configuration tab of fuzhi_shell, write value assignment code.

      Sample code:

      echo "hello,world";
    3. In the right-side navigation pane, click Properties. Then, configure Output Parameters in the Parameters section of the Properties tab.

      fuzhi_shell assigns the output of the code to the outputs parameter. In this example, the output is hello,world.SHELL

      The data hello,world is assigned to the outputs parameter of fuzhi_shell as a one-dimensional array.

  2. Configure down_compare.

    1. In the desired workflow, find down_compare and double-click its name.

    2. On the configuration tab of down_compare, write code.

      Sample code:

      echo 'The output of fuzhi_shell'${shell_inputs};
      echo 'Use the first value in the output of fuzhi_shell as the input'${shell_inputs[0]};
      echo 'Use the second value in the output of fuzhi_shell as the input'${shell_inputs[1]};
    3. In the right-side navigation pane, click Properties. Then, configure Input Parameters in the Parameters section of the Properties tab.

      Add the outputs parameter of fuzhi_shell to Input Parameters for down_compare and rename the parameter shell_inputs.

  3. Debug and run the descendant node.

Example 2 of obtaining the result set passed by an assignment node

The following table describes the value assignment cases of the outputs parameter for the assignment nodes that use different languages.

Language

Value of the outputs parameter

Configuration of scheduling parameters for assignment nodes

Configuration of scheduling parameters for descendant nodes

Method for descendant nodes to obtain data

Returned result of descendant nodes

ODPS SQL

Query the fuzhi_tb table.

  • Statement: SELECT * FROM fuzhi_tb;.

  • Result: 运行结果

  1. By default, the outputs parameter is added to Output Parameters in the Parameters section of the Properties tab for your assignment node.

  2. On the configuration tab of the assignment node, click the 提交 icon to commit the assignment node.

For more information about how to configure input and output parameters, see Configure input and output parameters.

In the following example, the assignment node uses the ODPS SQL language.

  1. Configure scheduling dependencies between the assignment node and its descendant nodes. For more information about how to configure scheduling dependencies between nodes, see Configure same-cycle scheduling dependencies.

  2. In the Parameters section of the Properties tab, add the input parameter named inputs_odps_sql to Input Parameters. For more information about how to configure input and output parameters, see Configure input and output parameters.

Valid values for different types of descendant nodes:

  • ODPS SQL: select '${inputs_odps_sql[0][0]}';

  • Shell: echo '${inputs_shell[0]}';

  • PyODPS 3: print ('${inputs_python[0]}');

Hello

Shell

Example: echo 'Data', 'Assignment Node 2 uses the Shell language';.

Data

Python

Example: print "Works!, Assignment Node 3 uses the Python language";.

Works!

Debug and run nodes

After a node that depends on an assignment node references the result set passed from the assignment node, you can double-click the name of the workflow to which the node belongs to open the configuration tab of the workflow, and click the image icon in the top toolbar of the configuration tab of the workflow to run the workflow and check whether the reference results are as expected.

Note
  • If the descendant node of an assignment node is a for-each node or a do-while node, you must go to Operation Center to run the descendant node and view the reference results.

  • For information about the best practices of using an assignment node together with a for-each node or a do-while node, see Configure a for-each node and Configure a do-while node.

Test the result set passed from an assignment node

After a node that depends on an assignment node references the result set passed from the assignment node, you can commit the current node and the assignment node to the development environment, and go to Operation Center in the development environment to backfill data to test whether the obtained result set is correct.