DataWorks provides for-each nodes. You can use a for-each node to loop through the result set passed by an assignment node. You can also customize the inner nodes of the for-each node. This topic provides an example on how a for-each node works and how to configure a for-each node. In this example, the for-each node is used to loop through the output of an assignment node twice, and the current loop count will be displayed for each loop.
Prerequisites
Before you configure a for-each node, you must be familiar with the logic of a for-each node. This prevents errors during the configuration of the node. For information about the logic of a for-each node, see Logic of for-each nodes.
Procedure
In most cases, a for-each node is used with an assignment node. This section describes the procedure for using a for-each node.
Configure dependencies for a for-each node.
A for-each node must depend on an assignment node. For information about how to configure dependencies for a for-each node, see Create and configure a workflow.
Configure inputs for the for-each node.
In the Input and Output Parameters section of the Properties tab for the for-each node, add the built-in output parameter named outputs of the assignment node to Input Parameters for the for-each node as an input parameter. For information about how to configure an assignment node, see Configure an assignment node.
Configure the inner nodes of the for-each node to obtain input parameters of the for-each node.
You can configure an inner workflow for the for-each node based on your business requirements, configure built-in variables for the inner nodes in the workflow to obtain the desired values for the input parameters, and then run the for-each node. For information about the built-in variables, see Built-in variables. For information about how to configure a for-each node, see Configure a for-each node.
Test the for-each node. You cannot test for-each nodes in DataStudio.
To test a for-each node, go to Operation Center, find the desired inner node, and then click the name of the node to view the details of the node. For more information, see Test the for-each node and view test results.
NoteIf you want to check whether an assignment node passes its output to a for-each node in Operation Center, you can use the data backfill feature and select both the assignment node and for-each node. You cannot obtain the output of the assignment node if you run only the for-each node.
Create and configure a workflow
To create a workflow that contains an assignment node as the ancestor node and a for-each node as the descendant node, perform the following steps:
Go to the DataStudio page.
Log on to the DataWorks console. In the left-side navigation pane, choose . On the page that appears, select the desired workspace from the drop-down list and click Go to DataStudio.
Create a for-each node.
In the Scheduled Workflow pane, move the pointer over the icon and choose
.Alternatively, you can find the desired workflow in the Business Flow section, right-click the workflow, and then choose
.In the Create Node dialog box, configure the parameters such as Name and Path.
Click Confirm.
Create an assignment node.
Double-click the workflow to go to the configuration tab of the workflow. Click + Create Node and drag Assignment Node in the General section to the canvas on the right.
For information about assignment nodes, see Configure an assignment node.
In the Create Node dialog box, configure the Name and Path parameters. By default, the assignment node is placed in the current workflow.
Click Confirm.
Drag a directed line to configure the assignment node as the ancestor node of the for-each node.
Configure an assignment node
On the configuration tab of the created workflow, double-click the name of the assignment node that you created. The configuration tab of the assignment node appears.
Select SHELL from the Language drop-down list.
Enter the following statement in the code editor:
echo 'this is name,ok';
In the right-side navigation pane, click the Properties tab. In the Output Parameters table of the Input and Output Parameters section, view the information about the outputs parameter. The outputs parameter is the default output parameter of the assignment node.
Click the icon in the top toolbar to save the assignment node.
Click the icon in the top toolbar to commit the assignment node.
In the Submit dialog box, configure the Change description parameter. Then, determine whether to review node code after you commit the node based on your business requirements.
ImportantYou must configure the Rerun and Parent Nodes parameters on the Properties tab before you commit the node.
You can use the code review feature to ensure the code quality of nodes and prevent node execution errors caused by invalid node code. If you enable the code review feature, the node code that is committed can be deployed only after the node code passes the code review. For more information, see Code review.
If the workspace that you use is in standard mode, you must click Deploy in the upper-right corner of the node configuration tab to deploy the node to the production environment for running after you commit the node. For more information, see Deploy nodes.
Configure a for-each node
Double-click the for-each node that you created. The configuration tab of the for-each node appears. By default, the start, shell, and end nodes are displayed on the tab.
You can replace a Shell node with another node based on your business requirements:
If you want to use a Shell node, you can directly configure the Shell node.
If you want to use another type of node, delete the default Shell node and create a node of the required type.
In this example, a Shell node is used.
Configure the Shell node.
Double-click the Shell node. The configuration tab of the Shell node appears.
Enter the following code in the code editor:
echo ${dag.loopTimes} ----Display the current number of loops.
NoteThe start and end nodes of the for-each node have fixed logic and cannot be edited.
After you modify the code of the Shell node, save the modification. No message that reminds you to save the modification will appear when you commit the node. If you do not save the modification, the code cannot be updated to the latest version in time.
A for-each node supports the following environment variables:
${dag.foreach.current}: the current data entry.
${dag.loopDataArray}: the input dataset.
${dag.offset}: the offset of the loop count to 1.
${dag.loopTimes}: the loop count, whose value equals the value of ${dag.offset} plus 1.
For more information about variables, see Built-in variables and Examples of variable values.
Configure the scheduling properties of the for-each node.
On the configuration tab of the for-each node, click the Properties tab in the right-side navigation pane.
Find the loopDataArray parameter in the Input Parameters table of the Input and Output Parameters section and click Change in the Actions column. The loopDataArray parameter is the default input parameter of the for-each node.
Select the outputs parameter of the assignment node from the drop-down list in the Value Source column and click Save.
NoteAfter you configure the assignment node as an ancestor node of the for-each node, you must specify the input parameter for the for-each node on the Properties tab. If you do not specify the input parameter, an error occurs when you commit the for-each node.
Click the icon in the top toolbar to save the for-each node.
Click the icon in the top toolbar to commit the for-each node.
ImportantYou must configure the Rerun and Parent Nodes parameters on the Properties tab before you commit the node.
In the Commit dialog box, select the inner nodes that you want to commit, enter your comments in the Description field, and then click Commit.
If the workspace that you use is in standard mode, you must click Deploy in the upper-right corner of the node configuration tab to deploy the nodes after you commit them. For more information, see Deploy nodes.
Test the for-each node and view test results
On the node configuration tab, click Operation Center in the upper-right corner to go to Operation Center.
In the left-side navigation pane of the Operation Center page, choose
.On the Cycle Task page, find the for-each node and click DAG in the Actions column to open the directed acyclic graph (DAG) of the for-each node. In the DAG of the for-each node, right-click the assignment node and choose
. In the Patch Data dialog box, configure the parameters and click OK.Refresh the Patch Data page. After the data backfill instance is run, click DAG in the Actions column of the instance.
In the DAG that appears, right-click the assignment node and select View Runtime Log to view its operational logs.
On the Patch Data page, right-click the for-each node in the DAG and select View Internal Nodes.
On the page that appears, click Loop 1 in the middle pane, right-click the Shell node in the DAG, and then select View Runtime Log.
On the page that appears, view the operational logs of the Shell node in the first loop.
Use the same method to view the operational logs of the Shell node in the second loop.