DataWorks provides for-each nodes. You can use a for-each node to loop through the result set passed by an assignment node. You can also customize the inner nodes of the for-each node. This topic describes the composition and application logic of a for-each node.
Usage notes
The following table describes the usage notes of for-each nodes.
Item | References |
Learn the use scenarios of for-each nodes. | Note A for-each node is used to only loop through the result set passed by an assignment node. |
Learn the limits and precautions of for-each nodes, such as the upper limit for the number of loops, the method to test a for-each node, and the method to view logs of a for-each node. | Limits and Precautions |
Learn that you can configure an inner workflow for a for-each node based on your business requirements. When you configure an inner workflow for a for-each node, make sure that the inner workflow starts with the start node and ends with the end node. | |
Learn that the number of loops for a for-each node is determined by the output of an assignment node. | |
Learn that the built-in variables provided by a for-each node can be used to obtain the related values from the result set of an assignment node in each loop. | |
Learn samples of variable values and the number of loops for a for-each node. |
Scenario
A for-each node in DataWorks is used in loop traversal scenarios and must be used with an assignment node. An assignment node must be configured as the ancestor node of a for-each node. After the assignment node passes its output to the for-each node, the for-each node loops through the output.
Limits
Only DataWorks Standard Edition and more advanced editions support for-each nodes. For information about the differences among DataWorks editions, see Differences among DataWorks editions.
The maximum number of loops for a for-each node is 1024. The actual number of loops for a for-each node is determined by the result set passed by an assignment node.
Parallel execution is not supported. A loop can start only if the previous loop ends.
Precautions
Dimension | Item | Description |
Dependencies | Dependency settings | A for-each node needs to loop through the value passed by an assignment node. Therefore, an assignment node must be configured as the ancestor node of a for-each node. The for-each node must depend on the assignment node. |
Traversal support | Upper limit for the number of loops | The maximum number of loops for a for-each node is 1024. If the number of loops for a for-each node exceeds 1024, an error is reported. The actual number of loops for a for-each node is determined by the output of an assignment node. |
Number of loops | The number of loops for a for-each node is determined by the result set passed by an assignment node. | |
Inner nodes | Workflow orchestration |
|
Value acquisition | The built-in variables provided by a for-each node can be used to obtain a specific value passed by the assignment node that is configured as the ancestor node of the for-each node. | |
Debugging | Node debugging |
|
Log viewing | To view the operational logs of a for-each node in Operation Center, perform the following steps: find the for-each node on the Cycle Task page and open the directed acyclic graph (DAG) of the node. In the DAG, right-click the node name and select View Internal Nodes to view the operational logs of the inner nodes. |
Composition and workflow orchestration of a for-each node
A for-each node is a type of special node that contains inner nodes. When you create a for-each node, the following three inner nodes are automatically created: the start node (loop start node), the Shell node (loop task node), and the end node (loop end node). The inner nodes are organized into an inner node workflow to loop through the output of an assignment node. The preceding figure shows the following information:
Shell node
DataWorks automatically creates an inner Shell task node. You can delete the default Shell node and configure an inner loop task node based on your business requirements.
If you use a Shell node as an inner loop task node, double-click the Shell node and edit the node code on the node configuration tab.
If your loop task is complicated, you can create more inner nodes in the inner workflow for the for-each node to process the loop task and connect the inner nodes based on your business requirements.
NoteWhen you customize inner loop task nodes of a for-each node, you can delete the dependencies between the existing inner nodes of the for-each node, and configure an inner workflow for the for-each node based on your business requirements. When you configure an inner workflow for a for-each node, make sure that the inner workflow starts with the start node and ends with the end node.
start and end nodes
The start node marks the startup of a loop, and the end node marks the end of a loop. The two nodes are not used to process a loop task.
NoteThe number of loops for a for-each node is determined by the output of an assignment node rather than the end node of the for-each node. The assignment node is configured as the ancestor node of the for-each node.
Number of loops
The maximum number of loops for a for-each node is 1024. The actual number of loops for a for-each node is determined by the result set passed by an assignment node.
The number of loops for a for-each node is determined by the result set passed by an assignment node:
If the assignment node that you configure as the ancestor node of the for-each node uses Shell or Python, the number of loops for the for-each node is determined by the generated one-dimensional array. The number of loops is equal to the number of elements in the one-dimensional array. The elements are separated by commas (,).
For example, if the assignment node uses Shell or Python (Python 2), the output of the assignment node is a one-dimensional array such as
2021-03-28,2021-03-29,2021-03-30,2021-03-31,2021-04-01
. The for-each node loops through the output of the assignment node for five times.If you use an SQL node as an inner loop task node of the for-each node, the number of loops for the for-each node is determined by the generated two-dimensional array. The number of loops is equal to the number of rows in the two-dimensional array.
For example, if ODPS SQL is used by the assignment node, the output of the assignment node is a two-dimensional array:
+----------------------------------------------+ | uid | region | age_range | zodiac | +----------------------------------------------+ | 0016359810821 | Hubei Province | 30 to 40 years old | Cancer | | 0016359814159 | Unknown | 30 to 40 years old | Cancer | +----------------------------------------------+
The output indicates that the for-each node loops through the output of the assignment node for twice.
Built-in variables
You can use the built-in variables provided by a for-each node to obtain the result set passed by the assignment node that is configured as the ancestor node of the for-each node. If the inner workflow for the for-each node contains an assignment node, you can obtain the output of the assignment node by using the default method. For more information about the default method, see Configure an assignment node.
You can use the built-in variables provided by a for-each node in DataWorks to obtain the number of loops that are finished and the offset between the current loop and the first loop when the for-each node is used to loop through the output of the assignment node.
Built-in variable | Description | Compare with the for loop |
| Obtain the dataset of an assignment node. | Equivalent to the code result in the for loop.
|
| Obtain the current data entry. | Sample for loop code:
|
| Obtain the offset between the current loop and the first loop. | |
| Obtain the number of loops that are finished. | - |
If you are familiar with the schema of the output table, you can also use the variables in the following table to obtain the related values in each loop.
Other variable | Description |
| If the output of the assignment node that is configured as the ancestor node of a for-each node is a two-dimensional array, the variable is used to obtain the data of a specific column for the current data entry when the for-each node loops through the output of the assignment node. |
| If the output of the assignment node that is configured as the ancestor node of a for-each node is a two-dimensional array, the variable is used to obtain the data of Row i and Column j in the dataset of the assignment node. |
| If the output of the assignment node that is configured as the ancestor node of a for-each node is a one-dimensional array, the variable is used to obtain the data of a specific column. |
Examples of variable values
Sample 1: A Shell node is used as an assignment node
Output of the assignment node
A Shell node is used as an assignment node, and the last output of the assignment node is
2021-03-28,2021-03-29,2021-03-30,2021-03-31,2021-04-01
.Values of the variables for a for-each node
NoteThe output of the assignment node is a one-dimensional array, and the five elements in the array are separated by commas (,). Therefore, the number of loops for the for-each node is 5.
Built-in variable
Value for the first loop
Value for the second loop
${dag.loopDataArray}
2021-03-28,2021-03-29,2021-03-30,2021-03-31,2021-04-01
${dag.foreach.current}
2021-03-28
2021-03-29
${dag.offset}
0
1
${dag.loopTimes}
1
2
${dag.foreach.current[3]}
2021-03-30
Sample 2: An ODPS SQL node is used as an assignment node
Output of the assignment node
An ODPS SQL node is used as an assignment node, and the last SELECT statement returns the following two pieces of data:
+----------------------------------------------+ | uid | region | age_range | zodiac | +----------------------------------------------+ | 0016359810821 | Hubei Province | 30 to 40 years old | Cancer | | 0016359814159 | Unknown | 30 to 40 years old | Cancer | +----------------------------------------------+
Values of the variables for a for-each node
NoteThe output of the assignment node is a two-dimensional array and two rows of data are contained in the array. Therefore, the number of loops for the for-each node is 2.
Built-in variable
Value for the first loop
Value for the second loop
${dag.loopDataArray}
+----------------------------------------------+ | uid | region | age_range | zodiac | +----------------------------------------------+ | 0016359810821 | Hubei Province | 30 to 40 years old | Cancer | | 0016359814159 | Unknown | 30 to 40 years old | Cancer | +----------------------------------------------+
${dag.foreach.current}
0016359810821,Hubei Province,30–40 years old,Cancer
0016359814159,unknown,30–40 years old,Cancer
${dag.offset}
0
1
${dag.loopTimes}
1
2
${dag.foreach.current[0]}
0016359810821
0016359814159
${dag.loopDataArray[1][0]}
0016359814159