All Products
Search
Document Center

DataWorks:Logic of for-each nodes

Last Updated:May 28, 2024

DataWorks provides for-each nodes. You can use a for-each node to loop through the result set passed by an assignment node. You can also customize the inner nodes of the for-each node. This topic describes the composition and application logic of a for-each node.

Usage notes

The following table describes the usage notes of for-each nodes.

Item

References

Learn the use scenarios of for-each nodes.

Scenario

Note

A for-each node is used to only loop through the result set passed by an assignment node.

Learn the limits and precautions of for-each nodes, such as the upper limit for the number of loops, the method to test a for-each node, and the method to view logs of a for-each node.

Limits and Precautions

Learn that you can configure an inner workflow for a for-each node based on your business requirements. When you configure an inner workflow for a for-each node, make sure that the inner workflow starts with the start node and ends with the end node.

Composition and workflow orchestration of a for-each node

Learn that the number of loops for a for-each node is determined by the output of an assignment node.

Number of loops

Learn that the built-in variables provided by a for-each node can be used to obtain the related values from the result set of an assignment node in each loop.

Built-in variables

Learn samples of variable values and the number of loops for a for-each node.

Sample variable values and sample number of loops for a for-each node if a Shell or ODPS SQL node is used as an assignment node for the for-each node

Scenario

A for-each node in DataWorks is used in loop traversal scenarios and must be used with an assignment node. An assignment node must be configured as the ancestor node of a for-each node. After the assignment node passes its output to the for-each node, the for-each node loops through the output.

for-each

Limits

  • Only DataWorks Standard Edition and more advanced editions support for-each nodes. For information about the differences among DataWorks editions, see Differences among DataWorks editions.

  • The maximum number of loops for a for-each node is 1024. The actual number of loops for a for-each node is determined by the result set passed by an assignment node.

  • Parallel execution is not supported. A loop can start only if the previous loop ends.

Precautions

Dimension

Item

Description

Dependencies

Dependency settings

A for-each node needs to loop through the value passed by an assignment node. Therefore, an assignment node must be configured as the ancestor node of a for-each node. The for-each node must depend on the assignment node.

Traversal support

Upper limit for the number of loops

The maximum number of loops for a for-each node is 1024. If the number of loops for a for-each node exceeds 1024, an error is reported. The actual number of loops for a for-each node is determined by the output of an assignment node.

Number of loops

The number of loops for a for-each node is determined by the result set passed by an assignment node.

Inner nodes

Workflow orchestration

  • You can delete the existing dependencies between inner nodes of a for-each node and configure an inner workflow for the for-each node based on your business requirements. When you configure an inner workflow for a for-each node, make sure that the inner workflow starts with the start node and ends with the end node.

  • If inner codes of a for-each node use a branch node to perform a logical judgment or loop through the result of an assignment node, a merge node is also required.

Value acquisition

The built-in variables provided by a for-each node can be used to obtain a specific value passed by the assignment node that is configured as the ancestor node of the for-each node.

Debugging

Node debugging

  • If you use a workspace in standard mode, you cannot directly perform a test to run a for-each node in DataStudio.

    If you want to perform a test to verify the running result of a for-each node, you must commit and deploy the node that contains the for-each node to Operation Center in the development environment and run the for-each node.

  • You need to run the assignment node and for-each node at the same time. If you want to check whether an assignment node passes its output to a for-each node in Operation Center, you can use the data backfill feature to backfill data for both the assignment and for-each nodes. You cannot obtain the output of the assignment node if you run only the for-each node.

Log viewing

To view the operational logs of a for-each node in Operation Center, perform the following steps: find the for-each node on the Cycle Task page and open the directed acyclic graph (DAG) of the node. In the DAG, right-click the node name and select View Internal Nodes to view the operational logs of the inner nodes.

Composition and workflow orchestration of a for-each node

A for-each node is a type of special node that contains inner nodes. When you create a for-each node, the following three inner nodes are automatically created: the start node (loop start node), the Shell node (loop task node), and the end node (loop end node). The inner nodes are organized into an inner node workflow to loop through the output of an assignment node. for-each内部节点The preceding figure shows the following information:

  • Shell node

    DataWorks automatically creates an inner Shell task node. You can delete the default Shell node and configure an inner loop task node based on your business requirements.

    • If you use a Shell node as an inner loop task node, double-click the Shell node and edit the node code on the node configuration tab.

    • If your loop task is complicated, you can create more inner nodes in the inner workflow for the for-each node to process the loop task and connect the inner nodes based on your business requirements.

      Note

      When you customize inner loop task nodes of a for-each node, you can delete the dependencies between the existing inner nodes of the for-each node, and configure an inner workflow for the for-each node based on your business requirements. When you configure an inner workflow for a for-each node, make sure that the inner workflow starts with the start node and ends with the end node.

  • start and end nodes

    The start node marks the startup of a loop, and the end node marks the end of a loop. The two nodes are not used to process a loop task.

    Note

    The number of loops for a for-each node is determined by the output of an assignment node rather than the end node of the for-each node. The assignment node is configured as the ancestor node of the for-each node.

Number of loops

Note

The maximum number of loops for a for-each node is 1024. The actual number of loops for a for-each node is determined by the result set passed by an assignment node.

The number of loops for a for-each node is determined by the result set passed by an assignment node:

  • If the assignment node that you configure as the ancestor node of the for-each node uses Shell or Python, the number of loops for the for-each node is determined by the generated one-dimensional array. The number of loops is equal to the number of elements in the one-dimensional array. The elements are separated by commas (,).

    For example, if the assignment node uses Shell or Python (Python 2), the output of the assignment node is a one-dimensional array such as 2021-03-28,2021-03-29,2021-03-30,2021-03-31,2021-04-01. The for-each node loops through the output of the assignment node for five times.

  • If you use an SQL node as an inner loop task node of the for-each node, the number of loops for the for-each node is determined by the generated two-dimensional array. The number of loops is equal to the number of rows in the two-dimensional array.

    For example, if ODPS SQL is used by the assignment node, the output of the assignment node is a two-dimensional array:

    +----------------------------------------------+
    | uid            | region | age_range | zodiac |
    +----------------------------------------------+
    | 0016359810821  | Hubei Province | 30 to 40 years old   | Cancer |
    | 0016359814159  | Unknown   | 30 to 40 years old   | Cancer |
    +----------------------------------------------+

    The output indicates that the for-each node loops through the output of the assignment node for twice.

Built-in variables

You can use the built-in variables provided by a for-each node to obtain the result set passed by the assignment node that is configured as the ancestor node of the for-each node. If the inner workflow for the for-each node contains an assignment node, you can obtain the output of the assignment node by using the default method. For more information about the default method, see Configure an assignment node.

You can use the built-in variables provided by a for-each node in DataWorks to obtain the number of loops that are finished and the offset between the current loop and the first loop when the for-each node is used to loop through the output of the assignment node.

Built-in variable

Description

Compare with the for loop

${dag.loopDataArray}

Obtain the dataset of an assignment node.

Equivalent to the code result in the for loop.

data=[]

${dag.foreach.current}

Obtain the current data entry.

Sample for loop code:

for(int i=0;i<data.length;i++) {
   print(data[i]);
}
  • data[i] is equivalent to ${dag.foreach.current}.

  • i is equivalent to ${dag.offset}.

${dag.offset}

Obtain the offset between the current loop and the first loop.

${dag.loopTimes}

Obtain the number of loops that are finished.

-

If you are familiar with the schema of the output table, you can also use the variables in the following table to obtain the related values in each loop.

Other variable

Description

${dag.foreach.current[n]}

If the output of the assignment node that is configured as the ancestor node of a for-each node is a two-dimensional array, the variable is used to obtain the data of a specific column for the current data entry when the for-each node loops through the output of the assignment node.

${dag.loopDataArray[i][j]}

If the output of the assignment node that is configured as the ancestor node of a for-each node is a two-dimensional array, the variable is used to obtain the data of Row i and Column j in the dataset of the assignment node.

${dag.foreach.current[n]}

If the output of the assignment node that is configured as the ancestor node of a for-each node is a one-dimensional array, the variable is used to obtain the data of a specific column.

Examples of variable values

Sample 1: A Shell node is used as an assignment node

  • Output of the assignment node

    A Shell node is used as an assignment node, and the last output of the assignment node is 2021-03-28,2021-03-29,2021-03-30,2021-03-31,2021-04-01.

  • Values of the variables for a for-each node

    Note

    The output of the assignment node is a one-dimensional array, and the five elements in the array are separated by commas (,). Therefore, the number of loops for the for-each node is 5.

    Built-in variable

    Value for the first loop

    Value for the second loop

    ${dag.loopDataArray}

    2021-03-28,2021-03-29,2021-03-30,2021-03-31,2021-04-01

    ${dag.foreach.current}

    2021-03-28

    2021-03-29

    ${dag.offset}

    0

    1

    ${dag.loopTimes}

    1

    2

    ${dag.foreach.current[3]}

    2021-03-30

Sample 2: An ODPS SQL node is used as an assignment node

  • Output of the assignment node

    An ODPS SQL node is used as an assignment node, and the last SELECT statement returns the following two pieces of data:

    +----------------------------------------------+
    | uid            | region | age_range | zodiac |
    +----------------------------------------------+
    | 0016359810821  | Hubei Province | 30 to 40 years old   | Cancer |
    | 0016359814159  | Unknown   | 30 to 40 years old   | Cancer |
    +----------------------------------------------+
  • Values of the variables for a for-each node

    Note

    The output of the assignment node is a two-dimensional array and two rows of data are contained in the array. Therefore, the number of loops for the for-each node is 2.

    Built-in variable

    Value for the first loop

    Value for the second loop

    ${dag.loopDataArray}

    +----------------------------------------------+
    | uid            | region | age_range | zodiac |
    +----------------------------------------------+
    | 0016359810821  | Hubei Province | 30 to 40 years old   | Cancer |
    | 0016359814159  | Unknown   | 30 to 40 years old   | Cancer |
    +----------------------------------------------+

    ${dag.foreach.current}

    0016359810821,Hubei Province,30–40 years old,Cancer

    0016359814159,unknown,30–40 years old,Cancer

    ${dag.offset}

    0

    1

    ${dag.loopTimes}

    1

    2

    ${dag.foreach.current[0]}

    0016359810821

    0016359814159

    ${dag.loopDataArray[1][0]}

    0016359814159