All Products
Search
Document Center

DataWorks:Logic of do-while nodes

Last Updated:Nov 28, 2024

DataWorks provides do-while nodes. You can rearrange the workflow inside a do-while node, write the logic to be executed in a loop in the node, and then configure an end node to determine whether to exit from looping. You can use a do-while node alone, or use a do-while node together with an assignment node to loop through the result set passed by the assignment node. This topic describes the composition and application logic of a do-while node.

Background information

The following table describes the usage notes of do-while nodes.

Description

References

Before you use a do-while node, you must learn the limits and precautions of do-while nodes, such as the upper limit for the number of loops, the method to test a do-while node, and the method to view logs of a do-while node.

Limits and Precautions

You can configure an inner workflow for a do-while node based on your business requirements. When you configure an inner workflow for a do-while node, make sure that the inner workflow starts with the start node and ends with the end node.

Composition and workflow orchestration of a do-while node

A do-while node provides built-in variables for you to obtain the related values in each loop. The examples for obtaining variable values are provided for reference.

Built-in variables and Examples of variable values

You can use an end node to determine whether to exit from looping. Sample code for an end node is provided for reference.

Sample code for the end node

Before you use a do-while node, you must learn the scenarios in which a do-while node is used.

Scenarios

Limits

  • Only DataWorks Standard Edition and more advanced editions support do-while nodes. For more information, see Differences among DataWorks editions.

  • The maximum number of loops for a do-while node is 1024.

  • Parallel execution is not supported. A loop can start only if the previous loop ends.

Precautions

Dimension

Item

Description

Loop support

Upper limit for the number of loops

The maximum number of loops for a do-while node is 1024. If the number of loops for a do-while node exceeds 1024, the end node returns False to exit from looping.

Inner nodes

Workflow orchestration

  • You can delete the existing dependencies between inner nodes of a do-while node and configure an inner workflow for the do-while node based on your business requirements. When you configure an inner workflow for a do-while node, make sure that the inner workflow starts with the start node and ends with the end node.

  • If inner nodes of a do-while node use a branch node to perform a logical judgment or loop through the result set passed by an assignment node, a merge node is required. For information about merge nodes, see Configure a merge node.

  • You cannot add comments when you develop code for the end node of a do-while node.

Value acquisition

The built-in variables provided by a do-while node can be used to obtain a specific value passed by the assignment node that is configured as the ancestor node of the do-while node.

Debugging

Task debugging

If you use a workspace in standard mode, you cannot directly perform a test to run a do-while node in DataStudio.

If you want to perform a test to verify the running result of a do-while node, you must commit and deploy the workflow that contains the do-while node to Operation Center in the development environment and run a task on the do-while node.

Log viewing

To view the run logs of a do-while node in Operation Center, perform the following steps: Find the do-while node and open the directed acyclic graph (DAG) of the node. In the DAG, right-click the node name and select View Internal Nodes.

Dependencies

Dependency settings

You can use a do-while node alone, or use a do-while node together with an assignment node. If you want to check whether an assignment node passes its output to a do-while node in Operation Center, you can use the data backfill feature to backfill data for both the assignment node and do-while node. If you run only the do-while node, you cannot obtain the output of the assignment node.

Composition and workflow orchestration of a do-while node

Do-while nodes are special nodes that contain inner nodes. When you create a do-while node, the following inner nodes are automatically created: the start node, the shell node (task node), and the end node. The inner nodes are organized into an inner workflow for looping. 循环节点

The preceding figure shows the following information:

  • start node

    The start node marks the start of a loop and is not used to process a loop task. The task nodes in a do-while node depend on the start node. Therefore, the start node cannot be deleted.

  • shell node

    When you create a do-while node, DataWorks automatically creates an inner Shell task node named shell. You can delete the default shell node and configure task nodes based on your business requirements.

    In most cases, a do-while node is used together with an assignment node, or with a branch node and a merge node. When you customize task nodes of a do-while node, you can delete the dependencies between the existing inner nodes of the do-while node, and configure an inner workflow for the do-while node based on your business requirements. When you configure an inner workflow for a do-while node, make sure that the inner workflow starts with the start node and ends with the end node.

  • end node

    • The end node determines whether to exit from looping. The end node can control the number of loops that can be run for the do-while node. The end node is essentially an assignment node. The end node returns True or False. The value True indicates to run the next loop, and the value False indicates to exit from looping.

    • You can use ODPS SQL, Shell, or Python 2 to develop code for the end node. The do-while node provides built-in variables for you to develop code for the end node. For information about the built-in variables, see Built-in variables and Examples of variable values. For information about sample code developed in different languages, see Sample code for the end node.

    • The end node must be the descendant node of the task nodes and cannot be deleted.

Built-in variables

In most cases, the built-in variables that are provided by a do-while node are configured in the ${dag.Variable name} format. A do-while node in DataWorks provides two built-in variables ${dag.loopTimes} and ${dag.offset}. You can also use a do-while node together with an assignment node to obtain the values of the value assignment parameters based on bulit-in variables that are configured in the ${dag.Variable name} format.

  • Built-in variables

    You can use the built-in variables provided by a do-while node in DataWorks to obtain the number of loops that are finished and the offset between the current loop and the previous loop when the do-while node is used for looping.

    Built-in variable

    Description

    Value

    ${dag.loopTimes}

    The number of loops that are finished.

    1 for the first loop, 2 for the second loop, 3 for the third loop, ..., and n for the nth loop.

    ${dag.offset}

    The offset between the current loop and the previous loop.

    0 for the first loop, 1 for the second loop, 2 for the third loop, ..., and n-1 for the nth loop.

  • Obtain the result set passed by an assignment node

    If you use a do-while node together with an assignment node, you can obtain the values of the value assignment parameters and loop variables by using the built-in variables described in the following table.

    Note

    If a do-while node depends on an assignment node, you can add output parameters that are specified in Output Parameters for the assignment node to Input Parameters for the do-while node. Then, you can use the do-while node to obtain the result set that is passed by the assignment node or specified data in the result set. Input parameters are configured in the ${dag.Variable name} format. Replace the variable name with the name of an input parameter that is specified in Input Parameters for the do-while node. In the following built-in variables, input specifies the name of the input parameter defined in the do-while node and is used to obtain the result set that is passed by an assignment node. You must replace input with the actual name of the input parameter that you use.

    Built-in variable

    Description

    ${dag.input}

    The dataset passed by the ancestor assignment node.

    ${dag.input[${dag.offset}]}

    The data entry obtained by the do-while node in the current loop.

    ${dag.input.length}

    The length of the dataset obtained inside the do-while node.

Examples of variable values

The format of a result set passed by an assignment node varies based on the assignment language used by the assignment node. If you use a do-while node to obtain the result set that is passed by an assignment node and the assignment node uses the Shell language, the result set or specified data in the result set is passed to the do-while node as a one-dimensional array. If you use a do-while node to obtain the result set that is passed by an assignment node and the assignment node uses the ODPS SQL language, the result set or specified data in the result set is passed to the do-while node as a two-dimensional array. For more information, see Output format of the outputs parameter.

Example 1: A Shell node is used as an assignment node

  • Output of the assignment node

    A Shell node is used as an assignment node, and the last output of the assignment node is 2021-03-28,2021-03-29,2021-03-30,2021-03-31,2021-04-01.

  • Values of the variables for a do-while node

    Built-in variable

    Value for the first loop

    Value for the second loop

    ${dag.input}

    2021-03-28,2021-03-29,2021-03-30,2021-03-31,2021-04-01

    ${dag.input[${dag.offset}]}

    2021-03-28

    2021-03-29

    ${dag.input.length}

    5

    ${dag.loopTimes}

    1

    2

    ${dag.offset}

    0

    1

Example 2: An ODPS SQL node is used as an assignment node

  • Output of the assignment node

    An ODPS SQL node is used as an assignment node, and the last SELECT statement returns the following two pieces of data:

    +----------------------------------------------+
    | uid            | region         | age_range            | zodiac |
    +----------------------------------------------+
    | 0016359810821  | Hubei Province | 30 to 40 years old   | Cancer |
    | 0016359814159  | Unknown        | 30 to 40 years old   | Cancer |
    +----------------------------------------------+
  • Values of the variables for a do-while node

    Built-in variable

    Value for the first loop

    Value for the second loop

    ${dag.input}

    +----------------------------------------------+
    | uid            | region         | age_range            | zodiac |
    +----------------------------------------------+
    | 0016359810821  | Hubei Province | 30 to 40 years old   | Cancer |
    | 0016359814159  | Unknown        | 30 to 40 years old   | Cancer |
    +----------------------------------------------+

    ${dag.input[${dag.offset}]}

    0016359810821, Hubei Province, 30 to 40 years old, Cancer

    0016359814159, Unknown, 30 to 40 years old, Cancer

    ${dag.input.length}

    2

    Note

    The number of rows in a two-dimensional array is the length of the dataset. The number of rows in a two-dimensional array in the output of the assignment node is 2.

    ${dag.input[0][1]}

    Note

    This built-in variable specifies the value in the first row and second column of the two-dimensional array.

    Hubei Province

    ${dag.loopTimes}

    1

    2

    ${dag.offset}

    0

    1

Sample code for the end node

You can use ODPS SQL, Shell, or Python 2 to develop code for the end node. This section provides typical sample code in these languages.

ODPS SQL

SELECT  CASE 
 WHEN COUNT(1) > 0 AND ${dag.offset}<= 9 
  THEN true 
  ELSE false 
 END 
FROM  xc_dpe_e2.xc_rpt_user_info_d  where dt='20200101';

In the preceding sample code of the end node, the number of rows and the offset are compared with fixed values to limit the number of loops that can be run for the do-while node.

Shell

if [ ${dag.loopTimes} -lt 5 ];
then
     echo "True"
else
     echo "False"
fi

In the preceding code, the number of loops that are finished is compared with 5 to limit the number of loops that can be run for the do-while node. The ${dag.loopTimes} variable specifies the number of loops that are finished.

The value of the ${dag.loopTimes} variable is 1 for the first loop and increases by 1 each time. In this case, the value of the ${dag.loopTimes} variable is 2 for the second loop and 5 for the fifth loop. In the preceding sample code, when the fifth loop is finished, the output of the end node is False, and the do-while node exits from looping.

Python 2

if ${dag.loopTimes}<${dag.input.length}:
   print True;
else
   print False;
# Start the next loop if the end node returns True. 
# Exit from looping if the end node returns False.

In the preceding sample code, the number of loops that are finished is compared with the number of rows in the dataset passed by the assignment node to limit the number of loops that can be run for the do-while node. The ${dag.loopTimes} variable specifies the number of loops that are finished.

Scenarios

Use a do-while node together with an assignment node

The following table describes a typical scenario and precautions for using a do-while node together with an assignment node.

Scenario

Precaution

Configuration example

When a do-while node is used to run a loop task, each time a loop is run, the inner nodes of the do-while node need to obtain and use the output parameters of the ancestor node of the do-while node. In this case, you can use an assignment node, such as an assignment node named assign_node, together with the do-while node.

  • Dependencies

    The do-while node must depend on the assignment node assign_node.

    Note

    The node that depends on the assignment node must be the do-while node rather than the shell node.

  • Input and output parameters

    • You must add the output parameters of the assignment node assign_node to Output Parameters for the node.

    • You must add the output parameters of the assignment node assign_node to Input Parameters for the shell node of the do-while node.

      Note

      The input parameters must be configured for the shell node rather than the do-while node.

赋值节点配置案例

Use a do-while node together with a branch node and a merge node

The following table describes a typical scenario and precautions for using a do-while node together with a branch node and a merge node.典型应用

Scenario

Precaution

The inner nodes of a do-while node need to perform logical judgment or result traversal. In this case, you can customize the task nodes in the do-while node and use a branch node and a merge node together with the do-while node. For example, you can use a branch node named branch_node and a merge node named merge_node.

In the inner workflow of the do-while node, the branch node branch_node and the merge node merge_node need to be used at the same time.