DataWorks provides do-while nodes. You can rearrange the workflow inside a do-while node, write the logic to be executed in a loop in the node, and then configure an end node to determine whether to exit from looping. You can use a do-while node alone, or use a do-while node together with an assignment node to loop through the result set passed by the assignment node. This topic provides example on how to configure a do-while node.
Prerequisites
You have understood that you can configure an inner workflow for a do-while node based on your business requirements. For more information, see Composition and workflow orchestration of a do-while node.
You have understood that the built-in variables provided by a do-while node can be used to obtain the related values in each loop. For more information about the built-in variables, see Built-in variables.
You have understood that the inner workflow in a do-while node must start with the start node and end with the end node. You have also understood that the start node marks the start of a loop and the end node is used to control whether to exit from looping. For information about the end node, see Sample code for the end node.
You are familiar with the method to test a do-while node and the method to view run logs of a do-while node. For more information, see Precautions.
Limits
Only DataWorks Standard Edition and more advanced editions support do-while nodes. For more information, see Differences among DataWorks editions.
The maximum number of loops for a do-while node is 1024.
Parallel execution is not supported. A loop can start only if the previous loop ends.
Create a do-while node
Go to the DataStudio page.
Log on to the DataWorks console. In the top navigation bar, select the desired region. In the left-side navigation pane, choose . On the page that appears, select the desired workspace from the drop-down list and click Go to Data Development.
Create a do-while node.
On the DataStudio page, move the pointer over the icon and choose
.Alternatively, you can find the workflow in which you want to create a do-while node, click the workflow name, right-click General, and then choose
.In the Create Node dialog box, configure the Name and Path parameters.
Click Confirm.
Example of using a do-while node
This section describes how to use a do-while node to loop through an output in five loops and display the current number of loops each time a loop is run.
Edit code for the shell node
By default, a do-while node consists of the start, shell, and end nodes.
The start node marks the start of a loop and is not used to process a loop task. The start node cannot be deleted.
The shell node is a sample business processing node provided by DataWorks.
The end node marks the end of a loop and determines whether to start the next loop. The end node defines the condition for exiting from looping for the do-while node. The end node cannot be deleted.
You can configure an inner workflow in a do-while node based on your business requirements. In detail, you can replace the shell node with another type of node.
Double-click the shell node. The configuration tab of the shell node appears.
Enter the following code in the code editor:
echo ${dag.loopTimes} ----Display the current number of loops.
The ${dag.loopTimes} variable is a reserved variable of the system. This variable specifies the current number of loops, and the value of this variable starts from 1. All inner nodes of the do-while node can reference this variable. For more information about the built-in variables, see Built-in variables and Examples of variable values.
After you modify the code of the shell node, save the modification. No message that reminds you to save the modification will appear when you commit the node. If you do not save the modification, the code cannot be updated to the latest version in time.
Configure the end node
Define the condition for exiting from looping in the end node.
Double-click the end node. The configuration tab of the node appears.
Select Python from the Language drop-down list.
Enter the following code to define the condition for exiting from looping for the do-while node:
if ${dag.loopTimes}<5: print True; else: print False;
The ${dag.loopTimes} variable is a reserved variable of the system. This variable specifies the current number of loops, and the value of this variable starts from 1. All inner nodes of the do-while node can reference this variable. For more information about the built-in variables, see Built-in variables and Examples of variable values.
In the code, the value of the
dag.loopTimes
variable is compared with 5 to limit the number of loops that can be run. The value of the dag.loopTimes variable is 1 for the first loop and increases by 1 each time. In this case, the value of the ${dag.loopTimes} variable is 2 for the second loop and 5 for the fifth loop. The do-while node exits from looping when the result of ${dag.loopTimes}<5 is False.
Commit the do-while node
Click the icon in the top toolbar to save the node.
Click the icon in the top toolbar to commit the node.
In the Submit dialog box, configure the Change description parameter. Then, determine whether to review node code after you commit the node based on your business requirements.
ImportantYou must configure the Rerun and Parent Nodes parameters on the Properties tab before you commit the node.
You can use the code review feature to ensure the code quality of tasks and prevent task execution errors caused by invalid task code. If you enable the code review feature, the node code that is committed can be deployed only after the node code passes the code review. For more information, see Code review.
If the workspace that you use is in standard mode, you must click Deploy in the upper-right corner of the node configuration tab to deploy a task on the node to the production environment for running after you commit the task on the node. For more information, see Deploy tasks.
Test the do-while node and view run logs of the do-while node
The procedure for committing, deploying, and running a do-while node is the same as that for committing, deploying, and running a common node. However, you cannot test a do-while node in DataStudio.
If the workspace that you use is in standard mode, you cannot directly perform a test to run a do-while node in DataStudio.
To perform a test to run the do-while node and view the result, you must commit and deploy the workflow that contains the do-while node to Operation Center and run the do-while node in Operation Center. If you use the value passed by an assignment node in the do-while node, run both the assignment node and do-while node during the test in Operation Center.
On the configuration tab of the do-while node, click Operation Center in the top toolbar to go to Operation Center.
In the left-side navigation pane of the Operation Center page, choose
.On the Cycle Task page, find the do-while node and click DAG in the Actions column to open the directed acyclic graph (DAG) of the do-while node. In the DAG of the do-while node, right-click the assignment node and choose
. In the Backfill Data dialog box, configure the parameters and click OK.Refresh the Patch Data page. After the data backfill instances are successfully run, click DAG in the Actions column of the data backfill instance generated for the do-while node.
View run logs of the do-while node.
Right-click the do-while node and select View Internal Nodes.
You can view run logs of a do-while node only if you view the inner nodes of the do-while node.
The inner workflow of the do-while node is divided into three parts:
The left pane of the view displays the rerun history of the do-while node. A record is generated each time a do-while node instance is run.
The middle pane of the view displays a loop record list that shows all existing loops of the do-while node and the status of each loop.
The right pane of the view displays the details about each loop. You can click a record in the loop record list to view the details of each instance in the loop.
On the inner node page, click a loop that is finished in the middle pane, right-click the desired node in the right pane, and then select View Runtime Log.
View run logs for the nth loop.
On the inner node page, click Loop 5 in the middle pane to view run logs of the shell node in the fifth loop.
The preceding example shows that a do-while node works based on the following application logic:
The system starts a loop from the start node.
Other nodes inside the do-while node run in sequence based on the dependencies configured for them.
The system executes the conditional statement defined in the code of the end node for exiting from looping.
The system records the number of loops that are run, and the next loop starts if the conditional statement returns True in the run logs of the end node.
The entire looping process ends if the conditional statement returns False in the run logs of the end node.
Summary
Comparison between a do-while node and the while, For Each, and do-while loop statements:
A do-while node runs based on a workflow that starts a loop before evaluation. This node functions the same way as the do-while statement. A do-while node can use the built-in variable ${dag.offset} and input and output parameters to achieve the feature of the For Each statement.
A do-while node cannot achieve the feature of the while statement because a do-while node runs a loop before evaluation.
Work procedure of a do-while node:
The system runs a loop from the start node and runs other nodes based on the dependencies configured for them.
After the system runs the code that is defined for the end node in a loop, one of the following situations occur:
The next loop starts if the end node returns True.
The entire looping process ends if the end node returns False.
Input and output parameters: The inner nodes of the do-while node use a variable ${dag.Input and output parameter names} to reference the input and output parameters configured for the do-while node.
Built-in variables: DataWorks provides the following built-in variables for the inner nodes of the do-while node:
dag.loopTimes: the number of loops that are run. The value of this variable starts from 1.
dag.offset: the offset of the number of loops that are run to 1. The value of this variable starts from 0.