DataWorks provides do-while nodes. You can rearrange the workflow inside a do-while node, write the logic to be executed in a loop in the node, and then configure an end node to determine whether to exit from looping. You can use a do-while node alone, or use a do-while node together with an assignment node to loop through the result set passed by the assignment node. This topic describes the composition and application logic of a do-while node.
Background information
The following table describes the usage notes of do-while nodes.
Description | References |
Before you use a do-while node, you must learn the limits and precautions of do-while nodes, such as the upper limit for the number of loops, the method to test a do-while node, and the method to view logs of a do-while node. | Limits and Precautions |
You can configure an inner workflow for a do-while node based on your business requirements. When you configure an inner workflow for a do-while node, make sure that the inner workflow starts with the start node and ends with the end node. | |
A do-while node provides built-in variables for you to obtain the related values in each loop. The examples for obtaining variable values are provided for reference. | |
You can use an end node to determine whether to exit from looping. Sample code for an end node is provided for reference. | |
Before you use a do-while node, you must learn the scenarios in which a do-while node is used. |
Limits
Only DataWorks Standard Edition and more advanced editions support do-while nodes. For more information, see Differences among DataWorks editions.
The maximum number of loops for a do-while node is 1024.
Parallel execution is not supported. A loop can start only if the previous loop ends.
Precautions
Dimension | Item | Description |
Loop support | Upper limit for the number of loops | The maximum number of loops for a do-while node is 1024. If the number of loops for a do-while node exceeds 1024, the end node returns False to exit from looping. |
Inner nodes | Workflow orchestration |
|
Value acquisition | The built-in variables provided by a do-while node can be used to obtain a specific value passed by the assignment node that is configured as the ancestor node of the do-while node. | |
Debugging | Task debugging | If you use a workspace in standard mode, you cannot directly perform a test to run a do-while node in DataStudio. If you want to perform a test to verify the running result of a do-while node, you must commit and deploy the workflow that contains the do-while node to Operation Center in the development environment and run a task on the do-while node. |
Log viewing | To view the run logs of a do-while node in Operation Center, perform the following steps: Find the do-while node and open the directed acyclic graph (DAG) of the node. In the DAG, right-click the node name and select View Internal Nodes. | |
Dependencies | Dependency settings | You can use a do-while node alone, or use a do-while node together with an assignment node. If you want to check whether an assignment node passes its output to a do-while node in Operation Center, you can use the data backfill feature to backfill data for both the assignment node and do-while node. If you run only the do-while node, you cannot obtain the output of the assignment node. |
Composition and workflow orchestration of a do-while node
Do-while nodes are special nodes that contain inner nodes. When you create a do-while node, the following inner nodes are automatically created: the start node, the shell node (task node), and the end node. The inner nodes are organized into an inner workflow for looping.
The preceding figure shows the following information:
start node
The start node marks the start of a loop and is not used to process a loop task. The task nodes in a do-while node depend on the start node. Therefore, the start node cannot be deleted.
shell node
When you create a do-while node, DataWorks automatically creates an inner Shell task node named shell. You can delete the default shell node and configure task nodes based on your business requirements.
In most cases, a do-while node is used together with an assignment node, or with a branch node and a merge node. When you customize task nodes of a do-while node, you can delete the dependencies between the existing inner nodes of the do-while node, and configure an inner workflow for the do-while node based on your business requirements. When you configure an inner workflow for a do-while node, make sure that the inner workflow starts with the start node and ends with the end node.
end node
The end node determines whether to exit from looping. The end node can control the number of loops that can be run for the do-while node. The end node is essentially an assignment node. The end node returns
True
orFalse
. The value True indicates to run the next loop, and the value False indicates to exit from looping.You can use ODPS SQL, Shell, or Python 2 to develop code for the end node. The do-while node provides built-in variables for you to develop code for the end node. For information about the built-in variables, see Built-in variables and Examples of variable values. For information about sample code developed in different languages, see Sample code for the end node.
The end node must be the descendant node of the task nodes and cannot be deleted.
Built-in variables
In most cases, the built-in variables that are provided by a do-while node are configured in the ${dag.Variable name} format. A do-while node in DataWorks provides two built-in variables ${dag.loopTimes} and ${dag.offset}. You can also use a do-while node together with an assignment node to obtain the values of the value assignment parameters based on bulit-in variables that are configured in the ${dag.Variable name} format.
Built-in variables
You can use the built-in variables provided by a do-while node in DataWorks to obtain the number of loops that are finished and the offset between the current loop and the previous loop when the do-while node is used for looping.
Built-in variable
Description
Value
${dag.loopTimes}
The number of loops that are finished.
1 for the first loop, 2 for the second loop, 3 for the third loop, ..., and n for the nth loop.
${dag.offset}
The offset between the current loop and the previous loop.
0 for the first loop, 1 for the second loop, 2 for the third loop, ..., and n-1 for the nth loop.
Obtain the result set passed by an assignment node
If you use a do-while node together with an assignment node, you can obtain the values of the value assignment parameters and loop variables by using the built-in variables described in the following table.
NoteIf a do-while node depends on an assignment node, you can add output parameters that are specified in Output Parameters for the assignment node to Input Parameters for the do-while node. Then, you can use the do-while node to obtain the result set that is passed by the assignment node or specified data in the result set. Input parameters are configured in the ${dag.Variable name} format. Replace the variable name with the name of an input parameter that is specified in Input Parameters for the do-while node. In the following built-in variables,
input
specifies the name of the input parameter defined in the do-while node and is used to obtain the result set that is passed by an assignment node. You must replace input with the actual name of the input parameter that you use.Built-in variable
Description
${dag.input}
The dataset passed by the ancestor assignment node.
${dag.input[${dag.offset}]}
The data entry obtained by the do-while node in the current loop.
${dag.input.length}
The length of the dataset obtained inside the do-while node.
Examples of variable values
The format of a result set passed by an assignment node varies based on the assignment language used by the assignment node. If you use a do-while node to obtain the result set that is passed by an assignment node and the assignment node uses the Shell language, the result set or specified data in the result set is passed to the do-while node as a one-dimensional array. If you use a do-while node to obtain the result set that is passed by an assignment node and the assignment node uses the ODPS SQL language, the result set or specified data in the result set is passed to the do-while node as a two-dimensional array. For more information, see Output format of the outputs parameter.
Example 1: A Shell node is used as an assignment node
Output of the assignment node
A Shell node is used as an assignment node, and the last output of the assignment node is
2021-03-28,2021-03-29,2021-03-30,2021-03-31,2021-04-01
.Values of the variables for a do-while node
Built-in variable
Value for the first loop
Value for the second loop
${dag.input}
2021-03-28,2021-03-29,2021-03-30,2021-03-31,2021-04-01
${dag.input[${dag.offset}]}
2021-03-28
2021-03-29
${dag.input.length}
5
${dag.loopTimes}
1
2
${dag.offset}
0
1
Example 2: An ODPS SQL node is used as an assignment node
Output of the assignment node
An ODPS SQL node is used as an assignment node, and the last SELECT statement returns the following two pieces of data:
+----------------------------------------------+ | uid | region | age_range | zodiac | +----------------------------------------------+ | 0016359810821 | Hubei Province | 30 to 40 years old | Cancer | | 0016359814159 | Unknown | 30 to 40 years old | Cancer | +----------------------------------------------+
Values of the variables for a do-while node
Built-in variable
Value for the first loop
Value for the second loop
${dag.input}
+----------------------------------------------+ | uid | region | age_range | zodiac | +----------------------------------------------+ | 0016359810821 | Hubei Province | 30 to 40 years old | Cancer | | 0016359814159 | Unknown | 30 to 40 years old | Cancer | +----------------------------------------------+
${dag.input[${dag.offset}]}
0016359810821, Hubei Province, 30 to 40 years old, Cancer
0016359814159, Unknown, 30 to 40 years old, Cancer
${dag.input.length}
2
NoteThe number of rows in a two-dimensional array is the length of the dataset. The number of rows in a two-dimensional array in the output of the assignment node is 2.
${dag.input[0][1]}
NoteThis built-in variable specifies the value in the first row and second column of the two-dimensional array.
Hubei Province
${dag.loopTimes}
1
2
${dag.offset}
0
1
Sample code for the end node
You can use ODPS SQL, Shell, or Python 2 to develop code for the end node. This section provides typical sample code in these languages.
ODPS SQL
SELECT CASE
WHEN COUNT(1) > 0 AND ${dag.offset}<= 9
THEN true
ELSE false
END
FROM xc_dpe_e2.xc_rpt_user_info_d where dt='20200101';
In the preceding sample code of the end node, the number of rows and the offset are compared with fixed values to limit the number of loops that can be run for the do-while node.
Shell
if [ ${dag.loopTimes} -lt 5 ];
then
echo "True"
else
echo "False"
fi
In the preceding code, the number of loops that are finished is compared with 5 to limit the number of loops that can be run for the do-while node. The ${dag.loopTimes}
variable specifies the number of loops that are finished.
The value of the ${dag.loopTimes}
variable is 1 for the first loop and increases by 1 each time. In this case, the value of the ${dag.loopTimes} variable is 2 for the second loop and 5 for the fifth loop. In the preceding sample code, when the fifth loop is finished, the output of the end node is False, and the do-while node exits from looping.
Python 2
if ${dag.loopTimes}<${dag.input.length}:
print True;
else
print False;
# Start the next loop if the end node returns True.
# Exit from looping if the end node returns False.
In the preceding sample code, the number of loops that are finished is compared with the number of rows in the dataset passed by the assignment node to limit the number of loops that can be run for the do-while node. The ${dag.loopTimes}
variable specifies the number of loops that are finished.
Scenarios
Use a do-while node together with an assignment node
The following table describes a typical scenario and precautions for using a do-while node together with an assignment node.
Scenario | Precaution | Configuration example |
When a |
|
Use a do-while node together with a branch node and a merge node
The following table describes a typical scenario and precautions for using a do-while node together with a branch node and a merge node.
Scenario | Precaution |
The inner nodes of a do-while node need to perform logical judgment or result traversal. In this case, you can customize the task nodes in the do-while node and use a branch node and a merge node together with the do-while node. For example, you can use a branch node named | In the inner workflow of the do-while node, the branch node |