Scenario 2: Configure scheduling dependencies for a node that depends on last-cycle instances - DataWorks

A cross-cycle dependency indicates the dependency of a node on its last-cycle instance, last-cycle instances of its descendant nodes, or last-cycle instances of specified nodes. After you configure a cross-cycle dependency for a node, the node is run in the current cycle only after relevant last-cycle instances are successfully run.

DataWorks supports the following types of cross-cycle dependencies:

Dependency on the last-cycle instances of descendant nodes
- Node dependency: A node depends on the last-cycle instances of its descendant nodes. For example, Node A has three descendant nodes: Node B, Node C, and Node D. If you configure this type of dependency for Node A, Node A is run in the current cycle only after all the last-cycle instances of Node B, Node C, and Node D are successfully run.
- Business scenario: A node is run in the current cycle only after its descendant nodes successfully cleanse the data in the tables generated by the node in the last cycle. To check whether the descendant nodes have cleansed the data, you can configure data quality monitoring rules for the tables generated by the descendant nodes.
Dependency on the last-cycle instance of the current node
- Node dependency: A node depends on its last-cycle instance. The node is run in the current cycle only after the last-cycle instance of the node is successfully run.
- Business scenario: The running of a node in the current cycle depends on the business data that is generated by the last-cycle instance of the node. To check whether the data of the node has been cleansed, you can configure data quality monitoring rules for the table generated by the node.
Dependency on the last-cycle instances of specified nodes: To configure this type of dependency, you must manually enter the IDs of the nodes that you want a node to depend on. Separate the IDs with commas (,), such as 12345,23456.
- Node dependency: A node depends on the last-cycle instances of specified nodes. The node is run in the current cycle only after the last-cycle instances of the specified nodes are successfully run.
- Business scenario: In the business logic, a node depends on the business data that is generated by other nodes but is not processed by the node itself.

The difference between cross- and same-cycle dependencies is that cross-cycle dependencies appear as dotted lines in Operation Center.

Before you undeploy a node, you must delete the dependencies that are configured for the node, including the cross- and same-cycle dependencies. The following figure shows the Properties panel of a node. You must delete the cross-cycle dependencies for the node in the section marked as 1 and same-cycle dependencies in the section marked as 2. Delete

You can configure a cross- or same-cycle dependency on the ancestor node for a node based on your business requirements. In most cases, you can configure either a same- or cross-cycle dependency on the ancestor node for a node. If you enable the automatic parsing feature for a node, the scheduling system automatically configures a same-cycle dependency on the ancestor node for the node. If this configuration does not meet your requirements, you can delete the default same-cycle dependency and configure a cross-cycle dependency for the node. For more information, see Logic of scheduling dependencies.

The following figure shows the dependency between the nodes in a workflow.

The following figure shows how the dependency appears in Operation Center.

The following figure shows the code and configurations of the xc_create node. Node

In the preceding figure, the xc_create node creates the xc_1 and xc_2 tables and inserts data to the two tables. The xc_1 and xc_2 tables are the outputs of the xc_create node.

The following figure shows the code and configurations of the xc_select node. Sample node configuration

In the preceding figure, the xc_select node queries data in the xc_1 and xc_2 tables. Based on the automatic parsing feature, the same-cycle dependency on the xc_create node is automatically configured for the xc_select node.

Dependency on the last-cycle instances of descendant nodes

Node dependency: A node depends on the last-cycle instances of its descendant nodes. For example, Node A has three descendant nodes: Node B, Node C, and Node D. If you configure this type of dependency for Node A, Node A is run in the current cycle only after all the last-cycle instances of Node B, Node C, and Node D are successfully run.

Business scenario: A node is run only after its descendant nodes have successfully cleansed the data in the tables generated by the node in the last cycle. Otherwise, the node is not run in the current cycle.

When you configure dependencies for the xc_create node, select Cross-Cycle Dependencies and set Depend On to Instances of Child Nodes.

The following figure shows how the dependency appears in Operation Center.

Dependency on the last-cycle instance of the current node

Node dependency: A node depends on its last-cycle instance. The node is run in the current cycle only after the last-cycle instance of the node is successfully run.

Business scenario: The running of a node in the current cycle depends on the business data that is generated by the last-cycle instance of the node. In this example, the node is scheduled to run by week. This way, you can conveniently view the dependencies of the node in Operation Center.

To view the dependencies of the node, go to Operation Center. In the left-side navigation pane, choose Cycle Task Maintenance > Cycle Instance. Search for the node to view its dependencies.

Note For example, a node is scheduled to run by hour and depends on its last-cycle instance. If the instance that is generated for a specific hour is not successfully run, the system does not run the instance that is generated for the next hour.

If the first instance that is generated on a day is not run or the running fails, the instances that are generated for the rest of the day cannot be run.

Dependency on the last-cycle instances of specified nodes

Node dependency: The tables generated by the xc_information node are not used in the code of the xc_create node. However, the xc_create node depends on the output data of the xc_information node in the last cycle, as configured in the business logic. Logically, the xc_create node depends on the last-cycle instance of the xc_information node.

Business scenario: Node A depends on the business data that is generated by Node B based on the business logic. However, the business data is not referenced in the code of Node A. This indicates that Node A performs no operations on the business data.

In this example, when you configure the xc_create node, select Cross-Cycle Dependencies, set Depend On to Instances of Custom Nodes, and then enter the ID of the xc_information node, which is 1000374815. Node

To view the dependencies of the node, go to Operation Center. In the left-side navigation pane, choose Cycle Task Maintenance > Cycle Instance. Search for the node to view its dependencies. Auto triggered node instances

Advanced configuration for dependency on the last-cycle instances

A branch node has two descendant nodes. In most cases, only one of the descendant nodes is actually run. The scheduling system generates and runs an instance for one descendant node. For the other descendant node, the scheduling system generates an instance and directly returns a successful response without running the instance. In addition, the scheduling system also performs a dry run for the descendant node of this descendant node. If this does not meet your requirements, you can select Upstream node air running attribute does not conduct cross-cycle when you configure this descendant node of the branch node.

If a descendant node of a branch node depends on its own last-cycle instance and the last-cycle instance is dry-run, the descendant node is also dry-run in the current cycle. As a result, the descendant node becomes a dry-run node permanently.

The following figure shows an example. If the scheduling system performs a dry run for the descendant node on the left, the scheduling system also performs a dry run for the descendant node of this descendant node.

Your business may require that the running of the descendant node of a branch node depend only on the result of the branch node in the current cycle, and that the descendant node not be affected by its dry run in the last cycle. To meet this requirement, perform the following steps:

On the editing tab of this descendant node of the branch node, click Properties in the right-side navigation pane.
In the Schedule section of the Properties panel, select Cross-Cycle Dependencies.
Click Advanced Settings.
In the Rely on last cycle Advanced configuration message, select Upstream node air running attribute does not conduct cross-cycle. This way, the descendant node of the branch node is not affected by its dry run in the last cycle.

Note This advanced configuration applies only to the descendant nodes of branch nodes. For other nodes, a dry run in the last cycle does not affect the running in the current cycle.

Typical scenarios of cross-cycle dependencies

Scenario 1
- Scenario description: Node A is scheduled to run by day. Node B is scheduled to run by hour. Node A depends on Node B. By default, Node A is run at the end of each day after Node B has been run for 24 times. However, you want Node A to be run at 12:00 every day.
- Solution: When you configure Node B, select Cross-Cycle Dependencies and set Depend On to Instances of Current Node. When you configure Node A, set Run At to 12:00. Do not configure cross-cycle dependencies for Node A.
  This way, after an instance is generated and run for Node B at 12:00, the scheduling system runs Node A.
Scenario 2
- Scenario description: Node A is scheduled to run by day. Node B is scheduled to run by hour. Node A depends on the data that is generated by Node B on the previous day.
- Solution: When you configure Node A, select Cross-Cycle Dependencies, set Depend On to Instances of Custom Nodes, and then enter the ID of Node B.
Scenario 3
- Scenario description: Node A is scheduled to run by hour. Node B is scheduled to run by day. Node A depends on Node B. After Node B is run on a day, Node A has gone through 24 cycles, and the scheduling system starts to generate and run 24 instances at the same time.
- Solution: When you configure Node A, select Cross-Cycle Dependencies and set Depend On to Instances of Current Node.
Scenario 4
- Scenario description: A node depends on the data that is generated by the node in the last cycle. The time at which the data in the last cycle was generated needs to be determined.
- Solution: When you configure the node, select Cross-Cycle Dependencies and set Depend On to Instances of Current Node.