DataWorks allows you to configure cross-cycle scheduling dependencies for nodes. You can configure the instance generated for a node in the current cycle to depend on the instances generated for one or more specific nodes in the previous cycle. The instance generated for the node in the current cycle can start to run only after the instances generated for one or more specific nodes on which the node depends are successfully run. If the instance generated for a node in the current cycle needs to depend on the data of an instance generated for another node on the previous day or if the instance generated for a node scheduled by hour or minute in the current cycle needs to depend on the instance generated for the same node in the previous cycle, you can configure cross-cycle scheduling dependencies. This topic describes how to configure cross-cycle scheduling dependencies for a node and the types of cross-cycle dependencies.
Precautions
When you configure cross-cycle scheduling dependencies, take note of the items described in the following table.Item | Description | References |
---|---|---|
Display of cross-cycle scheduling dependencies | Cross-cycle scheduling dependencies are presented as dash lines in the directed acyclic graph (DAG) of a node. | Appendix: Use the features provided in a DAG |
Confirmation of the requirement of configuring same-cycle scheduling dependencies after cross-cycle scheduling dependencies are configured | After you configure scheduling dependencies for a node, the node can start to run only after all the ancestor nodes are successfully run. By default, the automatic parsing feature for same-cycle scheduling dependencies is enabled. If cross-cycle scheduling dependencies are configured for a node, you must check whether the node requires same-cycle scheduling dependencies. If the node does not require same-cycle scheduling dependencies, you must delete the automatically generated same-cycle scheduling dependencies to prevent the running of the node from being affected. | Delete scheduling dependencies |
Complex scenarios in which cross-cycle scheduling dependencies are required | In some complex scenarios, same-cycle scheduling dependencies may not be able to meet your business requirements. In this case, you can configure cross-cycle scheduling dependencies. For example, if a node scheduled by day depends on a node scheduled by hour, the instance generated for the node scheduled by day depends on all instances generated for the node scheduled by hour on the current day by default. You can configure the self-dependency for the node scheduled by hour. This way, the instance generated for the node scheduled by day can depend on the instance that is generated for the node scheduled by hour in a specific scheduling cycle. | Principles and samples of scheduling configurations in complex dependency scenarios |
Preview of scheduling dependencies of a node | To prevent an auto triggered node in the production environment from being delayed due to the scheduling dependencies that do not meet expectations, we recommend that you preview the scheduling dependencies of the node before you deploy the node to the production environment. This ensures that the instances generated for the auto triggered node can run as expected. | Preview scheduling dependencies of a node |
Node deployment | After you configure cross-cycle scheduling dependencies for a node, you must deploy the node and its ancestor nodes to the production environment. After the deployment is complete, you can view the cross-cycle scheduling dependencies in Operation Center in the production environment. | Deploy nodes |
Entry point for configuring cross-cycle scheduling dependencies
Types of cross-cycle scheduling dependencies
Type | Description | Scenario |
---|---|---|
Dependency on the instance generated for the current node in the previous cycle | The instance generated for a node in the current cycle can start to run only after the instance generated for the same node in the previous cycle is successfully run. | The instance generated for a node in the current cycle depends on the latest business data of the instance generated for the same node in the previous cycle. |
Dependency on the instances generated for the level-1 descendant nodes of a node in the previous cycle | The instance generated for a node in the current cycle can start to run only after the instances generated for the descendant nodes of the current node in the previous cycle are successfully run. | The instance generated for a node in the current cycle depends on whether the output table data of the current node in the previous cycle is cleansed by the instances generated for the descendant nodes of the current node in the previous cycle. |
Dependency on the instances generated for one or more specified nodes in the previous cycle | The instance generated for a node in the current cycle can start to run only after the instances generated for one or more specified nodes in the previous cycle are successfully run. | The instance generated for a node in the current cycle depends on the output table data of the instances generated for one or more other nodes in the previous cycle in the business logic but does not use the data in the code. |
Dependency on the instance generated for the current node in the previous cycle
The instance generated for a node in the current cycle depends on the latest business data of the instance generated for the same node in the previous cycle. The following figure shows the configuration of the scheduling dependencies and the dependency relationship between instances.- Node scheduled by hour or minute not configured with the self-dependency
If the node scheduled by hour or minute is not configured with the self-dependency, the instance generated for the node scheduled by day depends on all instances generated for the node scheduled by hour or minute on the current day. In this case, the node scheduled by day aggregates and processes all table data of all instances generated for the node scheduled by hour or minute on the current day.
- Node scheduled by hour or minute configured with the self-dependency
If the node scheduled by hour or minute is configured with the self-dependency, the instance generated for the node scheduled by day depends only on a specific instance generated for the node scheduled by hour or minute based on the principle of scheduling time proximity. The scheduling time of the two instances are the closest.
Dependency on the instances generated for the level-1 descendant nodes of a node in the previous cycle
If you configure this type of scheduling dependency for a node, the instance generated for the node in the current cycle can start to run only after the instances generated for the level-1 descendant nodes of the node in the previous cycle are successfully run.Dependency on the instances generated for one or more specified nodes in the previous cycle
If you configure this type of scheduling dependency for a node, the instance generated for the node in the current cycle can start to run only after the instances generated for one or more specified nodes in the previous cycle are successfully run.Passing of the dry-run attribute of an ancestor node
In most cases, if you want to use branch nodes, you must configure this setting.- Entry pointYou can set the Follow the upstream air running attribute parameter to No for a node. This way, all the instances generated for the node and descendant nodes of the node can normally run.
- Scenario
A node has multiple descendant nodes. When the node is run, the status of a descendant node is dry-run. If you configure the instance generated for the dry-run descendant node in the current cycle to depend on the instance generated for the dry-run descendant node in the previous cycle, the dry-run attribute of the node is passed to the descendant nodes of the dry-run node. In this case, all the instances generated for the dry-run node and the descendant nodes of the dry-run node are dry-run. If you do not want the dry-run attribute to be passed, you can set the Follow the upstream air running attribute parameter to No for the dry-run descendant node in the Dependencies section of the Properties tab.
- Example
- Assign_Node is an assignment node. Branch_Node is a branch node. Shell_Node1 and Shell_Node2 are the descendant nodes of Branch_Node. All these nodes are scheduled by day.
- Shell_Node1 is dry-run, and Shell_Node2 normally runs.
- The instance generated for Shell_Node1 in the current cycle is configured to depend on the instance generated for Shell_Node1 in the previous cycle.
- Shell_Node1 generates an auto triggered instance
Shell_Node1'
in the current cycle (T). - Shell_Node1 generates an auto triggered instance
Shell_Node1
in the previous cycle (T-1).
Shell_Node1'
instance depends on theShell_Node1
instance. The dry-run attribute of the Shell_Node1 node is passed. Therefore, all the instances generated for the Shell_Node1 node and descendant nodes of the Shell_Node1 node are dry-run.
Preview scheduling dependencies
After you configure scheduling dependencies for a node, you can preview the scheduling dependencies. For more information, see Subsequent steps: Check whether the scheduling dependencies meet your expectations.