DataWorks allows you to configure scheduling dependencies between tasks that are scheduled by minute, hour, day, week, month, or year. The number of scheduling cycles of a task varies based on the scheduling frequency of the task. An instance is generated for a task in each scheduling cycle. This topic describes the dependencies between auto triggered task instances that are generated for ancestor and descendant tasks with different scheduling frequencies.
Background information
In DataWorks, an auto triggered task generates instances based on the scheduling frequency and the number of scheduling cycles of the task. For example, a task scheduled by hour generates the same number of instances as the number of scheduling cycles of the task every day. The task is run as an instance. In essence, dependencies between auto triggered tasks are dependencies between instances that are generated for the tasks. The number of instances generated for ancestor and descendant auto triggered tasks and dependencies between the instances vary based on the scheduling frequencies of the ancestor and descendant tasks.
DataWorks supports various scheduling dependency scenarios. You can configure same-cycle or previous-cycle scheduling dependencies between tasks in a specific scenario. For more information about same-cycle scheduling dependencies between tasks and previous-cycle scheduling dependencies between tasks, see Configure same-cycle scheduling dependencies and Configure cross-cycle scheduling dependencies.
Before you configure scheduling dependencies, you must take note of the items that are described in the following table.
No. | Description | References |
1 | DataWorks supports the following scheduling frequencies: minute, hour, day, week, month, and year. If the scheduling frequencies of ancestor and descendant tasks are different, DataWorks allows you to configure scheduling dependencies between the ancestor and descendant tasks based on the principle of scheduling time proximity. Note
| Principle of scheduling time proximity for scheduling dependencies |
2 | After you configure scheduling dependencies between tasks in DataWorks, the dependencies between data of the tasks are established. Regardless of the scheduling time of a task, the task meets the conditions to run only after all its ancestor tasks finish running. | Impacts of dependencies between tasks on the running of the tasks |
3 | You can understand the principle of scheduling time proximity for scheduling dependencies based on sample scenarios. |
|
Principle of scheduling time proximity for scheduling dependencies
In DataWorks, an instance is generated for an auto triggered task each time the task is scheduled to run. Therefore, multiple instances are generated. A descendant instance depends on an ancestor instance. Therefore, the ancestor instance must be generated before the scheduling time of the descendant instance arrives.
In most cases, if you do not specify an instance on which the current instance depends, the dependencies for the current instance conform to the principle of scheduling time proximity. This indicates that the current instance depends on the instance whose scheduling time is the closest to but not later than that of the current instance and that is not an ancestor instance of other instances. The following table describes the dependency principles in different scenarios.
If the scheduling time of a task is earlier than that of its ancestor task, the task is not run at the scheduling time. The task can be scheduled to run only after the ancestor task finishes running.
Based on the principle of scheduling time proximity, if the ancestor task of a task does not have an instance whose scheduling time is earlier than that of the first instance that is generated for the task on the current day, the first instance of the task depends on the first instance that is generated for the ancestor task on the current day by default.
Scenario | Description | Diagram |
Dependency scenarios for tasks scheduled by hour and tasks scheduled by minute | Dependencies between tasks are relevant to the scheduling time of the instances generated for the tasks.
| The following diagrams show scheduling dependencies between a task scheduled by hour and a task scheduled by minute in various scenarios. |
Dependencies between tasks are irrelevant to the scheduling time of the instances generated for the tasks. In the scenario where a task scheduled by hour depends on another task scheduled by hour or a task scheduled by minute depends on another task scheduled by minute, one-to-one mappings are established between the ancestor and descendant instances if the numbers of the scheduling cycles (instances generated on the current day) for both the ancestor and descendant tasks are the same. | ||
Scenario where a task scheduled by day depends on a task scheduled by hour or minute |
|
For more information about the dependencies and running situations of tasks in various dependency scenarios, see Appendix: Complex dependency scenarios.
Impacts of dependencies between tasks on the running of the tasks
After you configure dependencies between tasks, the descendant task cannot start to run even at the scheduling time of the descendant task if the ancestor task is not in the Successful state.
For example, Task B scheduled by hour depends on Task A scheduled by day.
Task A: The scheduling time is
07:00
.Task B: The scheduling time is
00:00
,08:00
, and16:00
.
If Task A does not finish running, Task B is not scheduled to run when the scheduling time 00:00
of Task B arrives. The earliest time at which Task B is actually run is 07:00
.
Appendix: Complex dependency scenarios
The following tables describe the dependencies and running situations of tasks with different scheduling frequencies in various dependency scenarios.
If the scheduling time of a task is earlier than that of its ancestor task, the task is not run at the scheduling time. The task can be scheduled to run only after the ancestor task finishes running.
Based on the principle of scheduling time proximity, if the ancestor task of a task does not have an instance whose scheduling time is earlier than that of the first instance that is generated for the task on the current day, the first instance of the task depends on the first instance that is generated for the ancestor task on the current day by default.
Dependencies for tasks scheduled by hour
Dependency scenario | Description | Diagram |
Scenario where a task scheduled by hour depends on another task scheduled by hour |
| |
Scenario where a task scheduled by hour depends on a task scheduled by day |
| |
Scenario where a task scheduled by hour depends on a task scheduled by minute |
|
Dependencies for tasks scheduled by day
Dependency scenario | Description | Diagram |
Scenario where a task scheduled by day depends on another task scheduled by day in the same scheduling cycle |
| |
Scenario where a task scheduled by day depends on a task scheduled by hour of the current day |
| |
Scenario where a task scheduled by day depends on a task scheduled by hour or minute on the previous day |
| The following diagram provides examples on how a task scheduled by day depends on a task scheduled by hour on the previous day. |
Dependencies for tasks scheduled by minute
Dependency scenario | Description | Diagram |
Scenario where a task scheduled by minute depends on a task scheduled by hour |
| |
Scenario where a task scheduled by minute depends on a task scheduled by day |
|
Dependencies on tasks scheduled by week, month, or year
If a task scheduled by day, hour, or minute depends on a task scheduled by week, month, or year, dry-run instances are generated for the task scheduled by week, month, or year in a period of time that falls out of the scheduling time. The dry-run instances do not generate data, occupy resources, or block descendant tasks from running.
Sample scenario in which a task scheduled by day depends on a task that is scheduled by week and for which the self-dependency is not configured:
The task scheduled by week is scheduled to run every Monday and Friday. Dry-run instances are generated for the task every Tuesday, Wednesday, Thursday, Saturday, and Sunday. When the scheduling time of the dry-run instances arrives, the status of the instances is directly set to successful, but the code of the instances is not run. Dry-run instances do not affect the normal running of descendant instances.
Instances are generated for the task scheduled by day every day and depend on the instances that are generated for the task scheduled by week every day, including the dry-run instances. The instances that are generated for the task scheduled by day can be scheduled to run after the instances that are generated for the task scheduled by week are successfully run every day.