You can configure time properties for a node to determine how the node is scheduled to run in the production environment after you commit and deploy the node to the production environment. In the Schedule section of the Properties tab, you can configure various parameters that are related to the time properties of the node, such as Instance Generation Mode, Scheduling Cycle, Scheduled time, Rerun, and Timeout definition. This topic describes how to configure time properties for a node.
Configuration guide
To configure time properties for a node, you need to perform the following operations: Go to the DataStudio page, double-click the name of a node in the Business Flow section in the Scheduled Workflow pane to go to the configuration tab of the node, and then click Properties in the right-side navigation pane of the configuration tab. In the Schedule section of the Properties tab, configure time properties for the node.
You can refer to this article to configure the scheduling properties of a single node on the Properties tab. You can also use the batch operation feature to modify scheduling properties of multiple nodes at the same time. For example, you can use the feature to modify the scheduling time or the resource groups for scheduling of multiple nodes at the same time.
The following table describes the parameters that are related to the time properties of a node.
Parameter | Description |
The mode in which instances generated for the node take effect in the production environment. | |
The mode in which the node is run in the production environment. | |
The scheduling dates and the scheduling type of the node. | |
The scheduling frequency of the node. This parameter determines the number of instances generated for the node and the time at which the instances are run in the production environment. | |
The timeout period for the node. If the period of time for which the node is run exceeds the specified timeout period, the node fails. | |
Specifies whether to rerun the node and the conditions in which the node can be rerun. When you configure this parameter, make sure that the data idempotence of the node is not affected. | |
The period of time during which the node is automatically scheduled to run. The node is not automatically run in the period of time that falls out of the specified time range. |
Usage notes
The time properties of a node define only the time when you want to schedule the node. Whether the node is run and the actual time when the node is run are determined by multiple factors. The factors include but are not limited to the following items:
Control of the scheduling switch
The node can be automatically scheduled based on its scheduling properties only if Periodic scheduling is turned on for the workspace to which the node belongs. You can turn on the switch on the Scheduling Settings tab of the Settings page in DataStudio for a workspace. For more information, see Configure scheduling settings.
Impacts of scheduling dependencies of the node on the node execution time
The scheduling time that you specify for a node takes effect only on the node. The actual time at which the node is run is related to the scheduling time of the ancestor nodes of the node. The node can start to run only if the scheduling time of the ancestor nodes arrives and the ancestor nodes are successfully run. The same logic applies to a node whose scheduling time is earlier than the scheduling time of the ancestor nodes of the node. For more information, see Impacts of dependencies between nodes on the running of the nodes.
Impacts of resource groups required to run the node on the node execution time
The running of the node is determined by not only the scheduling time of the ancestor nodes of the node and whether the ancestor nodes are successfully run, but also the resource groups that are required to run the node. Whether the resources that are required to run the node are sufficient at the scheduling time of the node also affects the running of the node. For more information, see Node execution mechanisms.
Impacts of environments
Only nodes that are deployed to the production environment can be automatically scheduled to run. If you want nodes to be periodically scheduled, you must deploy the nodes to the production environment. Nodes in the development environment cannot be automatically scheduled.
Ways in which the node is run
In DataWorks, an auto triggered node generates instances based on the scheduling frequency and the number of scheduling cycles of the node. For example, the number of instances generated for a node scheduled by hour every day is the same as the number of scheduling cycles of the node every day. The node is run as an instance.
Scheduling time
During peak hours such as early mornings, all nodes (including the root node of the current workspace) for which the scheduling time is set to 00:00 will be scheduled to run within the time range from 00:00 to 00:05.
Modes in which instances take effect
After you commit and deploy an auto triggered node to the scheduling system in the production environment, instances that can be automatically scheduled are generated for the node based on the value of the Instance Generation Mode parameter. Regardless of the value of the Instance Generation Mode parameter, you can view the latest scheduling dependencies of the node on the Auto Triggered Tasks page in Operation Center. The value of the Instance Generation Mode parameter can be Next Day or Immediately After Deployment. The time when the generated instances take effect and the time when the scheduling dependencies of the node are updated are determined by the value of the Instance Generation Mode parameter. The following table describes the valid values of the Instance Generation Mode parameter.
Regardless of the value of the Instance Generation Mode parameter, the changes that are made during the period of 23:30 to 24:00
will take effect on the third day after the node is deployed to the production environment. We recommend that you do not make task changes during this period.
Value of the Instance Generation Mode parameter | Description |
Next Day | Instances generated for a node are automatically scheduled on the next day after you deploy the node to the production environment. You can view the status of the instances on the Auto Triggered Instances page in Operation Center. If you want to run the node on the day when you deploy the node, you can use the data backfill feature for the node. If you select the previous day as the data timestamp when you configure settings related to data backfill for the node, data backfill instances generated for the node are run in the same manner as the instances that are scheduled to run on the current day. |
Instances generated for a node are automatically scheduled on the day when you deploy the node to the production environment. You can view the status of the instances on the Auto Triggered Instances page in Operation Center. Note Instances generated for a node can be normally run only if the scheduling time of the node is later than the time when the node is deployed. If you set it to a point in time in the past, a dry run is performed on the instances and no data is generated. To ensure that the instances are normally run, you must make sure that the scheduling time of the node that generates the instances is at least 10 minutes later than the time when you deploy the node.
|
Scheduling types
The following table describes the valid values of the Recurrence parameter.
Value | Description | Scenario |
Normal | If you set the Recurrence parameter to Normal, the node is run and generates data based on the settings of the scheduling cycle and scheduling time. After the node is run as expected, the descendant nodes of the node are also triggered and run. By default, the Recurrence parameter is set to Normal. | You want a node and the instances that are generated for this node to be run as expected. |
Skip Execution | If you set the Recurrence parameter to Skip Execution, the node is scheduled based on the settings of the scheduling cycle and scheduling time. However, the status of the node becomes frozen and the node generates no data. When the node is scheduled, the system directly returns a failure response and the descendant nodes cannot be run. Note The following icon is displayed next to the name of a node that is frozen in Operation Center: . | You want to freeze a node and the instances generated for the node. In this case, the current node and its descendant nodes cannot be run. If you do not need to run a workflow within a specified period of time, you can freeze the root node of the workflow in that period of time based on your business requirements. You can also unfreeze the root node to resume the workflow based on your business requirements. For information about how to unfreeze a node, see Node freezing and unfreezing. |
Dry Run | If you set the Recurrence parameter to Dry Run, the node is scheduled based on the settings of the scheduling cycle and scheduling time. However, a dry run is performed on the node and the node generates no data. When the node is scheduled, the scheduling system returns a success response. However, the running duration is | You want to suspend a node in a period of time and want the descendant nodes of the node to be run as expected. |
Scheduling calendar
The scheduling calendar feature is used to define the scheduling dates and the scheduling type of a node. Valid values for the Scheduling Calendar parameter in the Schedule section of the Properties tab in DataStudio:
Default Calendar: the calendar that is provided by DataWorks and is suitable for common scenarios.
Customize Calendar: the calendar that is configured by users and is suitable for industries and scenarios that require flexible scheduling dates, such as the financial industry. To configure a scheduling calendar for a node, you need to specify the items such as the workspaces to which the scheduling calendar can be applied, the validity period of the scheduling calendar, and the scheduling type of the node on a specific date. For more information, see Configure a scheduling calendar.
You can schedule the node at the specified point in time based on the selected scheduling calendar and other scheduling settings such as the scheduling type and scheduling frequency.
Scheduling frequency
The scheduling frequency of a node determines the number of cycles that the node is automatically run in the scheduling scenario. A scheduling frequency is used to define the interval at which the code logic of a node is actually executed in the scheduling system in the production environment. The node generates instances based on the scheduling frequency and the number of scheduling cycles of the node. For example, the number of instances generated for a node scheduled by hour every day is the same as the number of scheduling cycles of the node every day. The node is run as an instance.
The scheduling frequency of a node is unrelated to the scheduling frequencies of the ancestor nodes of the node.
The interval at which the node is scheduled is related to the scheduling frequency of the node and is unrelated to the scheduling frequencies of the ancestor nodes of the node.
DataWorks allows you to configure scheduling dependencies between nodes whose scheduling frequencies are different.
In DataWorks, an auto triggered node generates instances based on the scheduling frequency and the number of scheduling cycles of the node. For example, the number of instances generated for a node scheduled by hour every day is the same as the number of scheduling cycles of the node every day. The node is run as an instance. In essence, dependencies between auto triggered nodes are dependencies between instances that are generated for the nodes. The number of instances generated for ancestor and descendant auto triggered nodes and dependencies between the instances vary based on the scheduling frequencies of the ancestor and descendant nodes. For information about scheduling dependencies between nodes whose scheduling frequencies are different, see Principles and samples of scheduling configurations in complex dependency scenarios.
Dry-run instances are generated for a node on the days when the node is not scheduled to run.
For a node that is not scheduled to run every day, such as a node scheduled by week or month, DataWorks generates dry-run instances for the node on the days when it is not scheduled to run. The dry-run instances return success results when the scheduling time of the node arrives on these days. This way, if a node scheduled by day depends on the node scheduled by week or month, the node scheduled by day can be run as expected. In this case, a dry run is performed on the node scheduled by week or month, but the node scheduled by day is run as scheduled.
Node execution time
You can specify only the time when you want to schedule a node. The actual time when the node is run is affected by multiple factors. The running of a node is affected by various factors such as the scheduling time of the ancestor nodes of the node, resources required to run the node, and conditions for running the node. For more information, see Node execution conditions.
The following table describes the valid values of the Scheduling Cycle parameter.
Value of the Scheduling Cycle parameter | Description | Sample configuration in a typical scenario |
If a node is scheduled by minute, the node is automatically run once | The node is run once every 30 minutes. | |
If a node is scheduled by hour, the node is automatically run once | The node is run once every hour. | |
If a node is scheduled by day, the node is automatically run at a specified point in time every day. If you create an auto triggered node that is scheduled by day, the node is scheduled to run at 00:00 every day by default. You can change the scheduling time of the node based on your business requirements. | The node is run at 00:00 every day. | |
If a node is scheduled by week, the node is automatically run at a specified point in time on specific days every week. Important For a node that is not scheduled every day, DataWorks generates dry-run instances for the node on the days when it is not scheduled to run. The dry-run instances return success results but do not generate data. | The node is run at 12:00 every Friday. | |
If a node is scheduled by month, the node is automatically run at a specified point in time on specific days every month. Important For a node that is not scheduled every day, DataWorks generates dry-run instances for the node on the days when it is not scheduled to run. The dry-run instances return success results but do not generate data. | The node is run at a specified point in time on the last day of every month. | |
If a node is scheduled by year, the node is automatically run at a specified point in time on specific days every year. Important For a node that is not scheduled every day, DataWorks generates dry-run instances for the node on the days when it is not scheduled to run. The dry-run instances return success results but do not generate data. | The node is run at a specified point in time on the last day of the first month in every quarter. |
Timeout period
You can use the Timeout definition parameter to specify a timeout period for a node. If the period of time for which the node is run exceeds the specified timeout period, the node fails. Take note of the following items when you use this parameter:
The timeout period applies to auto triggered node instances, data backfill instances, and test instances.
The default timeout period ranges from 72 hours to 168 hours. The system automatically adjusts the default timeout period for a node based on system loads.
You can customize a timeout period, but it cannot exceed 168 hours.
Rerun properties
In the Schedule section, you can configure the conditions, interval, and number of times for rerunning a node.
When you configure rerun properties for a node, make sure that the data idempotence of the node is not affected based on your business requirements. This helps prevent data quality issues after a failed node is rerun. For example, when you create and develop an ODPS SQL node, you can replace the
INSERT INTO
statement with theINSERT OVERWRITE
statement.You can go to the Scheduling Settings tab of the Settings page in DataStudio to configure default scheduling settings for nodes to be created. For more information, see Configure scheduling settings.
Rerun
The following table describes the valid values of the Rerun parameter.
NoteYou can click Modify Default Settings next to the Rerun parameter to go to the Scheduling Settings tab.
Value
Scenario
Allow Regardless of Running Status
If the data idempotence of a node is not affected after the node is rerun multiple times, you can set the Rerun parameter to this value.
Allow upon Failure Only
If the rerun of a failed node does not affect the data idempotence but the rerun of a successful node does, you can set the Rerun parameter to this value.
Disallow Regardless of Running Status
If the data idempotence of a node cannot be ensured after the node is rerun, you can set the Rerun parameter to this value.
NoteIf you set the Rerun parameter to Disallow Regardless of Running Status, the system does not automatically rerun the node after the system recovers from an exception.
The Auto Rerun upon Failure parameter is not displayed if you set the Rerun parameter to Disallow Regardless of Running Status.
Auto Rerun upon Failure
The following table describes the parameters you must configure if you allow automatic reruns after an error occurs.
Parameter
Description
Number of re-runs
The default number of times that an auto triggered node is rerun after it fails to run as scheduled.
Valid values: 1 to 10. The value 1 indicates that the node is rerun once after it fails to run as expected. The value 10 indicates that the node is rerun ten times after it fails to run as expected. You can configure this parameter based on your business requirements.
Rerun interval
The interval at which a node is rerun after it fails to run as scheduled. You can configure this parameter based on your requirements. Valid values: 1 to 30. Default value: 30. Unit: minutes.
NoteThe Auto Rerun upon Failure parameter is not displayed if you set the Rerun parameter to Disallow Regardless of Running Status. In this case, the node is not allowed to rerun after it fails to run as scheduled.
You can specify the default number of reruns and default rerun interval for the nodes in a workspace on the Scheduling Settings tab. For more information, see Configure scheduling settings.
The automatic rerun feature does not take effect if a node fails because the timeout period is exceeded.
Validity period
You can specify a validity period during which a node is automatically run as scheduled. The node is not automatically run in the period of time that falls out of the specified time range. Nodes whose validity period expires are expired nodes. You can view the number of expired nodes on the O&M Dashboard page of Operation Center and undeploy the nodes based on your requirements.
Appendix: Description of the dry-run property
For a node that is scheduled by week, month, or year, the scheduling system runs the node at the scheduling time every day. On the days that are not specified to run the node, a dry run is performed on the node and the node generates no data. The following descriptions provide the effects of a dry run:
The scheduling system directly returns a success response, and the running duration is
0
second.No run logs are generated.
The running of descendant nodes is not affected.
No resources are occupied.