This topic describes how to run a node, perform O&M operations on the node, and troubleshoot errors that occur on the node based on logs.
Background information
For example, you configure settings to allow a batch synchronization node to run at 02:00 every Tuesday in the operation of configuring a scheduling cycle and scheduling dependencies for the node. After you commit the node, you need to wait until the next day to view the automatic execution result of the node. DataWorks allows you to run the node in the following modes: test run, data backfilling for the node, and periodic run. This way, you can check whether the scheduling time of each instance that is generated for the node, scheduling dependencies of the node, and data that is generated meet your business requirements.
Test run: Nodes are manually triggered. We recommend that you use this mode if you need to only check the scheduling time and running of a single node. For more information, see Test an auto triggered node and view test instances generated for the node.
Data backfilling: Nodes are manually triggered. We recommend that you use this mode if you need to check the scheduling time of multiple nodes and scheduling dependencies among the nodes, or if you need to reperform data analysis and computing from a specific root node. For more information, see Backfill data for an auto triggered node and view data backfill instances generated for the node.
Periodic run: Nodes are automatically triggered. After you commit a node, the scheduling system automatically runs instances for the node from 00:00 the next day. When the scheduling time of each instance arrives, the scheduling system checks whether the ancestor instances of the instance have been successfully run. If all the ancestor instances have been successfully run, the scheduling system runs the instance without manual intervention. For more information, see View auto triggered node instances.
The scheduling system generates instances for manually triggered nodes and auto triggered nodes based on the same rules.
The scheduling system generates instances for each node every day regardless of whether the node is scheduled to run by minute, hour, day, week, or month.
The scheduling system runs the instances that are generated for a node only on the days when the scheduling time of the instances arrives. In this case, run logs are generated for the instances.
The scheduling system does not run the instances that are generated for a node on other dates. Instead, the scheduling system changes the status of the instances to successful when the conditions for running the instances are met. No run logs are generated for the instances.
For information about node O&M, see Operation Center overview.
Perform a test run
In the upper-left corner of the current page, click the icon and choose
.In the left-side navigation pane of the Operation Center page, choose
.On the page that appears, find the desired node and click Test in the Actions column.
In the Test dialog box, configure the Test Name and Data Timestamp parameters and click OK.
On the Test Instance page, click the name of the generated instance. The directed acyclic graph (DAG) of the instance appears on the right side.
In the DAG, you can view the scheduling dependencies and details of the instance, and perform operations on the instance. For example, you can right-click the instance and select Stop or Rerun to stop or rerun the instance.
NoteIn test run mode, a node is manually triggered. When the scheduling time arrives, the scheduling system immediately runs the instance that is generated for the node, regardless of whether the ancestor instances of the instance have been successfully run.
A synchronization node is configured to run at 02:00 every Tuesday. Based on the instance generation rules described earlier in this topic, if the data timestamp, which is one day earlier than the time at which the node is scheduled to run, is set to Monday for a test run, the scheduling system runs the instance that is generated for the synchronization node at 02:00 on Tuesday. If the data timestamp is not set to Monday for the test run, the scheduling system changes the state of the instance to successful at 02:00 on Tuesday with no run logs generated.
Data backfilling
We recommend that you backfill data for a node if you need to check the scheduling time of multiple nodes and scheduling dependencies among the nodes, or if you need to reperform data analysis and computing from a specific root node. To backfill data for a node, perform the following steps:
In the left-side navigation pane of the Operation Center page, choose
.On the page that appears, find the desired node and choose
in the Actions column.In the Backfill Data dialog box, configure the parameters and click OK. The following table describes the parameters.
Parameter
Description
Data Backfill Instance Name
The name of the data backfill instance that is generated for the node.
Data Timestamp
The data timestamp of the data backfill instance. The value of the data timestamp is
one day earlier than the time at which the instance is scheduled to run
.Node
The node for which you want to backfill data. By default, the current node is specified and cannot be changed.
Concurrency
Specifies whether to run multiple data backfill instances in parallel. If you set this parameter to Yes, you also need to specify the number of data backfill instances that can be run in parallel.
On the Patch Data page, click the name of the generated data backfill instance to view the DAG of the instance.
In the DAG, you can view the scheduling dependencies and details of the instance, and perform operations on the instance. For example, you can right-click the instance and select Stop or Rerun to stop or rerun the instance.
NoteIn data backfilling mode, the running of a data backfill instance on a day depends on the instance that is scheduled to run on the previous day. For example, in the scenario in which you backfill data in the period of time from September 15, 2017 to September 18, 2017 for a node, if the instance on September 15, 2017 fails to run, the instance on September 16, 2017 cannot be run.
A synchronization node is configured to run at 02:00 every Tuesday. Based on the instance generation rules described earlier in this topic, if the data timestamp, which is one day earlier than the time at which the node is scheduled to run, is set to Monday for a data backfill operation, the scheduling system runs the instance that is generated for the synchronization node at 02:00 on Tuesday. If the data timestamp is not set to Monday for the data backfill operation, the scheduling system changes the state of the instance to successful at 02:00 on Tuesday with no run logs generated.
Periodic run
In periodic run mode, the scheduling system automatically runs instances for all nodes based on the scheduling settings of the nodes. No menu item is provided for you to control the periodic run of nodes on the Operation Center page. You can view the details and run logs of an instance that is generated for a node by using one of the following methods:
In the left-side navigation pane of the Operation Center page, choose
. On the Instance Perspective tab of the page that appears, configure parameters such as Data Timestamp or Schedule to search for a specific instance of the node. Then, choose More > View Node Details in the Actions column to view details of the instance, or choose More > View Runtime Log in the Actions column to view run logs of the instance.On the Instance Perspective tab of the Cycle Instance page, find the desired instance of the node and click the name of the instance in the General column to open the DAG of the instance.
In the DAG, you can view the scheduling dependencies and details of the instance, and perform operations on the instance. For example, you can right-click the instance and select Stop or Rerun to stop or rerun the instance.
NoteIf an ancestor node has not been run, its descendant nodes are not run either.
If the initial state of an instance is pending, the scheduling system checks whether all its ancestor instances have been successfully run when the scheduling time of the instance arrives.
An instance can be run only after all its ancestor instances are successfully run and when its scheduling time arrives.
If an instance is pending, check whether all its ancestor instances are successfully run and whether the scheduling time of the instance arrives.