If you want the system to periodically schedule a node, you must define scheduling properties such as the scheduling cycle, dependencies, and scheduling parameters for the node. This topic provides an overview of the configuration of scheduling properties.
Prerequisites
A node is created. Data development in DataWorks is based on nodes. Tasks of different types of compute engines are encapsulated into different types of nodes in DataWorks. You can select a specific type of node for data development based on your business requirements. For more information, see General development process.
The Periodic scheduling switch is turned on. A node can be automatically scheduled based on its scheduling properties only if Periodic scheduling is turned on for the workspace to which the node belongs on the Scheduling Settings tab of the Settings page in DataStudio. For information about how to turn on the switch on the Settings page in DataStudio, see Configure scheduling settings.
Precautions
Scheduling configurations defined for a node are the scheduling properties used to run the node. The node can be scheduled based on the scheduling properties only after the node is deployed to the production environment.
The scheduling time specified for a node in DataStudio is the expected running time of an instance that is generated for the node. The actual running time of the instance is affected by the execution of ancestor instances of the current instance. For information about the conditions that must be met before a node starts to run, see Use the Intelligent Diagnosis feature.
DataWorks allows you to configure scheduling dependencies between nodes that have different scheduling frequencies. Before you configure scheduling dependencies, we recommend that you view the Principles and samples of scheduling configurations in complex dependency scenarios topic to understand the principles and samples of scheduling configurations in complex dependency scenarios.
In DataWorks, an auto triggered node generates instances based on the scheduling frequency and the number of scheduling cycles of the node. For example, the number of instances generated for a node scheduled by hour every day is the same as the number of scheduling cycles of the node every day. The node is run as an instance.
If you configure scheduling parameters, the input parameters in the code of an auto triggered node in each scheduling cycle are determined by the scheduling time of the node in the specific scheduling cycle and the expressions of the scheduling parameters. For information about the replacement relationship between input parameters in node code and configurations of scheduling parameters, see Supported formats of scheduling parameters.
Go to the Properties tab
Go to the DataStudio page.
Log on to the DataWorks console. In the left-side navigation pane, choose . On the page that appears, select the desired workspace from the drop-down list and click Go to DataStudio.
Go to the Properties tab.
On the DataStudio page, find the desired node and go to the configuration tab of the node.
On the configuration tab of the node, click Properties in the right-side navigation pane. The Properties tab appears.
Configure scheduling properties
On the Properties tab, you can configure scheduling properties for a node in different sections. The following table describes the scheduling properties.
Section | Description |
In this section, you can view or configure basic information about the node, such as the node name, node ID, node type, and owner.
| |
In this section, you can configure the scheduling parameters that are used to define how the node is scheduled. The scheduling parameters provided by DataWorks can be classified into custom parameters and built-in variables based on their value assignment methods. Scheduling parameters support dynamic parameter settings for node scheduling. Note If you define a variable when you edit node code, you must assign a value to the variable. | |
In DataWorks, a node can run as an instance. In this section, you can configure time properties for a node to determine how the node is scheduled to run in the production environment after you commit and deploy the node to the production environment.
| |
In this section, you can select the resource group for scheduling that you want to use to deploy the node to the production environment. | |
In this section, you can configure scheduling dependencies for the node. Nodes are scheduled to run in sequence based on scheduling dependencies. The descendant nodes start to run after the ancestor nodes finish running. This ensures that valid business data is generated at the earliest opportunity. You can use the automatic parsing feature to parse node dependencies from code. You can also manually configure scheduling dependencies for nodes.
Note
| |
In this section, you can define input and output parameters to transmit data between ancestor and descendant nodes. After you define an output parameter for a node and specify a value for the output parameter, you can define an input parameter for the descendant node of the node and configure the descendant node to reference the value of the output parameter in the input parameter. |
What to do next: Commit and debug a node
After you configure scheduling properties for a node, you can commit and debug the node to check whether the scheduling configurations for the node meet your business requirements. For more information, see Debugging procedure. After the node is debugged, you can deploy the node to the production environment for periodic scheduling. You can perform O&M operations on the node in the production environment. For more information, see Perform basic O&M operations on auto triggered nodes.