If you use an external scheduling system and want to trigger DataWorks nodes after nodes in the scheduling system are run, you can use an HTTP Trigger node of DataWorks to trigger the DataWorks nodes. This topic describes how to use an HTTP Trigger node of DataWorks if an external scheduling system is used to trigger DataWorks nodes. This topic also describes the precautions of using an HTTP Trigger node.
Prerequisites
DataWorks Enterprise Edition or a more advanced edition is activated.
A workflow is created. The compute nodes that need to be triggered by an HTTP Trigger node are created. In this topic, ODPS SQL nodes are used as the compute nodes. For more information about how to create an ODPS SQL node, see Develop a MaxCompute SQL task.
Background information
An external scheduling system is used to trigger nodes in the following typical scenarios:
An HTTP Trigger node has no ancestor nodes other than the root node of the workflow.
In this scenario, you must configure a trigger in the external scheduling system after you create the HTTP Trigger node. Then, you must configure scheduling properties for each node in DataWorks. For more information, see Create an HTTP Trigger node and Configure triggers in an external scheduling system.
An HTTP Trigger node has an ancestor node.
In this scenario, take note of the following items:
You must configure a trigger in the external scheduling system after you create the HTTP Trigger node. Then, you must configure the scheduling properties for each node in DataWorks. For more information, see Create an HTTP Trigger node and Configure triggers in an external scheduling system.
By default, the HTTP Trigger node uses the root node of the workflow as its ancestor node. You must manually change the ancestor node of the HTTP Trigger node to the required node.
The HTTP Trigger node can trigger its descendant nodes only after the ancestor node of the HTTP Trigger node is run as expected and the HTTP Trigger node receives a scheduling instruction from the external scheduling system.
If the HTTP Trigger node receives a scheduling instruction from the external scheduling system before the ancestor node of the HTTP Trigger node finishes running, the HTTP Trigger node does not trigger its descendant nodes. The DataWorks scheduling system retains the scheduling instruction from the external scheduling system and schedules the HTTP Trigger node to trigger the descendant nodes after execution of the ancestor node is complete.
ImportantThe scheduling instruction from the external scheduling system can be retained only for 24 hours. If the execution of the ancestor node is not complete within 24 hours, the scheduling instruction from the external scheduling system becomes invalid and is discarded.
Limits
Only DataWorks Enterprise Edition and a more advanced edition support HTTP Trigger nodes. For information about DataWorks editions, see Differences among DataWorks editions.
The Instance Generation Mode parameter can be set to only Next Day for HTTP Trigger nodes, and data backfill instances that are generated when data is backfilled cannot be triggered. Therefore, HTTP Trigger nodes can be triggered by the external scheduling system the next day after the HTTP Trigger nodes are deployed to the production environment.
HTTP Trigger nodes serve only as nodes that trigger other compute nodes. HTTP Trigger nodes cannot be used as compute nodes. You must configure the nodes that need to be triggered as the descendant nodes of an HTTP Trigger node.
If you want to rerun an HTTP Trigger node after a workflow is created and run, you must enable the external scheduling system to resend a scheduling instruction. Rerunning an HTTP Trigger node does not trigger the running of its descendant nodes that are in the Succeeded state.
If you want to obtain the execution results of the descendant nodes of an HTTP Trigger node within a historical period of time after a workflow is created and run, you must backfill data for the descendant nodes. For more information, see Backfill data for an auto triggered node and view data backfill instances generated for the node. The external scheduling system does not need to send a scheduling instruction to backfill data. Instead, the HTTP Trigger node directly triggers the data backfill operation for its descendant nodes.
Remarks
The HTTP Trigger node can be run only if the following requirements are met:
Auto triggered node instances are generated for the HTTP Trigger node. You can find the instances on the Cycle Instance page in Operation Center. The instances are in the waiting state before the RunTriggerNode operation is successfully called to run the instances. The descendant nodes of the HTTP Trigger node are blocked until the RunTriggerNode operation is successfully called to run the instances generated for the HTTP Trigger node.
All ancestor nodes on which the HTTP Trigger node depends are run as expected. The status of the ancestor nodes is Succeeded.
The scheduling time of the auto triggered node instances generated for the HTTP Trigger node arrives.
Sufficient scheduling resources are available for use when the HTTP Trigger node is run.
The status of the HTTP Trigger node is not Freeze.
The HTTP Trigger node can be triggered only if it is in the Pending state. If the HTTP Trigger node is triggered, it cannot be triggered again.
Create an HTTP Trigger node
Log on to the DataWorks console. In the left-side navigation pane, choose . On the page that appears, select the desired workspace from the drop-down list and click Go to DataStudio.
On the DataStudio page, move the pointer over the icon and choose .
Alternatively, find the workflow in which you want to create an HTTP Trigger node, click the workflow name, right-click General, and then choose
.- In the Create Node dialog box, configure the Name and Path parameters. Note The node name must be 1 to 128 characters in length and can contain letters, digits, underscores (_), and periods (.).
- Click Confirm.
On the configuration tab of the node, click Properties in the right-side navigation pane. On the Properties tab, configure scheduling properties for the node. For more information, see Configure basic properties.
NoteBy default, the HTTP Trigger node uses the root node of the workflow as its ancestor node. You must manually change the ancestor node of the HTTP Trigger node to the required node.
- Save and commit the node. Important You must configure the Rerun and Parent Nodes parameters on the Properties tab before you commit the node.
- Click the icon in the top toolbar to save the node.
- Click the icon in the toolbar.
- In the Commit Node dialog box, configure the Change description parameter.
- Click OK.
If the workspace that you use is in standard mode, you must click Deploy in the upper-right corner to deploy the node after you commit it. For more information, see Deploy nodes. - Perform O&M operations on the node. For more information, see Perform basic O&M operations on auto triggered nodes.
Configure triggers in an external scheduling system
You can use Alibaba Cloud SDK for Java or Python to configure a trigger in an external scheduling system or call an API operation to run an HTTP Trigger node.
Alibaba Cloud SDK for Java
Install Alibaba Cloud SDK for Java. For more information, see Quick start.
Specify the following Project Object Model (POM) configurations to use DataWorks SDK for Java:
<dependency> <groupId>com.aliyun</groupId> <artifactId>aliyun-java-sdk-dataworks-public</artifactId> <version>3.4.2</version> </dependency>
Use the sample code that is shown in the following figure and configure the parameters in the code.
You can go to the debugging page of the RunTriggerNode operation and view complete sample code on the SDK Sample Code tab.
Alibaba Cloud SDK for Python
Install Alibaba Cloud SDK for Python. For more information, see Quick start.
Run the following command to install DataWorks SDK for Python:
pip install aliyun-python-sdk-dataworks-public==2.1.2
Use the sample code that is shown in the following figure and configure the parameters in the code.
You can go to the debugging page of the RunTriggerNode operation and view complete sample code on the SDK Sample Code tab.
API operation
For more information about the API operation, see RunTriggerNode.