Before you configure scheduling dependencies for a node, you must confirm the lineage of the table generated by the node. For example, you must confirm the data lineage of the table and the partition data in the table. You must also configure scheduling dependencies for a node based on the lineage of the table generated by the node. This topic describes how to confirm the lineage of a table. This topic also describes the impacts when you do not configure scheduling dependencies for a node based on the lineage of the table generated by the node.
Background information
Item | Description |
---|---|
Confirm the lineage of a table | If the table generated by Node A in a workspace depends on another table generated by Node B in the same workspace, you can confirm the partition data in the table generated by Node B every day based on the scheduling parameter configurations of Nodes A and B and the replacement results of the scheduling parameters of Nodes A and B. |
If the table generated by Node A in a workspace depends on another table generated by Node B in another workspace, you can confirm the partition data in the table generated by Node B every day based on the output information in DataMap. | |
Impacts of not configuring scheduling dependencies for a node based on the lineage of the table generated by the node | The lineage of the table generated by a node exist but you configure scheduling dependencies for the node not based on the lineage of the table. As a result, an error occurs when descendant nodes obtain data from the node. |
The lineage of the table generated by a node exist and you configure scheduling dependencies for the node based on the lineage of the table. However, a descendant instance depends on an unexpected ancestor instance. As a result, an error occurs when the node that generates the descendant instance obtains data from the ancestor instance generated for the node. |
Usage notes
In DataWorks, the partitions in a table from which a node reads data or the partitions in a table that stores data generated by a node are determined by scheduling parameters configured for the node. If the partitions in the table generated by an ancestor node do not match the partitions in the table on which a descendant node depends, you can determine whether to modify the configurations of the scheduling parameters of the descendant node based on your business requirements
Confirm the lineage of a table
Confirm the lineage of the table generated by a node on which the current node depends (both the nodes are in the same workspace)
In most cases, a node periodically writes data to a specific partition in a specific table based on the scheduling parameters you configure for the node. For information about dynamic replacement of scheduling parameters, see Scheduling parameters. If Node A in a workspace depends on Node B in the same workspace, you can check the configurations of the scheduling parameters of Node A.- Confirm dependencies between ancestor and descendant table data in the development environment
Go to the configuration tab of the ancestor node, and view the configurations of the scheduling parameters of the node and the code details of the node.
- Confirm the ancestor and descendant table data output in the production environment
Confirm the lineage of the table generated by a node on which the current node depends (the nodes are in different workspaces)
If Node A in a workspace depends on Node B in another workspace, you can confirm the partition data in the table generated by Node B every day based on the output information in DataMap. For example, you can confirm whether the data timestamp for the partition data in the table generated by Node B every day is the previous day or the current day.Impacts of not configuring scheduling dependencies for a node based on the lineage of the table generated by the node
Scenario 1: A strong lineage of the table generated by a node exists but you do not configure scheduling dependencies for the node based on the lineage of the table. As a result, an error occurs when descendant nodes obtain data from the node.
If Table A is specified in the SELECT statement in the code of theJob_B
node but the Job_A
node that generates data of Table A is not configured as an ancestor node for the Job_B node, the Job_B node may start to run before the data of Table A is generated. In this case, the Job_B
node cannot be run or generate data. The scheduling time of the Job_A
node is earlier than that of the Job_B
node. However, if the Job_A node cannot generate data before 02:00, an error occurs when the Job_B
node obtains data from the Job_A
node. The Job_A
node may fail to generate data at 01:00 as expected due to the following reasons:- An error occurs when an ancestor node of the Job_A node is run or an ancestor node of the Job_A node runs at a slow speed.
- The Job_A node or an ancestor node of the Job_A node is waiting for resources.
- An ancestor node of the Job_A node is frozen on a specific day.