Confirm the lineage of a table

Before you configure scheduling dependencies for a node, you must confirm the lineage of the table generated by the node. For example, you must confirm the data lineage of the table and the partition data in the table. You must also configure scheduling dependencies for a node based on the lineage of the table generated by the node. This topic describes how to confirm the lineage of a table. This topic also describes the impacts when you do not configure scheduling dependencies for a node based on the lineage of the table generated by the node.

Background information

The following table describes how to confirm the lineage of a table in different scenarios and the impacts when you do not configure scheduling dependencies for a node based on the lineage of the table generated by the node.

Item	Description
Confirm the lineage of a table	If the table generated by Node A in a workspace depends on another table generated by Node B in the same workspace, you can confirm the partition data in the table generated by Node B every day based on the scheduling parameter configurations of Nodes A and B and the replacement results of the scheduling parameters of Nodes A and B.
Confirm the lineage of a table	If the table generated by Node A in a workspace depends on another table generated by Node B in another workspace, you can confirm the partition data in the table generated by Node B every day based on the output information in DataMap.
Impacts of not configuring scheduling dependencies for a node based on the lineage of the table generated by the node	The lineage of the table generated by a node exist but you configure scheduling dependencies for the node not based on the lineage of the table. As a result, an error occurs when descendant nodes obtain data from the node.
	The lineage of the table generated by a node exist and you configure scheduling dependencies for the node based on the lineage of the table. However, a descendant instance depends on an unexpected ancestor instance. As a result, an error occurs when the node that generates the descendant instance obtains data from the ancestor instance generated for the node.

Usage notes

In DataWorks, the partitions in a table from which a node reads data or the partitions in a table that stores data generated by a node are determined by scheduling parameters configured for the node. If the partitions in the table generated by an ancestor node do not match the partitions in the table on which a descendant node depends, you can determine whether to modify the configurations of the scheduling parameters of the descendant node based on your business requirements.

If you want the instance generated for Node A in the current cycle to depend on the partition data in the table generated by the instance that is generated for Node B in the previous cycle, you can configure cross-cycle scheduling dependencies for Node A. This way, the instance generated for Node A in the current cycle depends on the instance generated for Node B in the previous cycle.

Note

In partitioned table scenarios, make sure that the partitions in the table generated by a node are the partitions in the table on which the current node depends.

Confirm the lineage of the table generated by a node on which the current node depends (both the nodes are in the same workspace)

In most cases, a node periodically writes data to a specific partition in a specific table based on the scheduling parameters you configure for the node. For information about dynamic replacement of scheduling parameters, see Scheduling parameters. If Node A in a workspace depends on Node B in the same workspace, you can check the configurations of the scheduling parameters of Node A.

Confirm dependencies between ancestor and descendant table data in the development environment.
Go to the configuration tab of the ancestor node, and view the configurations of the scheduling parameters of the node and the code details of the node.
Confirm the ancestor and descendant table data output in the production environment.

Confirm the lineage of the table generated by a node on which the current node depends (the nodes are in different workspaces)

If Node A in a workspace depends on Node B in another workspace, you can confirm the partition data in the table generated by Node B every day based on the output information in DataMap. For example, you can confirm whether the data timestamp for the partition data in the table generated by Node B every day is the previous day or the current day. Confirm the lineage of the table generated by a node on which the current node depends (the nodes are in different workspaces)

Impacts of not configuring scheduling dependencies for a node based on the lineage of the table generated by the node

Scenario 1: A strong lineage of the table generated by a node exists but you do not configure scheduling dependencies for the node based on the lineage of the table. As a result, an error occurs when descendant nodes obtain data from the node.

If Table A is specified in the SELECT statement in the code of the Job_B node but the Job_A node that generates data of Table A is not configured as an ancestor node for the Job_B node, the Job_B node may start to run before the data of Table A is generated. In this case, the Job_B node cannot be run or generate data. The scheduling time of the Job_A node is earlier than that of the Job_B node. However, if the Job_A node cannot generate data before 02:00, an error occurs when the Job_B node obtains data from the Job_A node. The Job_A node may fail to generate data at 01:00 as expected due to the following reasons:

An error occurs when an ancestor node of the Job_A node is run or an ancestor node of the Job_A node runs at a slow speed.
The Job_A node or an ancestor node of the Job_A node is waiting for resources.
An ancestor node of the Job_A node is frozen on a specific day.

Scenario 2: You configure scheduling dependencies for a node based on the lineage of the table generated by the node, but the time at which a descendant node obtains data from the table generated by an ancestor node is earlier than the generation time of the table.

If you configure same-cycle scheduling dependencies for a node, and the partitions in the table generated by another node do not match the partitions in the table on which the current node depends, data quality issues may occur when the current node obtains data from its ancestor node or an error occurs for the current node.

Note

When a MaxCompute node uses the max_pt function, make sure that the partition data in the table generated by an ancestor node of the MaxCompute node every day is valid.