Problem description
When I commit a node, the system reports an error that the input and output of the
node are not consistent with the data lineage in the code developed for the node.
Possible causes
The table specified in the SELECT statement is inconsistent with the table added to Parent Nodes for the node, or the table specified in the INSERT or CREATE statement is inconsistent with the table added to Outputs for the node.
For example, in the preceding figure:
- The SELECT statement in the code of the node that you committed specifies table2, but table2 is not added to Parent Nodes for the committed node.
- doc_test is added to Outputs for the committed node, but the INSERT or CREATE statement in the code of the node does not specify a table named doc_test.
Solution
- If a table that is generated not by an auto triggered node is configured as the input
or output of a node, you can ignore the error and commit the node.
Node dependencies ensure that a node can successfully obtain the table data generated by its ancestor node that is scheduled to run. However, if the ancestor node is not scheduled to run, the system cannot detect whether the ancestor node has generated the latest table data. If the SELECT statement in the code of a node specifies a table that is generated not by an auto triggered node, you can remove the table that is specified by the SELECT statement from Parent Nodes for the committed node. Tables that are generated not by auto triggered nodes include the following types:
- Tables uploaded from on-premises machines to DataWorks
- Dimension tables
- Tables that are generated not by nodes scheduled by DataWorks
- Tables generated by manually triggered nodes
- If a table is generated by an auto triggered node, you must check whether the data
lineage and scheduling dependencies are correctly configured.
If you forcibly commit a node without checking the preceding item, the following errors may occur:
- Descendant nodes cannot obtain data from the ancestor nodes on which they depend. For example, the SELECT statement in the code of a node specifies Table A, and Table A is generated by a node scheduled to run every day. If Table A is not added to Parent Nodes for the node, and the execution of the scheduled node that generates Table A fails one day, the node may fail to obtain the data generated by the last execution of the scheduled node.
- The output name of an ancestor node does not exist. For example, the CREATE or INSERT statement in the code of Node A specifies Table B, but Table B is not configured as the output of Node A. In this case, if the SELECT statement in the code of Node B specifies Table B, the system automatically configures Table B as the input of Node B to establish a dependency relationship between Node A and Node B. However, the system cannot find Node A based on the dependency relationship. Therefore, when you commit Node B, the system reports an error that the output name of the ancestor node of Node B does not exist. For more information, see When I commit Node A, the system reports an error that the output name of the dependent ancestor node of Node A does not exist. What do I do?.