The Data Map service of DataWorks allows you to view the lineage details of a table or a DataService Studio API on the details page of the table or DataService Studio API. You can use lineage details to trace and manage data. This topic describes how to view lineages in Data Map.
Table lineages
Entry point for viewing lineages
Go to the details page of the table whose lineage details you want to view and click the Lineage tab to view table-level and field-level lineage details. You can also perform the following operations on the tab: analyze impacts, obtain a list of descendant tables of the current table and download the list to your on-premises machine, and configure parameters to allow the system to send change notifications by email.
Data Map allows you to view lineages between tables and between fields. The lineages are obtained by parsing the data of scheduling jobs and the data flow information. Lineages that are generated by manual operations, such as ad hoc queries, are not included. Lineage details are collected in offline mode and updated with a delay of one day.
Limits on different types of data sources
E-MapReduce
If you want to manage metadata for a DataLake or custom cluster in DataWorks, you must configure EMR-HOOK in the cluster first. If you do not configure EMR-HOOK in the cluster, data lineages cannot be displayed in DataWorks. For more information about how to configure EMR-HOOK, see Use the Hive extension feature to record data lineage and historical access information.
You cannot view data lineages of a Spark cluster that is created on the EMR on ACK page or data lineages of an EMR Serverless Spark cluster.
You cannot view data lineages of a task that is developed by using an EMR Presto node.
AnalyticDB for MySQL
For AnalyticDB for MySQL tables, specific SQL statements do not support the generation of lineages in Data Map.
SQL statements that do not support the display of lineages in Data Map
Unsupported SQL statement
Example
SQL statements that contain the
join
andunion
keywords or an asterisk (*
)For example, Data Map cannot display lineages for the following SQL statement because the SQL statement contains an asterisk (
*
):insert into test select * from test1, test2 where test1.id = test2.id
SQL statements that contain subqueries
For example, Data Map cannot display lineages for the following SQL statement because the SQL statement contains a subquery:
SELECT column1, column2 FROM table1 WHERE column3 IN (SELECT column4 FROM table2 WHERE column5 = 'value')
Sample SQL statements that support the display of lineages
Example 1: Create Table A that does not contain column information and select specific columns from Table B as the content of Table A. The SQL statement does not contain an asterisk (*). Sample statement:
create table test as select id,name from test1;
Example 2: Insert specific columns that meet the column1= value1 condition from Table A into Table B that does not contain column information. The SQL statement does not contain an asterisk (*). Sample statement:
insert into test select id,name from test1 where name='test';
Example 3: Insert specific columns from Table A into Table B in a database and overwrite data in Table B. The SQL statement does not contain an asterisk (*). Sample statement:
INSERT OVERWRITE INTO db_name.test SELECT id,name from test1;
CDH
If you want Data Map to display the lineages of tables whose data is processed by CDH Spark SQL nodes and CDH Spark nodes in a specific data processing module, you can configure the parameters on the Spark-related Parameter tab of the related CDH cluster on the Cluster Management page in SettingCenter.
Go to the Management Center page.
Log on to the DataWorks console. In the top navigation bar, select the desired region. In the left-side navigation pane, choose . On the page that appears, select the desired workspace from the drop-down list and click Go to Management Center.
In the left-side navigation pane, click Cluster Management. On the page that appears, find the desired CDH cluster.
Click the Spark-related Parameter tab.
Add Spark-related parameters based on the data processing module.
For example, if you want Data Map to display the lineages of tables whose data is processed by CDH Spark SQL nodes and CDH Spark nodes on the Auto Triggered Instances page in Operation Center, add the following parameter to the corresponding module:
Spark Property Name: Set the parameter to
spark.sql.queryExecutionListeners
.Spark Property Value: Set the parameter to
com.aliyun.dataworks.meta.lineage.LineageListener
.
Click Complete.
Explanation of lineage display for various data sources
Data source | Data integration | Data development | ||
Table-level lineage | Field-level lineage | Table-level lineage | Field-level lineage | |
MaxCompute Details page Details page - Lineage | Batch synchronization Real-time synchronization (Synchronize data from MySQL, Kafka, PolarDB for MySQL, or LogHub to MaxCompute) | Batch synchronization Real-time synchronization | insert into /insert overwrite table create as select from table create external table | insert into /insert overwrite table create as select from table create external table |
Hologres Details page Details page - Lineage | Batch synchronization Real-time synchronization (Synchronize data from MySQL, Kafka, or LogHub to Hologres) | Batch synchronization Real-time synchronization | insert into /insert overwrite table create as select from table create external table | insert into /insert overwrite table create as select from table create external table |
E-MapReduce Details page Details page - Lineage | Batch synchronization (OSS and Hive) Real-time synchronization | Batch synchronization (OSS and Hive) Real-time synchronization | Hive, Spark (spark-submit), Spark SQL (Hudi format supported), and Shell (Hive SQL submitted by using the Beeline client) insert into /insert overwrite table create as select from table create external table | Hive, Spark (spark-submit), Spark SQL (Hudi format supported), and Shell (Hive SQL submitted by using the Beeline client) insert into /insert overwrite table create as select from table create external table |
AnalyticDB MySQL Details page Details page - Lineage | Batch synchronization Real-time synchronization | Batch synchronization Real-time synchronization | insert into /insert overwrite table create as select from table create external table | insert into /insert overwrite table create as select from table create external table |
AnalyticDB PostgreSQL Details page Details page - Lineage | Batch synchronization Real-time synchronization | Batch synchronization Real-time synchronization | insert into /insert overwrite table create as select from table create external table | insert into /insert overwrite table create as select from table create external table |
StarRocks Details page Details page - Lineage | Batch synchronization Real-time synchronization | Batch synchronization Real-time synchronization | insert into /insert overwrite table create as select from table create external table | insert into /insert overwrite table create as select from table create external table |
ClickHouse Details page Details page - Lineage | Batch synchronization Real-time synchronization | Batch synchronization Real-time synchronization | insert into /insert overwrite table create as select from table create external table | insert into /insert overwrite table create as select from table create external table |
CDH/CDP Details page Details page - Lineage | Batch synchronization Real-time synchronization | Batch synchronization Real-time synchronization | Hive, Impala, Spark, and Spark SQL insert into /insert overwrite table create as select from table create external table | Hive, Impala, Spark, and Spark SQL insert into /insert overwrite table create as select from table create external table |
MySQL Details page Details page - Lineage | Batch synchronization Real-time synchronization (Synchronize data from MySQL to MaxCompute or Hologres) | Batch synchronization Real-time synchronization | insert into /insert overwrite table create as select from table create external table | insert into /insert overwrite table create as select from table create external table |
PostgreSQL Details page Details page - Lineage | Batch synchronization Real-time synchronization | Batch synchronization Real-time synchronization | insert into /insert overwrite table create as select from table create external table | insert into /insert overwrite table create as select from table create external table |
SQL Server Details page Details page - Lineage | Batch synchronization Real-time synchronization | Batch synchronization Real-time synchronization | insert into /insert overwrite table create as select from table create external table | insert into /insert overwrite table create as select from table create external table |
Tablestore Details page Details page - Lineage | Batch synchronization Real-time synchronization | Batch synchronization Real-time synchronization | insert into /insert overwrite table create as select from table create external table | insert into /insert overwrite table create as select from table create external table |
Oracle Details page Details page - Lineage | Batch synchronization Real-time synchronization | Batch synchronization Real-time synchronization | insert into /insert overwrite table create as select from table create external table | insert into /insert overwrite table create as select from table create external table |
OceanBase Details page Details page - Lineage | Batch synchronization Real-time synchronization | Batch synchronization Real-time synchronization | insert into /insert overwrite table create as select from table create external table | insert into /insert overwrite table create as select from table create external table |
OSS Details page Details page - Lineage | Batch synchronization Real-time synchronization | Batch synchronization Real-time synchronization | insert into /insert overwrite table create as select from table create external table | insert into /insert overwrite table create as select from table create external table |
PolarDB MySQL Details page Details page - Lineage | Batch synchronization Real-time synchronization (Synchronize data from PolarDB for MySQL to MaxCompute) | Batch synchronization Real-time synchronization | insert into /insert overwrite table create as select from table create external table | insert into /insert overwrite table create as select from table create external table |
PolarDB PostgreSQL Details page Details page - Lineage | Batch synchronization Real-time synchronization | Batch synchronization Real-time synchronization | insert into /insert overwrite table create as select from table create external table | insert into /insert overwrite table create as select from table create external table |
Kafka Details page Details page - Lineage | Batch synchronization Real-time synchronization (Synchronize data from Kafka to MaxCompute or Hologres) | Batch synchronization Real-time synchronization | insert into /insert overwrite table create as select from table create external table | insert into /insert overwrite table create as select from table create external table |
DataService Studio API lineages
Go to the details page of the DataService Studio API whose lineage details you want to view and click the Lineage tab to view the lineage details of the API.