All Products
Search
Document Center

DataWorks:View lineages

Last Updated:Aug 20, 2024

The Data Map service of DataWorks allows you to view the lineage details of a table or a DataService Studio API on the details page of the table or DataService Studio API. You can use lineage details to trace and manage data. This topic describes how to view lineages in Data Map.

Table lineages

Entry point for viewing lineages

Go to the details page of the table whose lineage details you want to view and click the Lineage tab to view table-level and field-level lineage details. You can also perform the following operations on the tab: analyze impacts, obtain a list of descendant tables of the current table and download the list to your on-premises machine, and configure parameters to allow the system to send change notifications by email.

Note

Data Map allows you to view lineages between tables and between fields. The lineages are obtained by parsing the data of scheduling jobs and the data flow information. Lineages that are generated by manual operations, such as ad hoc queries, are not included. Lineage details are collected in offline mode and updated with a delay of one day.

image.png

Limits on different types of data sources

E-MapReduce

  • If you want to manage metadata for a DataLake or custom cluster in DataWorks, you must configure EMR-HOOK in the cluster first. If you do not configure EMR-HOOK in the cluster, data lineages cannot be displayed in DataWorks. For more information about how to configure EMR-HOOK, see Use the Hive extension feature to record data lineage and historical access information.

  • You cannot view data lineages of a Spark cluster that is created on the EMR on ACK page or data lineages of an EMR Serverless Spark cluster.

  • You cannot view data lineages of a task that is developed by using an EMR Presto node.

AnalyticDB for MySQL

For AnalyticDB for MySQL tables, specific SQL statements do not support the generation of lineages in Data Map.

  • SQL statements that do not support the display of lineages in Data Map

    Unsupported SQL statement

    Example

    SQL statements that contain the join and union keywords or an asterisk (*)

    For example, Data Map cannot display lineages for the following SQL statement because the SQL statement contains an asterisk (*):

    insert into test select * from test1, test2 where test1.id = test2.id

    SQL statements that contain subqueries

    For example, Data Map cannot display lineages for the following SQL statement because the SQL statement contains a subquery:

    SELECT column1, column2 FROM table1 WHERE column3 IN (SELECT column4 FROM table2 WHERE column5 = 'value')
  • Sample SQL statements that support the display of lineages

    • Example 1: Create Table A that does not contain column information and select specific columns from Table B as the content of Table A. The SQL statement does not contain an asterisk (*). Sample statement:

      create table test as select id,name from test1;
    • Example 2: Insert specific columns that meet the column1= value1 condition from Table A into Table B that does not contain column information. The SQL statement does not contain an asterisk (*). Sample statement:

      insert into test select id,name from test1 where name='test';
    • Example 3: Insert specific columns from Table A into Table B in a database and overwrite data in Table B. The SQL statement does not contain an asterisk (*). Sample statement:

      INSERT OVERWRITE INTO db_name.test SELECT id,name from test1;

CDH

If you want Data Map to display the lineages of tables whose data is processed by CDH Spark SQL nodes and CDH Spark nodes in a specific data processing module, you can configure the parameters on the Spark-related Parameter tab of the related CDH cluster on the Cluster Management page in SettingCenter.

  1. Go to the Management Center page.

    Log on to the DataWorks console. In the top navigation bar, select the desired region. Then, click Management Center in the left-side navigation pane. On the page that appears, select the desired workspace from the drop-down list and click Go to Management Center.

  2. In the left-side navigation pane, click Cluster Management. On the page that appears, find the desired CDH cluster.

  3. Click the Spark-related Parameter tab.

    image

  4. Add Spark-related parameters based on the data processing module.

    For example, if you want Data Map to display the lineages of tables whose data is processed by CDH Spark SQL nodes and CDH Spark nodes on the Auto Triggered Instances page in Operation Center, add the following parameter to the corresponding module:

    • Spark Property Name: Set the parameter to spark.sql.queryExecutionListeners.

    • Spark Property Value: Set the parameter to com.aliyun.dataworks.meta.lineage.LineageListener.

  5. Click Complete.

Explanation of lineage display for various data sources

Data source

Data integration

Data development

Table-level lineage

Field-level lineage

Table-level lineage

Field-level lineage

MaxCompute

imageDetails page

imageDetails page - Lineage

imageBatch synchronization

imageReal-time synchronization (Synchronize data from MySQL, Kafka, PolarDB for MySQL, or LogHub to MaxCompute)

imageBatch synchronization

imageReal-time synchronization

imageinsert into /insert overwrite table

imagecreate as select from table

imagecreate external table

imageinsert into /insert overwrite table

imagecreate as select from table

imagecreate external table

Hologres

imageDetails page

imageDetails page - Lineage

imageBatch synchronization

imageReal-time synchronization (Synchronize data from MySQL, Kafka, or LogHub to Hologres)

imageBatch synchronization

imageReal-time synchronization

imageinsert into /insert overwrite table

imagecreate as select from table

imagecreate external table

imageinsert into /insert overwrite table

imagecreate as select from table

imagecreate external table

E-MapReduce

imageDetails page

imageDetails page - Lineage

imageBatch synchronization

(OSS and Hive)

imageReal-time synchronization

imageBatch synchronization

(OSS and Hive)

imageReal-time synchronization

Hive, Spark (spark-submit), Spark SQL (Hudi format supported), and Shell (Hive SQL submitted by using the Beeline client)

imageinsert into /insert overwrite table

imagecreate as select from table

imagecreate external table

Hive, Spark (spark-submit), Spark SQL (Hudi format supported), and Shell (Hive SQL submitted by using the Beeline client)

imageinsert into /insert overwrite table

imagecreate as select from table

imagecreate external table

AnalyticDB MySQL

imageDetails page

imageDetails page - Lineage

imageBatch synchronization

imageReal-time synchronization

imageBatch synchronization

imageReal-time synchronization

imageinsert into /insert overwrite table

imagecreate as select from table

imagecreate external table

imageinsert into /insert overwrite table

imagecreate as select from table

imagecreate external table

AnalyticDB PostgreSQL

imageDetails page

imageDetails page - Lineage

imageBatch synchronization

imageReal-time synchronization

imageBatch synchronization

imageReal-time synchronization

imageinsert into /insert overwrite table

imagecreate as select from table

imagecreate external table

imageinsert into /insert overwrite table

imagecreate as select from table

imagecreate external table

StarRocks

imageDetails page

imageDetails page - Lineage

imageBatch synchronization

imageReal-time synchronization

imageBatch synchronization

imageReal-time synchronization

imageinsert into /insert overwrite table

imagecreate as select from table

imagecreate external table

imageinsert into /insert overwrite table

imagecreate as select from table

imagecreate external table

ClickHouse

imageDetails page

imageDetails page - Lineage

imageBatch synchronization

imageReal-time synchronization

imageBatch synchronization

imageReal-time synchronization

imageinsert into /insert overwrite table

imagecreate as select from table

imagecreate external table

imageinsert into /insert overwrite table

imagecreate as select from table

imagecreate external table

CDH/CDP

imageDetails page

imageDetails page - Lineage

imageBatch synchronization

imageReal-time synchronization

imageBatch synchronization

imageReal-time synchronization

Hive, Impala, Spark, and Spark SQL

imageinsert into /insert overwrite table

imagecreate as select from table

imagecreate external table

Hive, Impala, Spark, and Spark SQL

imageinsert into /insert overwrite table

imagecreate as select from table

imagecreate external table

MySQL

imageDetails page

imageDetails page - Lineage

imageBatch synchronization

imageReal-time synchronization

(Synchronize data from MySQL to MaxCompute or Hologres)

imageBatch synchronization

imageReal-time synchronization

imageinsert into /insert overwrite table

imagecreate as select from table

imagecreate external table

imageinsert into /insert overwrite table

imagecreate as select from table

imagecreate external table

PostgreSQL

imageDetails page

imageDetails page - Lineage

imageBatch synchronization

imageReal-time synchronization

imageBatch synchronization

imageReal-time synchronization

imageinsert into /insert overwrite table

imagecreate as select from table

imagecreate external table

imageinsert into /insert overwrite table

imagecreate as select from table

imagecreate external table

SQL Server

imageDetails page

imageDetails page - Lineage

imageBatch synchronization

imageReal-time synchronization

imageBatch synchronization

imageReal-time synchronization

imageinsert into /insert overwrite table

imagecreate as select from table

imagecreate external table

imageinsert into /insert overwrite table

imagecreate as select from table

imagecreate external table

Tablestore

imageDetails page

imageDetails page - Lineage

imageBatch synchronization

imageReal-time synchronization

imageBatch synchronization

imageReal-time synchronization

imageinsert into /insert overwrite table

imagecreate as select from table

imagecreate external table

imageinsert into /insert overwrite table

imagecreate as select from table

imagecreate external table

Oracle

imageDetails page

imageDetails page - Lineage

imageBatch synchronization

imageReal-time synchronization

imageBatch synchronization

imageReal-time synchronization

imageinsert into /insert overwrite table

imagecreate as select from table

imagecreate external table

imageinsert into /insert overwrite table

imagecreate as select from table

imagecreate external table

OceanBase

imageDetails page

imageDetails page - Lineage

imageBatch synchronization

imageReal-time synchronization

imageBatch synchronization

imageReal-time synchronization

imageinsert into /insert overwrite table

imagecreate as select from table

imagecreate external table

imageinsert into /insert overwrite table

imagecreate as select from table

imagecreate external table

OSS

imageDetails page

imageDetails page - Lineage

imageBatch synchronization

imageReal-time synchronization

imageBatch synchronization

imageReal-time synchronization

imageinsert into /insert overwrite table

imagecreate as select from table

imagecreate external table

imageinsert into /insert overwrite table

imagecreate as select from table

imagecreate external table

PolarDB MySQL

imageDetails page

imageDetails page - Lineage

imageBatch synchronization

imageReal-time synchronization (Synchronize data from PolarDB for MySQL to MaxCompute)

imageBatch synchronization

imageReal-time synchronization

imageinsert into /insert overwrite table

imagecreate as select from table

imagecreate external table

imageinsert into /insert overwrite table

imagecreate as select from table

imagecreate external table

PolarDB PostgreSQL

imageDetails page

imageDetails page - Lineage

imageBatch synchronization

imageReal-time synchronization

imageBatch synchronization

imageReal-time synchronization

imageinsert into /insert overwrite table

imagecreate as select from table

imagecreate external table

imageinsert into /insert overwrite table

imagecreate as select from table

imagecreate external table

Kafka

imageDetails page

imageDetails page - Lineage

imageBatch synchronization

imageReal-time synchronization

(Synchronize data from Kafka to MaxCompute or Hologres)

imageBatch synchronization

imageReal-time synchronization

imageinsert into /insert overwrite table

imagecreate as select from table

imagecreate external table

imageinsert into /insert overwrite table

imagecreate as select from table

imagecreate external table

DataService Studio API lineages

Go to the details page of the DataService Studio API whose lineage details you want to view and click the Lineage tab to view the lineage details of the API.

image.png