Data Map is a DataWorks module that you can use to manage the metadata and data assets of your business. You can use Data Map to view the details of metadata, view data lineage, and manage data categories. Data Map can help you search for, understand, and use Hologres data. This topic describes how to configure a crawler in Data Map to collect Hologres metadata as well as related operations.
Limits
Only Hologres V1.1 and later support Data Map. If the version of your instance is earlier than V1.1,
Data lineage information can be viewed 1 hour after the crawler is configured in Data Map.
The data lineage feature is available only in the following regions:
China (Hangzhou), China (Shanghai), China (Beijing), China (Shenzhen), China (Hong Kong), and Singapore.
Collect metadata
You can use the metadata collection feature to collect metadata from a Hologres data source to Data Map for centralized management. After the metadata is collected, you can search for and view the metadata in Data Map.
Log on to the DataWorks console and go to the DataMap page. For more information, see Go to the DataMap page.
- In the top navigation bar, click Data Discovery.
In the left-side navigation pane, choose .
On the HologresMetadata Crawler page, click Create Crawler.
In the Create Crawler dialog box, set the parameters in each step.
Configure the basic information.
In the Basic Information step, set the parameters as required.
Parameter
Description
Crawler Name
Required. The name of the crawler. You must set a unique name.
Crawler Description
The description of the crawler.
Workspace
The workspace of the data source from which you want to collect metadata.
Data Source Type
The type of the data source from which you want to collect metadata. The default value is Hologres and cannot be changed.
Click Next.
Select a Hologres data source.
In the Select Collection Object step, select a data source from the Data Source drop-down list.
You can collect metadata only from the Hologres instances that are associated with your workspace as compute engine instances. If no data source is available, click Create to create one. For more information, see Add a Hologres data source.
Click Start Testing next to Test Crawler Connectivity. If the message The connectivity test is successful appears, the DataWorks metadata service can connect to the Hologres data source.
NoteIf the message The connectivity test failed appears, you must find the cause of the connection error and troubleshoot the issue.
Click Next.
Configure an execution plan.
In the Configure Execution Plan step, configure an execution plan.
Valid values of the Execution Plan parameter are On-demand Execution, Monthly, Weekly, Daily, and Hourly. The execution plan that is generated varies based on the execution cycle. The system collects metadata from the Hologres data source based on the execution cycle that you specify. The following descriptions provide the details:
On-demand Execution: The system collects metadata from the Hologres data source based on your business requirements.
Monthly: The system automatically collects metadata from the Hologres data source once at a specific time on several specific days of each month.
ImportantSpecific months do not have the 29th, 30th, or 31st day. In these months, the system does not collect metadata from the Hologres data source on these dates. We recommend that you do not select the last few days of a month.
The following figure shows that the system automatically collects metadata from the Hologres data source once at 09:00 on the 1st, 11th, and 21st day of each month. An expression is automatically generated for the Cron Expression parameter based on the values of the Date and Time parameters.
Weekly: The system automatically collects metadata from the Hologres data source once at a specific time on several specific days of each week.
The following figure shows that the system automatically collects metadata from the Hologres data source once at 03:00 on Sunday and Monday of each week. An expression is automatically generated for the Cron Expression parameter based on the values of the Week and Time parameters. If the Time parameter is not set, the system automatically collects metadata from the Hologres data source once at 00:00:00 on the specific days of each week.
Daily: The system automatically collects metadata from the Hologres data source once at a specific time of each day.
The following figure shows that the system automatically collects metadata from the Hologres data source once at 01:00 each day. An expression is automatically generated for the Cron Expression parameter based on the values of the Time parameter.
Hourly: The system automatically collects metadata from the Hologres data source once on the
N × 5
th minute of each hour.NoteFor a Hologres metadata collection task that is run each hour, you can set the Minute value to a multiple of 5 minutes.
The following figure shows that the system automatically collects metadata from the Hologres data source from the 5th and 10th minutes of each hour. An expression is automatically generated for the Cron Expression parameter based on the values of the Minutes parameter.
Click Next.
Confirm the settings of the crawler.
In the Confirm Information step, check the information that you specified.
Click Confirm.
On the HologresMetadata Crawler page, you can view the information about your crawler and manage your crawler.
The following descriptions show the information that you can view and the operations that you can perform:
You can view the status and execution plan of the crawler. You can also view the time when the last execution started, the amount of time consumed for the last execution, the average amount of time consumed, the number of updated tables in the last execution, and the number of created tables in the last execution.
You can click Details, Edit, Delete, Run, or Stop in the Actions column to perform the desired operation.
Details: View the crawler name and the data source and execution plan configured for the crawler.
Edit: Modify the configurations of the crawler.
Delete: Delete the crawler.
Run: Run a task to collect metadata from the Hologres data source. The Run button is available only if the Execution Plan parameter is set to On-demand Execution.
Stop: Stop the crawler. The Stop button is displayed only if a crawler is in the Pending state.
Other supported operations
View overall data
On the Overview tab of the DataMap page, you can view the statistical and table information of all Hologres databases that have the data crawler configured in a region. For more information, see View resource information.
Search for tables
Data Map allows you to search for tables by table name, table description, field name, and field description. You can also filter tables by category, project, and database. For more information, see Search for tables.
View the details of a table
You can go to the details page of a table and view the details of the table, such as the basic information, output information, and lineage information of the table. For more information, see View the details of a table.
You can view the metadata of a table.
You can view table lineage and field lineage information.