The data catalog is the top-level entity of metadata in Data Lake Formation (DLF). It can contain multiple databases. This topic describes the basic operations of the data catalog.
Scenarios
The data catalog is used in the scenarios of metadata isolation. For example, multiple E-MapReduce (EMR) clusters, each cluster is attached to a different catalog. Metadata between each EMR cluster is invisible to each other.
Basic operations
Create a data catalog
Log on to the Data Lake Formation console.
In the left-side navigation pane, choose
.Click the Catalog List tab, and click New Catalog.
Enter the following information in the input box, and click OK.
Catalog ID: Required. It is a unique identifier and cannot be duplicated.
Description: Optional. Enter description information.
Location: Optional. Enter the default storage path. Only Object Storage Service (OSS) paths are supported.
View a data catalog
In the left-side navigation pane, choose
.Click the Catalog List tab to view the catalog list information.
Modify a data catalog
In the left-side navigation pane, choose
.Click the Catalog List tab.
On the data catalog list page, click Edit in the Actions column.
Modify the following information in the input box, and click OK.
Description: Optional. Enter description information.
Location: Optional. Enter the default storage path. Only OSS paths are supported.
Delete a data catalog
Once deleted, the data cannot be recovered.
In the left-side navigation pane, choose
.Click the Catalog List tab.
On the data catalog list page, click Delete in the Actions column.
In the confirmation dialog box that pops up, click Delete to complete the catalog deletion.
Adaptation operations with compute engines
How to modify the data catalog of an E-MapReduce cluster
After modifying the DLF Catalog ID bound to the E-MapReduce cluster, the cluster points to the new Catalog ID. This change causes operations on databases, tables, and running jobs in the original catalog to become invalid. Please fully consider the impact before switching.
Hive engine adaptation
In the core-site.xml file of the Hive service, add the following configuration item. For more information, see Manage configuration items.
Key
Value
dlf.catalog.id
The Catalog ID of the DLF.
Make the configuration take effect.
Click Save. After saving, click Deploy Client Configuration.
In the dialog box that pops up, enter Execution Reason, and click OK.
Restart the Hive service.
On the Hive service configuration page, click
.In the dialog box that pops up, enter Execution Reason, and click OK.
After a successful restart, the Hive status becomes Healthy, and the Catalog ID modification is complete.
Spark engine adaptation
Modify the hive-site.xml file of the Spark service. For more information, see Hive engine adaptation.
For EMR 5.6.0, 3.40.0, and earlier versions, you do not need to modify this configuration for Spark separately. It uses the Hive configuration, and only Hive configuration needs to be modified.
Presto engine adaptation
Modify the hive.properties file of the Presto service. For more information, see Hive engine adaptation.
This feature is supported only in EMR 5.8.0, 3.42.0, and later versions.
Impala engine adaptation
You do not need to modify Impala configuration separately. It uses Hive configuration, and only Hive configuration needs to be modified.