Data Lake Formation (DLF) provides a visual metadata migration feature, which allows you to migrate metadata from a Hive metastore to a data lake.
Prerequisites
The Hive version is v2.3.x. DLF supports only Hive v2.3.x.
Your metadatabase is a MySQL metadatabase. DLF supports only MySQL metadatabases.
Create a metadata migration task
Create a migration task
Log on to the DLF console.
In the left-side navigation pane, choose Metadata > Migrate Metadata.
Click Create Migration Task and then configure the metadata migration task.
Source database settings
Database Type: Currently, only MySQL is supported.
MySQL Type:
Alibaba Cloud RDS: ApsaraDB RDS for MySQL provided by Alibaba Cloud. For more information, see ApsaraDB RDS for MySQL.
Other MySQL Databases: built-in MySQL databases in E-MapReduce (EMR), self-managed MySQL databases, or other MySQL databases.
If you select Alibaba Cloud RDS, you must specify the following information about the RDS instance:
The ApsaraDB RDS instance.
The metadatabase name.
The username.
The password.
If you select Other MySQL Databases, you must enter the following information about the MySQL database:
Java Database Connectivity (JDBC) URL.
The username.
The password.
Network connection settings
If you set MySQL Type to Alibaba Cloud RDS, you can set Network Type to only Alibaba Cloud VPC. In this case, we recommend that you select a VPC, vSwitch, and security group that match your ApsaraDB RDS instance or MySQL database to avoid network connection failures.
If you set MySQL Type to Other MySQL Databases, you can set Network Type to Alibaba Cloud VPC or Internet.
If you set Network Type to Internet, make sure that your MySQL database supports remote access and your MySQL database port is open to the elastic IP address (EIP) 121.41.166.235. DLF metadata migration will use this IP address to access your MySQL database.
Migration task settings
Task Name: Enter a name for the metadata migration task.
Task Description (optional): Enter some remarks for your task.
Conflict Resolution Strategy:
Update Original Metadata: The original data is not deleted and metadata is updated based on the original data.
Delete Original Metadata and Create Metadata: All the original metadata will be deleted, and metadata is resynchronized.
Log Storage Path: All task logs will be stored in an Object Storage Service (OSS) storage path.
Object to Synchronize: The objects to be synchronized include database, function, table, and partition. Generally, you can select all the objects.
Location Replacement: You can set this parameter if you need to replace or modify the location of a table or database during migration. For example, if you migrate data from a traditional Hadoop Distributed File System (HDFS) architecture to an OSS architecture that separates storage and computing, you must replace the hdfs:// path with an oss:// path.
Save the migration task
Confirm that the task configurations are correct and click OK. The task is created.
Run a metadata migration task
Click Run in the Actions column of a metadata migration task to run the task.
When the task is running, you can click Stop in the Actions column to stop the task.
Click Runtime Record in the Actions column to view the execution details of the task.
Click View Logs to view the log details.
After the metadata is migrated, you can view the success information in the logs.
Verify the metadata synchronization result
On the Database tab of the Metadata page, you can query the information about the database that you want to synchronize.
On the Table tab of the Metadata page, you can query the information about the table that you want to synchronize.