All Products
Search
Document Center

Data Lake Formation:Migrate metadata

Last Updated:Aug 29, 2024

Data Lake Formation (DLF) provides a visual metadata migration feature, which allows you to migrate metadata from a Hive metastore to a data lake.

Prerequisites

  • The Hive version is v2.3.x. DLF supports only Hive v2.3.x.

  • Your metadatabase is a MySQL metadatabase. DLF supports only MySQL metadatabases.

Create a metadata migration task

Create a migration task

  1. Log on to the DLF console.

  2. In the left-side navigation pane, choose Metadata > Migrate Metadata.

  3. Click Create Migration Task and then configure the metadata migration task.

创建迁移任务

Source database settings

  • Database Type: Currently, only MySQL is supported.

  • MySQL Type:

    • Alibaba Cloud RDS: ApsaraDB RDS for MySQL provided by Alibaba Cloud. For more information, see ApsaraDB RDS for MySQL.

    • Other MySQL Databases: built-in MySQL databases in E-MapReduce (EMR), self-managed MySQL databases, or other MySQL databases.

  • If you select Alibaba Cloud RDS, you must specify the following information about the RDS instance:

    • The ApsaraDB RDS instance.

    • The metadatabase name.

    • The username.

    • The password.

    源库配置

  • If you select Other MySQL Databases, you must enter the following information about the MySQL database:

    • Java Database Connectivity (JDBC) URL.

    • The username.

    • The password.

源库配置-其他MYSQL

  • Network connection settings

    • If you set MySQL Type to Alibaba Cloud RDS, you can set Network Type to only Alibaba Cloud VPC. In this case, we recommend that you select a VPC, vSwitch, and security group that match your ApsaraDB RDS instance or MySQL database to avoid network connection failures.

    网络连接-RDS-阿里云VPC

    • If you set MySQL Type to Other MySQL Databases, you can set Network Type to Alibaba Cloud VPC or Internet.

    网络配置-其他MySQL

Note

If you set Network Type to Internet, make sure that your MySQL database supports remote access and your MySQL database port is open to the elastic IP address (EIP) 121.41.166.235. DLF metadata migration will use this IP address to access your MySQL database.

Migration task settings

  • Task Name: Enter a name for the metadata migration task.

  • Task Description (optional): Enter some remarks for your task.

  • Conflict Resolution Strategy:

    • Update Original Metadata: The original data is not deleted and metadata is updated based on the original data.

    • Delete Original Metadata and Create Metadata: All the original metadata will be deleted, and metadata is resynchronized.

  • Log Storage Path: All task logs will be stored in an Object Storage Service (OSS) storage path.

  • Object to Synchronize: The objects to be synchronized include database, function, table, and partition. Generally, you can select all the objects.

  • Location Replacement: You can set this parameter if you need to replace or modify the location of a table or database during migration. For example, if you migrate data from a traditional Hadoop Distributed File System (HDFS) architecture to an OSS architecture that separates storage and computing, you must replace the hdfs:// path with an oss:// path.

迁移任务配置

Save the migration task

Confirm that the task configurations are correct and click OK. The task is created.

确认元数据迁移任务信息

Run a metadata migration task

  • Click Run in the Actions column of a metadata migration task to run the task.

元数据迁移列表

  • When the task is running, you can click Stop in the Actions column to stop the task.

元数据迁移运行中

  • Click Runtime Record in the Actions column to view the execution details of the task.

元数据迁移-运行历史

  • Click View Logs to view the log details.

元数据迁移-运行历史-查看日志

  • After the metadata is migrated, you can view the success information in the logs.

元数据迁移-运行历史-查看日志-成功

Verify the metadata synchronization result

  • On the Database tab of the Metadata page, you can query the information about the database that you want to synchronize.

元数据管理-元数据库校验

  • On the Table tab of the Metadata page, you can query the information about the table that you want to synchronize.

元数据管理-元数据表-校验