All Products
Search
Document Center

ApsaraDB for SelectDB:Migrate data from a Hive cluster

Last Updated:Sep 11, 2024

You can migrate data from a Hive cluster to an ApsaraDB for SelectDB instance by using a catalog, X2Doris, DataWorks, or Object Storage Service (OSS). You can select an appropriate solution to migrate data based on the amount of data to be migrated and your business scenario. This topic describes how to migrate offline data from a Hive cluster to an ApsaraDB for SelectDB instance and how to select a migration solution.

Solutions

You can select an appropriate migration solution based on your business scenario. The following table describes the migration solutions.

Solution

Scenario

Benefits

References

Catalog

Data to be migrated is stored on Alibaba Cloud platforms.

Note

This solution is also applicable to scenarios in which data is stored in Alibaba Cloud E-MapReduce (EMR) clusters.

  • You can migrate data without generating data transfer costs.

    Note

    The Hive cluster and SelectDB instance reside in the same virtual private cloud (VPC). Data is migrated over the internal network.

  • You can migrate data without the need to use external components.

Hive data source

OSS

Data to be migrated is not stored on Alibaba Cloud platforms.

You can migrate data without generating data transfer costs.

Note

If you migrate data from an OSS bucket to an SelectDB instance, data is migrated over the internal network without generating data transfer costs.

Import data by using OSS

DataWorks

Data to be migrated is hosted by DataWorks or you use DataWorks as your data development platform.

You can migrate data by using a visualization platform, which simplifies operations.

Import data by using DataWorks

Migrate incremental data

In production environments, Hive data typically comprises offline and incremental data. Since Hive data migration to SelectDB usually involves replicating data to a data warehouse for faster query performance, you can migrate incremental data by using one of the following methods:

  • Replicate Hive data to SelectDB when the Hive data is generated.

  • Read data from Hive partitions by using scheduled jobs and write the data to SelectDB.

References

For more information about Hive, see Hive data source.