You can migrate data from a Hive cluster to an ApsaraDB for SelectDB instance by using a catalog, X2Doris, DataWorks, or Object Storage Service (OSS). You can select an appropriate solution to migrate data based on the amount of data to be migrated and your business scenario. This topic describes how to migrate offline data from a Hive cluster to an ApsaraDB for SelectDB instance and how to select a migration solution.
Solutions
You can select an appropriate migration solution based on your business scenario. The following table describes the migration solutions.
Solution | Scenario | Benefits | References |
Catalog | Data to be migrated is stored on Alibaba Cloud platforms. Note This solution is also applicable to scenarios in which data is stored in Alibaba Cloud E-MapReduce (EMR) clusters. |
| |
OSS | Data to be migrated is not stored on Alibaba Cloud platforms. | You can migrate data without generating data transfer costs. Note If you migrate data from an OSS bucket to an SelectDB instance, data is migrated over the internal network without generating data transfer costs. | |
DataWorks | Data to be migrated is hosted by DataWorks or you use DataWorks as your data development platform. | You can migrate data by using a visualization platform, which simplifies operations. |
Migrate incremental data
In production environments, Hive data typically comprises offline and incremental data. Since Hive data migration to SelectDB usually involves replicating data to a data warehouse for faster query performance, you can migrate incremental data by using one of the following methods:
Replicate Hive data to SelectDB when the Hive data is generated.
Read data from Hive partitions by using scheduled jobs and write the data to SelectDB.
References
For more information about Hive, see Hive data source.