All Products
Search
Document Center

Lindorm:Migrate and synchronize data from HBase clusters to Lindorm

Last Updated:Jul 15, 2024

You can use Lindorm Tunnel Service (LTS) to migrate existing data and synchronize real-time data from a self-managed HBase cluster or ApsaraDB for HBase cluster to LindormTable. This topic describes how to migrate and synchronize data from HBase clusters.

Scenarios

  • Data migration from self-managed HBase clusters to Lindorm.

  • Data migration across regions. For example, you can migrate data from a data center in the China (Qingdao) region to a data center in the China (Beijing) region.

  • Workload distribution. For example, you can migrate some of your business to a new cluster.

Features and benefits

Supported features

  • Data can be migrated from HBase V1.x and V2.x clusters to Lindorm without business interruption.

  • Table schema migration, real-time data synchronization, and full data migration are supported.

  • Migration based on databases, namespaces, and tables is supported.

  • Table renaming during migration is supported.

  • Time ranges, row key ranges, and columns can be specified during migration.

  • An API operation is supported to create migration tasks.

Benefits

  • Your business is not interrupted during migration. Both the historical data and real-time incremental data can be migration during one migration task.

  • When data is being migrated, the destination self-built HBase cluster does not interact with the source HBase cluster. The destination HBase cluster reads data only from the HDFS of the source cluster. This minimizes the impact on the online business that runs on the source cluster.

  • In most cases, compared with data migration at the API layer, data replication at the file layer helps you reduce more than 50% of the data usage and improves efficiency.

  • Each node can migrate data at a rate of up to 150 MB/s to meet stability requirements for data migration. You can add nodes for horizontal scaling to migrate terabytes or petabytes of data.

  • LTS implements a robust retry mechanism to respond to errors. LTS monitors the task speed and the task progress in real time, and generates alerts when tasks fail.

  • LTS automatically synchronizes schemas to ensure consistency among partitions.

Limits

  • Data cannot be migrated to self-managed HBase clusters.

  • HBase clusters for which Kerberos is enabled are not supported.

  • Single-node ApsaraDB for HBase instances are not supported.

  • ApsaraDB for HBase instances that are deployed in the classic network are not supported due to network compatibility issues.

  • Data cannot be migrated or synchronized to a Lindorm standalone instance.

  • LTS implements an asynchronous mode to synchronize incremental data based on write-ahead logging (WAL). Data that is imported by using bulk loading and data that is not written to WAL are not synchronized.

  • Search indexes cannot be migrated.

Usage notes

  • Before you migrate data, make sure that the HDFS capacity of the destination cluster is sufficient. This helps you prevent the data from exhausting the capacity during migration.

  • Before you submit an incremental synchronization task, we recommend that you set the log retention period for the source cluster to a period longer than 12 hours to reserve time for LTS to handle incremental synchronization errors. You can modify the value of the hbase.master.logcleaner.ttl parameter in the hbase-site.xml configuration file to set the log retention period for the source cluster. After you modify the parameter value, you must restart the source cluster. The unit of the hbase.master.logcleaner.ttl parameter is ms. For example, if you set the hbase.master.logcleaner.ttl parameter to 43200000, the specified log retention period is 12 hours.

  • You do not need to manually create tables in the destination cluster. LTS will automatically create tables and regions in the same way as those in the source cluster. The partitioning scheme of a manually created table may be different from that of the source table. As a result, the manually created table may be frequently split or compacted after the migration process is complete. If the table stores large amounts of data, the entire process may take a long time.

  • If the source table has a coprocessor, make sure that the destination cluster contains the corresponding JAR file of the coprocessor when you create a destination table.

  • If log data is not consumed after you enable the incremental synchronization feature, the log data is retained for 48 hours by default. After the period expires, the subscription is automatically canceled and the retained data is automatically deleted.

Prerequisites

Create a task

  1. Log on to LTS. For more information, see Log on to the web UI of LTS.

  2. In the left-side navigation pane, choose Migration > Quick Migration.

  3. Configure parameters and then click create.

  4. In the job name(optional) field, enter the name of the task. The task name can contain only letters and digits. You can leave the task name unspecified. In this case, the task ID is used as the task name.

  5. Configure the Source Cluster and Target Cluster parameters as prompted.

  6. Select the operations that you want to perform.

    • migration table schema: creates tables in the destination cluster. These tables have the same schema and partition information as the source tables. If a table already exists in the destination cluster, the data in this table is not migrated.

    • real time data replication: synchronizes incremental data from the source cluster in real time.

    • history data migration: physically migrates all the files at a file level.

  7. Enter required information in the table mapping and advance configuration fields. The advance configuration field can be left unspecified.

  8. Click create.

View the details of a task

  1. In the left-side navigation pane, choose Migration > Quick Migration to view the task that you created.

  2. On the page that appears, click the name of the task that you want to view. On the page that appears, view the execution status of the task.

Perform a switchover

  1. Wait until the full migration task is complete. The latency of incremental synchronization is as low as several seconds or hundreds of milliseconds.

  2. Enable LTS data sampling and verification. When you sample and verify large tables, make sure that the sampling ratio is appropriate to prevent the online business from being affected.

  3. Verify your business.

  4. Perform a switchover on your business.

FAQ

Q: Why does the data in the task fail to be consumed?

A: The LTS cluster is released while the task is still running, the synchronization task is suspended, or the task is abnormally blocked.