All Products
Search
Document Center

ApsaraDB for ClickHouse:Migrate data from a self-managed ClickHouse cluster to an ApsaraDB for ClickHouse Enterprise Edition cluster

Last Updated:May 14, 2024

This topic describes how to migrate data from a self-managed ClickHouse cluster to an ApsaraDB for ClickHouse Enterprise Edition cluster and migrate data between ApsaraDB for ClickHouse Enterprise Edition clusters. The remote function can be used in SELECT and INSERT queries to allow access to self-managed ClickHouse servers. Therefore, you can migrate tables by writing an INSERT INTO query with an embedded SELECT query.

The following figures show how to migrate data from a self-managed ClickHouse cluster to an ApsaraDB for ClickHouse Enterprise Edition cluster and migrate data between ApsaraDB for ClickHouse Enterprise Edition clusters.

image

Migrate data from a self-managed ClickHouse cluster to an ApsaraDB for ClickHouse Enterprise Edition cluster

image.png

Note

For ApsaraDB for ClickHouse Enterprise Edition, you need to only create a destination table regardless of whether your source table is sharded or replicated. You can leave out the Engine parameter for the destination table because the system will automatically use the SharedMergeTree table engine. ApsaraDB for ClickHouse Enterprise Edition performs automatic vertical and horizontal scaling. You do not need to worry about how to replicate and shard the table.

In this example, a self-managed ClickHouse cluster is referred to as the source cluster, and an ApsaraDB for ClickHouse Enterprise Edition cluster is referred to as the destination cluster.

Overview

The following section describes the procedure for migrating data from a self-managed ClickHouse cluster to an ApsaraDB for ClickHouse Enterprise Edition cluster:

  1. Add a read-only user to the source cluster.

  2. Copy the source table schema to the destination cluster.

  3. Pull the data from the source cluster to the destination cluster if the source cluster can be accessed over external networks. Push the data from the source cluster to the destination cluster if the source cluster cannot be accessed over external networks.

  4. (Optional) Remove the IP address of the source cluster from the IP address whitelist of the destination cluster.

  5. Remove the read-only user from the source cluster.

Procedure

  1. Perform the following operations on the source cluster. Make sure that the source table contains data.

    1. Add a read-only user that can read the source table. In this example, db.table is used.

      CREATE USER exporter
      IDENTIFIED WITH SHA256_PASSWORD BY 'password-here'
      SETTINGS readonly = 1;
      GRANT SELECT ON db.table TO exporter;
    2. Copy the source table schema.

      SELECT create_table_query
      FROM system.tables
      WHERE database = 'db' and table = 'table'
  2. Perform the following operations on the destination cluster:

    1. Create a database.

      CREATE DATABASE db
    2. Create a destination data table by using the CREATE TABLE statement that is used to create the source data table.

      Note

      Change the value of the ENGINE parameter to SharedMergeTree when you execute the CREATE TABLE statement. However, you cannot specify other parameters for the engine because ApsaraDB for ClickHouse Enterprise Edition clusters always replicate tables and provide the correct parameters. The ORDER BY, PRIMARY KEY, PARTITION BY, SAMPLE BY, TTL, and SETTINGS clauses define the schema and metadata of a table. Keep these clauses to ensure that the destination table is created as expected in the destination ApsaraDB for ClickHouse Enterprise Edition cluster.

      CREATE TABLE db.table ...
    3. Use the remote function to pull data or push data.

      Note

      If the source ClickHouse server is not available from external networks, you can push the data instead of pulling the data because the remote function works for select and insert operations.

      • Use the remote function in the destination cluster to pull data from the source table in the source cluster.

      image.png

      INSERT INTO db.table SELECT * FROM
      remote('source-hostname:9000', db, table, 'exporter', 'password-here')
      • Use the remote function in the source cluster to push data to the destination cluster.image.png

        Note

        To enable the remote function to connect to your ApsaraDB for ClickHouse Enterprise Edition cluster, you must add the IP address of the source cluster to the IP address whitelist of the destination cluster. For more information, see Configure a whitelist.

        INSERT INTO FUNCTION
        remote('target-hostname:9000', 'db.table',
        'default', 'PASS') SELECT * FROM db.table

Migrate data between ApsaraDB for ClickHouse Enterprise Edition clusters

image.png

You can migrate data between ApsaraDB for ClickHouse Enterprise Edition clusters in the following scenarios:

  • Migrate data from a restored backup.

  • Copy data from a development service to a staging service or from a staging service to a production service.

In this example, two clusters of ApsaraDB for ClickHouse Enterprise Edition are used. One is referred to as the source cluster and the other is referred to as the destination cluster.image.png

Overview

The following section describes the procedure for migrating data between ApsaraDB for ClickHouse Enterprise Edition clusters:

  1. Identify one ApsaraDB for ClickHouse Enterprise Edition cluster as the source cluster and the other as the destination cluster.

  2. Add a read-only user to the source cluster.

  3. Copy the source table schema to the destination cluster.

  4. Temporarily allow access to the destination cluster from the IP address of the source cluster.

  5. Copy the data from the source cluster to the destination cluster.

  6. Re-establish the IP access list on the destination cluster.

  7. Remove the read-only user from the source cluster.

Procedure

  1. Perform the following operations on the source cluster:

    1. Add a read-only user that can read the source table. In this example, db.table is used.

      CREATE USER exporter
      IDENTIFIED WITH SHA256_PASSWORD BY 'password-here'
      SETTINGS readonly = 1;
      GRANT SELECT ON db.table TO exporter;
    2. Copy the source table schema.

      SELECT create_table_query
      FROM system.tables
      WHERE database = 'db' and table = 'table'
  2. Copy the source table schema to the destination cluster.

    1. Create a database.

      CREATE DATABASE db
    2. Create a table in the destination database by using the CREATE TABLE statement that is used to create the source table.

      CREATE TABLE db.table ...

    Use the output of SELECT CREATE_TABLE_QUERY... from the source cluster to create the destination table.

  3. Allow remote operations on the source cluster.

    To pull data from the source cluster to the destination cluster, the source cluster and the destination cluster must allow connections from each other. Perform the following operations:

    • Add the IP address of the source cluster to the IP address whitelist of the destination cluster.

    • Add the IP address of the destination cluster to the IP address whitelist of the source cluster.

    Note

    You can view the IP addresses of ApsaraDB for ClickHouse Enterprise Edition clusters by executing the SELECT * FROM system.clusters; statement. For more information about how to modify a whitelist, see Configure a whitelist.

  4. Copy the data from the source cluster to the destination cluster.

    Use the remote function to pull data from the source cluster. Connect to the destination cluster and run the following command on the destination cluster to verify the data:

    INSERT INTO db.table SELECT * FROM
    remote('source-hostname:9000', db, table, 'exporter', 'password-here')
  5. Restore the IP access list on the source cluster.

    If you exported the access list earlier, you can import the access list again by using Share. Otherwise, add your IP addresses to the access list again.

  6. Remove the read-only exporter user.

    DROP USER exporter
  7. Remove the IP addresses used to access the source and destination clusters. For more information, see Configure a whitelist.