This topic describes how to migrate data from a self-managed ClickHouse cluster to an ApsaraDB for ClickHouse Enterprise Edition cluster and migrate data between ApsaraDB for ClickHouse Enterprise Edition clusters. The remote
function can be used in SELECT
and INSERT
queries to allow access to self-managed ClickHouse servers. Therefore, you can migrate tables by writing an INSERT INTO
query with an embedded SELECT
query.
The following figures show how to migrate data from a self-managed ClickHouse cluster to an ApsaraDB for ClickHouse Enterprise Edition cluster and migrate data between ApsaraDB for ClickHouse Enterprise Edition clusters.
Migrate data from a self-managed ClickHouse cluster to an ApsaraDB for ClickHouse Enterprise Edition cluster
For ApsaraDB for ClickHouse Enterprise Edition, you need to only create a destination table regardless of whether your source table is sharded or replicated. You can leave out the Engine parameter for the destination table because the system will automatically use the SharedMergeTree table engine. ApsaraDB for ClickHouse Enterprise Edition performs automatic vertical and horizontal scaling. You do not need to worry about how to replicate and shard the table.
In this example, a self-managed ClickHouse cluster is referred to as the source cluster, and an ApsaraDB for ClickHouse Enterprise Edition cluster is referred to as the destination cluster.
Overview
The following section describes the procedure for migrating data from a self-managed ClickHouse cluster to an ApsaraDB for ClickHouse Enterprise Edition cluster:
Add a read-only user to the source cluster.
Copy the source table schema to the destination cluster.
Pull the data from the source cluster to the destination cluster if the source cluster can be accessed over external networks. Push the data from the source cluster to the destination cluster if the source cluster cannot be accessed over external networks.
(Optional) Remove the IP address of the source cluster from the IP address whitelist of the destination cluster.
Remove the read-only user from the source cluster.
Procedure
Perform the following operations on the source cluster. Make sure that the source table contains data.
Add a read-only user that can read the source table. In this example, db.table is used.
CREATE USER exporter IDENTIFIED WITH SHA256_PASSWORD BY 'password-here' SETTINGS readonly = 1;
GRANT SELECT ON db.table TO exporter;
Copy the source table schema.
SELECT create_table_query FROM system.tables WHERE database = 'db' and table = 'table'
Perform the following operations on the destination cluster:
Create a database.
CREATE DATABASE db
Create a destination data table by using the
CREATE TABLE
statement that is used to create the source data table.NoteChange the value of the ENGINE parameter to SharedMergeTree when you execute the
CREATE TABLE
statement. However, you cannot specify other parameters for the engine because ApsaraDB for ClickHouse Enterprise Edition clusters always replicate tables and provide the correct parameters. TheORDER BY
,PRIMARY KEY
,PARTITION BY
,SAMPLE BY
,TTL
, andSETTINGS
clauses define the schema and metadata of a table. Keep these clauses to ensure that the destination table is created as expected in the destination ApsaraDB for ClickHouse Enterprise Edition cluster.CREATE TABLE db.table ...
Use the
remote
function to pull data or push data.NoteIf the source ClickHouse server is not available from external networks, you can push the data instead of pulling the data because the
remote
function works for select and insert operations.Use the
remote
function in the destination cluster to pull data from the source table in the source cluster.
INSERT INTO db.table SELECT * FROM remote('source-hostname:9000', db, table, 'exporter', 'password-here')
Use the
remote
function in the source cluster to push data to the destination cluster.NoteTo enable the
remote
function to connect to your ApsaraDB for ClickHouse Enterprise Edition cluster, you must add the IP address of the source cluster to the IP address whitelist of the destination cluster. For more information, see Configure a whitelist.INSERT INTO FUNCTION remote('target-hostname:9000', 'db.table', 'default', 'PASS') SELECT * FROM db.table
Migrate data between ApsaraDB for ClickHouse Enterprise Edition clusters
You can migrate data between ApsaraDB for ClickHouse Enterprise Edition clusters in the following scenarios:
Migrate data from a restored backup.
Copy data from a development service to a staging service or from a staging service to a production service.
In this example, two clusters of ApsaraDB for ClickHouse Enterprise Edition are used. One is referred to as the source cluster and the other is referred to as the destination cluster.
Overview
The following section describes the procedure for migrating data between ApsaraDB for ClickHouse Enterprise Edition clusters:
Identify one ApsaraDB for ClickHouse Enterprise Edition cluster as the source cluster and the other as the destination cluster.
Add a read-only user to the source cluster.
Copy the source table schema to the destination cluster.
Temporarily allow access to the destination cluster from the IP address of the source cluster.
Copy the data from the source cluster to the destination cluster.
Re-establish the IP access list on the destination cluster.
Remove the read-only user from the source cluster.
Procedure
Perform the following operations on the source cluster:
Add a read-only user that can read the source table. In this example, db.table is used.
CREATE USER exporter IDENTIFIED WITH SHA256_PASSWORD BY 'password-here' SETTINGS readonly = 1;
GRANT SELECT ON db.table TO exporter;
Copy the source table schema.
SELECT create_table_query FROM system.tables WHERE database = 'db' and table = 'table'
Copy the source table schema to the destination cluster.
Create a database.
CREATE DATABASE db
Create a table in the destination database by using the
CREATE TABLE
statement that is used to create the source table.CREATE TABLE db.table ...
Use the output of
SELECT CREATE_TABLE_QUERY...
from the source cluster to create the destination table.Allow remote operations on the source cluster.
To pull data from the source cluster to the destination cluster, the source cluster and the destination cluster must allow connections from each other. Perform the following operations:
Add the IP address of the source cluster to the IP address whitelist of the destination cluster.
Add the IP address of the destination cluster to the IP address whitelist of the source cluster.
NoteYou can view the IP addresses of ApsaraDB for ClickHouse Enterprise Edition clusters by executing the
SELECT * FROM system.clusters;
statement. For more information about how to modify a whitelist, see Configure a whitelist.Copy the data from the source cluster to the destination cluster.
Use the
remote
function to pull data from the source cluster. Connect to the destination cluster and run the following command on the destination cluster to verify the data:INSERT INTO db.table SELECT * FROM remote('source-hostname:9000', db, table, 'exporter', 'password-here')
Restore the IP access list on the source cluster.
If you exported the access list earlier, you can import the access list again by using Share. Otherwise, add your IP addresses to the access list again.
Remove the read-only exporter user.
DROP USER exporter
Remove the IP addresses used to access the source and destination clusters. For more information, see Configure a whitelist.