This topic describes how to migrate tables and data from a self-managed ClickHouse cluster to Hologres for data development and analytics.
Prerequisites
A Hologres instance is purchased. For more information, see Purchase a Hologres instance.
A ClickHouse cluster is available, and the ClickHouse client is installed. To install the client, click ClickHouse-Client. For more information about how to install and use the client, see Getting Started.
The PostgreSQL client is connected to the Hologres instance. For more information, see Use the PostgreSQL client to connect to Hologres.
Background information
ClickHouse is a column-oriented database management system that is used for online analytical processing (OLAP). Hologres is an interactive analytics service developed by Alibaba Cloud. Hologres can respond to queries within sub-seconds and supports a large amount of queries per second (QPS). You can migrate tables and data from a ClickHouse cluster to Hologres to enhance data development and analytics.
The following table compares the features of Hologres and ClickHouse.
Category | Item | ClickHouse | Hologres |
Service | Positioning | Traffic analysis. | Universal real-time data warehouse for data analytics and online data processing. |
Data write | Storage mode | Column-oriented storage. | Column-oriented storage and row-oriented storage. |
Visibility | Data can be viewed within seconds after it is written. The ClickHouse client can accumulate data for batch processing. Data is written to a distributed table based on data replication in shards. | Data can be viewed within milliseconds after it is written. Data is written in a way that adapts to batch processing. | |
Performance | Good. | Excellent. | |
Detailed records | Supported. | Supported. | |
Primary key | ClickHouse does not require a unique primary key. Primary keys cannot be used as unique constraints and are used only for indexing and merge operations. | Hologres requires standard database primary keys. A primary key can be used as a unique constraint. | |
Update performance | Incomplete and weak. ClickHouse does not support frequent updates based on primary keys. | Complete. Hologres supports frequent updates based on primary keys. | |
Real-time data writes | Append |
| |
Indexes |
|
Note Indexes such as minmax, Bloom filter, and N-gram are automatically created and transparent to users. | |
Query | Optimizer | Rule-based optimizer (RBO). | Cost-based optimizer (CBO). |
Federated query | Supported. ClickHouse allows you to perform federated queries based on the Hadoop Distributed File System (HDFS) and Kafka engines. | Supported. Hologres allows you to use a foreign data wrapper (FDW) to directly read MaxCompute and Hive data. | |
Pre-aggregation | Supported. Pre-aggregation is performed based on MergeTree. | Supported. Pre-aggregation is performed based on stored procedures and periodic scheduling. | |
Frequent point query | Not supported. | Supported. More than 10 million point queries can be performed per second. | |
Performance of single-table complex query | Good. | Good. | |
Performance of joining multiple tables | Poor. | Good. | |
SQL syntax | Custom syntax. | Hologres is compatible with the standard PostgreSQL syntax and provides more features. | |
Window functions | Not supported. | Supported. | |
Transaction | Atomicity, consistency, isolation, durability (ACID) of transactions | ClickHouse does not ensure the ACID of transactions. For example, ClickHouse does not ensure that data can be viewed instantly after it is written. In addition, ClickHouse does not ensure data consistency. | Hologres ensures the ACID of transactions to a specific extent. For example, Hologres ensures the visibility of data for data description language (DDL) transactions, single-row transactions, and snapshot-based transactions. |
Replication | Disaster recovery and backup | ClickHouse uses data replicas for disaster recovery and backup on remote Zookeeper and ClickHouse servers. | Hologres uses the binary logging feature to logically replicate data and uses the underlying mechanism to physically replicate data. |
Advanced features | Binary logging | Not supported. | Supported. |
Vector search | Supported by ClickHouse 22.8 or later. | Supported. | |
Spatial data | Not supported. | Supported. | |
Security management | ClickHouse allows you to customize permissions for security management. | Hologres is compatible with the Postgres permission model and allows you to perform various operations to manage permissions, configure IP address whitelists, and mask sensitive data for security management. | |
Separation of storage and computing | Storage and computing are not separated. The storage capacity on a single server is limited. | Storage and computing are separated. The storage capacity on a single server is almost unlimited. | |
Availability | Failovers must be manually handled. | Failovers can be automatically handled. | |
O&M | O&M is complex, and the shard distribution must be manually maintained. | No O&M operations are required. | |
Ecosystem | Data connection | Various ecosystem partners such as Apache Kafka, Apache Flink, and Apache Spark are supported. | Various ecosystem partners such as Apache Flink, Apache Spark, Java Database Connectivity (JDBC), and DataX are supported. |
Business intelligence (BI) tool | A small number of BI tools such as Tableau and Superset are supported. | Hologres is compatible with the PostgreSQL ecosystem and supports more than 100 mainstream BI tools. |
Data type mappings
The following table describes the mappings between data types in ClickHouse and Hologres.
Category | ClickHouse | Hologres |
Date | Date | Date |
DateTime | TIMESTAMPTZ | |
DateTime(timezone) | TIMESTAMPTZ | |
DateTime64 | TIMESTAMPTZ | |
Numeric value | Int8 | Hologres does not support the single-byte INT data type, but uses the SMALLINT data type instead. |
Int16 | SMALLINT | |
Int32 | INT | |
Int64 | BIGINT | |
UInt8 | INT | |
UInt16 | INT | |
UInt32 | BIGINT | |
UInt64 | BIGINT | |
Float32 | FLOAT | |
Float64 | DOUBLE PRECISION | |
Decimal(P, S) | DECIMAL | |
Decimal32(S) | DECIMAL | |
Decimal64(S) | DECIMAL | |
Decimal128(S) | DECIMAL | |
Boolean value | ClickHouse does not support the BOOLEAN data type, but uses the UInt8 data type instead. | BOOLEAN |
String | String | TEXT |
FixString(N) | Hologres does not support the FixedString(N) data type, but uses the TEXT data type instead. | |
LowCardinality | Hologres does not support the LowCardinality data type. Hologres can intelligently enable dictionary encoding and also allows you to call the | |
Binary value | ClickHouse does not support the BINARY data type, but uses the STRING or FixedString(N) data type instead. | Hologres supports data types such as BIT(n), VARBIT(n), BYTEA, and CHAR(n). |
Others | UUID | UUID |
Enum | Hologres does not support the Enum data type, but uses the TEXT data type instead. | |
Nested, Tuple, and Array | ARRAY |
Migrate metadata
The migration of metadata involves the migration of DDL statements that are used to create tables. To migrate metadata, perform the following steps:
Execute the following statement on the ClickHouse client to query the databases in your ClickHouse cluster.
NoteThe database named system in the query result is the system database. This database does not need to be migrated and can be skipped.
clickhouse-client --host="<host>" --port="<port>" --user="<username>" --password="<password>" --query="SHOW databases" > database.list;
The following table describes the parameters.
Parameter
Description
host
The endpoint of the ClickHouse cluster.
port
The port number of the ClickHouse cluster.
username
The username of the account that you use to log on to the ClickHouse cluster. The user must have permissions to execute specific DML statements, including read, write, and setting permissions. The user must also have permissions to execute DDL statements.
password
The password of the account that you use to log on to the ClickHouse cluster.
Execute the following statement on the ClickHouse client to query the tables in your ClickHouse cluster.
NoteYou can skip the returned tables whose names start with .inner. because these tables are internal representations of materialized views.
clickhouse-client --host="<host>" --port="<port>" --user="<username>" --password="<password>" --query="SHOW tables from <database_name>" > table.list;
The following table describes the parameters.
Parameter
Description
host
The endpoint of the ClickHouse cluster.
port
The port number of the ClickHouse cluster.
username
The username of the account that you use to log on to the ClickHouse cluster. The user must have permissions to execute specific DML statements, including read, write, and setting permissions. The user must also have permissions to execute DDL statements.
password
The password of the account that you use to log on to the ClickHouse cluster.
database_name
The name of the database in which the tables to be migrated reside.
You can also execute the following statement to query the names of all databases and tables in the ClickHouse cluster:
select distinct database, name from system.tables where database != 'system';
Execute the following statement on the ClickHouse client to export the DDL statements that are used to create tables in the ClickHouse cluster:
clickhouse-client --host="<host>" --port="<port>" --user="<username>" --password="<password>" --query="SHOW CREATE TABLE <database_name>.<table_name>" > table.sql;
You can also execute the following statement to query system tables:
SELECT * FROM system.tables where database = '<database_name>' and engine != 'Distributed';
The following table describes the conversion of the system table fields in ClickHouse when metadata is migrated from ClickHouse to Hologres.
Parameter
Description
database
The database in ClickHouse, which corresponds to a schema in Hologres that uses the PostgreSQL syntax. The
create database "<database_name>";
statement in ClickHouse corresponds to thecreate schema "<schema_name>";
statement in Hologres.name
The name of the ClickHouse table. No conversion is required.
engine
Hologres does not have distributed tables. Tables in Hologres are not classified into distributed tables and local tables. Tables in Hologres are individual tables for distributed storage and query. When you query system tables in the ClickHouse database, you must skip the tables that meet the
engine='Distributed'
condition.is_temporary
Specifies whether the table is a temporary table in ClickHouse. Temporary tables do not need to be migrated. In addition, Hologres does not support temporary tables.
data_paths
metadata_path
metadata_modification_time
These fields can be ignored.
dependencies_database
dependencies_table
These fields are required when views and materialized views are used. In Hologres, a view with dependencies must be created before a base table is created. Hologres does not support materialized views.
create_table_query
The DDL statement that is used to create the source ClickHouse table. To create the destination Hologres table, the statement must be converted into the DDL statement that uses the PostgreSQL syntax.
engine_full
The details of the engine. This field can be ignored.
partition_key
The partition key column of the ClickHouse table, which corresponds to the partition key column of the Hologres table. For example, if the partition_key parameter of the ClickHouse table is set to col1, you must append the
partition by list (col1);
clause to the statements that are used to create the Hologres table.sorting_key
The sort key of the ClickHouse table. The sort key corresponds to a segment key or a clustering key of the Hologres table.
primary_key
The primary key of the ClickHouse table. The primary key corresponds to the primary key in the DDL statements that are used to create the Hologres table.
sampling_key
The key used for sampling. Hologres DDL statements do not support sampling.
storage_policy
The storage policy of the ClickHouse database. This field can be ignored.
Convert the DDL statements that are used to create the source ClickHouse table into the statements that are used to create the destination Hologres table. Hologres is compatible with the PostgreSQL syntax.
You must convert DDL statements based on the field conversion description in the preceding table and the Data type mappings section in this topic. Examples:
Convert the DDL statements that are used to create the lineitem table in the ClickHouse cluster into the DDL statements that are used to create a destination Hologres table.
The following sample code shows the DDL statements that are used to create the lineitem table in the ClickHouse cluster:
-- lineitem on ClickHouse CREATE TABLE lineitem_local ON CLUSTER default( l_orderkey UInt64, l_partkey UInt32, l_suppkey UInt32, l_linenumber UInt32, l_quantity decimal(15,2), l_extendedprice decimal(15,2), l_discount decimal(15,2), l_tax decimal(15,2), l_returnflag LowCardinality(String), l_linestatus LowCardinality(String), l_shipdate Date, l_commitdate Date, l_receiptdate Date, l_shipinstruct LowCardinality(String), l_shipmode LowCardinality(String), l_comment LowCardinality(String) ) ENGINE = MergeTree PARTITION BY toYear(l_shipdate) ORDER BY (l_orderkey, l_linenumber); CREATE TABLE lineitem on cluster default as lineitem_local ENGINE = Distributed(default, default, lineitem_local, l_orderkey);
The following sample code shows the converted DDL statements that are used to create the destination Hologres table:
-- lineitem on Hologres -- create a table group with 32 shards CALL hg_create_table_group ('lineitem_tg', 32); BEGIN; CREATE TABLE LINEITEM ( L_ORDERKEY BIGINT NOT NULL, L_PARTKEY INT NOT NULL, L_SUPPKEY INT NOT NULL, L_LINENUMBER INT NOT NULL, L_QUANTITY DECIMAL(15,2) NOT NULL, L_EXTENDEDPRICE DECIMAL(15,2) NOT NULL, L_DISCOUNT DECIMAL(15,2) NOT NULL, L_TAX DECIMAL(15,2) NOT NULL, L_RETURNFLAG TEXT NOT NULL, L_LINESTATUS TEXT NOT NULL, L_SHIPDATE TIMESTAMPTZ NOT NULL, L_COMMITDATE TIMESTAMPTZ NOT NULL, L_RECEIPTDATE TIMESTAMPTZ NOT NULL, L_SHIPINSTRUCT TEXT NOT NULL, L_SHIPMODE TEXT NOT NULL, L_COMMENT TEXT NOT NULL, PRIMARY KEY (L_ORDERKEY,L_LINENUMBER) ); CALL set_table_property('LINEITEM', 'clustering_key', 'L_SHIPDATE,L_ORDERKEY'); CALL set_table_property('LINEITEM', 'segment_key', 'L_SHIPDATE'); CALL set_table_property('LINEITEM', 'table_group', 'lineitem_tg'); CALL set_table_property('LINEITEM', 'distribution_key', 'L_ORDERKEY'); -- columns with LowCardinality CALL set_table_property('LINEITEM', 'bitmap_columns', 'L_RETURNFLAG,L_LINESTATUS,L_SHIPINSTRUCT,L_SHIPMODE,L_COMMENT'); -- columns with LowCardinality CALL set_table_property('LINEITEM', 'dictionary_encoding_columns', 'L_RETURNFLAG,L_LINESTATUS,L_SHIPINSTRUCT,L_SHIPMODE,L_COMMENT'); CALL set_table_property('LINEITEM', 'time_to_live_in_seconds', '31536000'); COMMIT;
Convert the DDL statements that are used to create the customer table in the ClickHouse cluster into the DDL statements that are used to create a destination Hologres table.
The following sample code shows the DDL statements that are used to create the lineitem table in the ClickHouse cluster:
-- customer on ClickHouse CREATE TABLE customer_local ON CLUSTER default( c_custkey UInt32, c_name String, c_address String, c_nationkey UInt32, c_phone LowCardinality(String), c_acctbal decimal(15,2), c_mktsegment LowCardinality(String), c_comment LowCardinality(String) ) ENGINE = MergeTree ORDER BY (c_custkey); CREATE TABLE customer on cluster default as customer_local ENGINE = Distributed(default, default, customer_local, c_custkey);
The following sample code shows the converted DDL statements that are used to create the destination Hologres table:
-- customer on Hologres BEGIN; CREATE TABLE CUSTOMER ( C_CUSTKEY INT NOT NULL PRIMARY KEY, C_NAME TEXT NOT NULL, C_ADDRESS TEXT NOT NULL, C_NATIONKEY INT NOT NULL, C_PHONE TEXT NOT NULL, C_ACCTBAL DECIMAL(15,2) NOT NULL, C_MKTSEGMENT TEXT NOT NULL, C_COMMENT TEXT NOT NULL ); CALL set_table_property('CUSTOMER', 'distribution_key', 'C_CUSTKEY'); CALL set_table_property('CUSTOMER', 'table_group', 'lineitem_tg'); CALL set_table_property('CUSTOMER', 'bitmap_columns', 'C_CUSTKEY,C_NATIONKEY,C_NAME,C_ADDRESS,C_PHONE,C_MKTSEGMENT,C_COMMENT'); CALL set_table_property('CUSTOMER', 'dictionary_encoding_columns', 'C_NAME,C_ADDRESS,C_PHONE,C_MKTSEGMENT,C_COMMENT'); CALL set_table_property('CUSTOMER', 'time_to_live_in_seconds', '31536000'); COMMIT;
Execute the following statement on the PostgreSQL client to import the converted DDL statements to the destination Hologres instance:
PGUSER="<username>" PGPASSWORD="<password>" psql -h "<host>" -p "<port>" -d "<database_name>" -f table.sql;
Migrate data
You can use one of the following methods to migrate data from the ClickHouse cluster to the Hologres instance:
(Recommended) Export the data that you want to migrate from the ClickHouse cluster as a file. Then, execute the
COPY
statement for JDBC or PostgreSQL to import the file to the Hologres instance.Use Realtime Compute for Apache Flink deployments or Spark jobs to read data from the ClickHouse cluster and write the data to the Hologres instance. For more information, see Use Spark to import data.
Use the Data Integration service of DataWorks to read data from the ClickHouse cluster and write the data to the Hologres instance. For more information, see Overview.
The following example shows how to export the data that you want to migrate from the ClickHouse cluster as a file and import the file to the Hologres instance. To use this method, perform the following steps:
Execute the following statement on the ClickHouse client to export the data from the ClickHouse cluster to a local CSV file:
clickhouse-client --host="<host>" --port="<port>" --user="<username>" --password="<password>" --query="select * from <database_name>.<table_name> FORMAT CSV" > table.csv;
The following table describes the parameters.
Parameter
Description
host
The endpoint of the ClickHouse cluster.
port
The port number of the ClickHouse cluster.
username
The username of the account that you use to log on to the ClickHouse cluster. The user must have permissions to execute specific DML statements, including read, write, and setting permissions. The user must also have permissions to execute DDL statements.
password
The password of the account that you use to log on to the ClickHouse cluster.
database_name
The name of the database in which the table to be migrated resides.
table_name
The name of the table that you want to migrate from the ClickHouse cluster.
Execute the following statement on the PostgreSQL client to import the local CSV file to the Hologres instance:
PGUSER="<username>" PGPASSWORD="<password>" psql -h "<host>" -p "<port>" -d "<database_name>" -c "COPY <schema_name>.<table_name> FROM STDIN (FORMAT 'csv')" < table.csv;
The following table describes the parameters.
Parameter
Description
username
The username of the account that you use to log on to the Hologres instance. The user must have permissions to execute specific DML statements, including read, write, and setting permissions. The user must also have permissions to execute DDL statements. The username is usually the AccessKey ID of your Alibaba Cloud account. You can move the pointer over the profile picture in the upper-right corner of the Hologres console and select AccessKey Management to obtain the AccessKey ID.
password
The password of the account that you use to log on to the Hologres instance. The password is usually the AccessKey secret of your Alibaba Cloud account. You can move the pointer over the profile picture in the upper-right corner of the Hologres console and select AccessKey Management to obtain the AccessKey secret.
host
The endpoint of the Hologres instance.
You can obtain this information from the Network Information section of the instance details page in the Hologres console.
port
The port number of the Hologres instance.
You can obtain this information from the Network Information section of the instance details page in the Hologres console.
database_name
The name of the Hologres database to which the local CSV file is to be imported.
schema_name
The name of the Hologres schema to which the local CSV file is to be imported. Default value: public.
table_name
The name of the Hologres table to which the local CSV file is to be imported.
Query the imported data on the Hologres instance to verify whether the import is successful.
Synchronize all data from a ClickHouse database to Hologres in offline mode
You can use the synchronization task provided by DataWorks Data Integration to synchronize all data from a ClickHouse database to Hologres in offline mode. For more information, see Synchronize data from multiple tables in an ApsaraDB for ClickHouse database to Hologres in offline mode.
Convert data query statements
The data query statements in Hologres use the PostgreSQL syntax, whereas those in ClickHouse use custom syntax. The SQL syntax of ClickHouse is partially compatible with the American National Standards Institute (ANSI) SQL syntax. The SQL syntax of Hologres and that of ClickHouse are similar on the whole but has differences in details. Therefore, you need to convert data query statements, especially the functions such as scalar functions and window functions in the data query statements.
The SQL syntax of ClickHouse and that of Hologres have the following differences:
The names of columns in the syntax of ClickHouse are enclosed in single quotation marks (
'
), whereas those in the SQL syntax of Hologres are enclosed in double quotation marks ("
).ClickHouse uses the
SELECT X FROM <database_name>.<table_name>
statement to query data, whereas Hologres uses theSELECT X FROM <schema_name>.<table_name>
statement to query data.The expressions in the data query statements in ClickHouse and Hologres are different. The differences mainly lie in functions. The following table describes the mappings between specific functions in ClickHouse and Hologres. Those functions shared by ClickHouse and Hologres are not described.
ClickHouse
Hologres
toYear(expr)
to_char(expr, 'YYYY')
toInt32(expr)
CAST(expr as INTEGER)
uniq()
uniqCombined()
uniqCombined64()
uniqHLL12()
approx_count_distinct()
uniqExact()
count(distinct x)
quantile(level) (expr)
approx_percentile(level) WITHIN GROUP(ORDER BY expr)
quantileExact(level) (expr)
percentile_cont (level) WITHIN GROUP(ORDER BY expr)
You can use one of the following methods to convert data query statements:
Regular expressions
You can use regular expressions to convert specific ClickHouse syntax with fixed patterns, such as function names, punctuation marks, and expressions, into the Hologres SQL syntax. For example, you can convert single quotation marks (
'
) in the ClickHouse syntax into double quotation marks ("
) in the Hologres SQL syntax.ClickHouse extension
Hologres is embedded with the ClickHouse extension and is compatible with specific ClickHouse functions, such as the
toUInt32()
function. These functions do not need to be converted before they are used in Hologres.
The following examples show how to convert specific TPC-H query statements that use the ClickHouse syntax to those that use the Hologres SQL syntax:
Example 1
The following sample code shows a TPC-H query statement that is used to query data in a ClickHouse cluster:
-- Q1 on ClickHouse select l_returnflag, l_linestatus, sum(l_quantity) as sum_qty, sum(l_extendedprice) as sum_base_price, sum(l_extendedprice * (1 - l_discount)) as sum_disc_price, sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) as sum_charge, avg(l_quantity) as avg_qty, avg(l_extendedprice) as avg_price, avg(l_discount) as avg_disc, count(*) as count_order from lineitem where l_shipdate <= date '1998-12-01' - interval '90' day group by l_returnflag, l_linestatus order by l_returnflag, l_linestatus;
The following sample code shows the converted TPC-H query statement that is used to query data in a Hologres instance:
-- Q1 on Hologres select l_returnflag, l_linestatus, sum(l_quantity) as sum_qty, sum(l_extendedprice) as sum_base_price, sum(l_extendedprice * (1 - l_discount)) as sum_disc_price, sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) as sum_charge, avg(l_quantity) as avg_qty, avg(l_extendedprice) as avg_price, avg(l_discount) as avg_disc, count(*) as count_order from lineitem where l_shipdate <= date '1998-12-01' - interval '90' day group by l_returnflag, l_linestatus order by l_returnflag, l_linestatus;
Example 2
The following sample code shows a TPC-H query statement that is used to query data in a ClickHouse cluster:
-- Q4 on ClickHouse select o_orderpriority, count(*) as order_count from orders where o_orderdate >= date '1993-07-01' and o_orderdate < date '1993-07-01' + interval '3' month and o_orderdate in ( select o_orderdate from lineitem, orders where l_orderkey = o_orderkey and l_commitdate < l_receiptdate ) group by o_orderpriority order by o_orderpriority;
The following sample code shows the converted TPC-H query statement that is used to query data in a Hologres instance:
-- Q4 on Hologres select o_orderpriority, count(*) as order_count from orders where o_orderdate >= date '1993-07-01' and o_orderdate < date '1993-07-01' + interval '3' month and exists ( select * from lineitem where l_orderkey = o_orderkey and l_commitdate < l_receiptdate ) group by o_orderpriority order by o_orderpriority;
Example 3
The following sample code shows a TPC-H query statement that is used to query data in a ClickHouse cluster:
-- Q11 on ClickHouse select ps_partkey, sum(ps_supplycost * ps_availqty) as value from partsupp, supplier, nation where ps_suppkey = s_suppkey and s_nationkey = n_nationkey and n_name = 'GERMANY' group by ps_partkey having sum(ps_supplycost * ps_availqty) > ( select sum(ps_supplycost * ps_availqty) * toDecimal32(0.0000010000,9) from partsupp, supplier, nation where ps_suppkey = s_suppkey and s_nationkey = n_nationkey and n_name = 'GERMANY' ) order by value desc limit 100;
The following sample code shows the converted TPC-H query statement that is used to query data in a Hologres instance:
-- Q11 on Hologres select ps_partkey, sum(ps_supplycost * ps_availqty) as value from partsupp, supplier, nation where ps_suppkey = s_suppkey and s_nationkey = n_nationkey and n_name = 'GERMANY' group by ps_partkey having sum(ps_supplycost * ps_availqty) > ( select sum(ps_supplycost * ps_availqty) * 0.0000010000 from partsupp, supplier, nation where ps_suppkey = s_suppkey and s_nationkey = n_nationkey and n_name = 'GERMANY' ) order by value desc limit 100;
Function compatibility
The syntax of many basic functions in Hologres is the same as that of basic functions in ClickHouse. Hologres supports some ClickHouse functions or the ClickHouse functions that have the same semantics as Hologres functions. For more information about the function compatibility between Hologres and ClickHouse, see Supported ClickHouse functions.