Synchronize data from multiple tables in an ApsaraDB for ClickHouse database to Hologres in offline mode - DataWorks

This topic describes how to create and configure a batch synchronization task to synchronize data from multiple tables in an ApsaraDB for ClickHouse database to Hologres.

Limits

Batch synchronization from ApsaraDB for ClickHouse supports only ApsaraDB for ClickHouse data sources of V20.8 or V21.8.
Batch synchronization to ApsaraDB for ClickHouse supports only exclusive resource groups for Data Integration and new-version resource groups.

Prerequisites

An exclusive resource group for Data Integration or a new-version resource group is purchased. For more information, see Create and use an exclusive resource group for Data Integration or Create and use a serverless resource group.
A ClickHouse data source is added. For more information, see Add a ClickHouse data source.
A Hologres data source is added. For more information, see Add a Hologres data source.
Network connections are established between your resource group and the data sources. For more information, see Network connectivity solutions.

Procedure

Step 1: Select a synchronization type

Go to the Data Integration page.
Log on to the DataWorks console. In the top navigation bar, select the desired region. In the left-side navigation pane, choose Data Development and Governance > Data Integration. On the page that appears, select the desired workspace from the drop-down list and click Go to Data Integration.
In the left-side navigation pane of the Data Integration page, click Synchronization Task. In the Create Synchronization Task section, select ClickHouse from the Source drop-down list and Hologres from the Destination drop-down list, and click Create. The Create Data Synchronization Solution page appears. In the Basic Settings section of the page, configure the following parameters:
- New Node Name: Specify a name for the synchronization task based on your business requirements.
- Synchronization Method: Select Batch migration of entire database.

Step 2: Establish network connections

In the Network and Resource Configuration section, select the ClickHouse data source as the source, the Hologres data source as the destination, and the resource group. Then, click Test Connectivity for All Resource Groups and Data Sources to test the connectivity between the resource group and the data sources.
Click Next.

Step 3: Select the tables from which you want to synchronize data

In this step, you can select the tables from which you want to synchronize data in the Source Table list and click the icon to move the selected tables to the Selected Tables list.

Step 4: Configure settings related to destination tables

In the Mapping Rules for Destination Tables section, select all rows that are displayed and click Batch Refresh Mapping Results.

Note

After you select the tables from which you want to synchronize data, the selected tables are automatically displayed in the Mapping Rules for Destination Tables section. The properties of the destination tables are waiting to be mapped. You must manually define mappings between the source tables and destination tables to determine the data reading and writing relationships. Then, you can click Refresh in the Actions column. You can directly refresh mappings between source tables and destination tables. You can also refresh mappings between source tables and destination tables after you configure settings related to destination tables.

You can also select specific items and click Batch Modify to modify the items based on your business requirements. The following table describes the options that you can select after you click Batch Modify.

Option	Description
Value assignment	You can add constants and variables to destination tables.
Customize Mapping Rules for Destination Schema Names	You can concatenate built-in variables and specified strings into a final destination schema name. You can edit built-in variables. For example, you can specify strings as the value of built-in variables.
Customize Mapping Rules for Destination Table Names	You can concatenate built-in variables and specified strings into a destination table name. You can edit built-in variables. For example, you can specify strings as the value of built-in variables.
Destination Table Schema - Batch Modify and Add Field	You can perform one of the following operations on multiple destination tables at a time: modify schemas, add fields, and specify primary keys.

Step 5: Modify data type mappings

If the destination Hologres tables are in the to-be-created state, the system provides default mappings between data types of fields in ApsaraDB for ClickHouse tables and data types of fields in Hologres tables. The following table lists the default mappings. You can also click Edit Mapping of Field Data Types in the upper-right corner of the Mapping Rules for Destination Tables section to configure data type mappings based on your business requirements. After the configuration is complete, click Apply and Refresh Mapping.

Category	Data type of fields in ClickHouse data source	Data type of fields in Hologres data source
Date	Date	Date
	DateTime	TIMESTAMPTZ
	DateTime(timezone)	TIMESTAMPTZ
	DateTime64	TIMESTAMPTZ
Numeric	Int8	SMALLINT
	Int16	SMALLINT
	Int32	INTEGER
	Int64	BIGINT
	UInt8	INTEGER
	UInt16	INTEGER
	UInt32	BIGINT
	UInt64	BIGINT
	Float32	FLOAT
	Float64	DOUBLE PRECISION
	Decimal(P, S)	DECIMAL
	Decimal32(S)	DECIMAL
	Decimal64(S)	DECIMAL
	Decimal128(S)	DECIMAL
Boolean	None (UInt8 is used instead.)	BOOLEAN
String	String	TEXT

Step 6: Configure advanced parameters

You can click Configure Advanced Parameters in the upper-right corner of the configuration page to perform finer-grained configurations for the source and destination for data synchronization. For example, you can configure the maximum number of connections and the parameters related to throttling.

Step 7: Configure a resource group

You can click Configure Resource Group in the upper-right corner of the page and change the exclusive resource group for Data Integration that you want to use to run the data synchronization task.

Step 8: Run the synchronization task

After the configuration of the synchronization task is complete, click Complete in the lower part of the page.
In the Nodes section of the Data Integration page, find the created synchronization task and click Start in the Actions column.
Click the name or ID of the synchronization task in the Nodes section and view the detailed running process of the synchronization task.

Perform O&M operations on the synchronization task

View the status of the synchronization task

After the synchronization task is created, you can go to the Synchronization Task page to view all synchronization tasks that are created in the workspace and the basic information of each synchronization task.

You can click Start or Stop in the Actions column to start or stop a synchronization task. You can also edit a synchronization task or view details of a synchronization task.
You can click Running Details in the Actions column to view the running details of a synchronization task. You can also click different sections on the running details page of the synchronization task to view the related information.

Rerun the synchronization task

In some special cases, if you add tables to or remove tables from the synchronization task or modify the schema or name of a destination table, you can click Rerun in the Actions column of the synchronization task. The system reruns the synchronization task to synchronize data only from the newly added tables or the table whose schema or name is modified to the destination.