This topic describes how to create and configure a batch synchronization task to synchronize data from multiple tables in an ApsaraDB for ClickHouse database to Hologres.
Limits
Batch synchronization from ApsaraDB for ClickHouse supports only ApsaraDB for ClickHouse data sources of V20.8 or V21.8.
Batch synchronization to ApsaraDB for ClickHouse supports only exclusive resource groups for Data Integration and new-version resource groups.
Prerequisites
An exclusive resource group for Data Integration or a new-version resource group is purchased. For more information, see Create and use an exclusive resource group for Data Integration or Create and use a serverless resource group.
A ClickHouse data source is added. For more information, see Add a ClickHouse data source.
A Hologres data source is added. For more information, see Add a Hologres data source.
Network connections are established between your resource group and the data sources. For more information, see Network connectivity solutions.
Procedure
Step 1: Select a synchronization type
Go to the Data Integration page.
Log on to the DataWorks console. In the top navigation bar, select the desired region. In the left-side navigation pane, choose . On the page that appears, select the desired workspace from the drop-down list and click Go to Data Integration.
In the left-side navigation pane of the Data Integration page, click Synchronization Task. In the Create Synchronization Task section, select ClickHouse from the Source drop-down list and Hologres from the Destination drop-down list, and click Create. The Create Data Synchronization Solution page appears. In the Basic Settings section of the page, configure the following parameters:
New Node Name: Specify a name for the synchronization task based on your business requirements.
Synchronization Method: Select Batch migration of entire database.
Step 2: Establish network connections
In the Network and Resource Configuration section, select the ClickHouse data source as the source, the Hologres data source as the destination, and the resource group. Then, click Test Connectivity for All Resource Groups and Data Sources to test the connectivity between the resource group and the data sources.
Click Next.
Step 3: Select the tables from which you want to synchronize data
In this step, you can select the tables from which you want to synchronize data in the Source Table list and click the icon to move the selected tables to the Selected Tables list.
Step 4: Configure settings related to destination tables
In the Mapping Rules for Destination Tables section, select all rows that are displayed and click Batch Refresh Mapping Results.
After you select the tables from which you want to synchronize data, the selected tables are automatically displayed in the Mapping Rules for Destination Tables section. The properties of the destination tables are waiting to be mapped. You must manually define mappings between the source tables and destination tables to determine the data reading and writing relationships. Then, you can click Refresh in the Actions column. You can directly refresh mappings between source tables and destination tables. You can also refresh mappings between source tables and destination tables after you configure settings related to destination tables.
You can also select specific items and click Batch Modify to modify the items based on your business requirements. The following table describes the options that you can select after you click Batch Modify.
Option | Description |
Value assignment | You can add constants and variables to destination tables. |
Customize Mapping Rules for Destination Schema Names | You can concatenate built-in variables and specified strings into a final destination schema name. You can edit built-in variables. For example, you can specify strings as the value of built-in variables. |
Customize Mapping Rules for Destination Table Names | You can concatenate built-in variables and specified strings into a destination table name. You can edit built-in variables. For example, you can specify strings as the value of built-in variables. |
Destination Table Schema - Batch Modify and Add Field | You can perform one of the following operations on multiple destination tables at a time: modify schemas, add fields, and specify primary keys. |
Step 5: Modify data type mappings
If the destination Hologres tables are in the to-be-created state, the system provides default mappings between data types of fields in ApsaraDB for ClickHouse tables and data types of fields in Hologres tables. The following table lists the default mappings. You can also click Edit Mapping of Field Data Types in the upper-right corner of the Mapping Rules for Destination Tables section to configure data type mappings based on your business requirements. After the configuration is complete, click Apply and Refresh Mapping.
Category | Data type of fields in ClickHouse data source | Data type of fields in Hologres data source |
Date | Date | Date |
DateTime | TIMESTAMPTZ | |
DateTime(timezone) | TIMESTAMPTZ | |
DateTime64 | TIMESTAMPTZ | |
Numeric | Int8 | SMALLINT |
Int16 | SMALLINT | |
Int32 | INTEGER | |
Int64 | BIGINT | |
UInt8 | INTEGER | |
UInt16 | INTEGER | |
UInt32 | BIGINT | |
UInt64 | BIGINT | |
Float32 | FLOAT | |
Float64 | DOUBLE PRECISION | |
Decimal(P, S) | DECIMAL | |
Decimal32(S) | DECIMAL | |
Decimal64(S) | DECIMAL | |
Decimal128(S) | DECIMAL | |
Boolean | None (UInt8 is used instead.) | BOOLEAN |
String | String | TEXT |
Step 6: Configure advanced parameters
You can click Configure Advanced Parameters in the upper-right corner of the configuration page to perform finer-grained configurations for the source and destination for data synchronization. For example, you can configure the maximum number of connections and the parameters related to throttling.
Step 7: Configure a resource group
You can click Configure Resource Group in the upper-right corner of the page and change the exclusive resource group for Data Integration that you want to use to run the data synchronization task.
Step 8: Run the synchronization task
After the configuration of the synchronization task is complete, click Complete in the lower part of the page.
In the Nodes section of the Data Integration page, find the created synchronization task and click Start in the Actions column.
Click the name or ID of the synchronization task in the Nodes section and view the detailed running process of the synchronization task.
Perform O&M operations on the synchronization task
View the status of the synchronization task
After the synchronization task is created, you can go to the Synchronization Task page to view all synchronization tasks that are created in the workspace and the basic information of each synchronization task.
You can click Start or Stop in the Actions column to start or stop a synchronization task. You can also edit a synchronization task or view details of a synchronization task.
You can click Running Details in the Actions column to view the running details of a synchronization task. You can also click different sections on the running details page of the synchronization task to view the related information.
Rerun the synchronization task
In some special cases, if you add tables to or remove tables from the synchronization task or modify the schema or name of a destination table, you can click Rerun in the Actions column of the synchronization task. The system reruns the synchronization task to synchronize data only from the newly added tables or the table whose schema or name is modified to the destination.