All Products
Search
Document Center

DataWorks:Synchronize data from multiple tables in an ApsaraDB for ClickHouse database to Hologres in offline mode

Last Updated:Nov 15, 2024

This topic describes how to create and configure a batch synchronization task to synchronize data from multiple tables in an ApsaraDB for ClickHouse database to Hologres.

Limits

  • Batch synchronization from ApsaraDB for ClickHouse supports only ApsaraDB for ClickHouse data sources of V20.8 or V21.8.

  • Batch synchronization to ApsaraDB for ClickHouse supports only exclusive resource groups for Data Integration and new-version resource groups.

Prerequisites

Procedure

Step 1: Select a synchronization type

  1. Go to the Data Integration page.

    Log on to the DataWorks console. In the top navigation bar, select the desired region. In the left-side navigation pane, choose Data Development and Governance > Data Integration. On the page that appears, select the desired workspace from the drop-down list and click Go to Data Integration.

  2. In the left-side navigation pane of the Data Integration page, click Synchronization Task. In the Create Synchronization Task section, select ClickHouse from the Source drop-down list and Hologres from the Destination drop-down list, and click Create. The Create Data Synchronization Solution page appears. In the Basic Settings section of the page, configure the following parameters:

    • New Node Name: Specify a name for the synchronization task based on your business requirements.

    • Synchronization Method: Select Batch migration of entire database.

Step 2: Establish network connections

  1. In the Network and Resource Configuration section, select the ClickHouse data source as the source, the Hologres data source as the destination, and the resource group. Then, click Test Connectivity for All Resource Groups and Data Sources to test the connectivity between the resource group and the data sources.

    image

  2. Click Next.

Step 3: Select the tables from which you want to synchronize data

In this step, you can select the tables from which you want to synchronize data in the Source Table list and click the image icon to move the selected tables to the Selected Tables list.

image

Step 4: Configure settings related to destination tables

In the Mapping Rules for Destination Tables section, select all rows that are displayed and click Batch Refresh Mapping Results.

Note

After you select the tables from which you want to synchronize data, the selected tables are automatically displayed in the Mapping Rules for Destination Tables section. The properties of the destination tables are waiting to be mapped. You must manually define mappings between the source tables and destination tables to determine the data reading and writing relationships. Then, you can click Refresh in the Actions column. You can directly refresh mappings between source tables and destination tables. You can also refresh mappings between source tables and destination tables after you configure settings related to destination tables.

You can also select specific items and click Batch Modify to modify the items based on your business requirements. The following table describes the options that you can select after you click Batch Modify.

Option

Description

Value assignment

You can add constants and variables to destination tables.

Customize Mapping Rules for Destination Schema Names

You can concatenate built-in variables and specified strings into a final destination schema name. You can edit built-in variables. For example, you can specify strings as the value of built-in variables.

Customize Mapping Rules for Destination Table Names

You can concatenate built-in variables and specified strings into a destination table name. You can edit built-in variables. For example, you can specify strings as the value of built-in variables.

Destination Table Schema - Batch Modify and Add Field

You can perform one of the following operations on multiple destination tables at a time: modify schemas, add fields, and specify primary keys.

Step 5: Modify data type mappings

If the destination Hologres tables are in the to-be-created state, the system provides default mappings between data types of fields in ApsaraDB for ClickHouse tables and data types of fields in Hologres tables. The following table lists the default mappings. You can also click Edit Mapping of Field Data Types in the upper-right corner of the Mapping Rules for Destination Tables section to configure data type mappings based on your business requirements. After the configuration is complete, click Apply and Refresh Mapping.

Category

Data type of fields in ClickHouse data source

Data type of fields in Hologres data source

Date

Date

Date

DateTime

TIMESTAMPTZ

DateTime(timezone)

TIMESTAMPTZ

DateTime64

TIMESTAMPTZ

Numeric

Int8

SMALLINT

Int16

SMALLINT

Int32

INTEGER

Int64

BIGINT

UInt8

INTEGER

UInt16

INTEGER

UInt32

BIGINT

UInt64

BIGINT

Float32

FLOAT

Float64

DOUBLE PRECISION

Decimal(P, S)

DECIMAL

Decimal32(S)

DECIMAL

Decimal64(S)

DECIMAL

Decimal128(S)

DECIMAL

Boolean

None (UInt8 is used instead.)

BOOLEAN

String

String

TEXT

Step 6: Configure advanced parameters

You can click Configure Advanced Parameters in the upper-right corner of the configuration page to perform finer-grained configurations for the source and destination for data synchronization. For example, you can configure the maximum number of connections and the parameters related to throttling.

Step 7: Configure a resource group

You can click Configure Resource Group in the upper-right corner of the page and change the exclusive resource group for Data Integration that you want to use to run the data synchronization task.

Step 8: Run the synchronization task

  1. After the configuration of the synchronization task is complete, click Complete in the lower part of the page.

  2. In the Nodes section of the Data Integration page, find the created synchronization task and click Start in the Actions column.

  3. Click the name or ID of the synchronization task in the Nodes section and view the detailed running process of the synchronization task.

Perform O&M operations on the synchronization task

View the status of the synchronization task

After the synchronization task is created, you can go to the Synchronization Task page to view all synchronization tasks that are created in the workspace and the basic information of each synchronization task.image

  • You can click Start or Stop in the Actions column to start or stop a synchronization task. You can also edit a synchronization task or view details of a synchronization task.

  • You can click Running Details in the Actions column to view the running details of a synchronization task. You can also click different sections on the running details page of the synchronization task to view the related information.执行详情

Rerun the synchronization task

In some special cases, if you add tables to or remove tables from the synchronization task or modify the schema or name of a destination table, you can click Rerun in the Actions column of the synchronization task. The system reruns the synchronization task to synchronize data only from the newly added tables or the table whose schema or name is modified to the destination.