Synchronize all data in a Hologres database to another Hologres database in offline mode - DataWorks

This topic describes how to use DataWorks Data Integration to synchronize all data in a Hologres database to another Hologres database in offline mode at a time.

Prerequisites

An exclusive resource group for Data Integration or a general-purpose resource group is purchased. For more information, see Create and use an exclusive resource group for Data Integration and Create and use a general-purpose resource group.
Two Hologres data sources are added to DataWorks. One is used as the source, and the other is used as the destination. For more information, see Add a Hologres data source.
Network connections are established between the resource group and the data sources. For more information, see Network connectivity solutions.

Procedure

Step 1: Select a synchronization type

Go to the Data Integration page.
Log on to the DataWorks console. In the top navigation bar, select the desired region. In the left-side navigation pane, choose Data Development and Governance > Data Integration. On the page that appears, select the desired workspace from the drop-down list and click Go to Data Integration.
In the left-side navigation pane of the Data Integration page, click Synchronization Task. In the upper part of the page that appears, select Hologres from the Source drop-down list and Hologres from the Destination drop-down list, and click Create. In the Basic Settings section of the Create Data Synchronization Solution page, configure the following parameters:
- New Node Name: Specify a name for the synchronization task based on your business requirements.
- Synchronization Method: Select Batch migration of entire database.

Step 2: Establish network connections

In the Network and Resource Configuration section, select the ClickHouse data source as the source, the Hologres data source as the destination, and the resource group. Then, click Test Connectivity for All Resource Groups and Data Sources to test the connectivity between the resource group and the data sources.
Click Next.

Step 3: Select the tables from which you want to synchronize data

In this step, you can select the tables from which you want to synchronize data in the Source Table list and click the icon to move the selected tables to the Selected Tables list.

Step 4: Configure settings related to destination tables

In the Mapping Rules for Destination Tables section, select all rows that are displayed and click Batch Refresh Mapping Results.

Note

After you select the tables from which you want to synchronize data, the selected tables are automatically displayed in the Mapping Rules for Destination Tables section. The properties of the destination tables are waiting to be mapped. You must manually define mappings between the source tables and destination tables to determine the data reading and writing relationships. Then, you can click Refresh in the Actions column. You can directly refresh mappings between source tables and destination tables. You can also refresh mappings between source tables and destination tables after you configure settings related to destination tables.

You can also select specific items and click Batch Modify to modify the items based on your business requirements. The following table describes the options that you can select after you click Batch Modify.

Option	Description
Value assignment	You can add constants and variables to destination tables.
Customize Mapping Rules for Destination Schema Names	You can concatenate built-in variables and specified strings into a final destination schema name. You can edit built-in variables. For example, you can specify strings as the value of built-in variables.
Customize Mapping Rules for Destination Table Names	You can concatenate built-in variables and specified strings into a destination table name. You can edit built-in variables. For example, you can specify strings as the value of built-in variables.
Destination Table Schema - Batch Modify and Add Field	You can perform one of the following operations on multiple destination tables at a time: modify schemas, add fields, and specify primary keys.

Step 5: Modify data type mappings

If the destination Hologres tables are in the to-be-created state, the system provides default mappings between data types of fields in the source Hologres tables and data types of fields in the destination Hologres tables. You can also click Edit Mapping of Field Data Types in the upper-right corner of the Mapping Rules for Destination Tables section to configure data type mappings based on your business requirements. After the configuration is complete, click Apply and Refresh Mapping.

Step 6: Configure advanced parameters

You can click Configure Advanced Parameters in the upper-right corner of the configuration page to perform finer-grained configurations for the source and destination for data synchronization. For example, you can configure the Maximum number of connections at the source of offline tasks, Maximum number of offline task write connections, and Clear table before synchronization parameters, and configure the parameters on the Runtime Config tab.

Step 7: Configure a resource group

You can click Configure Resource Group in the upper-right corner of the page and change the exclusive resource group for Data Integration that you want to use to run the synchronization task.

Step 8: Run the synchronization task

After the configuration of the synchronization task is complete, click Complete in the lower part of the page.
In the Nodes section of the Data Integration page, find the created synchronization task and click Start in the Actions column.
Click the name or ID of the synchronization task in the Nodes section and view the detailed running process of the synchronization task.

Perform O&M operations on the synchronization task

View the status of the synchronization task

After the synchronization task is created, you can go to the Synchronization Task page to view all synchronization tasks that are created in the workspace and the basic information of each synchronization task.

You can click Start or Stop in the Actions column of the synchronization task to start or stop the task. You can also click More in the Actions column of the synchronization task and select the related item to perform operations such as editing and viewing details of the task.
You can click Running Details in the Actions column to view the running details of a synchronization task. You can also click different sections on the running details page of the synchronization task to view the related information.

Rerun the synchronization task

In some special cases, if you add tables to or remove tables from the source, or change the schema or name of a destination table, you can click More in the Actions column of the synchronization task and then click Rerun to rerun the task after the change. During the rerun process, the synchronization task synchronizes data only from the newly added tables to the destination or only from the mapped source table to the destination table whose schema or name is changed.