Synchronize data from DataHub to Hologres - Hologres - Alibaba Cloud Documentation Center

You can synchronize data from DataHub to Hologres in scenarios that require real-time data analytics, real-time monitoring, real-time reporting, or complex data type processing. The combined capabilities of DataHub and Hologres help improve the efficiency and accuracy of data processing and analytics. This topic describes how to create a DataConnector in DataHub and provides solutions to common errors.

Background information

DataHub supports data synchronization as a data source or a data sink. You can synchronize data from a DataHub topic to other Alibaba Cloud services in real time or quasi real time. This facilitates data flows among Alibaba Cloud services. For more information about the data synchronization feature of DataHub, see Overview.

The following table describes the mappings between concepts in DataHub and Hologres.

DataHub	Hologres
Project	Database
Topic	Table

Synchronization modes and policies

Synchronization modes

Mode	Description
One-by-one insert	Inserts DataHub data into Hologres one by one. This mode is applicable if you want to synchronize all data in DataHub to Hologres.
Playback	Performs change operations that are performed on the source DataHub project on the destination Hologres database to ensure data consistency. You can obtain the data change operations by analyzing binary logs. This mode is applicable if you want to synchronize data from a source to DataHub by using Data Transmission Service (DTS) and then synchronize the data to Hologres. In this case, DataHub is equivalent to binary logs. Note When DTS is used to synchronize data to DataHub, additional columns are added to the synchronized data columns to describe the data operations. The names of additional columns may conflict with the names of existing columns in the destination DataHub topic. For more information about how to prevent the conflicts, see Modify the naming rules for additional columns.

Synchronization policies

Policy	Description
Replace	If a primary key conflict occurs when you write data, the original data is overwritten by the new data. This ensures data consistency between the source and destination databases.
Ignore	If a primary key conflict occurs when you write data, new data is ignored. The data is not updated. This prevents repeated data from being written and ensures data integrity of the destination database.

Precautions

Only data in topics of the TUPLE type in DataHub can be synchronized to Hologres.
Before you write data to a partitioned table in the Hologres instance, you must create a child table. For more information, see CREATE PARTITION TABLE.
Each DataConnector occupies connections. The number of connections occupied by each DataConnector is the number of shards in the DataHub topic.

Preparations

Activate DataHub and prepare a data source in DataHub. For more information, see Get started with DataHub.
Purchase a Hologres instance and create a table. In this example, the table lineitem is used. For more information, see Purchase a Hologres instance and Connect to Hologres and query data.
The following table describes the mappings between data types in DataHub and Hologres.
DataHub
Hologres
TINYINT
SMALLINT
SMALLINT
SMALLINT
INTEGER
INTEGER
BIGINT
BIGINT
FLOAT
REAL
DOUBLE
DOUBLE PRECISION
DECIMAL
DECIMAL
STRING
TEXT
BOOLEAN
BOOLEAN
TIMESTAMP
TIMESTAMPTZ
For example, you can execute the following statements to create a table:
```
BEGIN;
CREATE TABLE lineitem ( 
L_ORDERKEY BIGINT NOT NULL,
L_PARTKEY BIGINT NOT NULL,
L_SUPPKEY BIGINT NOT NULL,
L_LINENUMBER BIGINT NOT NULL,
L_QUANTITY DECIMAL(20,10),
L_EXTENDEDPRICE DECIMAL(20,10),
L_DISCOUNT DECIMAL(20,10),
L_TAX DECIMAL(20,10),
L_RETURNFLAG TEXT,
L_LINESTATUS TEXT,
L_SHIPDATE TIMESTAMPTZ,
L_COMMITDATE TIMESTAMPTZ,
L_RECEIPTDATE TIMESTAMPTZ,
L_SHIPINSTRUCT TEXT,
L_SHIPMODE TEXT,
L_COMMENT TEXT
);
caLL set_table_property('lineitem', 'orientation', 'column');
COMMIT;
```

Create a DataConnector

Log on to the DataHub console. Click the created topic to go to the topic details page.
In the upper-right corner, click + Connector.

In the panel that appears, click Hologres. In the Create Connector panel, configure the parameters.

Parameter	Description
Instance	The ID of the Hologres instance. You can log on to the Hologres console and copy the ID of the required Hologres instance.
Database	The name of the Hologres database to which you want to synchronize DataHub data.
Table	The name of the Hologres table to which you want to synchronize DataHub data. In this example, the table `lineitem` is used.
Primary Key Conflict Policy	The data update policy if a primary key conflict occurs. Valid values: replace: If a primary key conflict occurs when you write data, the original data is overwritten by the new data. This is the default value. ignore: If a primary key conflict occurs when you write data, new data is ignored. The data is not updated, and original data is still used. For more information about data synchronization policies, see Synchronization policies in this topic.
Synchronization Scenarios	The mode that is used for data synchronization. Valid values: default: Data is inserted one by one. This is the default value. dts: Use this mode when you use DTS to synchronize data to DataHub and enable new naming rules for additional columns. dts_old: Use this mode when you use DTS to synchronize data to DataHub and do not enable new naming rules for additional columns. For more information about data synchronization modes, see Synchronization modes in this topic.
Import Fields	The fields that you want to synchronize to Hologres. You can synchronize all or part of the fields of the DataHub topic based on your business requirements.
Authentication Mode	The mode in which access to the Hologres instance is authenticated. Default value: AccessKey.
AccessKey ID	The AccessKey ID of the Alibaba Cloud account used to access the Hologres instance. You can obtain the AccessKey ID from the AccessKey Pair page.
AccessKey Secret	The AccessKey secret of the Alibaba Cloud account used to access the Hologres instance. You can obtain the AccessKey secret on the AccessKey Pair page.
Timestamp Unit	The timestamp unit used by the DataConnector for data synchronization. Valid values: MICROSECOND: This is the default value. MILLISECOND. SECOND.

Click Create to synchronize data from DataHub to Hologres.
After you create the DataConnector, you can view the status of the DataConnector on the Synchronization Task tab of the topic details page.
Query the synchronized data in Hologres.
Connect the Hologres instance to a development tool and use the tool to check whether the data is synchronized to the Hologres instance in real time. For more information about development tools, see Overview. For example, you can execute the following statement to query the synchronized data:
```
SELECT COUNT(*) FROM lineitem;
```

Common errors and troubleshooting

This section describes common errors that are reported when you use Hologres and provides the solutions to these errors.

Error 1
- Problem description
```
ErrorMessage: Import field not found in dest schema.
```
- Causes
  - The destination Hologres table does not contain the fields that are specified as import fields in the DataConnector.
  - The Synchronization Scenarios parameter is set to default, but the additional columns that are generated when data is synchronized to DataHub by using DTS are configured as import fields in the DataConnector.
- Solutions
  - Create a Hologres table and add the missing fields. You can also modify the Import Fields parameter of the DataConnector to remove the additional fields.
  - Create a DataConnector and set the Synchronization Scenarios parameter to dts or dts_old.

Error 2
- Problem description
```
ErrorMessage: Column type not match with Holo column.
```
- Cause
  The data types of fields in the DataHub topic do not match the data types of fields in the Hologres table.
- Solution
  Create a Hologres table and specify valid data types for the table based on the data type mappings.
Error 3
- Problem description
```
ErrorMessage: Not import column xxx not allow null and no default value.
```
- Cause
  The destination Hologres table contains some fields that are not specified as import fields in the DataConnector. The fields are configured with the not null property, and no default values are configured for the fields.
- Solution
  Create a Hologres table. For fields that are not specified as import fields in the DataConnector, configure default values for the fields or do not configure the not null property.

DataHub	Hologres
TINYINT	SMALLINT
SMALLINT	SMALLINT
INTEGER	INTEGER
BIGINT	BIGINT
FLOAT	REAL
DOUBLE	DOUBLE PRECISION
DECIMAL	DECIMAL
STRING	TEXT
BOOLEAN	BOOLEAN
TIMESTAMP	TIMESTAMPTZ