All Products
Search
Document Center

DataWorks:Hologres data source

Last Updated:Nov 07, 2024

DataWorks provides Hologres Reader and Hologres Writer for you to read data from and write data to Hologres data sources. This topic describes the capabilities of synchronizing data from or to Hologres data sources.

Supported Hologres versions

Hologres V0.7, V0.8, V0.9, V0.10, V1.1, V1.2, and V1.3

Limits

Batch data read and write

  • Hologres data sources support only exclusive resource groups for Data Integration.

  • Hologres Writer cannot write data to Hologres foreign tables.

  • Logic based on which the endpoint of a Hologres data source is obtained during the establishment of a network connection between the Hologres data source and a resource group:

    • If the Hologres data source resides in the same region as the resource group, the endpoint of the Hologres data source is obtained in the following order: classic network endpoint > VPC endpoint > public endpoint.

    • If the Hologres data source resides in a different region from the resource group, the endpoint of the Hologres data source is obtained in the following order: public endpoint > VPC endpoint.

Real-time data write in a database

Supported data types

Data type

Hologres Reader for batch data read

Hologres Writer for batch data write

Hologres Writer for real-time data write

UUID

Not supported

Not supported

Not supported

CHAR

Supported

Supported

Supported

NCHAR

Supported

Supported

Supported

VARCHAR

Supported

Supported

Supported

LONGVARCHAR

Supported

Supported

Supported

NVARCHAR

Supported

Supported

Supported

LONGNVARCHAR

Supported

Supported

Supported

CLOB

Supported

Supported

Supported

NCLOB

Supported

Supported

Supported

SMALLINT

Supported

Supported

Supported

TINYINT

Supported

Supported

Supported

INTEGER

Supported

Supported

Supported

BIGINT

Supported

Supported

Supported

NUMERIC

Supported

Supported

Supported

DECIMAL

Supported

Supported

Supported

FLOAT

Supported

Supported

Supported

REAL

Supported

Supported

Supported

DOUBLE

Supported

Supported

Supported

TIME

Supported

Supported

Supported

DATE

Supported

Supported

Supported

TIMESTAMP

Supported

Supported

Supported

BINARY

Supported

Supported

Supported

VARBINARY

Supported

Supported

Supported

BLOB

Supported

Supported

Supported

LONGVARBINARY

Supported

Supported

Supported

BOOLEAN

Supported

Supported

Supported

BIT

Supported

Supported

Supported

JSON

Supported

Supported

Supported

JSONB

Supported

Supported

Supported

How Hologres Reader and Hologres Writer work

Batch data read and write

Hologres Reader reads data from Hologres tables by using PostgreSQL statements. The number of parallel threads that are used to read data is based on the number of shards in the Hologres table from which you want to read data. One SELECT statement is executed for each shard.

  • When you execute the CREATE TABLE statement to create a table in Hologres, you can use the CALL set_table_property('table_name', 'shard_count', 'xx') command to configure the number of shards for the table.

    By default, the shard_count field is set to the default number of table shards for your Hologres database. The configurations of your Hologres instance determine the default number of table shards for your Hologres database.

  • A SELECT statement uses the shard that is specified by the built-in field hg_shard_id of the source Hologres table to query data.

Batch data write

Hologres Writer obtains data from a reader and writes data to Hologres based on the conflict processing mode that is configured for the related batch synchronization task.

You can configure a conflict processing mode for the batch synchronization task to control how to process conflicting data in a primary key conflict.

Important

You can configure a conflict processing mode only for a table that has a primary key. For information about the principle based on which data is written to Hologres when a primary key conflict occurs, and data write performance, see How it works.

  • If you configure the replacement mode for the batch synchronization task, new data overwrites the existing data, and all fields in a row are overwritten. NULL is written to the fields that have no mapped source fields as the value.

  • If you configure the update mode for the batch synchronization task, new data overwrites the existing data, and only fields that have mapped fields are overwritten.

  • If you configure the ignoring mode for the batch synchronization task, existing data is retained, and new data is ignored.

Develop a data synchronization task

For information about the entry point for and the procedure of configuring a data synchronization task, see the following sections. For information about the parameter settings, view the infotip of each parameter on the configuration tab of the task.

Add a data source

Before you configure a data synchronization task to synchronize data from or to a specific data source, you must add the data source to DataWorks. For more information, see Add and manage data sources.

Configure a batch synchronization task to synchronize data of a single table

Configure a real-time synchronization task to synchronize data of a single table or synchronize all data in a database

For more information about the configuration procedure, see Configure a real-time synchronization task in DataStudio.

Configure synchronization settings to implement real-time synchronization of full data or incremental data in a single table or a database

For more information about the configuration procedure, see Configure a synchronization task in Data Integration.

Appendix: Code and parameters

Appendix: Configure a batch synchronization task by using the code editor

If you use the code editor to configure a batch synchronization task, you must configure parameters for the reader and writer of the related data source based on the format requirements in the code editor. For more information about the format requirements, see Configure a batch synchronization task by using the code editor. The following information describes the configuration details of parameters for the reader and writer in the code editor.

Code for Hologres Reader

  • Read data from a non-partitioned Hologres table

    • In the following code, a synchronization task is configured to read data from a non-partitioned Hologres table:

      {
        "type":"job",
        "version":"2.0",// The version number. 
        "steps":[
          {
            "stepType":"holo",// The plug-in name. 
            "parameter":{
              "endpoint": "instance-id-region-endpoint.hologres.aliyuncs.com:port",
              "accessId": "***************", // The AccessKey ID of the account that you use to connect to Hologres. 
              "accessKey": "*******************", // The AccessKey secret of the account that you use to connect to Hologres. 
              "database": "postgres",
              "table": "holo_reader_****",
              "column" : [ // The names of the columns. 
                "tag",
                "id",
                "title"
              ]
            },
            "name":"Reader",
            "category":"reader"
          },
          {
            "stepType":"stream",
            "parameter":{},
            "name":"Writer",
            "category":"writer"
          }
        ],
        "setting":{
          "errorLimit":{
            "record":"0"// The maximum number of dirty data records allowed. 
          },
          "speed":{
            "throttle":true,// Specifies whether to enable throttling. The value false indicates that throttling is disabled, and the value true indicates that throttling is enabled. The mbps parameter takes effect only when the throttle parameter is set to true. 
            "concurrent":1,// The maximum number of parallel threads.  
                        "mbps":"12"// The maximum transmission rate. Unit: MB/s. 
          }
        },
        "order":{
          "hops":[
            {
              "from":"Reader",
              "to":"Writer"
            }
          ]
        }
      }
    • The following sample DDL statements are used to create a non-partitioned Hologres table:

      begin;
      drop table if exists holo_reader_basic_src;
      create table holo_reader_basic_src(
        tag text not null, 
        id int not null, 
        title text not null, 
        body text, 
        primary key (tag, id));
        call set_table_property('holo_reader_basic_src', 'orientation', 'column');
        call set_table_property('holo_reader_basic_src', 'shard_count', '3');
      commit;
  • Read data from a partition in a partitioned Hologres table

    • In the following code, a synchronization task is configured to read data from a child partitioned table of a partitioned Hologres table.

      Note

      Exercise caution when you configure the partition parameter.

      {
        "type":"job",
        "version":"2.0",// The version number. 
        "steps":[
          {
            "stepType":"holo",// The plug-in name. 
            "parameter":{
              "endpoint": "instance-id-region-endpoint.hologres.aliyuncs.com:port",
              "accessId": "***************", // The AccessKey ID of the account that you use to connect to Hologres. 
              "accessKey": "*******************", // The AccessKey secret of the account that you use to connect to Hologres. 
              "database": "postgres",
              "table": "holo_reader_basic_****",
              "partition": "tag=foo",
              "column" : [
                "*"
              ],
              "fetchSize": "100"
            },
            "name":"Reader",
            "category":"reader"
          },
          {
            "stepType":"stream",
            "parameter":{},
            "name":"Writer",
            "category":"writer"
          }
        ],
        "setting":{
          "errorLimit":{
            "record":"0"// The maximum number of dirty data records allowed. 
          },
          "speed":{
            "throttle":true,// Specifies whether to enable throttling. The value false indicates that throttling is disabled, and the value true indicates that throttling is enabled. The mbps parameter takes effect only when the throttle parameter is set to true. 
            "concurrent":1,// The maximum number of parallel threads. 
                        "mbps":"12"// The maximum transmission rate. Unit: MB/s.
          }
        },
        "order":{
          "hops":[
            {
              "from":"Reader",
              "to":"Writer"
            }
          ]
        }
      }
    • The following sample DDL statements are used to create a partitioned Hologres table:

      begin;
      drop table if exists holo_reader_basic_part_src;
      create table holo_reader_basic_part_src(
        tag text not null, 
        id int not null, 
        title text not null, 
        body text, 
        primary key (tag, id))
        partition by list( tag );
        call set_table_property('holo_reader_basic_part_src', 'orientation', 'column');
        call set_table_property('holo_reader_basic_part_src', 'shard_count', '3');
      commit;
      
      create table holo_reader_basic_part_src_1583161774228 partition of holo_reader_basic_part_src for values in ('foo');
      
      # Make sure that the child partitioned table from which you want to read data is created and data is inserted into the child partitioned table. 
      postgres=# \d+ holo_reader_basic_part_src
                               Table "public.holo_reader_basic_part_src"
       Column |  Type   | Collation | Nullable | Default | Storage  | Stats target | Description 
      --------+---------+-----------+----------+---------+----------+--------------+-------------
       tag    | text    |           | not null |         | extended |              | 
       id     | integer |           | not null |         | plain    |              | 
       title  | text    |           | not null |         | extended |              | 
       body   | text    |           |          |         | extended |              | 
      Partition key: LIST (tag)
      Indexes:
          "holo_reader_basic_part_src_pkey" PRIMARY KEY, btree (tag, id)
      Partitions: holo_reader_basic_part_src_1583161774228 FOR VALUES IN ('foo')

Parameters in code for Hologres Reader

Parameter

Description

Required

Default value

endpoint

The endpoint of the source Hologres instance. Specify the value in the format of instance-id-region-endpoint.hologres.aliyuncs.com:port. You can obtain the endpoint of a Hologres instance on the details page of the instance in the Hologres console.

The endpoint of a Hologres instance varies based on the network type, including the classic network, Internet, and VPC. Configure this parameter based on the type of the network where your exclusive resource group for Data Integration and the Hologres instance reside. If an invalid endpoint is specified, the connection between the exclusive resource group for Data Integration and the Hologres instance may fail, or data synchronization performance may be poor. The endpoints for the three network types are in the following formats:

  • Classic network endpoint: instance-id-region-endpoint-internal.hologres.aliyuncs.com:port

  • Public endpoint: instance-id-region-endpoint.hologres.aliyuncs.com:port

  • VPC endpoint: instance-id-region-endpoint-vpc.hologres.aliyuncs.com:port

We recommend that you deploy the exclusive resource group for Data Integration and the Hologres instance in the same zone of the same region. This helps ensure a successful network connection and the optimal data synchronization performance.

Yes

No default value

accessId

The AccessKey ID of the account that you use to connect to Hologres.

Yes

No default value

accessKey

The AccessKey secret of the account that you use to connect to Hologres. Make sure that the account is authorized to read data from the source table.

Yes

No default value

database

The name of the source database in the Hologres instance.

Yes

No default value

table

The name of the table from which you want to read data. If the table is a partitioned table, specify the name of the table instead of the name of the partition from which you want to read data.

Yes

No default value

column

The names of the columns from which you want to read data. The names of the primary key columns in the source table must be included. If you want to read data from all columns in the source table, set this parameter to ["*"].

Yes

No default value

partition

The partition key column and the related value of the source table, in the format of column=value. This parameter is valid only for partitioned tables.

Important
  • Hologres supports only list partitioning. You can specify only one column as the partition key column. The data type of the partition key column must be INT4 or TEXT.

  • The value of this parameter must match the partition filter expression in the DDL statements that are used to create the destination Hologres table.

  • You must specify a child partitioned table that exists and contains data.

No

Left empty, which indicates that the source table is a non-partitioned table

fetchSize

The maximum number of data records to read from the source table at a time by using the SELECT statement.

No

1,000

Code for Hologres Writer

  • Write data to a non-partitioned table

    • In the following code, a synchronization task is configured to write the data obtained from the memory to a non-partitioned Hologres table in JDBC mode:

      {
          "type": "job",
          "version": "2.0",
          "steps": [
              {
                  "stepType": "mysql",
                  "parameter": {
                      "envType": 0,
                          "datasource": "<mysql_source_name>",
                      "column": [
                          "<column1>",
                          "<column2>",
                          ......,
                          "<columnN>"
                      ],
                      "connection": [
                          {
                              "datasource": "<mysql_source_name>",// The name of the MySQL data source from which you want to read data.
                              "table": [
                                  "<mysql_table_name>"
                              ]
                          }
                      ],
                      "where": "",
                      "splitPk": "",
                      "encoding": "UTF-8"
                  },
                  "name": "Reader",
                  "category": "reader"
              },
              {
                  "stepType": "holo",
                  "parameter": {
                      "maxConnectionCount": 9,
                      "datasource": "<holo_sink_name>",// The name of the Hologres data source to which you want to write data.
                      "truncate":true,// Specifies whether to delete the existing data in the destination Hologres table before Hologres Writer writes data to the table. 
                      "conflictMode": "ignore",
                      "envType": 0,
                      "column": [
                          "<column1>",
                          "<column2>",
                          ......,
                          "<columnN>"
                      ],
                      "table": "<holo_table_name>"
                  },
                  "name": "Writer",
                  "category": "writer"
              }
          ],
          "setting": {
              "executeMode": null,
              "errorLimit": {
                  "record": ""
              },
              "speed": {
                  "concurrent": 2,// The maximum number of parallel threads.
                  "throttle": false// Specifies whether to enable throttling. The value false indicates that throttling is disabled, and the value true indicates that throttling is enabled.
              }
          },
          "order": {
              "hops": [
                  {
                      "from": "Reader",
                      "to": "Writer"
                  }
              ]
          }
      }
    • The following sample DDL statements are used to create a non-partitioned Hologres table:

      begin;
      drop table if exists mysql_to_holo_test;
      create table mysql_to_holo_test(
        tag text not null,
        id int not null,
        body text not null,
        brrth date,
        primary key (tag, id));
        call set_table_property('mysql_to_holo_test', 'orientation', 'column');
        call set_table_property('mysql_to_holo_test', 'distribution_key', 'id');
        call set_table_property('mysql_to_holo_test', 'clustering_key', 'birth');
      commit;
  • Write data to a partitioned table

    Note
    • Hologres supports only list partitioning. You can specify only one column as the partition key column. The data type of the partition key column must be INT4 or TEXT.

    • The value of the partition parameter must match the partition filter expression in the DDL statements that are used to create the destination Hologres table.

    • In the following code, a synchronization task is configured to write the data obtained from the memory to a child partitioned table of a partitioned Hologres table:

      {
        "type": "job",
        "version": "2.0",
        "steps": [
          {
            "stepType": "mysql",
            "parameter": {
              "envType": 0,
              "datasource": "<mysql_source_name>",
              "column": [
                "<column1>",
                "<column2>",
                  ......,
                "<columnN>"
              ],
              "connection": [
                {
                  "datasource": "<mysql_source_name>",
                  "table": [
                    "<mysql_table_name>"
                  ]
                }
              ],
              "where": "",
              "splitPk": "<mysql_pk>",// The field that is used for data sharding when MySQL Reader reads data.
              "encoding": "UTF-8"
            },
            "name": "Reader",
            "category": "reader"
          },
          {
            "stepType": "holo",
            "parameter": {
              "maxConnectionCount": 9,
              "partition": "<partition_key>",// The partition key of the destination Hologres table.
              "datasource": "<holo_sink_name>",// The name of the Hologres data source to which you want to write data.
              "conflictMode": "ignore",
              "envType": 0,
              "column": [
                "<column1>",
                "<column2>",
                  ......,
                "<columnN>"
              ],
              "table": "<holo_table_name>"
            },
            "name": "Writer",
            "category": "writer"
          }
        ],
        "setting": {
          "executeMode": null,
          "errorLimit": {
            "record": ""
          },
          "speed": {
            "concurrent": 2,// The maximum number of parallel threads.
            "throttle": false// Specifies whether to enable throttling. The value false indicates that throttling is disabled, and the value true indicates that throttling is enabled.
          }
        },
        "order": {
          "hops": [
            {
              "from": "Reader",
              "to": "Writer"
            }
          ]
        }
      }
    • The following sample DDL statements are used to create a partitioned Hologres table:

      BEGIN;
      CREATE TABLE public.hologres_parent_table(
        a text ,
        b int,
        c timestamp,
        d text,
        ds text,
        primary key(ds,b)
        )
        PARTITION BY LIST(ds);
      CALL set_table_property('public.hologres_parent_table', 'orientation', 'column');
      CREATE TABLE public.holo_child_1 PARTITION OF public.hologres_parent_table FOR VALUES IN('20201215');
      CREATE TABLE public.holo_child_2 PARTITION OF public.hologres_parent_table FOR VALUES IN('20201216');
      CREATE TABLE public.holo_child_3 PARTITION OF public.hologres_parent_table FOR VALUES IN('20201217');
      COMMIT;

Parameters in code for Hologres Writer

Parameter

Description

Required

Default value

endpoint

The endpoint of the destination Hologres instance. Specify the value in the format of instance-id-region-endpoint.hologres.aliyuncs.com:port. You can obtain the endpoint of a Hologres instance on the details page of the instance in the Hologres console.

The endpoint of a Hologres instance varies based on the network type. Network types include the Internet, classic network, and VPC. Configure this parameter based on the type of the network in which your exclusive resource group for Data Integration and the Hologres instance reside. If an invalid endpoint is specified, the network connection between the exclusive resource group for Data Integration and the Hologres instance may fail, or data synchronization performance may be degraded. The endpoints for the three network types are in the following formats:

  • Public endpoint: instance-id-region-endpoint.hologres.aliyuncs.com:port

  • Classic network endpoint: instance-id-region-endpoint-internal.hologres.aliyuncs.com:port

  • VPC endpoint: instance-id-region-endpoint-vpc.hologres.aliyuncs.com:port

We recommend that you deploy the exclusive resource group for Data Integration and the Hologres instance in the same zone of the same region. This helps ensure a successful network connection and the optimal data synchronization performance.

Yes

No default value

accessId

The AccessKey ID of the account that you use to connect to Hologres.

Yes

No default value

accessKey

The AccessKey secret of the account that you use to connect to Hologres. Make sure that the account is authorized to write data to the destination Hologres table.

Yes

No default value

database

The name of the destination database in the Hologres instance.

Yes

No default value

table

The name of the Hologres table to which you want to write data. You can specify the table name in the format of Schema_name.Table_name.

Yes

No default value

conflictMode

The conflict processing mode. Valid values: Replace, Update, and Ignore. For more information, see How Hologres Reader and Hologres Writer work.

Yes

No default value

column

The names of the columns to which you want to write data. The names of the primary key columns in the destination Hologres table must be included. If you want to write data to all the columns in the destination Hologres table, set this parameter to ["*"].

Yes

No default value

partition

The partition key column and the related value in the destination Hologres table. Configure this parameter in the format of column=value. This parameter is valid only for partitioned tables.

Note
  • Hologres supports only list partitioning. You can specify only one column as the partition key column. The data type of the partition key column must be INT4 or TEXT.

  • The value of this parameter must match the partition filter expression in the DDL statements that are used to create the destination Hologres table.

No

Left empty, which indicates that the destination Hologres table is a non-partitioned table

truncate

Specifies whether to delete the existing data in the destination Hologres table before Hologres Writer writes data to the table.

  • true: deletes the existing data in the destination Hologres table.

    Note
    • The existing data can be deleted only if the destination Hologres table is a non-partitioned table or a table that contains static partitions. The existing data in the table that contains dynamic partitions cannot be deleted. If your destination Hologres table is a table that contains dynamic partitions and you set the truncate parameter to true, the data synchronization task exits due to exceptions.

    • If your destination Hologres table is a table that contains static partitions and you set the truncate parameter to true, only data in the child partitioned table of the destination Hologres table is deleted.

  • false: retains the existing data in the destination Hologres table.

No

false