All Products
Search
Document Center

DataWorks:ClickHouse data source

Last Updated:Jan 09, 2026

ClickHouse data sources provide bidirectional channels to read from and write to ClickHouse. This topic describes the data synchronization capabilities that DataWorks supports for ClickHouse.

Supported versions

The following table describes the compatibility between Alibaba Cloud ClickHouse versions and JDBC driver versions.

JDBC driver version

Alibaba Cloud ClickHouse version

0.2.4

20.8, 21.8

0.4.0, 0.4.2

22.8, 23.8

Limits

ClickHouse data sources support only offline read and write operations. The following limits apply:

Supported data types

The common Alibaba Cloud ClickHouse data types are supported. For a complete list of Alibaba Cloud ClickHouse data types, see Data types. Other data types that are described in the open source ClickHouse documentation are not supported. For a full list of open source ClickHouse data types, see ClickHouse Doc.

Data type

ClickHouse Reader

ClickHouse Writer

Int8

Support

Support

Int16

Support

Supported

Int32

Supported

Support

Int64

Support

Supported

UInt8

Support

Supported

UInt16

Supported

Support

UInt32

Support

Support

UInt64

Support

Support

Float32

Supported

Support

Float64

Support

Support

Decimal

Support

Support

String

Support

Support

FixedString

Support

Support

Date

Supported

Support

DateTime

Support

Support

DateTime64

Support

Support

Boolean

Support

Note

ClickHouse does not have a dedicated Boolean type. Use UInt8 or Int8 instead.

Support

Array

Partially supported.

Supported only when array elements are integers, floating-point numbers, strings, or DateTime64 with millisecond precision.

Supported

Tuple

Supported

Supported

Domain(IPv4,IPv6)

Supported

Support

Enum8

Supported

Support

Enum16

Support

Support

Nullable

Support

Support

Nested

Partially supported.

The Nested data type supports integers, floating-point numbers, strings, and DateTime64 values with millisecond precision.

Support

Add a data source

Before you develop a synchronization task in DataWorks, you must add the required data source to DataWorks by following the instructions in Data Source Management. You can view parameter descriptions in the DataWorks console to understand the meanings of the parameters when you add a data source.

Data synchronization task development

For information about the entry point for and the procedure of configuring a synchronization task, see the following configuration guides.

Configure a single-table offline sync task

Configure full-database offline read sync

For more information about the procedure, see Configure a real-time full database synchronization task.

Appendix: Script demo and parameter descriptions

Configure a batch synchronization task by using the code editor

If you want to configure a batch synchronization task by using the code editor, you must configure the related parameters in the script based on the unified script format requirements. For more information, see Configure a task in the code editor. The following information describes the parameters that you must configure for data sources when you configure a batch synchronization task by using the code editor.

Reader script demo

{
    "type": "job",
    "version": "2.0",
    "steps": [
        {
            "stepType": "clickhouse", // Plug-in name.
            "parameter": {
                "fetchSize":1024,// Number of records fetched per batch from the database server.
                "datasource": "example",
                "column": [   // Column names.
                    "id",
                    "name"
                ],
                "where": "",    // Filter condition.
                "splitPk": "",  // Shard key.
                "table": ""    // Table name.
            },
            "name": "Reader",
            "category": "reader"
        },
        {
            "stepType": "clickhouse",
            "parameter": {
                "postSql": [
                    "update @table set db_modify_time = now() where db_id = 1"
                ],
                "datasource": "example",    // Data source.
                "batchByteSize": "67108864",
                "column": [
                    "id",
                    "name"
                ],
                "writeMode": "insert",
                "encoding": "UTF-8",
                "batchSize": 1024,
                "table": "ClickHouse_table",
                "preSql": [
                    "delete from @table where db_id = -1"
                ]
            },
            "name": "Writer",
            "category": "writer"
        }
    ],
    "setting": {
        "executeMode": null,
        "errorLimit": {
            "record": "0"  // Maximum number of error records allowed during sync.
        },
        "speed": {
         "throttle":true,// When throttle is false, mbps is ignored (no throttling). When throttle is true, throttling is enabled.
            "concurrent":1 // Job concurrency.
            "mbps":"12",// Throttling rate. 1 mbps = 1 MB/s.
        }
    },
    "order": {
        "hops": [
            {
                "from": "Reader",
                "to": "Writer"
            }
        ]
    }
}

Reader script parameters

Parameter

Description

Required

Default value

datasource

Data source name. In script mode, you can add a data source. The value must exactly match the added data source name.

Yes

None

table

Table to synchronize, described in JSON format.

Note

The table parameter must be included in the connection configuration block.

Yes

None

fetchSize

Number of records fetched per batch from the database server. This value determines how often the sync system communicates with the server and affects extraction performance.

Note

An excessively large fetchSize may cause out-of-memory (OOM) errors. Increase this value gradually based on ClickHouse load.

No

1,024

column

Columns to read from ClickHouse. Separate column names with commas. Example: "column": ["id", "name", "age"].

Note

The column parameter must be specified and cannot be empty.

Yes

None

jdbcUrl

JDBC connection URL for the source database. The jdbcUrl must be included in the connection configuration block.

  • Only one value can be configured per database.

  • The jdbcUrl format must follow ClickHouse official specifications and can include additional parameters. Example: jdbc:clickhouse://localhost:3306/test?user=root&password=&useUnicode=true&characterEncoding=gbk &autoReconnect=true&failOverReadOnly=false.

Yes

None

username

Username for the data source.

Yes

None

password

Password for the specified username.

Yes

None

splitPk

When extracting data from ClickHouse, specifying splitPk means you want to shard data using the field represented by splitPk. This enables concurrent sync tasks and improves efficiency.

Note

When splitPk is configured, fetchSize becomes required.

No

None

where

Filter condition. In practice, you often sync only today's data by setting where to gmt_create>$bizdate.

The where condition enables effective incremental synchronization. If you do not specify a where statement, which includes not providing a key or value for where, a full data synchronization is performed.

No

None

Writer script demo

{
    "type":"job",
    "version":"2.0",// Version number.
    "steps":[
        {
            "stepType":"stream",
            "parameter":{},
            "name":"Reader",
            "category":"reader"
        },
        {
            "stepType":"clickhouse",// Plug-in name.
            "parameter":{
                "username": "",
                "password": "",
                "column": [// Fields.
                    "id",
                    "name"
                ],
                "connection": [
                    {
                        "table": [// Table name.
                            "ClickHouse_table"
                        ],
                        "jdbcUrl": "jdbc:clickhouse://ip:port/database"
                    }
                ],
                "preSql": [ // SQL statement executed before the sync task runs.
                    "TRUNCATETABLEIFEXISTStablename"
                ],
                "postSql": [// SQL statement executed after the sync task runs.
                    "ALTERTABLEtablenameUPDATEcol1=1WHEREcol2=2"
                ],
                "batchSize": "1024",
                "batchByteSize": "67108864",
                "writeMode": "insert"
            },
            "name":"Writer",
            "category":"writer"
        }
    ],
    "setting":{
        "errorLimit":{
            "record":"0"// Number of error records.
        },
        "speed":{
            "throttle":true,// When throttle is false, mbps is ignored (no throttling). When throttle is true, throttling is enabled.
            "concurrent":1, // Job concurrency.
            "mbps":"12"// Throttling rate. 1 mbps = 1 MB/s.
        }
    },
    "order":{
        "hops":[
            {
                "from":"Reader",
                "to":"Writer"
            }
        ]
    }
}

Writer script parameters

Parameter

Description

Required

Default value

jdbcUrl

JDBC connection URL for the destination database. The jdbcUrl must be included in the connection configuration block.

  • Only one value can be configured per database.

  • The jdbcUrl format must follow ClickHouse official specifications and can include additional parameters. Example: jdbc:clickhouse://127.0.0.1:3306/database.

Yes

None

username

Username for the data source.

Yes

None

password

Password for the specified username.

Yes

None

table

Destination table name, described as a JSON array.

Note

The table parameter must be included in the connection configuration block.

Yes

None

column

Fields in the destination table to write data to. Separate field names with commas. Example: "column": ["id", "name", "age"].

Note

The column parameter must be specified and cannot be empty.

Yes

None

preSql

SQL statement executed before writing data to the destination table.

No

None

postSql

SQL statement executed after writing data to the destination table.

No

None

batchSize

Number of records submitted per batch. A proper value greatly reduces network interactions between the sync system and ClickHouse and improves overall throughput. An excessively large value may cause out-of-memory (OOM) errors.

No

1,024