All Products
Search
Document Center

DataWorks:Supported data sources and synchronization solutions

Last Updated:Mar 10, 2026

DataWorks Data Integration synchronizes data among various data sources, including MySQL, MaxCompute, Hologres, and Kafka. It provides batch synchronization for T+1 Extract, Transform, and Load (ETL) jobs, Real-time Data Synchronization for data replication with second-level latency, and Whole-database Migration.

Synchronization solutions

Type

Source granularity

Target granularity

Timeliness

Synchronization scenario

Single-table batch

Single table

Single table/partition

T+1 or periodic

Periodic full or incremental synchronization

Sharding batch

Multiple tables with identical schema

Single table/partition

T+1 or periodic

Periodic full or incremental synchronization

Single-table real-time

Single table

Single table/partition

Second-to-minute latency

Change Data Capture (CDC)

Whole-database batch

Whole database or multiple tables

Matching tables and partitions

One-time or periodic

One-time or periodic full/incremental synchronization. Supports an initial full synchronization followed by periodic incremental updates.

Whole-database real-time

Whole database or multiple tables

Matching tables and partitions

Second-to-minute latency

Full synchronization + Change Data Capture (CDC)

Whole-database full and incremental

Whole database or multiple tables

Matching tables and partitions

Initial full synchronization: Batch processing

Subsequent incremental synchronization: T+1

Full synchronization + Change Data Capture (CDC)

Synchronization approaches

When choosing a data synchronization approach, consider these two core questions:

  1. Latency requirements: How often do you need to synchronize data? Is a daily update (batch) sufficient, or do you need real-time updates in seconds or minutes (real-time)?

  2. Scale and complexity: How many tables do you need to synchronize? Is the processing logic for these tables consistent (single table vs. whole database)?

Based on these factors, we recommend two main categories of synchronization approaches: batch synchronization and real-time synchronization.

1. Batch synchronization (T+1/periodic)

A batch approach is for scenarios that are not time-sensitive (for example, T+1) and require periodic batch processing.

Important

Prerequisite: To implement incremental batch synchronization, the source table must contain a field to track incremental changes, such as a timestamp like gmt_modified or an auto-incrementing ID. If no such field exists, you must perform periodic full synchronization.

1.1. Single-table batch

Use this approach when you need to perform fine-grained processing on a small number of core, heterogeneous data tables.

  • Key advantage: Flexible processing logic.

    • Fine-grained transformation: Supports complex field mapping, data filtering, constant value assignment, function-based transformations, and even AI-assisted processing.

    • Heterogeneous source integration: The optimal choice for handling non-standard data sources such as APIs and log files.

  • Key limitation: High cost at scale.

    • High configuration overhead: Synchronizing a large number of tables requires significant effort to configure and maintain each task individually.

    • High resource consumption: Each task is scheduled independently. The resource consumption of 100 single-table tasks greatly exceeds that of one whole-database task.

Recommended approach for Single-table batch: Batch synchronization task for a single table
1.2. Whole-database batch

Use this approach when you need to efficiently "move" a large number of homogeneous data tables from one location to another.

  • Key advantage: High O&M efficiency and low cost.

    • High efficiency: Configure hundreds of tables in a single operation with automatic object matching, dramatically improving development efficiency.

    • Cost-effectiveness: Resources are scheduled and optimized holistically, resulting in very low costs. For example, the resource consumption of one whole-database task might be 2 CUs, compared to 100 CUs for 100 single-table tasks.

    • Typical use cases: Building an Operational Data Store (ODS) layer for a data warehouse, creating periodic database backups, and migrating data to the cloud.

  • Key limitation: Limited processing logic.

    • This approach is primarily for data replication and does not support complex transformation logic for individual tables.

Recommended approach for Whole-database batch: Batch whole-database synchronization task.

2. Real-time synchronization (second- to minute-level)

A real-time approach is for scenarios that require capturing real-time data changes (inserts, updates, and deletes) from the source to support real-time analytics and business responses.

Important

Prerequisite: The source must support Change Data Capture (CDC) or be a message queue itself. For example, MySQL must have binary logging enabled, or the source could be a Kafka instance.

Single-table or whole-database real-time

The decision logic is similar to the batch approach:

  • Single-table real-time: Suitable for use cases that involve complex processing of real-time change streams from a single, core table.

  • Whole-database real-time: The mainstream choice for building a real-time data warehouse, implementing real-time database disaster recovery, and integrating with real-time data lakes. It also offers significant advantages in efficiency and cost-effectiveness.

Recommended real-time approaches: Real-time synchronization task for a single table, Real-time whole-database synchronization task

3. Real-time CDC to append-only targets

Important

Context: Real-time synchronization captures CDC data, which includes Insert, Update, and Delete operations. For Append-Only storage systems like non-Delta tables in MaxCompute that do not natively support physical Update or Delete operations, writing the CDC stream directly can lead to inconsistent data states. For example, the target table will not reflect a delete operation.

  • DataWorks solution: Base + Log pattern

    • This solution is implemented as a whole-database full and incremental task. It works by creating a Base table (for full snapshots) and a Log table (for incremental logs) at the target.

    • How it works: The CDC data stream is written to the Log table in real time. Then, on a T+1 schedule, the system automatically runs a task to Merge the changes from the Log table into the Base table, generating an updated full snapshot. With this approach, data is written to the Log table in minutes, while the final, merged state is visible only after the T+1 task completes. This balances real-time data capture with the eventual consistency required by a batch data warehouse.

Recommended approach: Whole-database full and incremental (near real-time) task.

Data source read/write capabilities

Data source

Single-table batch

Single-table real-time

Whole-database batch

Whole-database real-time

Whole-database full and incremental

Public dataset

Read

-

-

-

-

Amazon S3

Read/Write

-

-

-

-

Amazon Redshift

Read/Write

-

-

-

-

AnalyticDB for MySQL 2.0

Read/Write

-

-

-

-

AnalyticDB for MySQL 3.0

Read/Write

Write

Read

Write

-

AnalyticDB for PostgreSQL

Read/Write

-

Read

-

-

ApsaraDB for OceanBase

Read/Write

Write

-

Read/Write

-

Azure Blob Storage

Read

-

-

-

-

BigQuery

Read

-

-

-

-

ClickHouse

Read/Write

-

Read

-

-

COS

Read

-

-

-

-

Databricks

Read

-

-

-

-

DataHub

Read/Write

Read/Write

-

Write

-

Data Lake Formation

Read/Write

Write

Write

Write

-

DB2

Read/Write

-

Read

-

-

Doris

Read/Write

Write

Read

-

-

DM (Dameng)

Read/Write

-

Read

-

-

DRDS (PolarDB-X 1.0)

Read/Write

-

Read

-

-

Elasticsearch

Read/Write

Write

Write

Write

-

FTP

Read/Write

-

-

-

-

GBase8a

Read/Write

-

-

-

-

HBase

HBase Read/Write

HBase 20xsql Read

HBase 11xsql Write

-

-

-

-

HDFS

Read/Write

-

-

-

-

Hive

Read/Write

-

Read/Write

-

-

Hologres

Read/Write

Read/Write

Read/Write

Write

-

HttpFile

Read

-

-

-

-

Kafka

Read/Write

Read/Write

-

Write

-

KingbaseES (Renda Jingcang)

Read/Write

-

-

-

-

Lindorm

Read/Write

Write

-

Write

-

Simple Log Service (SLS)

Read/Write

Read

-

-

-

MaxCompute

Read/Write

Write

Write

Write

Write

MariaDB

Read/Write

-

-

-

-

MaxGraph

Write

-

-

-

-

ApsaraDB for Memcache

Write

-

-

-

-

MetaQ

Read

-

-

-

-

Milvus

Read/Write

-

-

-

-

MongoDB

Read/Write

-

-

Read

-

MySQL

Read/Write

Read

Read

Read

Read

OpenSearch

Write

-

-

-

-

Oracle

Read/Write

Read

Read

Read

Read

Object Storage Service (OSS)

Read/Write

-

Write

Write

-

OSS-HDFS

Read/Write

-

Write

Write

-

PolarDB

Read/Write

Read

Read

Read

Read

PolarDB-X 2.0

Read/Write

-

Read

Read

-

PostgreSQL

Read/Write

-

Read

Read

-

Redis

Write

-

-

-

-

RestAPI

Read/Write

-

-

-

-

Salesforce

Read/Write

-

-

-

-

SAP HANA

Read/Write

-

-

-

-

Sensors Data (Shen Ce)

Write

-

-

-

-

Snowflake

Read/Write

-

-

-

-

StarRocks

Read/Write

Write

Write

Write

-

SQL Server

Read/Write

-

Read

-

-

Tablestore

Read/Write

Write

-

-

-

TiDB

Read/Write

-

-

-

-

TSDB

Write

-

-

-

-

Vertica

Read/Write

-

-

-

-

TOS

Read

-

-

-

-

Related documents

This guide lists essential Data Integration documents to help you get started.