All Products
Search
Document Center

DataWorks:Change data source from MySQL to PolarDB via API

Last Updated:Mar 02, 2026

Use the DataWorks SDK for Java to programmatically change the reader source of an offline data synchronization task from MySQL to PolarDB. The task destination remains MaxCompute.

When migrating databases from MySQL to PolarDB, existing data synchronization tasks in DataWorks still point to the original MySQL data source. Instead of manually reconfiguring each task in the console, use the DataWorks API to update task configurations.

Limitations

This method only supports changing the source from a MySQL data source to a PolarDB data source. Other data source type conversions are not supported.

Warning

Modifying the JSON configuration file through the API carries risks. Do not reuse the modified configuration file for other business purposes, and do not use this approach to modify other configuration files. Incorrect modifications may cause data synchronization task failures and data quality issues.

Prerequisites

Before you begin, make sure that you have:

  • Java Development Kit (JDK) installed

  • A Maven-based Java project

  • An Alibaba Cloud account with an AccessKey ID and AccessKey secret

  • A DataWorks workspace with the target data synchronization task already created

  • The PolarDB data source already registered in your DataWorks workspace

  • Sufficient permissions to call DataWorks API operations (ListFiles, GetFile, UpdateDISyncTask, SubmitFile, DeployFile, ListNodes, RunCycleDagNodes)

API call sequence

The following diagram shows the API call sequence:

ListFiles          Find the sync task file by path and name
    |
GetFile            Retrieve the current JSON configuration
    |
modifyContent      Replace MySQL reader config with PolarDB (local logic)
    |
UpdateDISyncTask   Save the updated configuration to DataWorks
    |
SubmitFile         Submit the file for deployment review
    |
GetDeployment      Check submission status
    |
DeployFile         Deploy the submitted file to production
    |
GetDeployment      Confirm deployment status
    |
ListNodes          Look up the node ID by node name
    |
RunCycleDagNodes   Trigger a retroactive run of the updated task

Each step is required because DataWorks separates the development lifecycle into distinct phases: edit, submit, deploy, and run. Skipping any step causes the change to remain in draft state or not take effect in production.

Add Maven dependencies

Add the following dependencies to your pom.xml file:

<dependency>
    <groupId>com.aliyun</groupId>
    <artifactId>aliyun-java-sdk-core</artifactId>
    <version>4.5.20</version>
</dependency>
<dependency>
    <groupId>com.aliyun</groupId>
    <artifactId>aliyun-java-sdk-dataworks-public</artifactId>
    <version>3.4.4</version>
</dependency>

Configuration changes

The modifyContent method finds the reader step (where category is reader and stepType is mysql) and updates three fields:

Field

Location

Before (MySQL)

After (PolarDB)

stepType

steps[].stepType

mysql

polardb

datasource

steps[].parameter.datasource

mysql_from_polardb

Name of your PolarDB data source

datasource

steps[].parameter.connection[].datasource

mysql_from_polardb

Name of your PolarDB data source

All other fields remain unchanged, including columns, table names, writer settings, and speed settings.

Before: MySQL reader configuration

{
    "stepType": "mysql",
    "parameter": {
        "envType": 0,
        "datasource": "mysql_from_polardb",
        "column": [
            "id",
            "name",
            "create_time",
            "create_user"
        ],
        "tableComment": "Test",
        "connection": [
            {
                "selectedDatabase": "polardb_db1",
                "datasource": "mysql_from_polardb",
                "table": [
                    "lcl_test_demo"
                ]
            }
        ],
        "where": "",
        "splitPk": "id",
        "encoding": "UTF-8"
    },
    "name": "Reader",
    "category": "reader"
}

After: PolarDB reader configuration

{
    "stepType": "polardb",
    "parameter": {
        "envType": 0,
        "datasource": "polardb",
        "column": [
            "id",
            "name",
            "create_time",
            "create_user"
        ],
        "tableComment": "Test",
        "connection": [
            {
                "selectedDatabase": "polardb_db1",
                "datasource": "polardb",
                "table": [
                    "lcl_test_demo"
                ]
            }
        ],
        "where": "",
        "splitPk": "id",
        "encoding": "UTF-8"
    },
    "name": "Reader",
    "category": "reader"
}
The writer step (stepType: odps, datasource: odps_source) remains unchanged. Only the reader step is modified.

Complete Java sample code

The following code demonstrates the full workflow: locate the task, retrieve and modify its configuration, deploy the changes, and trigger a run.

Replace the following placeholder values before running:

Placeholder

Description

akId, akSecret

Your AccessKey ID and AccessKey secret. Store credentials securely and avoid hardcoding them in production code.

regionId

The region of your DataWorks workspace (for example, cn-chengdu).

setProjectId

The ID of your DataWorks workspace. The sample uses 1911L as an example.

folderPath

The folder path of your data synchronization task in DataWorks.

filename

The name of the data synchronization task file.

Second and third arguments of modifyContent

The new step type (polardb) and the name of your PolarDB data source as registered in DataWorks.

API operations reference

API operation

Purpose

Key parameters

ListFiles

Find the sync task file by folder path and keyword

projectId, fileFolderPath, keyword

GetFile

Retrieve the JSON configuration of a task file

projectId, fileId

UpdateDISyncTask

Save updated task configuration

projectId, fileId, taskContent, taskType (DI_OFFLINE)

SubmitFile

Submit the file for deployment review

projectId, fileId

GetDeployment

Check submission or deployment status

projectId, deploymentId

DeployFile

Deploy the submitted file to production

projectId, fileId

ListNodes

Find the production node ID by name

projectId, nodeName, projectEnv (PROD)

RunCycleDagNodes

Trigger a retroactive run

includeNodeIds, name, parallelism, projectEnv, rootNodeId, startBizDate, endBizDate

Verify the result

After running the code:

  1. Check deployment status. The getDeployment output should show a success status. If the status indicates failure, check whether the PolarDB data source is correctly registered in your DataWorks workspace.

  2. Verify in the DataWorks console. Open the data synchronization task in Data Development and confirm that the reader source now shows PolarDB instead of MySQL.

  3. Check the retroactive run. In Operation Center, find the task instance triggered by RunCycleDagNodes and verify that it completed successfully with data written to the MaxCompute destination table.

Troubleshooting

ListFiles returns empty results

The folder path or filename does not match any file in the specified workspace. Double-check the folderPath value (for example, Business Flow/Test Workflow/Data Integration/) and the filename.

UpdateDISyncTask fails

Possible causes:

  • The taskContent JSON is malformed. Validate the JSON before submitting.

  • The PolarDB data source name specified in modifyContent does not match a registered data source in the workspace.

  • The taskType is not set to DI_OFFLINE.

Deployment status shows failure

  • The submitted file may have validation errors. Open the task in the DataWorks console and check for error messages.

  • Make sure the PolarDB data source connectivity is working before deploying.

RunCycleDagNodes does not run

  • The startBizDate and endBizDate values must use the format yyyy-MM-dd HH:mm:ss.

  • Verify that the node exists in the PROD environment by checking the ListNodes response.

Usage notes

  • The projectId (1911L in the sample) must match your DataWorks workspace ID. Find this value in the DataWorks console on the Workspace Details page.

  • The Thread.sleep(10000) calls (10 seconds) are simple wait intervals. In production, poll GetDeployment in a loop until the status indicates completion or failure, rather than relying on fixed wait times.

  • Store your AccessKey ID and AccessKey secret securely. Do not hardcode credentials in source files. Use environment variables or a credentials provider instead.

  • The endpoint follows the pattern dataworks.{regionId}.aliyuncs.com. Replace {regionId} with the region of your workspace.