All Products
Search
Document Center

DataWorks:Configure DataHub Writer

Last Updated:Dec 09, 2024

DataHub is a platform designed to process streaming data. You can publish and subscribe to streaming data in DataHub and distribute the data to other platforms. DataHub allows you to analyze streaming data and build applications based on the streaming data.

Prerequisites

A reader or conversion node is configured. For more information, see Data source types that support real-time synchronization.

Background information

DataHub Writer writes data to DataHub by using the DataHub SDK for Java. The following code shows the version of the DataHub SDK for Java.

<dependency>
    <groupId>com.aliyun.datahub</groupId>
    <artifactId>aliyun-sdk-datahub</artifactId>
    <version>2.5.1</version>
</dependency>

Procedure

  1. Go to the DataStudio page.

    Log on to the DataWorks console. In the top navigation bar, select the desired region. In the left-side navigation pane, choose Data Development and Governance > Data Development. On the page that appears, select the desired workspace from the drop-down list and click Go to Data Development.

  2. In the Scheduled Workflow pane of the DataStudio page, move the pointer over the 新建 icon and choose Create Node > Data Integration > Real-time Synchronization.

    Alternatively, find the desired workflow in the Scheduled Workflow pane, right-click the workflow name, and then choose Create Node > Data Integration > Real-time Synchronization.

  3. In the Create Node dialog box, set the Sync Method parameter to End-to-end ETL and configure the Name and Path parameters.

    Important

    The node name cannot exceed 128 characters in length and can contain only letters, digits, underscores (_), and periods (.).

  4. Click Confirm.

  5. On the configuration tab of the real-time synchronization node, drag DataHub in the Output section to the canvas on the right and connect the DataHub node to the reader or conversion node.

  6. Click the DataHub node. In the configuration panel that appears, configure the parameters.

    image

    Parameter

    Description

    Data source

    The name of the DataHub data source that you added to DataWorks. You can select only a DataHub data source.

    If no data source is available, click New data source on the right to go to the Data Sources page in Management Center to add a DataHub data source. For more information, see Add a DataHub data source.

    Topic

    The name of the DataHub topic to which you want to write data. You can click Data preview on the right to preview the selected topic.

    Write Mode

    The mode in which you want to write data to the DataHub topic. Valid values:

    Tuple: Data is written to the DataHub topic as structured data. This mode requires that the DataHub be created based on a schema.

    Blob: Data is written to the DataHub topic as unstructured data. This mode requires that the DataHub topic be of the BLOB type, and the data written to the DataHub topic is stored as a chunk of binary data.

    Number of batches

    The number of data records to write at a time.

    Mappings

    The mappings between fields in the source and destination. DataWorks synchronizes data based on the field mappings.

  7. In the top toolbar of the configuration tab of the real-time synchronization node, click the 保存 icon to save the node.