All Products
Search
Document Center

DataWorks:Configure MaxCompute Writer

Last Updated:Nov 20, 2024

MaxCompute (formerly known as ODPS) provides a comprehensive data import solution that supports the fast computing of large amounts of data.

Prerequisites

A reader or conversion node is configured. For more information, see Overview of the real-time synchronization feature.

Background information

Deduplication is not supported for the data that you want to write to MaxCompute. If you reset the offset for your synchronization node or the synchronization node is restarted after a failover, duplicate data may be written to MaxCompute.

Procedure

  1. Go to the DataStudio page.

    Log on to the DataWorks console. In the top navigation bar, select the desired region. In the left-side navigation pane, choose Data Development and Governance > Data Development. On the page that appears, select the desired workspace from the drop-down list and click Go to Data Development.

  2. In the Scheduled Workflow pane of the DataStudio page, move the pointer over the 新建 icon and choose Create Node > Data Integration > Real-time Synchronization.

    Alternatively, find the desired workflow in the Scheduled Workflow pane, right-click the workflow name, and then choose Create Node > Data Integration > Real-time Synchronization.

  3. In the Create Node dialog box, set the Sync Method parameter to End-to-end ETL, enter a name in the Name field, and configure the Path parameter.

    Important

    The node name cannot exceed 128 characters in length and can contain only letters, digits, underscores (_), and periods (.).

  4. Click Confirm.

  5. On the configuration tab of the real-time synchronization node, drag MaxCompute in the Output section to the canvas on the right, and connect the MaxCompute node to the configured reader or conversion node.

  6. Click the MaxCompute node. In the panel that appears, configure the parameters.

    MaxCompute

    Parameter

    Description

    Data source

    The name of the MaxCompute data source that you added to DataWorks. You can select only a MaxCompute data source.

    If no data source is available, click New data source on the right to go to the Data Sources page in Management Center to add a MaxCompute data source. For more information, see Add a MaxCompute data source.

    Table

    The name of the MaxCompute table to which you want to write data.

    You can click Create Table to create a table, or click Data preview to preview the selected table.

    Important

    Before you create a table, connect the MaxCompute node to a reader node and make sure that output fields are specified for the reader node.

    Partitioning Mode

    The mode in which data is written to the destination partitions of the MaxCompute table. Valid values: Automatic Partitioning by Time and Dynamic Partitioning by Field Value. If you select Automatic Partitioning by Time, data is written to the destination partitions of the MaxCompute table based on the value of the _execute_time_ field. For more information, see Fields used for real-time synchronization. If you select Dynamic Partitioning by Field Value, data is dynamically written to the destination partitions of the MaxCompute table based the mappings between fields in the source table and fields in the partitions of the destination MaxCompute table.

    Partition Information

    The information about the partitioned MaxCompute table.

    Mappings

    The field mappings between the source and destination. Click Mappings to configure field mappings. The real-time synchronization node synchronizes data based on the field mappings.

    If you want to create a table, click Create Table next to Table. In the Create Table dialog box, configure the parameters.一键建表

    Parameter or section

    Description

    Table Name

    The name of the MaxCompute table to which you want to write data in real time.

    Lifecycle

    The lifecycle of the MaxCompute table. For more information, see Lifecycle.

    Data Field Structure

    In this section, you can configure the fields of the MaxCompute table. You can click Add field to add a field.

    Configure Partition Settings

    In this section, you can configure the partition information of the MaxCompute table. Valid values of the Partitioning Mode parameter: Automatic Partitioning by Time and Dynamic Partitioning by Field Value.

    • Automatic Partitioning by Time: Data is written to the destination partitions of the MaxCompute table based on the value of the _execute_time_ field. For more information, see Fields used for real-time synchronization.时间自动分区

      Important
      • You must configure at least two levels of partitions, which are yearly and monthly partitions. You can configure a maximum of five levels of partitions, which are yearly, monthly, daily, hourly, and minutely partitions.

      • For more information about MaxCompute tables, see Partition.

    • Dynamic Partitioning by Field Value: Data is dynamically written to the destination partitions of the MaxCompute table based on the mappings between fields in the source table and fields in the partitions of the destination MaxCompute table. 根据字段内容动态分区For example, the values of Field A in the source table are defined as the values of the partition field in the MaxCompute table. If the value of Field A is aa, the data is written to the aa partition of the MaxCompute table. If the value of Field A is bb, the data is written to the bb partition of the MaxCompute table.

  7. In the top toolbar of the configuration tab of the real-time synchronization node, click the 保存 icon to save the node.