Configure MaxCompute Writer - DataWorks - Alibaba Cloud Documentation Center

MaxCompute (formerly known as ODPS) provides a comprehensive data import solution that supports the fast computing of large amounts of data.

Prerequisites

A reader or conversion node is configured. For more information, see Overview of the real-time synchronization feature.

Background information

Deduplication is not supported for the data that you want to write to MaxCompute. If you reset the offset for your synchronization node or the synchronization node is restarted after a failover, duplicate data may be written to MaxCompute.

Procedure

Go to the DataStudio page.
Log on to the DataWorks console. In the top navigation bar, select the desired region. In the left-side navigation pane, choose Data Development and O&M > Data Development. On the page that appears, select the desired workspace from the drop-down list and click Go to Data Development.
In the Scheduled Workflow pane of the DataStudio page, move the pointer over the icon and choose Create Node > Data Integration > Real-time Synchronization.
Alternatively, find the desired workflow in the Scheduled Workflow pane, right-click the workflow name, and then choose Create Node > Data Integration > Real-time Synchronization.
In the Create Node dialog box, set the Sync Method parameter to End-to-end ETL and configure the Name and Path parameters.
Important
The node name cannot exceed 128 characters in length and can contain only letters, digits, underscores (_), and periods (.).
Click Confirm.
On the configuration tab of the real-time synchronization node, drag MaxCompute in the Output section to the canvas on the right, and connect the MaxCompute node to the configured reader or conversion node.

Click the MaxCompute node. In the panel that appears, configure the parameters.

MaxCompute

Parameter	Description
Data source	The name of the MaxCompute data source that you added to DataWorks. You can select only a MaxCompute data source. If no data source is available, click New data source on the right to go to the Data Sources page in Management Center to add a MaxCompute data source. For more information, see Add a MaxCompute data source.
Tunnel Resource Group	The name of the Tunnel quota group. By default, Common transmission resources is selected, which is a quota that is provided by MaxCompute free of charge. For more information about data transmission resources of MaxCompute, see Purchase and use exclusive resource groups for data transmission service. Note If the exclusive Tunnel quota is unavailable due to overdue payments or expiration, the running task automatically switches from the exclusive Tunnel quota to the free Tunnel quota.
schema	Select the name of the schema that is created in MaxCompute.
Table	The name of the MaxCompute table to which you want to write data. You can click Create Table to create a table, or click Data preview to preview the selected table. Note Before you create a table, connect the MaxCompute node to a reader node and make sure that output fields are specified for the reader node.
Partition Information	The information about the partitioned MaxCompute table.
Partitioning Mode	The mode in which data is written to the destination partitions of the MaxCompute table. Valid values: Automatic Partitioning by Time and Dynamic Partitioning by Field Value. If you select Automatic Partitioning by Time, data is written to the destination partitions of the MaxCompute table based on the value of the _execute_time_ field. For more information, see Fields used for real-time synchronization. If you select Dynamic Partitioning by Field Value, data is dynamically written to the destination partitions of the MaxCompute table based the mappings between fields in the source table and fields in the partitions of the destination MaxCompute table.
Mappings	The field mappings between the source and destination. Click Mappings to configure field mappings. The real-time synchronization node synchronizes data based on the field mappings.

If you want to create a table, click Create Table next to Table. In the Create Table dialog box, configure the parameters. 一键建表

Parameter or section	Description
Table Name	The name of the MaxCompute table to which you want to write data in real time.
Lifecycle	The lifecycle of the MaxCompute table. For more information, see Lifecycle.
Data Field Structure	In this section, you can configure the fields of the MaxCompute table. You can click New field to add a field.
Configure Partition Settings	In this section, you can configure the partition information of the MaxCompute table. Valid values of the Partitioning Mode parameter: Automatic Partitioning by Time and Dynamic Partitioning by Field Value. Automatic Partitioning by Time: Data is written to the destination partitions of the MaxCompute table based on the value of the _execute_time_ field. For more information, see Fields used for real-time synchronization. Important You must configure at least two levels of partitions, which are yearly and monthly partitions. You can configure a maximum of five levels of partitions, which are yearly, monthly, daily, hourly, and minutely partitions. For more information about MaxCompute tables, see Partition. Dynamic Partitioning by Field Value: Data is dynamically written to the destination partitions of the MaxCompute table based on the mappings between fields in the source table and fields in the partitions of the destination MaxCompute table. For example, the values of Field A in the source table are defined as the values of the partition field in the MaxCompute table. If the value of Field A is aa, the data is written to the aa partition of the MaxCompute table. If the value of Field A is bb, the data is written to the bb partition of the MaxCompute table.

In the top toolbar of the configuration tab of the real-time synchronization node, click the icon to save the node.