Create a node to synchronize MaxCompute data with a few clicks - DataWorks

DataWorks allows you to create a node to synchronize MaxCompute data to Hologres with a few clicks on the DataStudio page. This way, you can query data of MaxCompute tables in an accelerated manner. This topic describes how to create a node to synchronize MaxCompute data to Hologres with a few clicks.

Background information

Before you synchronize MaxCompute data to Hologres with a few clicks, you must create an external table in Hologres. The Hologres external table is used to synchronize data from a source MaxCompute table to a Hologres internal table. The schema of the Hologres external table is the same as that of the source MaxCompute table. You can also use SQL statements to import data from MaxCompute to Hologres. For more information, see Import data from MaxCompute to Hologres by executing SQL statements.

The performance of importing data from MaxCompute to Hologres based on SQL statements is higher than the performance of synchronizing data based on external tables. For more information about how to create an external table to synchronize MaxCompute data, see Create a node to synchronize schemas of MaxCompute tables with a few clicks.

Note

The operations described in this topic are performed in the China (Shanghai) region. You can perform operations in other regions based on the instructions displayed in the DataWorks console.

Create a node to synchronize MaxCompute data to Hologres

Go to the DataStudio page.
Log on to the DataWorks console. In the top navigation bar, select the desired region. In the left-side navigation pane, choose Data Development and Governance > Data Development. On the page that appears, select the desired workspace from the drop-down list and click Go to Data Development.
Create a workflow.
If you have an existing workflow, skip this step.
1. Move the pointer over the icon and select Create Workflow.
2. In the Create Workflow dialog box, configure the Workflow Name parameter.
3. Click Create.
Create a node to synchronize MaxCompute data to Hologres .
1. Move the pointer over the icon and choose Create Node > Hologres > Data Synchronization from MaxCompute.
  You can also find the desired workflow, right-click the workflow name, and then choose Create Node > Hologres > Data Synchronization from MaxCompute.
2. In the Create Node dialog box, configure the Name, Engine Instance, Node Type, and Path parameters.
3. Click Confirm. The configuration tab of the node appears.

Configure node information.

On the configuration tab of the node, configure the information about the source MaxCompute table from which you want to synchronize data, the information about the destination table where you want to store the synchronized data, the data synchronization policy, and the SQL statement. 一键导入MaxComputes数据

Configure the parameters in the Settings for Source Table (MaxCompute) section.

The parameters that you configure in this section determine the source MaxCompute table from which you want to synchronize data. In this section, you must configure the information about the Hologres external table that maps to the source MaxCompute table. The following table describes the parameters.

Parameter

Description

Source of External Table

The source of the Hologres external table. The Hologres external table maps to the source MaxCompute table and is used to synchronize the data of the source MaxCompute table to a Hologres internal table. Valid values:

Existing External Table: You can select this option if the external table that you want to use already exists. If you select this option, you must specify the schema and name of the external table.
Create External Table: You must select this option if no Hologres external table that maps to the source MaxCompute table exists.
If you select this option, you must specify the server that is used by the external table, the name of the MaxCompute project to which the source MaxCompute table belongs, and the name of the source MaxCompute table.
Note
You can use the odps_server server that is created in the underlying layer of Hologres. For more information, see postgres_fdw.

Configure the parameters in the Settings for Destination Table (Hologres) section.

The parameters that you configure in this section are used to create a Hologres internal table where you want to store the synchronized data.

Parameter	Description
Schema	The `schema` to which the Hologres internal table belongs.
Table Name	The name of the Hologres internal table. If the name of the internal table you specify already exists, Hologres processes the existing internal table based on the following policies: Non-partitioned table: Hologres deletes the existing internal table and creates another internal table with the same name. Partitioned table: Hologres does not delete the existing internal table. Hologres creates partitions in the table based on partition values and synchronizes data to the new partitions. Note An error is reported if the schema of the created internal table is different from the schema of the existing internal table with the same name.
Table Description	The description of the Hologres internal table.

Configure the parameters in the Synchronization Settings section.

The parameters that you configure in this section determine the policy that is used to synchronize MaxCompute data to Hologres.

Tab	Description
Synchronization Field	Select the fields in the source MaxCompute table from which you want to synchronize data.
Partition Configurations	Select the partitions in the source MaxCompute table from which you want to synchronize data. Note Hologres allows you to synchronize data only from level-1 partitions in the source MaxCompute table. If the source MaxCompute table contains multiple levels of partitions, you must specify the level-1 partition field of the source MaxCompute table for the destination table. Other partition fields in the source MaxCompute table are mapped to common fields in the destination table.
Index Configuration	Configure an index for the Hologres internal table to store the synchronized MaxCompute data. You can query data based on the index. For more information about how to create an index, see CREATE TABLE.

Generate an SQL script.
DataWorks parses the SQL statement that is used to run the current data synchronization node based on the synchronization configurations. You can go to the code editor of Hologres and run the data synchronization node in SQL mode.
Note
- You cannot edit the generated SQL script. If the synchronization configurations of the data synchronization node change, click Refresh to generate a new SQL statement.
- For more information about how to run a data synchronization node in SQL mode, see Import data from MaxCompute to Hologres by executing SQL statements.

Configure task scheduling properties.
If you want the system to periodically run a task on the node, you can click Properties in the right-side navigation pane on the configuration tab of the node to configure task scheduling properties based on your business requirements.
- Configure basic properties for the task. For more information, see Configure basic properties.
- Configure the scheduling cycle, rerun properties, and scheduling dependencies. For more information, see Configure time properties and Configure same-cycle scheduling dependencies.
  Note
  You must configure the Rerun and Parent Nodes parameters on the Properties tab before you commit the task.
- Configure resource properties. For more information, see Configure the resource property. If you want to access the MySQL data source over the Internet or a VPC, you must use the exclusive resource group for scheduling that is connected to the MySQL data source to run a task on the MySQL node. For more information, see Network connectivity solutions.
Save and run the node.
1. In the top navigation bar of the configuration tab of the node, click the icon to save the node.
2. In the top navigation bar of the configuration tab of the node, click the icon to run the node.
If the data synchronization node is created in a workspace in standard mode, you must click Deploy in the top navigation bar to deploy the node to the production environment after you commit the node. For more information, see Deploy nodes.
View the task.
1. Click Operation Center in the upper-right corner of the configuration tab of the corresponding node to go to Operation Center in the production environment.
2. View the scheduled task. For more information, see View and manage auto triggered tasks.
To view more information about the task, click Operation Center in the top navigation bar of the DataStudio page. For more information, see Overview.

What to do next

After the data of the source MaxCompute table is synchronized, you can go to the tab management page to view the data details. For more information, see Manage tables. You can also log on to the Hologres console and query MaxCompute data by using HoloWeb. For more information, see HoloWeb.