DataWorks allows you to create a node to synchronize MaxCompute data to Hologres with a few clicks on the DataStudio page. This way, you can query data of MaxCompute tables in an accelerated manner. This topic describes how to create a node to synchronize MaxCompute data to Hologres with a few clicks.
Background information
Before you synchronize MaxCompute data to Hologres with a few clicks, you must create an external table in Hologres. The Hologres external table is used to synchronize data from a source MaxCompute table to a Hologres internal table. The schema of the Hologres external table is the same as that of the source MaxCompute table. You can also use SQL statements to import data from MaxCompute to Hologres. For more information, see Import data from MaxCompute to Hologres by executing SQL statements.
The performance of importing data from MaxCompute to Hologres based on SQL statements is higher than the performance of synchronizing data based on external tables. For more information about how to create a Hologres external table, see Create a node to synchronize schemas of MaxCompute tables with a few clicks.
Create a node to synchronize MaxCompute data to Hologres
Go to the DataStudio page.
Log on to the DataWorks console. In the left-side navigation pane, choose Data Modeling and Development > DataStudio. On the page that appears, select the desired workspace from the drop-down list and click Go to DataStudio.
Create a workflow.
If you have an existing workflow, skip this step.
Move the pointer over the icon and select Create Workflow.
In the Create Workflow dialog box, configure the Workflow Name parameter.
Click Create.
Create a One-click MaxCompute data synchronization node.
Move the pointer over the icon and choose .
You can also find the desired workflow, right-click the workflow, and then choose
.In the Create Node dialog box, configure the Name, Engine Instance, Node Type, and Path parameters.
Click Confirm. The configuration tab of the node appears.
Configure the node information.
On the configuration tab of the node, configure the information about the source MaxCompute table from which you want to synchronize data, the information about the destination table where you want to store the synchronized data, the data synchronization policy, and the SQL statement.
Configure the parameters in the MaxCompute Source table selection section.
The parameters that you configure in this section determine the source MaxCompute table from which you want to synchronize data. In this section, you must configure the information about the Hologres external table that maps to the source MaxCompute table. The following table describes the parameters.
Parameter
Description
Target connection
The name of the Hologres compute engine instance where the Hologres external table resides.
Target Library
The name of the database where the Hologres external table resides in the Hologres compute engine instance.
External table source
The source of the Hologres external table. The Hologres external table maps to the source MaxCompute table and is used to synchronize the data of the source MaxCompute table to a Hologres internal table. Valid values:
External table already exists: You can select this option if the external table that you want to use already exists. If you select this option, you must specify the schema and name of the external table.
New external table: You must select this option if no Hologres external table that maps to the source MaxCompute table exists.
If you select this option, you must specify the server that is used by the external table, the name of the MaxCompute project to which the source MaxCompute table belongs, and the name of the source MaxCompute table.
NoteYou can use the
odps_server
server that is created at the underlying layer of Hologres. For more information, see postgres_fdw.
Configure the parameters in the Target table settings section.
The parameters that you configure in this section are used to create a Hologres internal table where you want to store the synchronized data.
Parameter
Description
Target schema
The
schema
to which the Hologres internal table belongs.Destination Table Name
The name of the Hologres internal table. If the name of the internal table you specify already exists, Hologres processes the existing internal table based on the following policies:
Non-partitioned table: Hologres deletes the existing internal table and creates a new internal table with the same name.
Partitioned table: Hologres does not delete the existing internal table. Hologres creates partitions in the table based on partition values and synchronizes data to the new partitions.
NoteAn error is reported if the schema of the created internal table is different from the schema of the existing internal table with the same name.
Target table description
The description of the Hologres internal table.
Configure the parameters in the Synchronization settings section.
The parameters that you configure in this section determine the policy that is used to synchronize MaxCompute data to Hologres.
Parameter
Description
Synchronization field
The fields in the source MaxCompute table from which you want to synchronize data.
Partition configuration
The partitions in the source MaxCompute table from which you want to synchronize data.
NoteHologres allows you to synchronize data only from level-1 partitions in the source MaxCompute table. If the source MaxCompute table contains multiple levels of partitions, you must specify the level-1 partition field of the source MaxCompute table for the destination table. Other partition fields in the source MaxCompute table are mapped to common fields in the destination table.
Index configuration
The index for the Hologres internal table that is used to store the synchronized MaxCompute data. You can query data based on the index. For more information about how to create an index, see CREATE TABLE.
Generate an SQL script.
DataWorks parses the SQL statement that is used to run the current data synchronization node based on the synchronization configurations. You can go to the code editor of Hologres and run the data synchronization node in SQL mode.
NoteYou cannot edit the generated SQL script. If the synchronization configurations of the data synchronization node change, click Refresh to generate a new SQL statement.
For more information about how to run the data synchronization node in SQL mode, see Import data from MaxCompute to Hologres by executing SQL statements.
Configure task scheduling properties.
If you want the system to periodically run a task on the node, you can click Properties in the right-side navigation pane on the configuration tab of the node to configure task scheduling properties based on your business requirements.
Configure basic properties for the task. For more information, see Configure basic properties.
Configure the scheduling cycle, rerun properties, and scheduling dependencies. For more information, see Configure time properties and Configure same-cycle scheduling dependencies.
NoteYou must configure the Rerun and Parent Nodes parameters on the Properties tab before you commit the task.
Configure resource properties. For more information, see Configure the resource property. If you want to access the MySQL data source over the Internet or a VPC, you must use the exclusive resource group for scheduling that is connected to the MySQL data source to run a task on the MySQL node. For more information, see Network connectivity solutions.
Save the node configurations and run the node.
In the top navigation bar of the configuration tab of the node, click the icon to save the node configurations.
In the top navigation bar of the configuration tab of the node, click the icon to synchronize the data of the source MaxCompute table.
If the data synchronization node is created in a workspace in standard mode, you must click Deploy in the top navigation bar to deploy the node to the production environment after you commit the node. For more information, see Deploy nodes.
View the task.
Click Operation Center in the upper-right corner of the configuration tab of the corresponding node to go to Operation Center in the production environment.
View the scheduled task. For more information, see View and manage auto triggered tasks.
To view more information about the task, click Operation Center in the top navigation bar of the DataStudio page. For more information, see Overview.
What to do next
After the data of the source MaxCompute table is synchronized, you can go to the Workspace Tables page in DataStudio to view the data details. For more information, see Manage tables. You can also log on to the Hologres console and query MaxCompute data by using HoloWeb. For more information, see HoloWeb.