All Products
Search
Document Center

DataWorks:Create a node to synchronize MaxCompute data with a few clicks

Last Updated:Nov 14, 2024

DataWorks allows you to create a node to synchronize MaxCompute data to Hologres with a few clicks on the DataStudio page. This way, you can query data of MaxCompute tables in an accelerated manner. This topic describes how to create a node to synchronize MaxCompute data to Hologres with a few clicks.

Background information

Before you synchronize MaxCompute data to Hologres with a few clicks, you must create an external table in Hologres. The Hologres external table is used to synchronize data from a source MaxCompute table to a Hologres internal table. The schema of the Hologres external table is the same as that of the source MaxCompute table. You can also use SQL statements to import data from MaxCompute to Hologres. For more information, see Import data from MaxCompute to Hologres by executing SQL statements.

The performance of importing data from MaxCompute to Hologres based on SQL statements is higher than the performance of synchronizing data based on external tables. For more information about how to create a Hologres external table, see Create a node to synchronize schemas of MaxCompute tables with a few clicks.

Create a node to synchronize MaxCompute data to Hologres

  1. Go to the DataStudio page.

    Log on to the DataWorks console. In the left-side navigation pane, choose Data Modeling and Development > DataStudio. On the page that appears, select the desired workspace from the drop-down list and click Go to DataStudio.

  2. Create a workflow.

    If you have an existing workflow, skip this step.

    1. Move the pointer over the 新建 icon and select Create Workflow.

    2. In the Create Workflow dialog box, configure the Workflow Name parameter.

    3. Click Create.

  3. Create a One-click MaxCompute data synchronization node.

    1. Move the pointer over the Create icon icon and choose Create Node > Hologres > One-click MaxCompute data synchronization.

      You can also find the desired workflow, right-click the workflow, and then choose Create Node > Hologres > One-click MaxCompute data synchronization.

    2. In the Create Node dialog box, configure the Name, Engine Instance, Node Type, and Path parameters.

    3. Click Confirm. The configuration tab of the node appears.

  4. Configure the node information.

    On the configuration tab of the node, configure the information about the source MaxCompute table from which you want to synchronize data, the information about the destination table where you want to store the synchronized data, the data synchronization policy, and the SQL statement. One-click MaxCompute data synchronization

    1. Configure the parameters in the MaxCompute Source table selection section.

      The parameters that you configure in this section determine the source MaxCompute table from which you want to synchronize data. In this section, you must configure the information about the Hologres external table that maps to the source MaxCompute table. The following table describes the parameters.

      Parameter

      Description

      Target connection

      The name of the Hologres compute engine instance where the Hologres external table resides.

      Target Library

      The name of the database where the Hologres external table resides in the Hologres compute engine instance.

      External table source

      The source of the Hologres external table. The Hologres external table maps to the source MaxCompute table and is used to synchronize the data of the source MaxCompute table to a Hologres internal table. Valid values:

      • External table already exists: You can select this option if the external table that you want to use already exists. If you select this option, you must specify the schema and name of the external table.

      • New external table: You must select this option if no Hologres external table that maps to the source MaxCompute table exists.

        If you select this option, you must specify the server that is used by the external table, the name of the MaxCompute project to which the source MaxCompute table belongs, and the name of the source MaxCompute table.

        Note

        You can use the odps_server server that is created at the underlying layer of Hologres. For more information, see postgres_fdw.

    2. Configure the parameters in the Target table settings section.

      The parameters that you configure in this section are used to create a Hologres internal table where you want to store the synchronized data.

      Parameter

      Description

      Target schema

      The schema to which the Hologres internal table belongs.

      Destination Table Name

      The name of the Hologres internal table. If the name of the internal table you specify already exists, Hologres processes the existing internal table based on the following policies:

      • Non-partitioned table: Hologres deletes the existing internal table and creates a new internal table with the same name.

      • Partitioned table: Hologres does not delete the existing internal table. Hologres creates partitions in the table based on partition values and synchronizes data to the new partitions.

        Note

        An error is reported if the schema of the created internal table is different from the schema of the existing internal table with the same name.

      Target table description

      The description of the Hologres internal table.

    3. Configure the parameters in the Synchronization settings section.

      The parameters that you configure in this section determine the policy that is used to synchronize MaxCompute data to Hologres.

      Parameter

      Description

      Synchronization field

      The fields in the source MaxCompute table from which you want to synchronize data.

      Partition configuration

      The partitions in the source MaxCompute table from which you want to synchronize data.

      Note

      Hologres allows you to synchronize data only from level-1 partitions in the source MaxCompute table. If the source MaxCompute table contains multiple levels of partitions, you must specify the level-1 partition field of the source MaxCompute table for the destination table. Other partition fields in the source MaxCompute table are mapped to common fields in the destination table.

      Index configuration

      The index for the Hologres internal table that is used to store the synchronized MaxCompute data. You can query data based on the index. For more information about how to create an index, see CREATE TABLE.

    4. Generate an SQL script.

      DataWorks parses the SQL statement that is used to run the current data synchronization node based on the synchronization configurations. You can go to the code editor of Hologres and run the data synchronization node in SQL mode.

      Note
  5. Configure task scheduling properties.

    If you want the system to periodically run a task on the node, you can click Properties in the right-side navigation pane on the configuration tab of the node to configure task scheduling properties based on your business requirements.

  6. Save the node configurations and run the node.

    1. In the top navigation bar of the configuration tab of the node, click the Save icon icon to save the node configurations.

    2. In the top navigation bar of the configuration tab of the node, click the Run icon icon to synchronize the data of the source MaxCompute table.

    If the data synchronization node is created in a workspace in standard mode, you must click Deploy in the top navigation bar to deploy the node to the production environment after you commit the node. For more information, see Deploy nodes.

  7. View the task.

    1. Click Operation Center in the upper-right corner of the configuration tab of the corresponding node to go to Operation Center in the production environment.

    2. View the scheduled task. For more information, see View and manage auto triggered tasks.

    To view more information about the task, click Operation Center in the top navigation bar of the DataStudio page. For more information, see Overview.

What to do next

After the data of the source MaxCompute table is synchronized, you can go to the Workspace Tables page in DataStudio to view the data details. For more information, see Manage tables. You can also log on to the Hologres console and query MaxCompute data by using HoloWeb. For more information, see HoloWeb.