All Products
Search
Document Center

:Create a node to synchronize MaxCompute data with a few clicks

Last Updated:Dec 16, 2024

DataWorks allows you to create a node to synchronize MaxCompute data to Hologres with a few clicks on the DataStudio page. This way, you can query data of MaxCompute tables in an accelerated manner. This topic describes how to create a node to synchronize MaxCompute data to Hologres with a few clicks.

Background information

Before you synchronize MaxCompute data to Hologres with a few clicks, you must create an external table in Hologres. The Hologres external table is used to synchronize data from a source MaxCompute table to a Hologres internal table. The schema of the Hologres external table is the same as that of the source MaxCompute table. You can also use SQL statements to import data from MaxCompute to Hologres. For more information, see Import data from MaxCompute to Hologres by executing SQL statements.

The performance of importing data from MaxCompute to Hologres based on SQL statements is higher than the performance of synchronizing data based on external tables. For more information about how to create an external table to synchronize MaxCompute data, see Create a node to synchronize schemas of MaxCompute tables with a few clicks.

Note

The operations described in this topic are performed in the China (Shanghai) region. You can perform operations in other regions based on the instructions displayed in the DataWorks console.

Create a node to synchronize MaxCompute data to Hologres

  1. Go to the DataStudio page.

    Log on to the DataWorks console. In the top navigation bar, select the desired region. In the left-side navigation pane, choose Data Development and Governance > Data Development. On the page that appears, select the desired workspace from the drop-down list and click Go to Data Development.

  2. Create a workflow.

    If you have an existing workflow, skip this step.

    1. Move the pointer over the 新建 icon and select Create Workflow.

    2. In the Create Workflow dialog box, configure the Workflow Name parameter.

    3. Click Create.

  3. Create a node to synchronize MaxCompute data to Hologres .

    1. Move the pointer over the 新建 icon and choose Create Node > Hologres > Data Synchronization from MaxCompute.

      You can also find the desired workflow, right-click the workflow name, and then choose Create Node > Hologres > Data Synchronization from MaxCompute.

    2. In the Create Node dialog box, configure the Name, Engine Instance, Node Type, and Path parameters.

    3. Click Confirm. The configuration tab of the node appears.

  4. Configure node information.

    On the configuration tab of the node, configure the information about the source MaxCompute table from which you want to synchronize data, the information about the destination table where you want to store the synchronized data, the data synchronization policy, and the SQL statement.一键导入MaxComputes数据

    1. Configure the parameters in the Settings for Source Table (MaxCompute) section.

      The parameters that you configure in this section determine the source MaxCompute table from which you want to synchronize data. In this section, you must configure the information about the Hologres external table that maps to the source MaxCompute table. The following table describes the parameters.

      Parameter

      Description

      Source of External Table

      The source of the Hologres external table. The Hologres external table maps to the source MaxCompute table and is used to synchronize the data of the source MaxCompute table to a Hologres internal table. Valid values:

      • Existing External Table: You can select this option if the external table that you want to use already exists. If you select this option, you must specify the schema and name of the external table.

      • Create External Table: You must select this option if no Hologres external table that maps to the source MaxCompute table exists.

        If you select this option, you must specify the server that is used by the external table, the name of the MaxCompute project to which the source MaxCompute table belongs, and the name of the source MaxCompute table.

        Note

        You can use the odps_server server that is created in the underlying layer of Hologres. For more information, see postgres_fdw.

    2. Configure the parameters in the Settings for Destination Table (Hologres) section.

      The parameters that you configure in this section are used to create a Hologres internal table where you want to store the synchronized data.

      Parameter

      Description

      Schema

      The schema to which the Hologres internal table belongs.

      Table Name

      The name of the Hologres internal table. If the name of the internal table you specify already exists, Hologres processes the existing internal table based on the following policies:

      • Non-partitioned table: Hologres deletes the existing internal table and creates another internal table with the same name.

      • Partitioned table: Hologres does not delete the existing internal table. Hologres creates partitions in the table based on partition values and synchronizes data to the new partitions.

        Note

        An error is reported if the schema of the created internal table is different from the schema of the existing internal table with the same name.

      Table Description

      The description of the Hologres internal table.

    3. Configure the parameters in the Synchronization Settings section.

      The parameters that you configure in this section determine the policy that is used to synchronize MaxCompute data to Hologres.

      Tab

      Description

      Synchronization Field

      Select the fields in the source MaxCompute table from which you want to synchronize data.

      Partition Configurations

      Select the partitions in the source MaxCompute table from which you want to synchronize data.

      Note

      Hologres allows you to synchronize data only from level-1 partitions in the source MaxCompute table. If the source MaxCompute table contains multiple levels of partitions, you must specify the level-1 partition field of the source MaxCompute table for the destination table. Other partition fields in the source MaxCompute table are mapped to common fields in the destination table.

      Index Configuration

      Configure an index for the Hologres internal table to store the synchronized MaxCompute data. You can query data based on the index. For more information about how to create an index, see CREATE TABLE.

    4. Generate an SQL script.

      DataWorks parses the SQL statement that is used to run the current data synchronization node based on the synchronization configurations. You can go to the code editor of Hologres and run the data synchronization node in SQL mode.

      Note
  5. Configure task scheduling properties.

    If you want the system to periodically run a task on the node, you can click Properties in the right-side navigation pane on the configuration tab of the node to configure task scheduling properties based on your business requirements.

  6. Save and run the node.

    1. In the top navigation bar of the configuration tab of the node, click the 保存 icon to save the node.

    2. In the top navigation bar of the configuration tab of the node, click the 运行 icon to run the node.

    If the data synchronization node is created in a workspace in standard mode, you must click Deploy in the top navigation bar to deploy the node to the production environment after you commit the node. For more information, see Deploy nodes.

  7. View the task.

    1. Click Operation Center in the upper-right corner of the configuration tab of the corresponding node to go to Operation Center in the production environment.

    2. View the scheduled task. For more information, see View and manage auto triggered tasks.

    To view more information about the task, click Operation Center in the top navigation bar of the DataStudio page. For more information, see Overview.

What to do next

After the data of the source MaxCompute table is synchronized, you can go to the tab management page to view the data details. For more information, see Manage tables. You can also log on to the Hologres console and query MaxCompute data by using HoloWeb. For more information, see HoloWeb.