All Products
Search
Document Center

Hologres:Use DataWorks to periodically import MaxCompute data

Last Updated:Apr 30, 2024

When you create a DataWorks scheduling task, you can specify the time and frequency for data transmission based on your business requirements. This helps ensure the data integrity and accuracy during data transmission and import. This topic describes how to use DataWorks to import data from a MaxCompute partitioned table to a Hologres partitioned table. This helps combine the benefits of the two platforms and improve the efficiency and reliability of data processing.

Prerequisites

Usage notes

You must make sure that the schema feature is not enabled at the tenant level or project level in MaxCompute. For more information about schemas, see Schema-related operations.

Procedure

  1. Prepare MaxCompute data.

    1. Log on to the MaxCompute console.

    2. In the left-side navigation pane, click Data Analytics.sql查询

    3. On the Data Analysis page, click SQL Query.

    4. On the SQL Query page, enter the following SQL statement to create a partitioned table, and click Run.

      In this example, a MaxCompute partitioned table named dwd_product_movie_basic_info is created in the MaxCompute public dataset named public_data.

      -- The DDL statement used to create a MaxCompute partitioned table.
      CREATE TABLE IF NOT EXISTS public_data.dwd_product_movie_basic_info(
        movie_name STRING COMMENT 'Movie name',
        director STRING COMMENT 'Director',
        scriptwriter STRING COMMENT 'Scriptwriter',
        area STRING COMMENT 'Country/Region',
        actors STRING COMMENT 'Actors',
        `type` STRING COMMENT 'Type',
        movie_length STRING COMMENT 'Movie length',
        movie_date STRING COMMENT 'Release date',
        movie_language STRING COMMENT 'Movie language',
        imdb_url STRING COMMENT 'IMDb URL'
      ) 
      PARTITIONED BY (ds STRING) STORED AS ALIORC;
    5. On the SQL Query page, enter the required SQL statement and click the Run icon to query the imported data.

      For example, you can execute the following statement to query the data imported to the 20170112 partition:

      SELECT * FROM public_data.dwd_product_movie_basic_info WHERE ds = '20170112';
  2. Create a foreign table in Hologres.

    Create a foreign table in Hologres to map the MaxCompute table. The sequence and data types of fields in the foreign table must be the same as those in the MaxCompute table. Fields in the foreign table must have a one-to-one mapping with fields in the MaxCompute table.

    1. Log on to the Hologres console, go to HoloWeb, and then create an SQL query task. For more information, see Connect to HoloWeb.

    2. On the Ad-hoc Query tab, select the instance to which the database belongs from the Instance drop-down list and select the database from the Database drop-down list. Enter the following statements in the SQL editor and click Run.

      In this example, the IMPORT FOREIGN SCHEMA statement is used to create a foreign table named dwd_product_movie_basic_info in Hologres.

      import foreign schema public_data limit to (dwd_product_movie_basic_info) from server odps_server into public options(if_table_exist 'update');
  3. Create an internal table in Hologres to store data.

    Create an internal table that is used to receive and store the imported data.

    1. On the SQL Editor page, click the New SQL Query icon in the left-side navigation pane.

    2. On the Ad-hoc Query tab, select the instance to which the database belongs from the Instance drop-down list and select the database from the Database drop-down list. Enter the following statements in the SQL editor and click Run.

      In this example, data is imported from a MaxCompute partitioned table to Hologres. Therefore, the internal table to be created must be a partitioned table.

      Note

      The following statements are used only in this example. You can edit the statements based on your business requirements and create indexes for the internal table to improve query efficiency.

      BEGIN;
      CREATE TABLE "public"."holo_dwd_product_movie_basic_info" (
       "movie_name" text,
       "director" text,
       "scriptwriter" text,
       "area" text,
       "actors" text,
       "type" text,
       "movie_length" text,
       "movie_date" text,
       "movie_language" text,
       "imdb_url" text,
       "ds" text
      )
      PARTITION BY LIST (ds);
      CALL SET_TABLE_PROPERTY('"public"."holo_dwd_product_movie_basic_info"', 'orientation', 'column');
      comment on column "public"."holo_dwd_product_movie_basic_info"."movie_name" is 'Movie name';
      comment on column "public"."holo_dwd_product_movie_basic_info"."director" is 'Director';
      comment on column "public"."holo_dwd_product_movie_basic_info"."scriptwriter" is 'Scriptwriter';
      comment on column "public"."holo_dwd_product_movie_basic_info"."area" is 'Country/Region';
      comment on column "public"."holo_dwd_product_movie_basic_info"."actors" is 'Actor';
      comment on column "public"."holo_dwd_product_movie_basic_info"."type" is 'Type';
      comment on column "public"."holo_dwd_product_movie_basic_info"."movie_length" is 'Movie length';
      comment on column "public"."holo_dwd_product_movie_basic_info"."movie_date" is 'Release date';
      comment on column "public"."holo_dwd_product_movie_basic_info"."movie_language" is 'Movie language';
      comment on column "public"."holo_dwd_product_movie_basic_info"."imdb_url" is 'IMDb URL';
      COMMIT;
  4. Create a child partitioned table to perform data development.

    In this step, a Hologres SQL node is created to schedule data import.

    1. Log on to the DataWorks console, go to the DataStudio page, and then create a Hologres SQL node. For more information, see Create a Hologres SQL node.

    2. Enter one of the following sets of sample statements in the SQL editor:

      Hologres does not allow you to directly import data from a MaxCompute partitioned table to a parent partitioned table in Hologres. Therefore, you must create a child partitioned table in Hologres that has the same partition key value with a specific partition in the MaxCompute table. This way, you can import data from the partition to the child partitioned table. The partition key value is specified by the ${bizdate} parameter. The value of this parameter is automatically assigned by the scheduling system of DataWorks in periodic scheduling. For more information about scheduling parameters, see Supported formats of scheduling parameters.

      Note

      The key value of the partition from which data is imported must be the same as the partition key value of the child partitioned table. Otherwise, an error message appears. In the following examples, the ds partition key is used.

      You can use different methods to import data in different scenarios. The following scenarios are for your reference. You can select one based on your business requirements.

      • Scenario 1: Import incremental data from a partition to a Hologres temporary table

        -- Create a temporary child partitioned table.
        BEGIN;
        CREATE TABLE IF NOT EXISTS "public".tmp_holo_dwd_product_movie_basic_info_${bizdate}  (
         "movie_name" text,
         "director" text,
         "scriptwriter" text,
         "area" text,
         "actors" text,
         "type" text,
         "movie_length" text,
         "movie_date" text,
         "movie_language" text,
         "imdb_url" text,
         "ds" text
        );
        COMMIT;
        
        -- Update the data in the foreign table that maps the MaxCompute table.
        import foreign schema public_data limit to (dwd_product_movie_basic_info) from server odps_server into public options(if_table_exist 'update');
        
        -- Wait 30 seconds before you import data to Hologres. Otherwise, the import operation may fail because of the latency caused by the update of Hologres metadata.
        select pg_sleep(30); 
        
        -- Import data from the partition to the temporary child partitioned table.
        INSERT INTO "public".tmp_holo_dwd_product_movie_basic_info_${bizdate} 
        SELECT 
            "movie_name",
            "director",
            "scriptwriter",
            "area",
            "actors",
            "type",
            "movie_length",
            "movie_date",
            "movie_language",
            "imdb_url",
            "ds"
        FROM "public".dwd_product_movie_basic_info
        WHERE ds='${bizdate}';
        
        -- Import incremental data from the partition.
        BEGIN;
        
        ALTER TABLE tmp_holo_dwd_product_movie_basic_info_${bizdate} RENAME TO holo_dwd_product_movie_basic_info_${bizdate};
        
        -- Bind the temporary child partitioned table to the parent partitioned table.
        ALTER TABLE holo_dwd_product_movie_basic_info ATTACH PARTITION holo_dwd_product_movie_basic_info_${bizdate} FOR VALUES in ('${bizdate}');
        
        COMMIT;
                                            
      • Scenario 2: Update the existing data imported from a partition

        -- Create a temporary child partitioned table.
        BEGIN;
        CREATE TABLE IF NOT EXISTS "public".tmp_holo_dwd_product_movie_basic_info_${bizdate}  (
         "movie_name" text,
         "director" text,
         "scriptwriter" text,
         "area" text,
         "actors" text,
         "type" text,
         "movie_length" text,
         "movie_date" text,
         "movie_language" text,
         "imdb_url" text,
         "ds" text
        );
        COMMIT;
        
        -- Update the data in the foreign table that maps the MaxCompute table.
        import foreign schema public_data limit to (dwd_product_movie_basic_info) from server odps_server into public options(if_table_exist 'update');
        
        -- Wait 30 seconds before you import data to Hologres. Otherwise, the import operation may fail because of the latency caused by the update of Hologres metadata.
        select pg_sleep(30); 
        
        -- Import data from the partition to the temporary child partitioned table.
        INSERT INTO "public".tmp_holo_dwd_product_movie_basic_info_${bizdate} 
        SELECT 
            "movie_name",
            "director",
            "scriptwriter",
            "area",
            "actors",
            "type",
            "movie_length",
            "movie_date",
            "movie_language",
            "imdb_url",
            "ds"
        FROM "public".dwd_product_movie_basic_info
        WHERE ds='${bizdate}';
        
        Update the existing data imported from the partition
        BEGIN;
        
        ALTER TABLE IF EXISTS holo_dwd_product_movie_basic_info DETACH PARTITION holo_dwd_product_movie_basic_info_${bizdate};
        
        DROP TABLE IF EXISTS holo_dwd_product_movie_basic_info_${bizdate};
        
        ALTER TABLE tmp_holo_dwd_product_movie_basic_info_${bizdate} RENAME TO holo_dwd_product_movie_basic_info_${bizdate};
        
        -- Bind the temporary child partitioned table to the parent partitioned table.
        ALTER TABLE holo_dwd_product_movie_basic_info ATTACH PARTITION holo_dwd_product_movie_basic_info_${bizdate} FOR VALUES in ('${bizdate}');
        
        COMMIT;
  5. Configure the node properties.

    On the configuration tab of the Hologres SQL node, click Properties to configure the properties of the node.

    Note

    In this example, configure the parameters in the following tables based on the suggested values and retain the default values of other parameters.

    • General

      Parameter

      Value

      Parameter

      bizdate=${yyyymmdd}

    • Schedule

      Parameter

      Value

      Instance Generation Mode

      Immediately After Deployment

      Rerun

      Allow upon Failure Only

      Scheduling time to run the instance

      00:05

    • Dependencies

      Specify that the Hologres SQL node depends on the root node. You can also select an existing parent node based on your business requirements. Select Yes for Automatic Parsing From Code Before Committing and click Parse Input and Output from Code. The root node is automatically parsed. Then, select No for Automatic Parsing From Code Before Committing.

  6. Save and commit the node for scheduling.

    1. On the configuration tab of the Hologres SQL node, click the 保存 icon in the toolbar.

    2. Click the 提交 icon in the toolbar to commit the node.

    3. In the Submit dialog box, enter your comments in the Change description field.

    4. Click Determine.

  7. Deploy the node in Operation Center.

    1. On the configuration tab of the Hologres SQL node, click Operation Center in the toolbar.

    2. On the Operation Center page, choose Cycle Task Maintenance > Cycle Task in the left-side navigation pane.

    3. On the Cycle Task page, right-click the node and choose Run > Current Node Retroactively.

    4. In the left-side navigation pane, choose Cycle Task Maintenance > Patch data to view the instance that is running and its status.

  8. Query data.

    After the instance is run, DataWorks automatically creates a child partitioned table in Hologres that maps the partition in the MaxCompute table.

    1. Go to the DataStudio page and create a Hologres SQL node.

    2. On the node configuration tab, enter the following statements in sequence in the SQL editor to query data.

      • Query data in the child partitioned table

        select * from holo_dwd_product_movie_basic_info_20170112;
      • Query data in the parent partitioned table

        select count (*) from holo_dwd_product_movie_basic_info;