Use DataWorks to import MaxCompute data to Hologres - Hologres

When you create a DataWorks scheduling task, you can specify the time and frequency for data transmission based on your business requirements. This helps ensure the data integrity and accuracy during data transmission and import. This topic describes how to use DataWorks to import data from a MaxCompute partitioned table to a Hologres partitioned table. This helps combine the benefits of the two platforms and improve the efficiency and reliability of data processing.

Prerequisites

A Hologres instance is purchased. For more information, see Purchase a Hologres instance.
MaxCompute is activated. A MaxCompute project is created. For more information about how to activate MaxCompute, see Activate MaxCompute and DataWorks.
DataWorks is activated. A DataWorks workspace is created. For more information about how to create a DataWorks workspace, see Create a workspace.

Precautions

You must make sure that the schema feature is not enabled at the tenant level or project level in MaxCompute. For more information about schemas, see Schema-related operations.

Procedure

Prepare MaxCompute data.

Log on to the MaxCompute console.
In the left-side navigation pane, choose DataWorks > Data Analytics.
In the left-side navigation pane of the SQL Query page, click SQL Query.

On the SQL Query page, enter the following SQL statement to create a partitioned table and click Run.

In this example, a MaxCompute partitioned table named dwd_product_movie_basic_info is created in the MaxCompute public dataset named public_data.

-- The DDL statement used to create a MaxCompute partitioned table.
CREATE TABLE IF NOT EXISTS public_data.dwd_product_movie_basic_info(
  movie_name STRING COMMENT 'Movie name',
  director STRING COMMENT 'Director',
  scriptwriter STRING COMMENT 'Scriptwriter',
  area STRING COMMENT 'Country/Region',
  actors STRING COMMENT 'Actors',
  `type` STRING COMMENT 'Type',
  movie_length STRING COMMENT 'Movie length',
  movie_date STRING COMMENT 'Release date',
  movie_language STRING COMMENT 'Movie language',
  imdb_url STRING COMMENT 'IMDb URL'
) 
PARTITIONED BY (ds STRING) STORED AS ALIORC;

On the SQL Query page, enter the required SQL statement and click the Run icon to query the imported data.
For example, you can execute the following statement to query the data imported to the 20170112 partition:
```
SELECT * FROM public_data.dwd_product_movie_basic_info WHERE ds = '20170112';
```

Create a foreign table in Hologres.
Create a foreign table in Hologres to map the MaxCompute table. The sequence and data types of fields in the foreign table must be the same as those in the MaxCompute table. Fields in the foreign table must have a one-to-one mapping with fields in the MaxCompute table.
1. Log on to the Hologres console, go to HoloWeb, and then create an SQL query task. For more information, see Connect to HoloWeb.
2. On the Ad-hoc Query tab, select the instance to which the database belongs from the Instance drop-down list and select the database from the Database drop-down list. Enter the following statements in the SQL editor and click Run.
  In this example, the IMPORT FOREIGN SCHEMA statement is used to create a foreign table named dwd_product_movie_basic_info in Hologres.
```
import foreign schema public_data limit to (dwd_product_movie_basic_info) from server odps_server into public options(if_table_exist 'update');
```

Create an internal table in Hologres to store data.

Create an internal table that is used to receive and store the imported data.

On the SQL Editor page, click the New SQL Query icon in the left-side navigation pane.

On the Ad-hoc Query tab, select the instance to which the database belongs from the Instance drop-down list and select the database from the Database drop-down list. Enter the following statements in the SQL editor and click Run.

In this example, data is imported from a MaxCompute partitioned table to Hologres. Therefore, the internal table to be created must be a partitioned table.

Note

The following statements are used only in this example. You can edit the statements based on your business requirements and create indexes for the internal table to improve query efficiency.

BEGIN;
CREATE TABLE "public"."holo_dwd_product_movie_basic_info" (
 "movie_name" text,
 "director" text,
 "scriptwriter" text,
 "area" text,
 "actors" text,
 "type" text,
 "movie_length" text,
 "movie_date" text,
 "movie_language" text,
 "imdb_url" text,
 "ds" text
)
PARTITION BY LIST (ds);
CALL SET_TABLE_PROPERTY('"public"."holo_dwd_product_movie_basic_info"', 'orientation', 'column');
comment on column "public"."holo_dwd_product_movie_basic_info"."movie_name" is 'Movie name';
comment on column "public"."holo_dwd_product_movie_basic_info"."director" is 'Director';
comment on column "public"."holo_dwd_product_movie_basic_info"."scriptwriter" is 'Scriptwriter';
comment on column "public"."holo_dwd_product_movie_basic_info"."area" is 'Country/Region';
comment on column "public"."holo_dwd_product_movie_basic_info"."actors" is 'Actor';
comment on column "public"."holo_dwd_product_movie_basic_info"."type" is 'Type';
comment on column "public"."holo_dwd_product_movie_basic_info"."movie_length" is 'Movie length';
comment on column "public"."holo_dwd_product_movie_basic_info"."movie_date" is 'Release date';
comment on column "public"."holo_dwd_product_movie_basic_info"."movie_language" is 'Movie language';
comment on column "public"."holo_dwd_product_movie_basic_info"."imdb_url" is 'IMDb URL';
COMMIT;

Create a child partitioned table to perform data development.

In this step, a Hologres SQL node is created to schedule data import.

Log on to the DataWorks console, go to the DataStudio page, and then create a Hologres SQL node. For more information, see Create a Hologres SQL node.

Enter one of the following sets of sample statements in the SQL editor.

Hologres does not allow you to directly import data from a MaxCompute partitioned table to a parent partitioned table in Hologres. Therefore, you must create a child partitioned table in Hologres that has the same partition key value with a specific partition in the MaxCompute table. This way, you can import data from the partition to the child partitioned table. The partition key value is specified by the ${bizdate} parameter. The value of this parameter is automatically assigned by the scheduling system of DataWorks in periodic scheduling. For more information about scheduling parameters, see Supported formats of scheduling parameters.

Note

The key value of the partition from which data is imported must be the same as the partition key value of the child partitioned table. Otherwise, an error message appears. In the following examples, the ds partition key is used.

You can use different methods to import data in different scenarios. The following scenarios are for your reference. You can select one based on your business requirements.

Scenario 1: Import incremental data from a partition to a Hologres temporary table

-- Create a temporary child partitioned table.
BEGIN;
CREATE TABLE IF NOT EXISTS "public".tmp_holo_dwd_product_movie_basic_info_${bizdate}  (
 "movie_name" text,
 "director" text,
 "scriptwriter" text,
 "area" text,
 "actors" text,
 "type" text,
 "movie_length" text,
 "movie_date" text,
 "movie_language" text,
 "imdb_url" text,
 "ds" text
);
COMMIT;

-- Update the data in the foreign table that maps the MaxCompute table.
import foreign schema public_data limit to (dwd_product_movie_basic_info) from server odps_server into public options(if_table_exist 'update');

-- Wait 30 seconds before you import data to Hologres. Otherwise, the import operation may fail because of the latency caused by the update of Hologres metadata.
select pg_sleep(30); 

-- Import data from the partition to the temporary child partitioned table.
INSERT INTO "public".tmp_holo_dwd_product_movie_basic_info_${bizdate} 
SELECT 
    "movie_name",
    "director",
    "scriptwriter",
    "area",
    "actors",
    "type",
    "movie_length",
    "movie_date",
    "movie_language",
    "imdb_url",
    "ds"
FROM "public".dwd_product_movie_basic_info
WHERE ds='${bizdate}';

-- Import incremental data from the partition.
BEGIN;

ALTER TABLE tmp_holo_dwd_product_movie_basic_info_${bizdate} RENAME TO holo_dwd_product_movie_basic_info_${bizdate};

-- Bind the temporary child partitioned table to the parent partitioned table.
ALTER TABLE holo_dwd_product_movie_basic_info ATTACH PARTITION holo_dwd_product_movie_basic_info_${bizdate} FOR VALUES in ('${bizdate}');

COMMIT;

Scenario 2: Update the existing data imported from a partition

-- Create a temporary child partitioned table.
BEGIN;
CREATE TABLE IF NOT EXISTS "public".tmp_holo_dwd_product_movie_basic_info_${bizdate}  (
 "movie_name" text,
 "director" text,
 "scriptwriter" text,
 "area" text,
 "actors" text,
 "type" text,
 "movie_length" text,
 "movie_date" text,
 "movie_language" text,
 "imdb_url" text,
 "ds" text
);
COMMIT;

-- Update the data in the foreign table that maps the MaxCompute table.
import foreign schema public_data limit to (dwd_product_movie_basic_info) from server odps_server into public options(if_table_exist 'update');

-- Wait 30 seconds before you import data to Hologres. Otherwise, the import operation may fail because of the latency caused by the update of Hologres metadata.
select pg_sleep(30); 

-- Import data from the partition to the temporary child partitioned table.
INSERT INTO "public".tmp_holo_dwd_product_movie_basic_info_${bizdate} 
SELECT 
    "movie_name",
    "director",
    "scriptwriter",
    "area",
    "actors",
    "type",
    "movie_length",
    "movie_date",
    "movie_language",
    "imdb_url",
    "ds"
FROM "public".dwd_product_movie_basic_info
WHERE ds='${bizdate}';

Update the existing data imported from the partition
BEGIN;

ALTER TABLE IF EXISTS holo_dwd_product_movie_basic_info DETACH PARTITION holo_dwd_product_movie_basic_info_${bizdate};

DROP TABLE IF EXISTS holo_dwd_product_movie_basic_info_${bizdate};

ALTER TABLE tmp_holo_dwd_product_movie_basic_info_${bizdate} RENAME TO holo_dwd_product_movie_basic_info_${bizdate};

-- Bind the temporary child partitioned table to the parent partitioned table.
ALTER TABLE holo_dwd_product_movie_basic_info ATTACH PARTITION holo_dwd_product_movie_basic_info_${bizdate} FOR VALUES in ('${bizdate}');

COMMIT;

Configure the node properties.
On the configuration tab of the Hologres SQL node, click Properties in the right-side navigation pane to configure the properties of the node.
Note
In this example, configure the parameters in the following tables based on the suggested values and retain the default values of other parameters.
- General
  Parameter
  Value
  Parameter
  bizdate=${yyyymmdd}
- Schedule
  Parameter
  Value
  Instance Generation Mode
  Immediately After Deployment
  Rerun
  Allow upon Failure Only
  Scheduled time
  00:05
- Dependencies
  Specify that the Hologres SQL node depends on the root node. You can also select an existing parent node based on your business requirements. Select Yes for Automatic Parsing From Code Before Committing and click Parse Input and Output from Code. The root node is automatically parsed. Then, select No for Automatic Parsing From Code Before Committing.
Save and commit the node for scheduling.
1. On the configuration tab of the Hologres SQL node, click the icon in the top toolbar to save the node.
2. Click the icon in the top toolbar to commit the node.
3. In the Submit dialog box, configure the Change description parameter.
4. Click Confirm.
Deploy the node in Operation Center.
1. On the configuration tab of the Hologres SQL node, click Operation Center in the upper-right corner.
2. On the Operation Center page, choose Auto Triggered Node O&M > Auto Triggered Nodes in the left-side navigation pane.
3. On the page that appears, open the DAG of the node, right-click the node name, and then choose Run > Backfill Data for Current Node.
4. In the left-side navigation pane, choose O&M Assistant > Data Backfill to view the instance that is running and its status.
Query data.
After the instance is run, DataWorks automatically creates a child partitioned table in Hologres that maps the partition in the MaxCompute table.
1. Go to the DataStudio page and create a Hologres SQL node.
2. On the node configuration tab, enter the following statements in sequence in the SQL editor to query data.
  - Query data in the child partitioned table
```
select * from holo_dwd_product_movie_basic_info_20170112;
```
  - Query data in the parent partitioned table
```
select count (*) from holo_dwd_product_movie_basic_info;
```

Parameter	Value
Instance Generation Mode	Immediately After Deployment
Rerun	Allow upon Failure Only
Scheduled time	00:05

Parameter	Value
Parameter	bizdate=${yyyymmdd}