Migrate data from Doris to AnalyticDB for PostgreSQL

This topic describes how to migrate data from Doris to AnalyticDB for PostgreSQL.

Prerequisites

An Express Connect circuit is created for an on-premises data center. For more information, see Connect an on-premises data center to a VPC by using an Express Connect circuit.
Alibaba Cloud Object Storage Service (OSS) is activated. For more information, see What is OSS?
An OSS bucket is created. For more information, see Create a bucket.
An AnalyticDB for PostgreSQL instance is created. For more information, see Create an instance.

Procedure

Step 1: Create a destination table to store data

Create a destination table in the AnalyticDB for PostgreSQL instance to store Doris data. The destination table must have the same schema as the source table. For information about the syntax, see the "Table creation statements" section of this topic.

Step 2: Import data from Doris to OSS

Doris supports data export to object storage services by using the S3 protocol. OSS also supports the S3 protocol. You can directly upload the required data to OSS. The export capabilities of Doris vary based on different versions.

In Doris 2.0 and later, you can execute the EXPORT TABLE statement to export data in the CSV, TEXT, ORC, or Parquet format. The Parquet export format of Doris is incompatible with the mainstream Parquet format. Data that contains special characters such as line feeds cannot be exported in the CSV or TEXT format. We recommend that you export data in the ORC format because the ORC format offers a faster export speed.
In Doris 1.2, you can execute the EXPORT TABLE statement to export data only in the CSV format. We recommend that you execute the SELECT INTO OUTFILE statement to export data in the ORC format. This export method offers a slightly slower export speed than the EXPORT TABLE statement but allows you to use the WHERE clause to filter data.

Example

Execute the following statements to export data from Doris 2.0 and Doris 1.2 to OSS:

------ Doris 2.0
EXPORT TABLE s3_test TO "s3://bucket/dir/" 
PROPERTIES (
  "format"="orc"
)
WITH s3 (
    "AWS_ENDPOINT" = "oss-cn-shanghai-internal.aliyuncs.com",
    "AWS_ACCESS_KEY" = "****************",
    "AWS_SECRET_KEY" = "****************",
    "AWS_REGION" = "shanghai"
)
---- Doris 1.2 (with the c column of the DATETIME type)
SELECT a, b, CAST(c AS string) AS c FROM s3_test INTO OUTFILE "s3://bucket/dir/"
FORMAT AS orc
PROPERTIES
(
    "AWS_ENDPOINT" = "oss-cn-shanghai-internal.aliyuncs.com",
    "AWS_ACCESS_KEY" = "****************",
    "AWS_SECRET_KEY" = "****************",
    "AWS_REGION" = "shanghai"
);

Note

If the source table contains columns of the DATETIME type, you can execute only the SELECT INTO FILE statement. This is because DATETIME data exported from Doris is incompatible with mainstream products and must be converted into the STRING type.

Step 3: Import data from OSS to AnalyticDB for PostgreSQL

You can use the COPY statement or an OSS foreign table to import data from OSS to AnalyticDB for PostgreSQL.

For information about how to use the COPY statement to import OSS data, see Use the COPY or UNLOAD statement to import or export data between OSS foreign tables and AnalyticDB for PostgreSQL tables.
For information about how to use an OSS foreign table to import OSS data, see Use OSS foreign tables for data lake analysis.

Example

Execute the following COPY statement to import OSS data:

COPY test1 FROM 'oss://bucket/dir/' ACCESS_KEY_ID '****************' SECRET_ACCESS_KEY '****************' 
FORMAT AS orc ENDPOINT 'oss-*****-
internal.aliyuncs.com' FDW 'oss_fdw' ;

Syntax conversion

Data types

Doris	AnalyticDB for PostgreSQL	Description

Doris	AnalyticDB for PostgreSQL	Description
BOOLEAN	BOOLEAN	None.
TINYINT	SMALLINT	The TINYINT type is not supported in AnalyticDB for PostgreSQL.
SMALLINT	SMALLINT	None.
INT	INT	None.
BIGINT	BIGINT	None.
LARGEINT	DECIMAL	None.
FLOAT	FLOAT	None.
DOUBLE	DOUBLE	None.
DECIMAL	DECIMAL	None.
DATE	DATE	None.
DATETIME	TIMESTAMP/TIMSTAMPTZ	None.
CHAR	CHAR	None.
VARCHAR	VARCHAR	None.
STRING	TEXT	None.
HyperLogLog (HLL)	/	HLL provides an approximate count of unique elements. If the amount of data is large, the performance of HLL is higher than the Count Distinct() function. In most cases, the error rate of HLL is approximately 1%. Occasionally, the error rate of HLL reaches up to 2%. An HLL column cannot be used as a key column. The aggregation type HLL_UNION is used when a Doris table of the aggregate key model is created. You do not need to specify a length and the default value for HLL columns. The length is specified by the system based on the degree of data aggregation. HLL columns can be queried or used only by using the following functions: hll_union_agg(), hll_raw_agg(), hll_cardinality(), and hll_hash().
BITMAP	/	BITMAP columns can be used in Doris tables of the aggregate or unique key model.
QUANTILE_STATE	/	None.
ARRAY	[ ]	None.
MAP	A custom composite type	None.
STRUCT	A custom composite type	None.
JSON	JSON	None.
AGG_STATE	/	None.
VARIANT	A custom composite type	None.

Table creation statements

You can create tables of the following models:

Model 1: Detail model

In the detail model, no primary key or aggregate key is specified. The DUPLICATE KEY clause specified in the table creation statement is used to specify the columns based on which underlying data is sorted. A table of the detail model corresponds to an append-optimized column-oriented (AOCO) or Beam table in AnalyticDB for PostgreSQL. You can use the ORDER BY clause to specify the sort key and enable the auto-merge feature to sort data on a regular basis.

Example

CREATE TABLE IF NOT EXISTS example_tbl_by_default
(
    `timestamp` DATETIME NOT NULL COMMENT "The time when the log was generated",
    `type` INT NOT NULL COMMENT "The type of the log",
    `error_code` INT COMMENT "The error code",
    `error_msg` VARCHAR(1024) COMMENT "The error message",
    `op_id` BIGINT COMMENT "The owner ID",
    `op_time` DATETIME COMMENT "The time when the error was resolved"
)
DUPLICATE KEY (`timestamp`,`type`,`error_code`)
DISTRIBUTED BY HASH(`type`) BUCKETS 1
PROPERTIES (
"replication_allocation" = "tag.location.default: 1"
);

----AnalyticDB PostgreSQL
CREATE TABLE IF NOT EXISTS example_tbl_by_default
(
    "timestamp" TIMESTAMP NOT NULL ,
    "type" INT NOT NULL ,
    error_code INT ,
    error_msg VARCHAR(1024),
    op_id BIGINT,
    op_time TIMESTAMP
)
WITH(appendonly = true, orientation = column)
DISTRIBUTED BY("type")
ORDER BY("timestamp","type",error_code);

COMMENT ON COLUMN example_tbl_by_default.timestamp IS 'The time when the log was generated';

Model 2: Unique key model

In the unique key model, the UNIQUE KEY clause is used to specify the unique constraint and ensure the uniqueness of the primary key. A table of the unique key model corresponds to a heap table in AnalyticDB for PostgreSQL. You can use the PRIMARY KEY clause to specify the unique key.

Example

CREATE TABLE IF NOT EXISTS example_tbl_unique
(
    `user_id` LARGEINT NOT NULL COMMENT "The user ID",
    `username` VARCHAR(50) NOT NULL COMMENT "The nickname of the user",
    `city` VARCHAR(20) COMMENT "The city in which the user resides",
    `age` SMALLINT COMMENT "The age of the user",
    `sex` TINYINT COMMENT "The gender of the user",
    `phone` LARGEINT COMMENT "The phone number of the user",
    `address` VARCHAR(500) COMMENT "The address of the user",
    `register_time` DATETIME COMMENT "The time when the user registered"
)
UNIQUE KEY(`user_id`, `username`)
DISTRIBUTED BY HASH(`user_id`) BUCKETS 1
PROPERTIES (
"replication_allocation" = "tag.location.default: 1"
);

----AnalyticDB PostgreSQL
CREATE TABLE IF NOT EXISTS example_tbl_unique
(
    user_id BIGINT NOT NULL,
    username VARCHAR(50) NOT NULL,
    city VARCHAR(20),
    age SMALLINT,
    sex SMALLINT,
    phone BIGINT,
    address VARCHAR(500),
    register_time TIMESTAMP,
    PRIMARY KEY (user_id, username)
)
DISTRIBUTED BY (user_id);

COMMENT ON COLUMN example_tbl_unique.user_id IS 'The user ID';

Model 3: Aggregate key model

When you import data to a table of the aggregate key model, rows that have the same aggregate key column values are aggregated into one row based on the specified aggregate key fields. A table of the aggregate key model corresponds to a heap table in AnalyticDB for PostgreSQL. You can create a unique index on the aggregate key and execute the UPSERT statement to insert data. For more information, see Use INSERT ON CONFLICT to overwrite data.

Example

CREATE TABLE IF NOT EXISTS example_tbl_agg1
(
    `user_id` LARGEINT NOT NULL COMMENT "The user ID",
    `date` DATE NOT NULL COMMENT "The date on which data is imported to the table",
    `city` VARCHAR(20) COMMENT "The city in which the user resides",
    `age` SMALLINT COMMENT "The age of the user",
    `sex` TINYINT COMMENT "The gender of the user",
    `last_visit_date` DATETIME REPLACE DEFAULT "1970-01-01 00:00:00" COMMENT "The most recent time when the user visited",
    `cost` BIGINT SUM DEFAULT "0" COMMENT "The amount of money that the user spends",
    `max_dwell_time` INT MAX DEFAULT "0" COMMENT "The maximum dwell time of the user",
    `min_dwell_time` INT MIN DEFAULT "99999" COMMENT "The minimum dwell time of the user"
)
AGGREGATE KEY(`user_id`, `date`, `city`, `age`, `sex`)
DISTRIBUTED BY HASH(`user_id`) BUCKETS 1
PROPERTIES (
"replication_allocation" = "tag.location.default: 1"
);

-----AnalyticDB for PostgreSQL does not support automatic pre-aggregation.
CREATE TABLE IF NOT EXISTS example_tbl_agg1
(
    user_id BIGINT NOT NULL,
    "date" DATE NOT NULL,
    city VARCHAR(20),
    age SMALLINT,
    sex SMALLINT,
    last_visit_date TIMESTAMP DEFAULT '1970-01-01 00:00:00',
    cost BIGINT DEFAULT 0,
    max_dwell_time INT DEFAULT 0,
    min_dwell_time INT DEFAULT 99999,
    UNIQUE (user_id, "date", city, age, sex)
)
DISTRIBUTED BY(user_id);

INSERT INTO example_tbl_agg1 VALUES (10000,'2024-08-22','beijing', 18, 0, '2024-08-22 12:00:00', 20, 1000, 1000) ON CONFLICT (user_id, "date", city, age, sex) DO UPDATE SET last_visit_date = excluded.last_visit_date, cost = example_tbl_agg1.cost + excluded.cost, max_dwell_time = GREATEST(example_tbl_agg1.max_dwell_time, excluded.max_dwell_time), min_dwell_time = LEAST(example_tbl_agg1.min_dwell_time, excluded.min_dwell_time);

Partitioning and bucketing

In Doris, you can use the PARTITION BY clause to partition data and the DISTRIBUTED BY clause to further divide data into buckets. You can use the BUCKETS parameter to specify the number of buckets. In AnalyticDB for PostgreSQL, you can use the PARTITION BY clause to specify the partition key and the DISTRIBUTED BY clause to specify the distribution key.

Example

CREATE TABLE IF NOT EXISTS example_range_tbl
(
    `user_id` LARGEINT NOT NULL COMMENT "The user ID",
    `date` DATE NOT NULL COMMENT "The date on which data is imported to the table",
    `timestamp` DATETIME NOT NULL COMMENT "The time when data is imported to the table",
    `city` VARCHAR(20) COMMENT "The city in which the user resides",
    `age` SMALLINT COMMENT "The age of the user",
    `sex` TINYINT COMMENT "The gender of the user",
    `last_visit_date` DATETIME REPLACE DEFAULT "1970-01-01 00:00:00" COMMENT "The most recent time when the user visited",
    `cost` BIGINT SUM DEFAULT "0" COMMENT "The amount of money that the user spends",
    `max_dwell_time` INT MAX DEFAULT "0" COMMENT "The maximum dwell time of the user",
    `min_dwell_time` INT MIN DEFAULT "99999" COMMENT "The minimum dwell time of the user"
)
ENGINE=OLAP
AGGREGATE KEY(`user_id`, `date`, `timestamp`, `city`, `age`, `sex`)
PARTITION BY RANGE(`date`)
(
    PARTITION `p201701` VALUES LESS THAN ("2017-02-01"),
    PARTITION `p201702` VALUES LESS THAN ("2017-03-01"),
    PARTITION `p201703` VALUES LESS THAN ("2017-04-01"),
    PARTITION `p2018` VALUES [("2018-01-01"), ("2019-01-01"))
)
DISTRIBUTED BY HASH(`user_id`) BUCKETS 16
PROPERTIES
(
    "replication_num" = "1"
);

----AnalyticDB PostgreSQL
CREATE TABLE IF NOT EXISTS example_range_tbl
(
    user_id BIGINT NOT NULL,
    "date" DATE NOT NULL,
    city VARCHAR(20),
    age SMALLINT,
    sex SMALLINT,
    visit_date TIMESTAMP DEFAULT '1970-01-01 00:00:00',
    a_cost BIGINT DEFAULT 0,
    dwell_time INT DEFAULT 0
)
PARTITION BY RANGE("date")
(
    PARTITION p201701 VALUES START ("2017-02-01") INCLUSIVE,
    PARTITION p201702 VALUES START ("2017-03-01") INCLUSIVE,
    PARTITION p201703 VALUES START ("2017-04-01") INCLUSIVE,
    PARTITION p2018 VALUES START ("2018-01-01") INCLUSIVE END ("2019-01-01") EXCLUSIVE
)
DISTRIBUTED BY (user_id);

Prerequisites

Procedure

Step 1: Create a destination table to store data

Step 2: Import data from Doris to OSS

Example

Step 3: Import data from OSS to AnalyticDB for PostgreSQL

Example

Syntax conversion

Data types

Table creation statements

Model 1: Detail model

Example

Model 2: Unique key model

Example

Model 3: Aggregate key model

Example

Partitioning and bucketing

Example

Sales Support

Technical Support

Connect & Report Abuse

About Alibaba Cloud

Our Global Network

Quick Start

Global Offices

Olympic Games Paris 2024 New

Stade Roland Garros – Glitz from the Past New

Place de la Concorde – “Breaking” the Barriers New

Vaires-sur-Marne Nautical Stadium – Sports with Sustainability New

International Broadcast Center – Images, Sounds, and Data that Captivate Billions New

Customer Success Stories New

Trust Center

Security & Compliance Center

Cloud Compliance Resources

Security Compliance FAQs

Product & Feature Update New

Cloud Forward

Press Room

Alibaba Cloud e-Magazine New

Alibaba Cloud in Analyst Research

Notice

Go Global Service New

Go Global Alliance with Alibaba Cloud

Asia Accelerator Hot

Information Compliance

China Gateway - MLPS 2.0 Compliance New

China Gateway - Networking

China Gateway - Global Application Acceleration New

China Gateway - Security

China Gateway - Data Security New

ICP Support Hot

China Gateway - Omnichannel Data Mid-End New

China Gateway - Organizational Data Mid-End New

China Gateway - Business Mid-End New

China Gateway - AI Service for Conversational Chatbots New

China Gateway - Online Education

China Gateway - Domain Registration

Work at Alibaba Cloud

Experienced Professionals

Students and Graduates

Free Trial

Pricing

Promo Center

Price Reduction

Pay Less and Deploy More

FinOps

Elastic Compute Service (ECS)

Simple Application Server (SAS)

Elastic GPU Service

Elastic Desktop Service (EDS)

Object Storage Service (OSS)

Cloud Enterprise Network (CEN)

Web Application Firewall (WAF)

Domain Names

Container Compute Service (ACS)

Secure Access Service Edge (SASE)

Intelligent Media Services(IMS)

Edge Security Acceleration (ESA)(Original DCDN)

Intelligent Media Management

DingTalk Enterprise

YiDA

Alibaba Cloud Model Studio

Apsara Prime - For Easy Cloud Product Selection

Alibaba Cloud ECS - Cater All Your Cloud Hosting Needs

1TB CDN—Get Free 1 TB Outbound Traffic Plan Now

Security—Under Attack? Get Free Security Support

Short Message Service - Free Testing is Available

Elastic Compute Service (ECS) Hot

CloudBox

Compute Nest

Dedicated Host Hot

ECS Bare Metal Instance

Elastic GPU Service Featured

Simple Application Server (SAS) Hot

Auto Scaling

Cloud Phone Beta

Elastic Desktop Service (EDS) Featured

Batch Compute

Elastic High Performance Computing (E-HPC)

Super Computing Cluster (SCC)

Function Compute (FC)