MaxCompute foreign data wrapper (FDW) of AnalyticDB for PostgreSQL is developed based on the PostgreSQL FDW framework to access foreign data that is stored in MaxCompute. MaxCompute FDW allows you to create foreign tables in AnalyticDB for PostgreSQL and use these tables to import data from MaxCompute to AnalyticDB for PostgreSQL for query and analysis.
The MaxCompute FDW module allows you to synchronize data between AnalyticDB for PostgreSQL and MaxCompute. You can create the following types of MaxCompute foreign tables by using MaxCompute FDW:
Non-partitioned foreign tables, which are mapped to non-partitioned MaxCompute tables.
Lowest-level partitioned foreign tables, which are mapped to the lowest-level partitions of partitioned MaxCompute tables.
Partitioned foreign tables, which are mapped to partitioned MaxCompute tables.
Use MaxCompute FDW
Create the MaxCompute FDW extension in an AnalyticDB for PostgreSQL database.
CREATE EXTENSION odps_fdw ;
Grant permissions on MaxCompute FDW to all database accounts.
GRANT USAGE ON FOREIGN DATA WRAPPER odps_fdw TO PUBLIC;
For a new AnalyticDB for PostgreSQL instance, the MaxCompute FDW extension is automatically created. In this case, you can skip the preceding steps.
For an existing AnalyticDB for PostgreSQL instance, you can use the privileged database account to connect to a specific database, run the preceding commands to create the MaxCompute FDW extension, and then grant permissions on the extension to all database accounts.
Use a MaxCompute foreign table
Before you use a MaxCompute foreign table, you must perform the following operations:
Create a MaxCompute server to specify the endpoint for accessing MaxCompute.
Create a MaxCompute user mapping to specify the account that can access the created MaxCompute server.
Create a MaxCompute foreign table to specify the MaxCompute table that you want to access.
1. Create a MaxCompute server
1.1 Sample code
CREATE SERVER odps_serv -- The name of the MaxCompute server.
FOREIGN DATA WRAPPER odps_fdw
OPTIONS (
tunnel_endpoint '<odps tunnel endpoint>' -- ODPS Tunnel Endpoint
);
1.2 Options
When you create a MaxCompute server in AnalyticDB for PostgreSQL, you need to specify only one of the tunnel_endpoint and odps_endpoint options. The following table describes the options.
Option | Required | Description |
tunnel_endpoint | No. We recommend that you specify this option. | The endpoint of the MaxCompute Tunnel service. |
odps_endpoint | No | The endpoint of the MaxCompute service. |
When you create a MaxCompute server, you can specify one or both of the options. If you specify both options, the specified MaxCompute Tunnel endpoint takes precedence. If you do not specify the MaxCompute Tunnel endpoint, the MaxCompute endpoint is used to route access requests to the corresponding MaxCompute Tunnel endpoint.
We recommend that you use the MaxCompute Tunnel endpoint in the Alibaba Cloud classic network or a virtual private cloud (VPC). If you use a VPC endpoint, make sure that the AnalyticDB for PostgreSQL instance and the MaxCompute project reside in the same zone.
When you use the Tunnel endpoint to access MaxCompute data over the Internet, you are charged USD 0.1166 for each GB of data.
For more information about MaxCompute endpoints, see Endpoints.
2. Create a user mapping to a MaxCompute server
2.1 Sample code
CREATE USER MAPPING FOR { username | USER | CURRENT_USER | PUBLIC }
SERVER odps_serv -- The name of the MaxCompute server.
OPTIONS (
id '<odps access id>', -- ODPS Account ID
key '<odps access key>' -- ODPS Account Key
);
username: the name of the existing account that is mapped to the MaxCompute server.
CURRENT_USER or USER: the name of the current user.
PUBLIC: all database accounts.
2.2 Options
When you create an AnalyticDB for PostgreSQL database account that is used to access the MaxCompute server, specify the type, AccessKey ID, and AccessKey secret of the account.
Option | Required | Description |
| Yes | The AccessKey ID of the account. |
| Yes | The AccessKey secret of the account. |
3. Create a MaxCompute foreign table
3.1 Sample code
CREATE FOREIGN TABLE IF NOT EXISTS table_name ( -- The name of the MaxCompute foreign table.
column_name data_type [, ... ]
)
SERVER odps_serv -- The name of the MaxCompute server.
OPTIONS (
project '<odps project>', -- The name of the MaxCompute project.
table '<odps table>' -- The name of the MaxCompute table.
);
3.2 Options
After you create a MaxCompute server and a user mapping to the MaxCompute server, you can create a MaxCompute foreign table. The following table describes the options.
Option | Required | Description |
| Yes | The name of the MaxCompute project. A project is a basic unit of MaxCompute. Similar to a database or schema in a traditional database system, a project is used to isolate users and implement access control. For more information, see Project. |
| Yes | The name of the MaxCompute table. MaxCompute stores data in tables. For more information, see Table. |
| No | The lowest-level partition in the partitioned MaxCompute table. Partitioning refers to dividing data in a table into independent parts based on partition keys. A partition key can be a single column or a combination of multiple columns. If a table is not partitioned, data is stored in the directory that stores the table. If a table is partitioned, each partition corresponds to a subdirectory in the directory that stores the table. In this case, data is stored in separate subdirectories. For more information about partitions, see Partition. |
3.3 Types of foreign tables
MaxCompute FDW allows you to create the following types of MaxCompute foreign tables based on the types of MaxCompute tables:
Non-partitioned foreign tables
A non-partitioned foreign table is mapped to a non-partitioned MaxCompute table. When you create a non-partitioned foreign table, you need to specify valid values only for the project and table options. You do not need to specify the partition option, or you can leave the partition option empty. Example:
CREATE FOREIGN TABLE odps_lineitem ( -- The name of the MaxCompute foreign table. l_orderkey bigint, l_partkey bigint, l_suppkey bigint, l_linenumber bigint, l_quantity double precision, l_extendedprice double precision, l_discount double precision, l_tax double precision, l_returnflag char(1), l_linestatus char(1), l_shipdate date, l_commitdate date, l_receiptdate date, l_shipinstruct char(25), l_shipmode char(10), l_comment varchar(44) ) SERVER odps_serv -- The name of the MaxCompute server. OPTIONS ( project 'odps_fdw', -- The name of the MaxCompute project. table 'lineitem_big' -- The name of the MaxCompute table. );
Lowest-level partitioned foreign tables
A lowest-level partitioned foreign table is mapped to the lowest-level partition of a MaxCompute table. When you create a lowest-level partitioned foreign table, you must specify valid values for the partition option. If a MaxCompute table is partitioned at multiple levels, you can map the lowest-level partitioned foreign table only to the lowest-level partition of the MaxCompute table. You must set the partition option to the full path of the lowest-level partition.
In the following example, a partitioned table that contains two levels of partitions in MaxCompute is created:
-- Create a partitioned table that contains two levels of partitions. In the partitioned table, pt is used as a level-1 partition key and region is used as a level-2 partition key. CREATE TABLE src (key string, value bigint) PARTITIONED BY (pt string,region string);
For example, a MaxCompute table is partitioned at two levels and contains the 20170601 partition and the hangzhou subpartition. To create a lowest-level partitioned foreign table that is mapped to the hangzhou subpartition in an AnalyticDB for PostgreSQL database, the partition option must be set to
pt=20170601,region=hangzhou
.CREATE FOREIGN TABLE odps_src_20170601_hangzhou ( -- The name of the MaxCompute foreign table. key string, value bigint ) SERVER odps_serv -- The name of the MaxCompute server. OPTIONS ( project 'odps_fdw', -- The name of the MaxCompute project. table 'src', -- The name of the MaxCompute table. partition 'pt=20170601,region=hangzhou' -- The full path of the lowest-level partition. );
NoteSet the partition option in the key=value format. To specify the lowest-level partition in a multi-level partitioned table, separate the key-value pairs with commas (,). Do not include spaces in the partition option.
You cannot map a lowest-level partitioned foreign table to a non-lowest-level partition of a MaxCompute table. In this example, you cannot set the partition option to
pt=20170601
, which specifies the path of the partition.Make sure that you set the partition option to the full path of a lowest-level partition. In this example, you cannot set the
partition
option toregion=hangzhou
, which specifies the relative path of the lowest-level partition.
Partitioned foreign tables
A partitioned foreign table is mapped to a partitioned MaxCompute table. The following code shows how to use the preceding MaxCompute table named src that contains two levels of partitions to create a partitioned foreign table. For more information, see Define table partitioning.
CREATE FOREIGN TABLE odps_src( -- The name of the MaxCompute foreign table. key text, value bigint, pt text, -- The partition key of the MaxCompute foreign table. region text -- The subpartition key of the MaxCompute foreign table. ) SERVER odps_serv OPTIONS ( project 'odps_fdw', -- The name of the MaxCompute project. table 'src' -- The name of the partitioned MaxCompute table. ) PARTITION BY LIST (pt) -- Use the pt field to partition the table. SUBPARTITION BY LIST (region) -- Use the region field to subpartition the table. SUBPARTITION TEMPLATE ( -- Specify a subpartition template. SUBPARTITION hangzhou VALUES ('hangzhou'), SUBPARTITION shanghai VALUES ('shanghai') ) ( PARTITION "20170601" VALUES ('20170601'), PARTITION "20170602" VALUES ('20170602'));
NoteWhen you create a partitioned foreign table in an AnalyticDB for PostgreSQL database, make sure that the following requirements are met. The requirements differ from the requirements for creating partitioned tables in MaxCompute.
Append the partition keys as fields to the end of other fields. If you create a multi-level partitioned foreign table, make sure that the order and levels of partition keys and the partition levels of the MaxCompute table match one another.
When you create a partitioned table, you must specify the partition key values. To partition a table, use the LIST method.
When you create a partitioned foreign table, you do not need to specify the partition option. If you specify the partition option, the foreign table is created based on the specified lowest-level partition of the specified MaxCompute table instead of the whole MaxCompute table.
If a partitioned foreign table contains partitions or subpartitions that have no matches in the specified MaxCompute table, an alert is triggered when you query the foreign table. In this case, you can delete the corresponding partition or subpartition from the foreign table. For more information, see the "Delete partitions or subpartitions from a partitioned foreign table" section of this topic.
3.4 Add partitions or subpartitions to a partitioned foreign table
In this example, the preceding partitioned foreign table named odps_src is used.
Add a partition, as shown in the following figure.
-- Add a partition. The subpartitions are automatically created.
alter table odps_src add partition "20170603" values(20170603);
Add a subpartition, as shown in the following figure.
-- Add a subpartition.
alter table odps_src alter partition "20170603" add partition "nanjing" values('nanjing');
3.5 Delete partitions or subpartitions from a partitioned foreign table
In this example, the preceding partitioned foreign table named odps_src is used.
Delete a partition, as shown in the following figure.
-- Delete a partition. The cascaded subpartitions are also deleted.
alter table odps_src drop partition "20170602";
Delete a subpartition, as shown in the following figure.
-- Delete a subpartition.
alter table odps_src alter partition "20170601" drop partition "hangzhou";
Data types supported by MaxCompute foreign tables
The following table describes the data type mappings between MaxCompute and AnalyticDB for PostgreSQL. We recommend that you specify the data types of columns in a foreign table in an AnalyticDB for PostgreSQL database based on the listed mappings.
AnalyticDB for PostgreSQL does not support data types that correspond to the STRUCT, MAP, and ARRAY data types that are supported by MaxCompute.
MaxCompute data type | AnalyticDB for PostgreSQL data type |
BOOLEAN | bool |
TINYINT | int2 |
SMALLINT | int2 |
INTEGER | int4 |
BIGINT | int8 |
FLOAT | float4 |
DOUBLE | float8 |
DECIMAL | numeric |
BINARY | bytea |
VARCHAR(n) | varchar(n) |
CHAR(n) | char(n) |
STRING | text |
DATE | date |
DATETIME | timestamp |
TIMESTAMP | timestamp |
ARRAY<SMALLINT/INT/BIGINT/BOOLEAN/FLOAT/DOUBLE/TEXT/VARCHAR/TIMESTAMP> (Not supported for the feature of writing data to MaxCompute foreign tables) | int2, int4 , int8, boolean, float4, float8, text, varchar, timestamp |
Scenarios
You can execute foreign scans on AnalyticDB for PostgreSQL databases to scan MaxCompute foreign tables. You can execute the same query statements to query data in foreign tables and data in other tables in AnalyticDB for PostgreSQL. In this example, TPC Benchmark™H (TPC-H) queries are used to illustrate common scenarios in which MaxCompute foreign tables are used.
Query a MaxCompute foreign table
TPC-H Q1 queries are used to aggregate and filter data in a single table. In this example, a Q1 query is performed on the MaxCompute foreign table named odps_lineitem.
-- Create the MaxCompute foreign table named odps_lineitem.
CREATE FOREIGN TABLE odps_lineitem (
l_orderkey bigint,
l_partkey bigint,
l_suppkey bigint,
l_linenumber bigint,
l_quantity double precision,
l_extendedprice double precision,
l_discount double precision,
l_tax double precision,
l_returnflag CHAR(1),
l_linestatus CHAR(1),
l_shipdate DATE,
l_commitdate DATE,
l_receiptdate DATE,
l_shipinstruct CHAR(25),
l_shipmode CHAR(10),
l_comment VARCHAR(44)
) server odps_serv
options (
project 'odps_fdw', table 'lineitem'
);
-- TPC-H Q1
select
l_returnflag,
l_linestatus,
sum(l_quantity) as sum_qty,
sum(l_extendedprice) as sum_base_price,
sum(l_extendedprice * (1 - l_discount)) as sum_disc_price,
sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) as sum_charge,
avg(l_quantity) as avg_qty,
avg(l_extendedprice) as avg_price,
avg(l_discount) as avg_disc,
count(*) as count_order
from
odps_lineitem
where
l_shipdate <= date '1998-12-01' - interval '88' day --(3)
group by
l_returnflag,
l_linestatus
order by
l_returnflag,
l_linestatus;
Import MaxCompute data to an on-premises table
To import data, perform the following steps:
Create a MaxCompute foreign table in an AnalyticDB for PostgreSQL database.
To import data to the new table, execute one of the following statements:
-- INSERT statement
INSERT INTO <Destination on-premises table> SELECT * FROM <MaxCompute foreign table>;
-- CREATE TABLE AS statement
CREATE TABLE <Destination on-premises table> AS SELECT * FROM <MaxCompute foreign table>;
Example 1: Execute the INSERT statement to import the data in odps_lineitem to an on-premises append-optimized column-oriented storage (AOCS) table.
-- Create an on-premises AOCS table.
CREATE TABLE aocs_lineitem (
l_orderkey bigint,
l_partkey bigint,
l_suppkey bigint,
l_linenumber bigint,
l_quantity double precision,
l_extendedprice double precision,
l_discount double precision,
l_tax double precision,
l_returnflag CHAR(1),
l_linestatus CHAR(1),
l_shipdate DATE,
l_commitdate DATE,
l_receiptdate DATE,
l_shipinstruct CHAR(25),
l_shipmode CHAR(10),
l_comment VARCHAR(44)
) WITH (APPENDONLY=TRUE, ORIENTATION=COLUMN, COMPRESSTYPE=ZSTD, COMPRESSLEVEL=5)
DISTRIBUTED BY (l_orderkey);
-- Import the data in odps_lineitem to the on-premises AOCS table.
INSERT INTO aocs_lineitem SELECT * FROM odps_lineitem;
Example 2: Execute the CREATE TABLE AS statement to import the data in odps_lineitem to an on-premises heap table.
create table heap_lineitem as select * from odps_lineitem distributed by (l_orderkey);
Associate a MaxCompute foreign table with an on-premises table
In this example, a TPC-H Q19 query is performed on the MaxCompute foreign table named odps_part that is associated with the on-premises AOCS table named aocs_lineitem.
-- TPC-H Q19
select
sum(l_extendedprice* (1 - l_discount)) as revenue
from
aocs_lineitem, -- The name of the on-premises AOCS table.
odps_part -- The name of the MaxCompute foreign table.
where
(
p_partkey = l_partkey
and p_brand = 'Brand#32'
and p_container in ('SM CASE', 'SM BOX', 'SM PACK', 'SM PKG')
and l_quantity >= 8 and l_quantity <= 8 + 10
and p_size between 1 and 5
and l_shipmode in ('AIR', 'AIR REG')
and l_shipinstruct = 'DELIVER IN PERSON'
)
or
(
p_partkey = l_partkey
and p_brand = 'Brand#41'
and p_container in ('MED BAG', 'MED BOX', 'MED PKG', 'MED PACK')
and l_quantity >= 15 and l_quantity <= 15 + 10
and p_size between 1 and 10
and l_shipmode in ('AIR', 'AIR REG')
and l_shipinstruct = 'DELIVER IN PERSON'
)
or
(
p_partkey = l_partkey
and p_brand = 'Brand#44'
and p_container in ('LG CASE', 'LG BOX', 'LG PACK', 'LG PKG')
and l_quantity >= 22 and l_quantity <= 22 + 10
and p_size between 1 and 15
and l_shipmode in ('AIR', 'AIR REG')
and l_shipinstruct = 'DELIVER IN PERSON'
);
Write data from an AnalyticDB for PostgreSQL table to MaxCompute
Example: Write data from the heap_lineitem table that is created in Example 2 to the odps_lineitem table.
INSERT INTO odps_lineitem SELECT * FROM heap_lineitem;
Common errors in using MaxCompute foreign tables
Notes
You can synchronize data from MaxCompute to MaxCompute foreign tables by using MaxCompute Tunnel. The synchronization performance is subject to the server resources and the outbound network bandwidth of MaxCompute Tunnel. Therefore, we recommend that you take note of the following items:
If you use only foreign tables, an AnalyticDB for PostgreSQL database can contain up to five foreign tables.
If you associate multiple MaxCompute foreign tables, import the data in large MaxCompute foreign tables to on-premises tables and associate the on-premises tables with small foreign tables to improve the synchronization performance.