All Products
Search
Document Center

AnalyticDB:Use MaxCompute foreign tables to access MaxCompute data

Last Updated:Oct 29, 2024

MaxCompute foreign data wrapper (FDW) of AnalyticDB for PostgreSQL is developed based on the PostgreSQL FDW framework to access foreign data that is stored in MaxCompute. MaxCompute FDW allows you to create foreign tables in AnalyticDB for PostgreSQL and use these tables to import data from MaxCompute to AnalyticDB for PostgreSQL for query and analysis.

The MaxCompute FDW module allows you to synchronize data between AnalyticDB for PostgreSQL and MaxCompute. You can create the following types of MaxCompute foreign tables by using MaxCompute FDW:

  • Non-partitioned foreign tables, which are mapped to non-partitioned MaxCompute tables.

  • Lowest-level partitioned foreign tables, which are mapped to the lowest-level partitions of partitioned MaxCompute tables.

  • Partitioned foreign tables, which are mapped to partitioned MaxCompute tables.

Use MaxCompute FDW

  1. Create the MaxCompute FDW extension in an AnalyticDB for PostgreSQL database.

    CREATE EXTENSION odps_fdw ;
  2. Grant permissions on MaxCompute FDW to all database accounts.

    GRANT USAGE ON FOREIGN DATA WRAPPER odps_fdw TO PUBLIC;
  • For a new AnalyticDB for PostgreSQL instance, the MaxCompute FDW extension is automatically created. In this case, you can skip the preceding steps.

  • For an existing AnalyticDB for PostgreSQL instance, you can use the privileged database account to connect to a specific database, run the preceding commands to create the MaxCompute FDW extension, and then grant permissions on the extension to all database accounts.

Use a MaxCompute foreign table

Before you use a MaxCompute foreign table, you must perform the following operations:

  • Create a MaxCompute server to specify the endpoint for accessing MaxCompute.

  • Create a MaxCompute user mapping to specify the account that can access the created MaxCompute server.

  • Create a MaxCompute foreign table to specify the MaxCompute table that you want to access.

1. Create a MaxCompute server

1.1 Sample code

CREATE SERVER odps_serv                        -- The name of the MaxCompute server.
  FOREIGN DATA WRAPPER odps_fdw
  OPTIONS (
    tunnel_endpoint '<odps tunnel endpoint>'   -- ODPS Tunnel Endpoint
  );

1.2 Options

When you create a MaxCompute server in AnalyticDB for PostgreSQL, you need to specify only one of the tunnel_endpoint and odps_endpoint options. The following table describes the options.

Option

Required

Description

tunnel_endpoint

No. We recommend that you specify this option.

The endpoint of the MaxCompute Tunnel service.

odps_endpoint

No

The endpoint of the MaxCompute service.

Note
  • When you create a MaxCompute server, you can specify one or both of the options. If you specify both options, the specified MaxCompute Tunnel endpoint takes precedence. If you do not specify the MaxCompute Tunnel endpoint, the MaxCompute endpoint is used to route access requests to the corresponding MaxCompute Tunnel endpoint.

  • We recommend that you use the MaxCompute Tunnel endpoint in the Alibaba Cloud classic network or a virtual private cloud (VPC). If you use a VPC endpoint, make sure that the AnalyticDB for PostgreSQL instance and the MaxCompute project reside in the same zone.

  • When you use the Tunnel endpoint to access MaxCompute data over the Internet, you are charged USD 0.1166 for each GB of data.

For more information about MaxCompute endpoints, see Endpoints.

2. Create a user mapping to a MaxCompute server

2.1 Sample code

CREATE USER MAPPING FOR { username | USER | CURRENT_USER | PUBLIC }
  SERVER odps_serv                                  -- The name of the MaxCompute server.
  OPTIONS (
    id '<odps access id>',                                -- ODPS Account ID
    key '<odps access key>'                               -- ODPS Account Key
  );
Note
  • username: the name of the existing account that is mapped to the MaxCompute server.

  • CURRENT_USER or USER: the name of the current user.

  • PUBLIC: all database accounts.

2.2 Options

When you create an AnalyticDB for PostgreSQL database account that is used to access the MaxCompute server, specify the type, AccessKey ID, and AccessKey secret of the account.

Option

Required

Description

id

Yes

The AccessKey ID of the account.

key

Yes

The AccessKey secret of the account.

3. Create a MaxCompute foreign table

3.1 Sample code

CREATE FOREIGN TABLE IF NOT EXISTS table_name ( -- The name of the MaxCompute foreign table.
    column_name data_type [, ... ]
)
  SERVER odps_serv                              -- The name of the MaxCompute server.
  OPTIONS (
    project '<odps project>',                   -- The name of the MaxCompute project.
    table '<odps table>'                        -- The name of the MaxCompute table.
);

3.2 Options

After you create a MaxCompute server and a user mapping to the MaxCompute server, you can create a MaxCompute foreign table. The following table describes the options.

Option

Required

Description

project

Yes

The name of the MaxCompute project. A project is a basic unit of MaxCompute. Similar to a database or schema in a traditional database system, a project is used to isolate users and implement access control. For more information, see Project.

table

Yes

The name of the MaxCompute table. MaxCompute stores data in tables. For more information, see Table.

partition

No

The lowest-level partition in the partitioned MaxCompute table. Partitioning refers to dividing data in a table into independent parts based on partition keys. A partition key can be a single column or a combination of multiple columns. If a table is not partitioned, data is stored in the directory that stores the table. If a table is partitioned, each partition corresponds to a subdirectory in the directory that stores the table. In this case, data is stored in separate subdirectories. For more information about partitions, see Partition.

3.3 Types of foreign tables

MaxCompute FDW allows you to create the following types of MaxCompute foreign tables based on the types of MaxCompute tables:

  • Non-partitioned foreign tables

    A non-partitioned foreign table is mapped to a non-partitioned MaxCompute table. When you create a non-partitioned foreign table, you need to specify valid values only for the project and table options. You do not need to specify the partition option, or you can leave the partition option empty. Example:

    CREATE FOREIGN TABLE odps_lineitem (              -- The name of the MaxCompute foreign table.
        l_orderkey      bigint,
        l_partkey       bigint,
        l_suppkey       bigint,
        l_linenumber    bigint,
        l_quantity      double precision,
        l_extendedprice double precision,
        l_discount      double precision,
        l_tax           double precision,
        l_returnflag    char(1),
        l_linestatus    char(1),
        l_shipdate      date,
        l_commitdate    date,
        l_receiptdate   date,
        l_shipinstruct  char(25),
        l_shipmode      char(10),
        l_comment       varchar(44)
    ) SERVER odps_serv                              -- The name of the MaxCompute server.
    OPTIONS (
      project 'odps_fdw',                           -- The name of the MaxCompute project.
      table 'lineitem_big'                          -- The name of the MaxCompute table.
    );
  • Lowest-level partitioned foreign tables

    A lowest-level partitioned foreign table is mapped to the lowest-level partition of a MaxCompute table. When you create a lowest-level partitioned foreign table, you must specify valid values for the partition option. If a MaxCompute table is partitioned at multiple levels, you can map the lowest-level partitioned foreign table only to the lowest-level partition of the MaxCompute table. You must set the partition option to the full path of the lowest-level partition.

    In the following example, a partitioned table that contains two levels of partitions in MaxCompute is created:

    -- Create a partitioned table that contains two levels of partitions. In the partitioned table, pt is used as a level-1 partition key and region is used as a level-2 partition key.
    CREATE TABLE src (key string, value bigint) PARTITIONED BY (pt string,region string);

    ODPS分区

    For example, a MaxCompute table is partitioned at two levels and contains the 20170601 partition and the hangzhou subpartition. To create a lowest-level partitioned foreign table that is mapped to the hangzhou subpartition in an AnalyticDB for PostgreSQL database, the partition option must be set to pt=20170601,region=hangzhou.

    CREATE FOREIGN TABLE odps_src_20170601_hangzhou (   -- The name of the MaxCompute foreign table.
      key string,
      value bigint
    ) SERVER odps_serv                                  -- The name of the MaxCompute server.
    OPTIONS (
      project 'odps_fdw',                               -- The name of the MaxCompute project. 
      table 'src',                                      -- The name of the MaxCompute table.
      partition 'pt=20170601,region=hangzhou'           -- The full path of the lowest-level partition.
    );
    Note
    • Set the partition option in the key=value format. To specify the lowest-level partition in a multi-level partitioned table, separate the key-value pairs with commas (,). Do not include spaces in the partition option.

    • You cannot map a lowest-level partitioned foreign table to a non-lowest-level partition of a MaxCompute table. In this example, you cannot set the partition option to pt=20170601, which specifies the path of the partition.

    • Make sure that you set the partition option to the full path of a lowest-level partition. In this example, you cannot set the partition option to

      region=hangzhou, which specifies the relative path of the lowest-level partition.

  • Partitioned foreign tables

    A partitioned foreign table is mapped to a partitioned MaxCompute table. The following code shows how to use the preceding MaxCompute table named src that contains two levels of partitions to create a partitioned foreign table. For more information, see Define table partitioning.

    CREATE FOREIGN TABLE odps_src(                    -- The name of the MaxCompute foreign table.
      key text,
      value bigint,
      pt text,                                        -- The partition key of the MaxCompute foreign table.
      region text                                     -- The subpartition key of the MaxCompute foreign table.
    ) SERVER odps_serv
    OPTIONS (
      project 'odps_fdw',                             -- The name of the MaxCompute project.
      table 'src'                                     -- The name of the partitioned MaxCompute table.
    )
    PARTITION BY LIST (pt)                            -- Use the pt field to partition the table.
    SUBPARTITION BY LIST (region)                     -- Use the region field to subpartition the table.
        SUBPARTITION TEMPLATE (                       -- Specify a subpartition template.
           SUBPARTITION hangzhou VALUES ('hangzhou'),
           SUBPARTITION shanghai VALUES ('shanghai')
        )
    ( PARTITION "20170601" VALUES ('20170601'), 
      PARTITION "20170602" VALUES ('20170602'));
    Note

    When you create a partitioned foreign table in an AnalyticDB for PostgreSQL database, make sure that the following requirements are met. The requirements differ from the requirements for creating partitioned tables in MaxCompute.

    • Append the partition keys as fields to the end of other fields. If you create a multi-level partitioned foreign table, make sure that the order and levels of partition keys and the partition levels of the MaxCompute table match one another.

    • When you create a partitioned table, you must specify the partition key values. To partition a table, use the LIST method.

    • When you create a partitioned foreign table, you do not need to specify the partition option. If you specify the partition option, the foreign table is created based on the specified lowest-level partition of the specified MaxCompute table instead of the whole MaxCompute table.

    • If a partitioned foreign table contains partitions or subpartitions that have no matches in the specified MaxCompute table, an alert is triggered when you query the foreign table. In this case, you can delete the corresponding partition or subpartition from the foreign table. For more information, see the "Delete partitions or subpartitions from a partitioned foreign table" section of this topic.

      不存在的分析外表

3.4 Add partitions or subpartitions to a partitioned foreign table

In this example, the preceding partitioned foreign table named odps_src is used.

  • Add a partition, as shown in the following figure.

-- Add a partition. The subpartitions are automatically created.
alter table odps_src add partition "20170603" values(20170603);

添加一级子分区

  • Add a subpartition, as shown in the following figure.

-- Add a subpartition.
alter table odps_src alter partition "20170603" add partition "nanjing" values('nanjing');

image

3.5 Delete partitions or subpartitions from a partitioned foreign table

In this example, the preceding partitioned foreign table named odps_src is used.

  • Delete a partition, as shown in the following figure.

-- Delete a partition. The cascaded subpartitions are also deleted.
alter table odps_src drop partition "20170602";

image

  • Delete a subpartition, as shown in the following figure.

-- Delete a subpartition.
alter table odps_src alter partition "20170601" drop partition "hangzhou";

删除二级子分区

Data types supported by MaxCompute foreign tables

The following table describes the data type mappings between MaxCompute and AnalyticDB for PostgreSQL. We recommend that you specify the data types of columns in a foreign table in an AnalyticDB for PostgreSQL database based on the listed mappings.

Note

AnalyticDB for PostgreSQL does not support data types that correspond to the STRUCT, MAP, and ARRAY data types that are supported by MaxCompute.

MaxCompute data type

AnalyticDB for PostgreSQL data type

BOOLEAN

bool

TINYINT

int2

SMALLINT

int2

INTEGER

int4

BIGINT

int8

FLOAT

float4

DOUBLE

float8

DECIMAL

numeric

BINARY

bytea

VARCHAR(n)

varchar(n)

CHAR(n)

char(n)

STRING

text

DATE

date

DATETIME

timestamp

TIMESTAMP

timestamp

ARRAY<SMALLINT/INT/BIGINT/BOOLEAN/FLOAT/DOUBLE/TEXT/VARCHAR/TIMESTAMP>

(Not supported for the feature of writing data to MaxCompute foreign tables)

int2, int4 , int8, boolean, float4, float8, text, varchar, timestamp

Scenarios

You can execute foreign scans on AnalyticDB for PostgreSQL databases to scan MaxCompute foreign tables. You can execute the same query statements to query data in foreign tables and data in other tables in AnalyticDB for PostgreSQL. In this example, TPC Benchmark™H (TPC-H) queries are used to illustrate common scenarios in which MaxCompute foreign tables are used.

Query a MaxCompute foreign table

TPC-H Q1 queries are used to aggregate and filter data in a single table. In this example, a Q1 query is performed on the MaxCompute foreign table named odps_lineitem.

-- Create the MaxCompute foreign table named odps_lineitem.
CREATE FOREIGN TABLE odps_lineitem (
    l_orderkey bigint,
    l_partkey bigint,
    l_suppkey bigint,
    l_linenumber bigint,
    l_quantity double precision,
    l_extendedprice double precision,
    l_discount double precision,
    l_tax double precision,
    l_returnflag CHAR(1),
    l_linestatus CHAR(1),
    l_shipdate DATE,
    l_commitdate DATE,
    l_receiptdate DATE,
    l_shipinstruct CHAR(25),
    l_shipmode CHAR(10),
    l_comment VARCHAR(44)
) server odps_serv
    options (
        project 'odps_fdw', table 'lineitem'
    );

-- TPC-H Q1
select
    l_returnflag,
    l_linestatus,
    sum(l_quantity) as sum_qty,
    sum(l_extendedprice) as sum_base_price,
    sum(l_extendedprice * (1 - l_discount)) as sum_disc_price,
    sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) as sum_charge,
    avg(l_quantity) as avg_qty,
    avg(l_extendedprice) as avg_price,
    avg(l_discount) as avg_disc,
    count(*) as count_order
from
    odps_lineitem
where
    l_shipdate <= date '1998-12-01' - interval '88' day --(3)
group by
    l_returnflag,
    l_linestatus
order by
    l_returnflag,
    l_linestatus;

Import MaxCompute data to an on-premises table

To import data, perform the following steps:

  1. Create a MaxCompute foreign table in an AnalyticDB for PostgreSQL database.

  2. To import data to the new table, execute one of the following statements:

-- INSERT statement
INSERT INTO <Destination on-premises table> SELECT * FROM <MaxCompute foreign table>;

-- CREATE TABLE AS statement
CREATE TABLE <Destination on-premises table> AS SELECT * FROM <MaxCompute foreign table>;
  • Example 1: Execute the INSERT statement to import the data in odps_lineitem to an on-premises append-optimized column-oriented storage (AOCS) table.

-- Create an on-premises AOCS table.
CREATE TABLE aocs_lineitem (
    l_orderkey bigint,
    l_partkey bigint,
    l_suppkey bigint,
    l_linenumber bigint,
    l_quantity double precision,
    l_extendedprice double precision,
    l_discount double precision,
    l_tax double precision,
    l_returnflag CHAR(1),
    l_linestatus CHAR(1),
    l_shipdate DATE,
    l_commitdate DATE,
    l_receiptdate DATE,
    l_shipinstruct CHAR(25),
    l_shipmode CHAR(10),
    l_comment VARCHAR(44)
) WITH (APPENDONLY=TRUE, ORIENTATION=COLUMN, COMPRESSTYPE=ZSTD, COMPRESSLEVEL=5)
DISTRIBUTED BY (l_orderkey);

-- Import the data in odps_lineitem to the on-premises AOCS table.
INSERT INTO aocs_lineitem SELECT * FROM odps_lineitem;
  • Example 2: Execute the CREATE TABLE AS statement to import the data in odps_lineitem to an on-premises heap table.

create table heap_lineitem as select * from odps_lineitem distributed by (l_orderkey);

Associate a MaxCompute foreign table with an on-premises table

In this example, a TPC-H Q19 query is performed on the MaxCompute foreign table named odps_part that is associated with the on-premises AOCS table named aocs_lineitem.

-- TPC-H Q19
select
    sum(l_extendedprice* (1 - l_discount)) as revenue
from
    aocs_lineitem,          -- The name of the on-premises AOCS table.
    odps_part               -- The name of the MaxCompute foreign table.
where
    (
        p_partkey = l_partkey
        and p_brand = 'Brand#32'
        and p_container in ('SM CASE', 'SM BOX', 'SM PACK', 'SM PKG')
        and l_quantity >= 8 and l_quantity <= 8 + 10
        and p_size between 1 and 5
        and l_shipmode in ('AIR', 'AIR REG')
        and l_shipinstruct = 'DELIVER IN PERSON'
    )
    or
    (
        p_partkey = l_partkey
        and p_brand = 'Brand#41'
        and p_container in ('MED BAG', 'MED BOX', 'MED PKG', 'MED PACK')
        and l_quantity >= 15 and l_quantity <= 15 + 10
        and p_size between 1 and 10
        and l_shipmode in ('AIR', 'AIR REG')
        and l_shipinstruct = 'DELIVER IN PERSON'
    )
    or
    (
        p_partkey = l_partkey
        and p_brand = 'Brand#44'
        and p_container in ('LG CASE', 'LG BOX', 'LG PACK', 'LG PKG')
        and l_quantity >= 22 and l_quantity <= 22 + 10
        and p_size between 1 and 15
        and l_shipmode in ('AIR', 'AIR REG')
        and l_shipinstruct = 'DELIVER IN PERSON'
    );

Write data from an AnalyticDB for PostgreSQL table to MaxCompute

Example: Write data from the heap_lineitem table that is created in Example 2 to the odps_lineitem table.

INSERT INTO odps_lineitem SELECT * FROM heap_lineitem;

Common errors in using MaxCompute foreign tables

Tunnel Common Errors

Notes

You can synchronize data from MaxCompute to MaxCompute foreign tables by using MaxCompute Tunnel. The synchronization performance is subject to the server resources and the outbound network bandwidth of MaxCompute Tunnel. Therefore, we recommend that you take note of the following items:

  • If you use only foreign tables, an AnalyticDB for PostgreSQL database can contain up to five foreign tables.

  • If you associate multiple MaxCompute foreign tables, import the data in large MaxCompute foreign tables to on-premises tables and associate the on-premises tables with small foreign tables to improve the synchronization performance.