Use partitioned OSS foreign tables - AnalyticDB - Alibaba Cloud Documentation Center

The table partitioning feature is supported for Object Storage Service (OSS) foreign tables. If the WHERE clause of a query statement contains a partition column, you can improve query efficiency by scanning less data from OSS.

Precautions

To use the partitioning feature of OSS foreign tables, make sure that the object directory is in the following format: oss://bucket/partcol1=partval1/partcol2=partval2/. In this directory, partcol1 and partcol2 indicate the partition columns, and partval1 and partval2 indicate the values in the preceding partition columns.

For example, assume that a table is partitioned based on the year column and then subpartitioned based on the month column. If the value in the year column is 2022 and the value in the month column is 07, the associated partition is stored in the oss://bucket/year=2022/month=07 directory.

Create an OSS server and a user mapping to the OSS server

Before you use OSS foreign tables, you must create an OSS server and a user mapping to the OSS server.

For more information about how to create an OSS server, see the "Create an OSS server" section of the Use OSS foreign tables for data lake analysis topic.
For more information about how to create a user mapping to the OSS server, see the " Create a user mapping to the OSS server" section of the Use OSS foreign tables for data lake analysis topic.

Create a partitioned table

You can execute a CREATE FOREIGN TABLE statement to create a partitioned OSS foreign table. The syntax used to create a partitioned OSS foreign table is the same as that used to create a standard partitioned table. For more information, see Define table partitioning.

For more information about CREATE FOREIGN TABLE, see Create an OSS foreign table.

Note

OSS foreign tables support only list partitions.

Create a partitioned table named ossfdw_parttable and specify a partition template. Example:

CREATE FOREIGN TABLE ossfdw_parttable(            
  key text,
  value bigint,
  pt text,                                        -- Define a partition key.
  region text                                     -- Define a subpartition key.
) 
SERVER oss_serv
OPTIONS (dir 'PartationDataDirInOss/', format 'jsonline')
PARTITION BY LIST (pt)                            -- Use the pt field to partition the table.
SUBPARTITION BY LIST (region)                     -- Use the region field to subpartition the table.
    SUBPARTITION TEMPLATE (                       -- Specify a subpartition template.
       SUBPARTITION hangzhou VALUES ('hangzhou'),
       SUBPARTITION shanghai VALUES ('shanghai')
    )
( PARTITION "20170601" VALUES ('20170601'), 
  PARTITION "20170602" VALUES ('20170602'));

Create a partitioned table named ossfdw_parttable1 without specifying a partition template. Example:

CREATE FOREIGN TABLE ossfdw_parttable1(            
  key text,
  value bigint,
  pt text,                                        -- Define a partition key.
  region text                                     -- Define a subpartition key.
) 
SERVER oss_serv
OPTIONS (dir 'PartationDataDirInOss/', format 'jsonline')
PARTITION BY LIST (pt)                            -- Use the pt field to partition the table.
SUBPARTITION BY LIST (region)                     -- Use the region field to subpartition the table.
(
    -- The following partitions contain different subpartitions: 
    VALUES('20181218')
    (
        VALUES('hangzhou'),
        VALUES('shanghai') 
    ),
    VALUES('20181219')
    (
        VALUES('nantong'),
        VALUES('anhui') 
    )    
);

Adjust the schema of a partitioned OSS foreign table

You can execute an ALTER TABLE statement to adjust the schema of a partitioned OSS foreign table. AnalyticDB for PostgreSQL allows you to create or delete partitions and subpartitions in OSS foreign tables.

Create partitions and subpartitions

Create a partition
- Create a partition in the ossfdw_parttable table. The system automatically generates corresponding subpartitions based on the partition template specified when you create the table. Example:
```
ALTER TABLE ossfdw_parttable ADD PARTITION VALUES ('20170603');
```
  The following figure shows the change to the table schema.
- Create a partition in the ossfdw_parttable1 table. You must specify subpartitions in the partition because no partition template is specified when you create the table. Example:
```
ALTER TABLE ossfdw_parttable1 ADD PARTITION VALUES ('20181220')
(
    VALUES('hefei'),
    VALUES('guangzhou') 
);
```
Create a subpartition
Create a subpartition in the 20170603 partition of the ossfdw_parttable table. Example:
```
ALTER TABLE ossfdw_parttable ALTER PARTITION FOR ('20170603') ADD PARTITION VALUES('nanjing');
```
The following figure shows the change to the table schema.

Delete partitions and subpartitions

Delete a partition. Example:

ALTER TABLE ossfdw_parttable DROP PARTITION FOR ('20170601');

Delete a subpartition. Example:

ALTER TABLE ossfdw_parttable ALTER PARTITION FOR ('20170602') DROP PARTITION FOR ('hangzhou');

Delete a partitioned table

You can execute a DROP FOREIGN TABLE statement to delete a partitioned table.

Delete a partitioned table. Example:

DROP FOREIGN TABLE ossfdw_parttable;

Use OSS foreign tables to access log data shipped by Log Service

Partitioned OSS foreign tables can be used when you access log data shipped by Log Service. If you use an appropriate object path when you write data from Log Service to OSS, you can create a partitioned OSS foreign table.

For more information about Log Service, see What is Simple Log Service?

Ship log data from Simple Log Service to OSS.

In the OSS LogShipper panel, we recommend that you set Shard Format to date=%Y%m/userlogin. An example of the generated OSS object path is in the following format:

oss://testBucketName/adbpgossfdw
├── date=202002
│   ├── userlogin_158561762910654****_647504382.csv
│   └── userlogin_158561784923220****_647507440.csv
└── date=202003
    └── userlogin_158561794424704****_647508762.csv

Create a partitioned OSS foreign table based on the key fields of log data shipped by Log Service. Example:

CREATE FOREIGN TABLE userlogin (
        uid integer,
        name character varying,
        source integer,
        logindate timestamp without time zone,
        "date" int
) SERVER oss_serv OPTIONS (
    dir 'adbpgossfdw/',
    format 'text'
)
PARTITION BY LIST ("date")
(
        VALUES ('202002'), 
        VALUES ('202003')       
)

Query and analyze the execution plan of a SELECT statement on the log data shipped by Log Service.

For example, assume that you want to query the total number of user logons in February 2022.

EXPLAIN SELECT uid, count(uid) FROM userlogin WHERE "date" = 202002 GROUP BY uid;

The following information is returned:

                                                                            QUERY PLAN                                                                             
-------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Gather Motion 3:1  (slice2; segments: 3)  (cost=5135.10..5145.10 rows=1000 width=12)
   ->  HashAggregate  (cost=5135.10..5145.10 rows=334 width=12)
         Group Key: userlogin_1_prt_1.uid
         ->  Redistribute Motion 3:3  (slice1; segments: 3)  (cost=5100.10..5120.10 rows=334 width=12)
               Hash Key: userlogin_1_prt_1.uid
               -> HashAggregate  (cost=5100.10..5100.10 rows=334 width=12)
                     Group Key: userlogin_1_prt_1.uid
                     ->t;  Append  (cost=0.00..100.10 rows=333334 width=4)
                           ->  Foreign Scan on userlogin_1_prt_1  (cost=0.00..100.10 rows=333334 width=4)
                                 Filter: (date = 202002)
                                 Oss Url: endpoint=oss-cn-hangzhou-zmf-internal.aliyuncs.com bucket=adbpg-regress dir=adbpgossfdw/date=202002/ filetype=plain|text
                                 Oss Parallel (Max 4) Get: total 0 file(s) with 0 bytes byte(s).
 Optimizer: Postgres query optimizer
(13 rows)

The returned information shows that only data in the date=202002 directory needs to be scanned. The less data scanned, the higher the query efficiency.