This topic describes how to import data from an Object Storage Service (OSS) bucket into an ApsaraDB RDS for PostgreSQL instance by using the oss_fdw extension. This topic also describes how to export data from an ApsaraDB RDS for PostgreSQL instance to an OSS bucket by using the oss_fdw extension.
Prerequisites
Your RDS instance runs PostgreSQL 10 or later.
If your RDS instance runs PostgreSQL 14, the minor engine version of your RDS instance must be 20220830 or later. For more information about how to update the minor engine version, see Update the minor engine version.
Use the oss_fdw extension
Create the oss_fdw extension. For more information, see Parameters supported by the oss_fdw extension.
CREATE EXTENSION oss_fdw;
Create a server. For more information, see Parameters in the CREATE SERVER statement.
CREATE SERVER ossserver FOREIGN DATA WRAPPER oss_fdw OPTIONS (host 'oss-cn-hangzhou-internal.aliyuncs.com' , id 'access_id', key 'secrect_key',bucket 'mybucket');
Create an OSS foreign table named ossexample. For more information, see Parameters in the CREATE FOREIGN TABLE statement.
CREATE FOREIGN TABLE ossexample (date text, time text, open float, high float, low float, volume int) SERVER ossserver OPTIONS ( dir 'osstest/', delimiter ',' , format 'csv', encoding 'utf8');
ImportantThe table schema in the table creation statement is used for reference only. Make sure that the schema of the foreign table is the same as the schema of the table that corresponds to the OSS object.
Read data from the OSS foreign table and write the data to your RDS instance.
Read data from the OSS foreign table.
SELECT * FROM ossexample;
Write the data in the OSS bucket to your RDS instance.
Create a table named example that has the same schema as the ossexample table.
CREATE TABLE example (date text, time text, open float, high float, low float, volume int);
Write the data in the OSS bucket to the example table.
INSERT INTO example SELECT * FROM ossexample;
You can execute the
EXPLAIN
statement to estimate the size of OSS objects and develop a query plan.EXPLAIN INSERT INTO example SELECT * FROM ossexample; QUERY PLAN ---------------------------------------------------------------------- Insert on example (cost=0.00..1.10 rows=0 width=0) -> Foreign Scan on ossexample (cost=0.00..1.10 rows=1 width=998) Foreign OssDir: osstest/ Number Of Ossfile: 2
You can also write the data in the example table to an OSS bucket.
INSERT INTO ossexample SELECT * FROM example;
For more information about oss_fdw parameters, see the following section.
Parameters supported by the oss_fdw extension
Similar to other foreign data wrappers (FDW), the oss_fdw extension encapsulates foreign data that is stored in OSS buckets. You can use the oss_fdw extension to read data from OSS buckets. This process is similar to the process of reading data tables. The oss_fdw extension supports parameters to connect to OSS buckets and analyze data in the OSS buckets.
The oss_fdw extension can read data from and write data to OSS objects in the CSV format. The OSS objects include the CSV objects that are compressed by using gzip.
Parameters in the CREATE SERVER statement
Parameter | Description |
host | The internal endpoint of OSS for the required region. |
id | The AccessKey ID of your Alibaba Cloud account. For more information, see Obtain an AccessKey pair. |
key | The AccessKey secret of your Alibaba Cloud account. For more information, see Obtain an AccessKey pair. |
bucket | The name of the OSS bucket. You must create an OSS account before you configure this parameter. |
The following table describes the fault tolerance parameters that are provided by OSS. If network connectivity is poor, you can adjust the values of these parameters to ensure successful import and export.
Parameter | Description |
oss_connect_timeout | The timeout period of a connection. Unit: seconds. Default value: 10. |
oss_dns_cache_timeout | The timeout period of a cached Domain Name System (DNS) record. Unit: seconds. Default value: 60. |
oss_speed_limit | The minimum transmission rate that can be tolerated. Unit: bit/s. Default value: 1024. The default value is equal to 1 Kbit/s. |
oss_speed_time | The maximum period of time for which the minimum transmission rate can be tolerated. Unit: seconds. Default value: 15. |
You can retain the default values of the oss_speed_limit and oss_speed_time parameters. In this case, if the transmission rate remains less than 1 Kbit/s for 15 consecutive seconds, a timeout error occurs.
Parameters in the CREATE FOREIGN TABLE statement
Parameter | Description |
filepath | The object name that is used to match objects stored in the OSS bucket. The object name must contain an OSS path. You can configure one of the filepath and dir parameters. If you configure the filepath parameter, you can only import data from an OSS bucket into your RDS instance.
|
dir | The folder that is used to match objects stored in the OSS bucket. You can configure one of the filepath and dir parameters. If you configure the dir parameter, you can import and export data between an OSS bucket and your RDS instance.
|
prefix | The prefix of the OSS path to the object. This parameter does not support regular expressions. You can configure only one of the prefix, filepath, and dir parameters. |
format | The format that is supported for the objects stored in the OSS bucket. Only the CSV format is supported. |
encoding | The format that is used to encode data in the objects stored in the OSS bucket. Common encoding formats in PostgreSQL are supported. These supported formats include UTF-8. |
parse_errors | Parses the fault tolerance mode when you read data from the OSS bucket. Unit: data rows. Errors that occur during file analysis are ignored. Important This parameter is not supported when you write data from your RDS instance to the OSS bucket. If you want to write data from your RDS instance to the OSS bucket, do not configure this parameter. |
delimiter | The delimiter that is used to separate columns in the objects stored in the OSS bucket. |
quote | The quote character that is supported for the objects stored in the OSS bucket. |
escape | The escape character that is supported for the objects stored in the OSS bucket. |
null | Populates an empty column by using null values. For example, you specify the null 'test' setting. In this case, if the test column is empty, it is populated by using null values. |
force_not_null | Populates an empty column by using empty strings rather than null values. For example, you specify the |
compressiontype | The compression format that is used to read data from and write data to the objects stored in the OSS bucket.
|
compressionlevel | The compression level that is used to write data to the objects stored in the OSS bucket. Valid values: 1 to 9. Default value: 6. |
The filepath and dir parameters are specified in the OPTIONS parameter.
You must specify the filepath parameter or the dir parameter. Do not specify both parameters.
If you export data from your RDS instance to the OSS bucket, you can specify only the dir parameter. You cannot specify the filepath parameter.
Parameters in the CREATE FOREIGN TABLE statement
oss_flush_block_size: the buffer size of the data that can be written to the OSS bucket at a time. Valid values: 1 to 128. Unit: MB. Default value: 32.
oss_file_max_size: the maximum amount of data that can be written to an object in the OSS bucket. If the amount of data that needs to be written reaches the maximum value, the data that remains is written to a new object. Valid values: 8 to 4000. Unit: MB. Default value: 1024.
num_parallel_worker: the maximum number of threads that can run in parallel to compress the data written to the OSS bucket. Valid values: 1 to 8. Default value: 3.
Auxiliary functions
FUNCTION oss_fdw_list_file (relname text, schema text DEFAULT 'public')
This function is used to obtain the name and size of the OSS object that the specified foreign table matches.
The size is measured in bytes.
SELECT * FROM oss_fdw_list_file('ossexample');
name | size
--------------------------------+-----------
osstest/test.gz.1 | 739698350
osstest/test.gz.2 | 739413041
osstest/test.gz.3 | 739562048
(3 rows)
Auxiliary parameters
oss_fdw.rds_read_one_file: specifies the OSS object that a foreign table matches. This parameter is supported only when you import data from the OSS bucket into your RDS instance. If you specify this parameter, only the OSS object that the specified foreign table matches is imported.
SET oss_fdw.rds_read_one_file = 'osstest/test.gz.2';
SELECT * FROM oss_fdw_list_file('ossexample');
name | size
--------------------------------+-----------
oss_test/test.gz.2 | 739413041
(1 rows)
Usage notes
The oss_fdw extension is developed based on the PostgreSQL FOREIGN TABLE framework to manage foreign tables.
Data import performance varies based on the available PostgreSQL and OSS resources. The PostgreSQL resources are CPU, I/O, and memory.
To import data at high performance, make sure that your RDS instance and the OSS bucket reside in the same region. For more information, see OSS domain names.
If the error "
ERROR: oss endpoint userendpoint not in aliyun white list
" is reported when SQL statements are read from the foreign table, we recommend that you use the public OSS endpoint that is provided for the specified region. For more information, see Regions and endpoints.
Troubleshooting
If an import or export error occurs, the following error information is logged:
code: the HTTP status code of the request that has failed.
error_code: the error code that is returned by OSS.
error_msg: the error message that is returned by OSS.
req_id: the UUID of the request that has failed. If you require assistance in solving a problem, you can submit a ticket that contains the req_id value of the failed request.
For more information about various errors, see the following documentation. You can handle time-out errors by reconfiguring the parameters related to the oss_ext extension.
Encryption of AccessKey ID and AccessKey secret
If you do not encrypt the values of the id and key parameters in the CREATE SERVER statement, other users can obtain your AccessKey pair in plaintext by executing the select * from pg_foreign_server
statement. You can use symmetric encryption to encrypt the values of the id and key parameters. Use different keys for different RDS instances. This further protects your AccessKey pair. However, you cannot add data types as you can in Greenplum. This prevents incompatibility with earlier versions.
The following snippet provides the encrypted values of the id and key parameters:
postgres=# SELECT * FROM pg_foreign_server ;
srvname | srvowner | srvfdw | srvtype | srvversion | srvacl | srvoptions
-----------+----------+--------+---------+------------+--------+------------------------------------------------------------------------------------------------------------------------------------
----------------------------------
ossserver | 10 | 16390 | | | | {host=oss-cn-hangzhou-zmf.aliyuncs.com,id=MD5****,key=MD5****,bucket=067862}
Each encrypted value starts with an MD5 string. The total length divided by 8 is 3. After these encrypted values are exported, they will not be encrypted again. Take note that you cannot create an AccessKey ID or AccessKey secret that starts with an MD5 string.