All Products
Search
Document Center

ApsaraDB RDS:Use the oss_fdw extension to read and write foreign data text files

Last Updated:Jul 25, 2024

This topic describes how to import data from an Object Storage Service (OSS) bucket into an ApsaraDB RDS for PostgreSQL instance by using the oss_fdw extension. This topic also describes how to export data from an ApsaraDB RDS for PostgreSQL instance to an OSS bucket by using the oss_fdw extension.

Prerequisites

Your RDS instance runs PostgreSQL 10 or later.

Note

If your RDS instance runs PostgreSQL 14, the minor engine version of your RDS instance must be 20220830 or later. For more information about how to update the minor engine version, see Update the minor engine version.

Use the oss_fdw extension

  1. Create the oss_fdw extension. For more information, see Parameters supported by the oss_fdw extension.

    CREATE EXTENSION oss_fdw; 
  2. Create a server. For more information, see Parameters in the CREATE SERVER statement.

    CREATE SERVER ossserver FOREIGN DATA WRAPPER oss_fdw OPTIONS 
         (host 'oss-cn-hangzhou-internal.aliyuncs.com' , id 'access_id', key 'secrect_key',bucket 'mybucket');
  3. Create an OSS foreign table named ossexample. For more information, see Parameters in the CREATE FOREIGN TABLE statement.

    CREATE FOREIGN TABLE ossexample 
        (date text, time text, open float,
         high float, low float, volume int) 
         SERVER ossserver 
         OPTIONS ( dir 'osstest/', delimiter ',' ,
             format 'csv', encoding 'utf8');
    Important

    The table schema in the table creation statement is used for reference only. Make sure that the schema of the foreign table is the same as the schema of the table that corresponds to the OSS object.

  4. Read data from the OSS foreign table and write the data to your RDS instance.

    • Read data from the OSS foreign table.

      SELECT * FROM ossexample;
    • Write the data in the OSS bucket to your RDS instance.

      1. Create a table named example that has the same schema as the ossexample table.

        CREATE TABLE example
                (date text, time text, open float,
                 high float, low float, volume int);
      2. Write the data in the OSS bucket to the example table.

        INSERT INTO example SELECT * FROM ossexample;
        • You can execute the EXPLAIN statement to estimate the size of OSS objects and develop a query plan.

          EXPLAIN INSERT INTO example SELECT * FROM ossexample;
                                        QUERY PLAN
          ----------------------------------------------------------------------
           Insert on example  (cost=0.00..1.10 rows=0 width=0)
             ->  Foreign Scan on ossexample  (cost=0.00..1.10 rows=1 width=998)
                   Foreign OssDir: osstest/
                   Number Of Ossfile: 2
        • You can also write the data in the example table to an OSS bucket.

          INSERT INTO ossexample SELECT * FROM example;

For more information about oss_fdw parameters, see the following section.

Parameters supported by the oss_fdw extension

Similar to other foreign data wrappers (FDW), the oss_fdw extension encapsulates foreign data that is stored in OSS buckets. You can use the oss_fdw extension to read data from OSS buckets. This process is similar to the process of reading data tables. The oss_fdw extension supports parameters to connect to OSS buckets and analyze data in the OSS buckets.

Note

The oss_fdw extension can read data from and write data to OSS objects in the CSV format. The OSS objects include the CSV objects that are compressed by using gzip.

Parameters in the CREATE SERVER statement

Parameter

Description

host

The internal endpoint of OSS for the required region.

id

The AccessKey ID of your Alibaba Cloud account. For more information, see Obtain an AccessKey pair.

key

The AccessKey secret of your Alibaba Cloud account. For more information, see Obtain an AccessKey pair.

bucket

The name of the OSS bucket. You must create an OSS account before you configure this parameter.

The following table describes the fault tolerance parameters that are provided by OSS. If network connectivity is poor, you can adjust the values of these parameters to ensure successful import and export.

Parameter

Description

oss_connect_timeout

The timeout period of a connection. Unit: seconds. Default value: 10.

oss_dns_cache_timeout

The timeout period of a cached Domain Name System (DNS) record. Unit: seconds. Default value: 60.

oss_speed_limit

The minimum transmission rate that can be tolerated. Unit: bit/s. Default value: 1024. The default value is equal to 1 Kbit/s.

oss_speed_time

The maximum period of time for which the minimum transmission rate can be tolerated. Unit: seconds. Default value: 15.

Note

You can retain the default values of the oss_speed_limit and oss_speed_time parameters. In this case, if the transmission rate remains less than 1 Kbit/s for 15 consecutive seconds, a timeout error occurs.

Parameters in the CREATE FOREIGN TABLE statement

Parameter

Description

filepath

The object name that is used to match objects stored in the OSS bucket. The object name must contain an OSS path. You can configure one of the filepath and dir parameters. If you configure the filepath parameter, you can only import data from an OSS bucket into your RDS instance.

  • The object name does not contain the name ofthe OSS bucket.

  • The object name matches multiple objects that are stored in the OSS path. This allows you to import data from multiple objects into your RDS instance.

  • Only the data from the objects that are named in the following formats can be imported into your RDS instance: filepath and filepath.x. The values of the x variable must be consecutive integers that start from 1.

    For example, five objects are stored in the OSS path: filepath, filepath.1, filepath.2, filepath.3, and filepath.5. In this case, filepath, filepath.1, filepath.2, and filepath.3 are matched and imported, but filepath.5 cannot be matched or imported.

dir

The folder that is used to match objects stored in the OSS bucket. You can configure one of the filepath and dir parameters. If you configure the dir parameter, you can import and export data between an OSS bucket and your RDS instance.

  • The folder name must end with a forward slash (/).

  • The data of all the objects in the folder are matched and imported into your RDS instance. However, these objects do not include the subfolders and the objects stored in the subfolders.

prefix

The prefix of the OSS path to the object. This parameter does not support regular expressions. You can configure only one of the prefix, filepath, and dir parameters.

format

The format that is supported for the objects stored in the OSS bucket. Only the CSV format is supported.

encoding

The format that is used to encode data in the objects stored in the OSS bucket. Common encoding formats in PostgreSQL are supported. These supported formats include UTF-8.

parse_errors

Parses the fault tolerance mode when you read data from the OSS bucket. Unit: data rows. Errors that occur during file analysis are ignored.

Important

This parameter is not supported when you write data from your RDS instance to the OSS bucket. If you want to write data from your RDS instance to the OSS bucket, do not configure this parameter.

delimiter

The delimiter that is used to separate columns in the objects stored in the OSS bucket.

quote

The quote character that is supported for the objects stored in the OSS bucket.

escape

The escape character that is supported for the objects stored in the OSS bucket.

null

Populates an empty column by using null values. For example, you specify the null 'test' setting. In this case, if the test column is empty, it is populated by using null values.

force_not_null

Populates an empty column by using empty strings rather than null values. For example, you specify the force_not_null 'id' setting. In this case, if the ID column is empty, it is populated by using empty strings rather than null values.

compressiontype

The compression format that is used to read data from and write data to the objects stored in the OSS bucket.

  • none: The data is not compressed. This is the default value.

  • gzip: The data is compressed in the GZIP format.

compressionlevel

The compression level that is used to write data to the objects stored in the OSS bucket. Valid values: 1 to 9. Default value: 6.

Note
  • The filepath and dir parameters are specified in the OPTIONS parameter.

  • You must specify the filepath parameter or the dir parameter. Do not specify both parameters.

  • If you export data from your RDS instance to the OSS bucket, you can specify only the dir parameter. You cannot specify the filepath parameter.

Parameters in the CREATE FOREIGN TABLE statement

  • oss_flush_block_size: the buffer size of the data that can be written to the OSS bucket at a time. Valid values: 1 to 128. Unit: MB. Default value: 32.

  • oss_file_max_size: the maximum amount of data that can be written to an object in the OSS bucket. If the amount of data that needs to be written reaches the maximum value, the data that remains is written to a new object. Valid values: 8 to 4000. Unit: MB. Default value: 1024.

  • num_parallel_worker: the maximum number of threads that can run in parallel to compress the data written to the OSS bucket. Valid values: 1 to 8. Default value: 3.

Auxiliary functions

FUNCTION oss_fdw_list_file (relname text, schema text DEFAULT 'public')

  • This function is used to obtain the name and size of the OSS object that the specified foreign table matches.

  • The size is measured in bytes.

SELECT * FROM oss_fdw_list_file('ossexample');
              name              |   size    
--------------------------------+-----------
 osstest/test.gz.1  | 739698350
 osstest/test.gz.2  | 739413041
 osstest/test.gz.3  | 739562048
(3 rows)

Auxiliary parameters

oss_fdw.rds_read_one_file: specifies the OSS object that a foreign table matches. This parameter is supported only when you import data from the OSS bucket into your RDS instance. If you specify this parameter, only the OSS object that the specified foreign table matches is imported.

SET oss_fdw.rds_read_one_file = 'osstest/test.gz.2';
SELECT * FROM oss_fdw_list_file('ossexample');
              name              |   size    
--------------------------------+-----------
  oss_test/test.gz.2  | 739413041
(1 rows)

Usage notes

  • The oss_fdw extension is developed based on the PostgreSQL FOREIGN TABLE framework to manage foreign tables.

  • Data import performance varies based on the available PostgreSQL and OSS resources. The PostgreSQL resources are CPU, I/O, and memory.

  • To import data at high performance, make sure that your RDS instance and the OSS bucket reside in the same region. For more information, see OSS domain names.

  • If the error "ERROR: oss endpoint userendpoint not in aliyun white list" is reported when SQL statements are read from the foreign table, we recommend that you use the public OSS endpoint that is provided for the specified region. For more information, see Regions and endpoints.

Troubleshooting

If an import or export error occurs, the following error information is logged:

  • code: the HTTP status code of the request that has failed.

  • error_code: the error code that is returned by OSS.

  • error_msg: the error message that is returned by OSS.

  • req_id: the UUID of the request that has failed. If you require assistance in solving a problem, you can submit a ticket that contains the req_id value of the failed request.

For more information about various errors, see the following documentation. You can handle time-out errors by reconfiguring the parameters related to the oss_ext extension.

Encryption of AccessKey ID and AccessKey secret

If you do not encrypt the values of the id and key parameters in the CREATE SERVER statement, other users can obtain your AccessKey pair in plaintext by executing the select * from pg_foreign_server statement. You can use symmetric encryption to encrypt the values of the id and key parameters. Use different keys for different RDS instances. This further protects your AccessKey pair. However, you cannot add data types as you can in Greenplum. This prevents incompatibility with earlier versions.

The following snippet provides the encrypted values of the id and key parameters:

postgres=# SELECT * FROM pg_foreign_server ;
  srvname  | srvowner | srvfdw | srvtype | srvversion | srvacl |                                                                              srvoptions
-----------+----------+--------+---------+------------+--------+------------------------------------------------------------------------------------------------------------------------------------
----------------------------------
 ossserver |       10 |  16390 |         |            |        | {host=oss-cn-hangzhou-zmf.aliyuncs.com,id=MD5****,key=MD5****,bucket=067862}

Each encrypted value starts with an MD5 string. The total length divided by 8 is 3. After these encrypted values are exported, they will not be encrypted again. Take note that you cannot create an AccessKey ID or AccessKey secret that starts with an MD5 string.