This topic describes how to use the COPY statement to import data from Object Storage Service (OSS) foreign tables to AnalyticDB for PostgreSQL tables, and use the UNLOAD statement to export data from AnalyticDB for PostgreSQL tables to OSS foreign tables.
The COPY or UNLOAD statement is used to import or export data based on OSS foreign tables. For more information, see Use OSS foreign tables to access OSS data.
COPY
Syntax
COPY <table_name>
[ <column_list> ]
FROM <data_source>
ACCESS_KEY_ID '<access_key_id>'
SECRET_ACCESS_KEY '<secret_access_key>'
[ [ FORMAT ] [ AS ] <data_format> ]
[ MANIFEST ]
[ option '<value>' [ ... ] ]
Parameters
Parameter | Required | Description |
table_name | Yes | The name of the AnalyticDB for PostgreSQL table in which the imported data is stored. The AnalyticDB for PostgreSQL table must exist in the AnalyticDB for PostgreSQL instance. |
column_list | No | The list of columns to which you want to write data. If you do not specify this parameter, data is written to all columns. |
data_source | Yes | The URL of the OSS bucket from which data is obtained. Example: oss://<bucket_name>/path_prefix. |
access_key_id | Yes | The AccessKey ID of an Alibaba Cloud account or a Resource Access Management (RAM) user that has permissions to access OSS. For information about how to obtain your AccessKey ID, see Obtain an AccessKey pair. |
secret_access_key | Yes | The AccessKey secret of an Alibaba Cloud account or a RAM user that has permissions to access OSS. For information about how to obtain your AccessKey secret, see Obtain an AccessKey pair. |
[ FORMAT ] [ AS ] <data_format> | No | The file format in which the imported data is stored. If you do not specify this parameter, FORMAT AS CSV is used. You can set |
MANIFEST | No | The data source is a manifest file. The manifest file must be in the JSON format and consist of the following elements:
|
[ option '<value>' [ ... ] ] | No | A list of options. Specify each option in the |
Options
Option | Type | Required | Description |
ENDPOINT | STRING | Yes | The OSS endpoint. For information about how to obtain the OSS endpoint, see Regions and endpoints. |
FDW | STRING | Yes | The name of the oss_fdw extension. The oss_fdw extension is required when you create a temporary OSS server for the COPY statement. |
Other options that are used to create an OSS foreign table, such as FORMAT, FILETYPE, DELIMITER, and ESCAPE | N/A | N/A | The options that are used to create a temporary OSS foreign table. For more information, see Overview of OSS foreign tables. |
Examples
Example 1
Create an AnalyticDB for PostgreSQL table.
CREATE TABLE local_t2 (a int, b float8, c text);
Use the COPY statement to import data to columns a and c. Column b is assigned NULL.
COPY local_t2 (a, c) FROM 'oss://adbpg-regress/local_t/' ACCESS_KEY_ID 'LTAI5tDfyHVmxCf66Un****' SECRET_ACCESS_KEY 'TNPPxOFY4xx7CZGjVgSsVBtIs****' FORMAT AS CSV ENDPOINT 'oss-cn-hangzhou-internal.aliyuncs.com' FDW 'oss_fdw';
Query data of the table.
SELECT * FROM local_t2 LIMIT 10;
Sample result:
a | b | c ----+---+---------------------------------- 12 | | a24cba6ebdc5e0c485cd88ef60b72fea 15 | | c4d3028f5205fab98e5f43c7945db4ba 20 | | 769884311db01f400e21a903a3f1cb50 26 | | 7d12c981d262e0067ea1a04368f32f2a 30 | | 4e64bda52d54d263d16f42771b1d0225 35 | | b70c976d4c04568bd497b42a7d2e451d 40 | | d07ce2948b8618b47c351b6e222182f6 46 | | c2234393f878f5557776b7e778299564 47 | | cde904b2331fa274cd8d9266aa858342 50 | | 1235b900fb644bb36440a274314e4b6b (10 rows)
Check whether the data in columns a and c of the local_t2 table is the same as that of the local_t table.
SELECT sum(hashtext(t.a::text)) AS col_a_hash, sum(hashtext(t.c::text)) AS col_c_hash FROM local_t2 t;
Sample result:
col_a_hash | col_c_hash -------------+------------- 23725368368 | 13447976580 (1 row)
SELECT sum(hashtext(t.a::text)) AS col_a_hash, sum(hashtext(t.c::text)) AS col_c_hash FROM local_t t;
Sample result:
col_a_hash | col_c_hash -------------+------------- 23725368368 | 13447976580 (1 row)
Save the data in a format other than CSV.
Save the data in the ORC format.
COPY tt FROM 'oss://adbpg-regress/q_oss_orc_list/' ACCESS_KEY_ID 'LTAI5tDfyHVmxCf66Un****' SECRET_ACCESS_KEY 'TNPPxOFY4xx7CZGjVgSsVBtIs****' FORMAT AS ORC ENDPOINT 'oss-cn-hangzhou-internal.aliyuncs.com' FDW 'oss_fdw';
Save the data in the PARQUET format.
COPY tp FROM 'oss://adbpg-regress/test_parquet/' ACCESS_KEY_ID 'LTAI5tDfyHVmxCf66Un****' SECRET_ACCESS_KEY 'TNPPxOFY4xx7CZGjVgSsVBtIs****' FORMAT AS PARQUET ENDPOINT 'oss-cn-hangzhou-internal.aliyuncs.com' FDW 'oss_fdw';
Example 2
Create an AnalyticDB for PostgreSQL table.
CREATE TABLE local_manifest (a int, c text);
Create a manifest file in which the OSS objects can be in different buckets.
{ "entries": [ {"url": "oss://adbpg-regress/local_t/_20210114103840_83f407434beccbd4eb2a0ce45ef39568_1450404435_seg2_0.csv", "mandatory":true}, {"url": "oss://adbpg-regress/local_t/_20210114103840_83f407434beccbd4eb2a0ce45ef39568_1856683967_seg1_0.csv", "mandatory":true}, {"url": "oss://adbpg-regress/local_t/_20210114103840_83f407434beccbd4eb2a0ce45ef39568_1880804901_seg0_0.csv", "mandatory":true}, {"url": "oss://adbpg-regress-2/local_t/_20210114103849_67100080728ef95228e662bc02cb99d1_1008521914_seg1_0.csv", "mandatory":true}, {"url": "oss://adbpg-regress-2/local_t/_20210114103849_67100080728ef95228e662bc02cb99d1_1234881553_seg2_0.csv", "mandatory":true}, {"url": "oss://adbpg-regress-2/local_t/_20210114103849_67100080728ef95228e662bc02cb99d1_1711667760_seg0_0.csv", "mandatory":true} ] }
Use the COPY statement to import the data from the manifest file to the AnalyticDB for PostgreSQL table.
COPY local_manifest FROM 'oss://adbpg-regress-2/unload_manifest/t_manifest' ACCESS_KEY_ID 'LTAI5tDfyHVmxCf66Un****' SECRET_ACCESS_KEY 'TNPPxOFY4xx7CZGjVgSsVBtIs****' FORMAT AS CSV MANIFEST -- The data source is a manifest file. ENDPOINT 'oss-cn-hangzhou-internal.aliyuncs.com' FDW 'oss_fdw';
Example 3
When you use the COPY statement to import data from OSS, error lines may be returned. In this case, you can configure the following options to implement fault tolerance:
log_errors: specifies whether to record information of the error lines in log files.
segment_reject_limit:
segment_reject_limit '10'
specifies if the number of error lines is greater than or equal to 10, the system returns an error and exits.segment_reject_limit '10%'
specifies if the proportion of error lines is greater than or equal to 10%, the system returns an error and exits.
Create an AnalyticDB for PostgreSQL table.
CREATE TABLE sales(id integer, value float8, x text) DISTRIBUTED BY (id);
Use the COPY statement to import data from an OSS object that has three error lines.
COPY sales FROM 'oss://adbpg-const/error_sales/' ACCESS_KEY_ID 'LTAI5tDfyHVmxCf66Un****' SECRET_ACCESS_KEY 'TNPPxOFY4xx7CZGjVgSsVBtIs****' FORMAT AS csv log_errors 'true' segment_reject_limit '10' endpoint 'oss-cn-hangzhou-internal.aliyuncs.com' FDW 'oss_fdw'; NOTICE: found 3 data formatting errors (3 or more input rows), rejected related input data COPY FOREIGN TABLE
Execute the following statement to query the error line details:
SELECT * FROM gp_read_error_log('<Name of the destination table of the COPY statement>');
In the following example, the error line details of the sales table are queried:
SELECT * FROM gp_read_error_log('sales');
Sample result:
cmdtime | relname | filename | linenum | bytenum | errmsg | rawdata | rawbytes -------------------------------+------------------------------------------------+-------------------------+---------+---------+-----------------------------------------------------------+---------+---------- 2021-02-08 14:24:04.225238+08 | adbpgforeigntabletmp_20210208142403_1936866966 | error_sales/sales.2.csv | 2 | | invalid byte sequence for encoding "UTF8": 0xed 0xab 0xad | | \x 2021-02-08 14:24:04.225238+08 | adbpgforeigntabletmp_20210208142403_1936866966 | error_sales/sales.2.csv | 3 | | invalid byte sequence for encoding "UTF8": 0xed 0xab 0xad | | \x 2021-02-08 14:24:04.225269+08 | adbpgforeigntabletmp_20210208142403_1936866966 | error_sales/sales.3.csv | 2 | | invalid byte sequence for encoding "UTF8": 0xed 0xab 0xad | | \x (3 rows)
NoteThe additionally saved error line logs occupy storage space. You can use the following syntax to delete the error line logs:
SELECT gp_truncate_error_log('<table_name>')
.
UNLOAD
Usage notes
When you export data to a CSV file, you must enclose options in quotation marks (") and write the options in lowercase letters. If you do not follow this requirement, some options may be processed as keywords. This may result in syntax errors. You must specify the following options in a particular way: delimiter
, quote
, null
, header
, escape
, and encoding
. Example:
UNLOAD ('SELECT * FROM test')
TO 'oss://adbpg-regress/local_t/'
ACCESS_KEY_ID 'LTAI5tDfyHVmxCf66Un****'
SECRET_ACCESS_KEY 'TNPPxOFY4xx7CZGjVgSsVBtIs****'
FORMAT csv
"delimiter" '|'
"quote" '"'
"null" ''
"header" 'true'
"escape" 'E'
"encoding" 'utf-8'
FDW 'oss_fdw'
ENDPOINT 'oss-cn-hangzhou-internal.aliyuncs.com';
Syntax
UNLOAD ('<select_statement>')
TO <destination_url>
ACCESS_KEY_ID '<access_key_id>'
SECRET_ACCESS_KEY '<secret_access_key>'
[ [ FORMAT ] [ AS ] <data_format> ]
[ MANIFEST [ '<manifest_url>' ] ]
[ PARALLEL [ { ON | TRUE } | { OFF | FALSE } ] ]
[ option '<value>' [ ... ] ]
Parameters
Parameter | Required | Description |
select_statement | Yes | The SELECT statement. The query result data is written to OSS. |
destination_url | Yes | The URL of the OSS bucket from which data is obtained. Example: oss://<bucket_name>/path_prefix. |
access_key_id | Yes | The AccessKey ID of an Alibaba Cloud account or a Resource Access Management (RAM) user that has permissions to access OSS. For information about how to obtain your AccessKey ID, see Obtain an AccessKey pair. |
secret_access_key | Yes | The AccessKey secret of an Alibaba Cloud account or a RAM user that has permissions to access OSS. For information about how to obtain your AccessKey secret, see Obtain an AccessKey pair. |
[ FORMAT ] [ AS ] <data_format> | No | The file format in which the exported data is stored. If you do not specify this parameter, FORMAT AS CSV is used. You can set |
MANIFEST | No | Generate a manifest file when data is exported. If you specify Note If the file that is specified by |
PARALLEL | No | Specifies whether to export data from multiple compute nodes in parallel. By default, data is exported from multiple compute nodes in parallel. A separate export file is generated for each compute node. If you set this parameter to OFF or FALSE, the data is not exported in parallel. If the size of the exported data does not exceed 8 GB, the exported data is saved to a single file. |
[ option '<value>' [ ... ] ] | No | A list of options. Specify each option in the |
Options
Option | Type | Required | Description |
ENDPOINT | STRING | Yes | The OSS endpoint. For information about how to obtain the OSS endpoint, see Regions and endpoints. |
FDW | STRING | Yes | The name of the oss_fdw extension. The oss_fdw extension is required when you create a temporary OSS server for the COPY statement. |
Other options that are used to create an OSS foreign table, such as FORMAT, FILETYPE, DELIMITER, and ESCAPE | N/A | N/A | The options that are used to create a temporary OSS foreign table. For more information, see Overview of OSS foreign tables. |
Examples
Example 1
Create an AnalyticDB for PostgreSQL table and insert test data into the table.
Query data of the AnalyticDB for PostgreSQL table.
SELECT * FROM local_t LIMIT 5;
Sample result:
Use the UNLOAD statement to export the data from the specified columns of the AnalyticDB for PostgreSQL table to OSS and save the data in the CSV format.
Check whether the CSV file is written to the specified path.
$ ossutil --config hangzhou-zmf.config ls oss://adbpg-regress/local_t/
Sample result:
Check whether the CSV file contains only the data in columns a and c of the local_t table.
$ head -n 10 adbpgforeigntabletmp_20200907164801_1354519958_20200907164801_652261618_seg2_0.csv
Sample result:
CREATE TABLE local_t (a int, b float8, c text);
INSERT INTO local_t SELECT r, random() * 1000, md5(random()::text) FROM generate_series(1,1000)r;
a | b | c
----+------------------+----------------------------------
5 | 550.81393988803 | 8009fa725372e996786849213a695ce0
6 | 95.8335199393332 | ce7952c6728cdffdee06cc5b502d6457
9 | 421.379795763642 | d3260ccbf6b9c03f3658d96bb7678b4d
10 | 362.347379792482 | 2bbbf89d23a2f83b089b589f55b5c4fc
11 | 800.203878898174 | a52994c5573e6b36d8a1c357bf800ce5
(5 rows)
UNLOAD ('select a, c from local_t') TO 'oss://adbpg-regress/local_t/'
ACCESS_KEY_ID 'LTAI5tDfyHVmxCf66Un****'
SECRET_ACCESS_KEY 'TNPPxOFY4xx7CZGjVgSsVBtIs****'
FORMAT AS CSV
ENDPOINT 'oss-cn-hangzhou-internal.aliyuncs.com'
FDW 'oss_fdw';
NOTICE: OSS output prefix: "local_t/adbpgforeigntabletmp_20200907164801_1354519958_20200907164801_652261618".
UNLOAD
LastModifiedTime Size(B) StorageClass ETAG ObjectName
2020-09-07 16:48:01 +0800 CST 12023 Standard 9F38B5407142C044C1F3555F00000000 oss://adbpg-regress/local_t/adbpgforeigntabletmp_20200907164801_1354519958_20200907164801_652261618_seg0_0.csv
2020-09-07 16:48:01 +0800 CST 12469 Standard 807BA680A0DED49BC1F3555F00000000 oss://adbpg-regress/local_t/adbpgforeigntabletmp_20200907164801_1354519958_20200907164801_652261618_seg1_0.csv
2020-09-07 16:48:01 +0800 CST 12401 Standard 3524F68F628CEB64C1F3555F00000000 oss://adbpg-regress/local_t/adbpgforeigntabletmp_20200907164801_1354519958_20200907164801_652261618_seg2_0.csv
Object Number is: 3
0.153414(s) elapsed
7,1225341d0d367a69b1b345536b21ef73
19,424a7a5c36066842f4de8c8a8341fc89
27,c214432e9928e4a6f7bef7bd815424c0
29,ade5d636e2b5d2a606a02e79255da4bd
37,85660e60ede47b68493f6295620db568
77,e1be448ba2b08f0a2ca05b7ed812abfd
80,5e85d597a3b0f2f9736a728724a0f9e0
92,dc23f76f0b1446504b8f1c2274521d2f
94,50304822488d55a500e3a71bcf40890f
97,e970fde8cd0df9c6b610925a488f6042
Example 2
Use the UNLOAD statement to export data and generate a manifest file. The path prefix of the manifest file is the same as that of the data files.
View the list of exported files.
ossutil ls -s oss://adbpg-regress/local_t/
The list includes several data files and a manifest file. Sample result:
View the content of the manifest file.
ossutil cat oss://adbpg-regress/local_t/_20210114100329_3e9b07726306d88b3193dc95c10a5c5c_manifest
Sample result:
Use the UNLOAD statement to export data and generate a specified manifest file. The path of the manifest file may be different from that of the data files.
View the list of exported files.
ossutil ls -s oss://adbpg-regress/local_t/
Only data files are included in the list. Sample result:
View the content of the manifest file that is stored in a different bucket from the data files.
ossutil cat oss://adbpg-regress-2/unload_manifest/t_manifest
Sample result:
UNLOAD ('select * from local_t') TO 'oss://adbpg-regress/local_t/'
ACCESS_KEY_ID 'LTAI5tDfyHVmxCf66Un****'
SECRET_ACCESS_KEY 'TNPPxOFY4xx7CZGjVgSsVBtIs****'
FORMAT AS CSV
MANIFEST -- Generate a manifest file when the UNLOAD statement is used to export data. The path prefix of the manifest file is the same as that of the data files.
ENDPOINT 'oss-cn-hangzhou-internal.aliyuncs.com'
FDW 'oss_fdw';
NOTICE: OSS output prefix: "local_t/_20210114100329_3e9b07726306d88b3193dc95c10a5c5c".
UNLOAD
oss://adbpg-regress/local_t/_20210114100329_3e9b07726306d88b3193dc95c10a5c5c_162488956_seg1_0.csv
oss://adbpg-regress/local_t/_20210114100329_3e9b07726306d88b3193dc95c10a5c5c_163756258_seg0_0.csv
oss://adbpg-regress/local_t/_20210114100329_3e9b07726306d88b3193dc95c10a5c5c_1741120517_seg2_0.csv
oss://adbpg-regress/local_t/_20210114100329_3e9b07726306d88b3193dc95c10a5c5c_manifest
Object Number is: 4
0.136180(s) elapsed
{
"entries": [
{"url": "oss://adbpg-regress/local_t/_20210114100329_3e9b07726306d88b3193dc95c10a5c5c_162488956_seg1_0.csv"},
{"url": "oss://adbpg-regress/local_t/_20210114100329_3e9b07726306d88b3193dc95c10a5c5c_163756258_seg0_0.csv"},
{"url": "oss://adbpg-regress/local_t/_20210114100329_3e9b07726306d88b3193dc95c10a5c5c_1741120517_seg2_0.csv"}
]
}
If you set the ALLOWOVERWRITE option to true, the existing manifest file is overwritten. However, the data files are not overwritten. The data files can be manually deleted.
UNLOAD ('select * from local_t') TO 'oss://adbpg-regress/local_t/'
ACCESS_KEY_ID 'LTAI5tDfyHVmxCf66Un****'
SECRET_ACCESS_KEY 'TNPPxOFY4xx7CZGjVgSsVBtIs****'
FORMAT AS CSV
MANIFEST 'oss://adbpg-regress-2/unload_manifest/t_manifest' -- Generate a specified manifest file when the UNLOAD statement is used to export data.
ALLOWOVERWRITE 'true' -- Overwrite the existing manifest file.
ENDPOINT 'oss-cn-hangzhou-internal.aliyuncs.com'
FDW 'oss_fdw';
NOTICE: OSS output prefix: "local_t/_20210114100329_3e9b07726306d88b3193dc95c10a5c5c".
UNLOAD
oss://adbpg-regress/local_t/_20210114100956_4d3395a9501f6e22da724a2b6df1b6d3_1736161168_seg0_0.csv
oss://adbpg-regress/local_t/_20210114100956_4d3395a9501f6e22da724a2b6df1b6d3_1925769064_seg2_0.csv
oss://adbpg-regress/local_t/_20210114100956_4d3395a9501f6e22da724a2b6df1b6d3_644328153_seg1_0.csv
Object Number is: 3
0.118540(s) elapsed
{
"entries": [
{"url": "oss://adbpg-regress/local_t/_20210114100956_4d3395a9501f6e22da724a2b6df1b6d3_1736161168_seg0_0.csv"},
{"url": "oss://adbpg-regress/local_t/_20210114100956_4d3395a9501f6e22da724a2b6df1b6d3_1925769064_seg2_0.csv"},
{"url": "oss://adbpg-regress/local_t/_20210114100956_4d3395a9501f6e22da724a2b6df1b6d3_644328153_seg1_0.csv"}
]
}
FAQ
Q: Multiple CSV files were generated when I exported data. Why?
A: When you use the UNLOAD statement to export data from AnalyticDB for PostgreSQL to OSS, one CSV file is generated for each compute node. For example, if your instance has four compute nodes, four CSV files are generated.