MaxCompute allows you to export data from a MaxCompute project to an external data store such as Object Storage Service (OSS) or Hologres. This way, other compute engines can use the data that is exported from MaxCompute to the external data store. This topic describes how to use UNLOAD statements and provides related examples.
You can execute the statements on the following platforms:
Overview
MaxCompute allows you to execute UNLOAD
statements to export data from MaxCompute to the external data store OSS or Hologres. OSS can store data in the CSV format or another open source format. Take note of the following points:
MaxCompute data must be exported to OSS before the data can be exported to other compute engines, such as Amazon Redshift and BigQuery.
If you repeatedly export data, the previously exported file is not overwritten. Instead, a new file is generated.
Limits
The use of UNLOAD statements has the following limits:
MaxCompute automatically splits the file that is exported to OSS into multiple parts and generates a name for the file. You cannot specify a custom name or file name extension for the exported file.
When you export data from MaxCompute to Hologres, dual-signature authentication is not supported.
You cannot export data from MaxCompute to a partitioned Hologres table.
File name extensions cannot be added to the exported files in an open source format.
Precautions
You are not charged for UNLOAD statements. The subquery clauses in the UNLOAD statements need to scan data and use computing resources to calculate the results. Therefore, the subquery clauses are charged as common SQL jobs.
In some scenarios, you can store structured data in OSS to reduce storage costs. However, you must estimate the costs in advance.
The MaxCompute storage fee is USD 0.018 per GB-month. For more information about storage fees, see Storage pricing (pay-as-you-go). The data compression ratio for data that is imported into MaxCompute is about 5:1. You are charged based on the size of data after compression.
If you use the Standard storage class of OSS to store your data, the unit price is USD 0.018 per GB-month. For more information about the fees for the Infrequent Access (IA), Archive, and Cold Archive storage classes, see Storage fees.
If you want to export data to reduce storage costs, we recommend that you estimate the data compression ratio based on a data feature test, estimate the costs of using UNLOAD statements based on the query statement used when you export data, and evaluate the method for accessing the exported data. This helps you prevent additional costs caused by unnecessary data migration.
Prerequisites
The operation account is granted the related permissions on MaxCompute.
Before you execute the
UNLOAD
statements by using the account, make sure that the account has the SELECT permission to read data from the table that you want to export in the MaxCompute project. For more information about how to grant permissions, see MaxCompute permissions.The operation account is granted the required permissions on the related external data store.
Before you export data from MaxCompute to the external data store OSS or Hologres by using the account, you must grant the account the permissions to access the external data store from MaxCompute. The authorization mode of the
UNLOAD
statements is the same as the authorization mode of MaxCompute external tables. The following sections describe how to authorize MaxCompute to access OSS and Hologres.Authorize MaxCompute to access OSS
You can perform one-click authorization to achieve higher security. For more information, see STS authorization.
NoteIn the following examples, one-click authorization is used and the role name is AliyunODPSDefaultRole.
Authorize MaxCompute to access Hologres
You can create a RAM role, grant the RAM role the permissions to access MaxCompute, and then add the RAM role to a Hologres instance to complete the authorization. For more information, see Create a Hologres foreign table in STS mode.
After you complete the preceding authorization, you must select an export method based on the format of the data that you want to export.
Use a built-in extractor to export data
Syntax
unload from {<select_statement>|<table_name> [partition (<pt_spec>)]} into location <external_location> stored by <StorageHandler> [with serdeproperties ('<property_name>'='<property_value>',...)];
Parameters
Export data from MaxCompute to OSS
select_statement: the
SELECT
clause that is used to query the data that you want to insert into the destination OSS directory from the source table. The source table can be a partitioned table or a non-partitioned table. For more information about theSELECT
clause, see SELECT syntax.table_name and pt_spec: You can use the table name or the combination of the table and partition names to specify the data that you want to export. This export method does not automatically generate query statements. Therefore, no fees are incurred. The value of pt_spec is in the
(partition_col1 = partition_col_value1, partition_col2 = partition_col_value2, ...)
format.external_location: required. The OSS directory to which you want to export data. The value of this parameter is in the
'oss://<oss_endpoint>/<object>'
format. For more information about OSS directories, see OSS domain names.StorageHandler: required. The name of the built-in storage handler that is used. Valid values:
com.aliyun.odps.CsvStorageHandler
andcom.aliyun.odps.TsvStorageHandler
. This parameter specifies a built-in storage handler that processes CSV and TSV files and defines how to read or write CSV and TSV files. You need to specify only this parameter based on your business requirements. The related logic is implemented by the system. If you use a built-in storage handler to export data, the file name extension.csv
or.tsv
is automatically added to the name of the exported file. The configuration of this parameter is the same as that for MaxCompute external tables. For more information, see Create an OSS external table.<property_name>'='<property_value>': optional. property_name specifies the name of a property and property_value specifies the value of a property. You can use this clause in the same way as you use it for MaxCompute external tables. For more information about properties, see Create an OSS external table.
Export data from MaxCompute to Hologres
select_statement: the
SELECT
clause that is used to query the data that you want to insert into the destination Hologres directory from the source table. The source table can only be a non-partitioned table. For more information about theSELECT
clause, see SELECT syntax.table_name: You can use the table name to specify the data that you want to export. This export method does not automatically generate query statements. Therefore, no fees are incurred.
external_location: required. The Hologres directory to which you want to export data. The value of this parameter is in the
'jdbc:postgresql://<endpoint>:<port>/<database>?ApplicationName=MaxCompute&[currentSchema=<schema>&][useSSL={true|false}&]table=<holo_table_name>/'
format. For more information about Hologres directories, see Syntax.StorageHandler: required. The name of the built-in storage handler that is used. Set the value to
com.aliyun.odps.jdbc.JdbcStorageHandler
. The value indicates that the Hologres external table is accessed by using JdbcStorageHandler.<property_name>'='<property_value>': required. property_name specifies the name of a property. property_value specifies the value of a property. When you export data to Hologres, you must configure the following parameters:
'odps.properties.rolearn'='<ram_arn>': specifies the Alibaba Cloud Resource Name (ARN) of the specified RAM role. The ARN is used for Security Token Service (STS) authentication. To obtain the ARN of the specified RAM role, you can perform the following steps: Log on to the Resource Access Management (RAM) console. On the Roles page, find the RAM role whose ARN you want to query and click the RAM role name in the Role Name column. On the page that appears, view the ARN in the Basic Information section.
'mcfed.mapreduce.jdbc.driver.class'='org.postgresql.Driver': specifies the driver that is used to connect to the Hologres database. Set the value to
org.postgresql.Driver
.'odps.federation.jdbc.target.db.type'='holo': specifies the type of the database to which the driver is connected. Set the value to
holo
.
The properties that are supported by Hologres are the same as the properties of MaxCompute external tables. For more information about properties, see Syntax.
Examples
Export data from MaxCompute to OSS
This example shows how to export data of the sale_detail table from a MaxCompute project to OSS. The sale_detail table contains the following data:
+------------+-------------+-------------+------------+------------+ | shop_name | customer_id | total_price | sale_date | region | +------------+-------------+-------------+------------+------------+ | s1 | c1 | 100.1 | 2013 | china | | s2 | c2 | 100.2 | 2013 | china | | s3 | c3 | 100.3 | 2013 | china | | null | c5 | NULL | 2014 | shanghai | | s6 | c6 | 100.4 | 2014 | shanghai | | s7 | c7 | 100.5 | 2014 | shanghai | +------------+-------------+-------------+------------+------------+
Log on to the OSS console. Then, create the directory
mc-unload/data_location/
in the OSS bucket in theoss-cn-hangzhou
region and organize the OSS directory. For more information about how to create an OSS bucket, see Create buckets.The following OSS directory is organized based on the bucket, region, and endpoint:
oss://oss-cn-hangzhou-internal.aliyuncs.com/mc-unload/data_location
Log on to the MaxCompute client and execute the UNLOAD statement to export data from the sale_detail table to OSS. The following examples are provided:
Example 1: Export the data of the sale_detail table as a CSV file and package the file into a GZIP file. Sample statements:
-- Control the number of exported files: Set the size of data in the MaxCompute internal table read by a single worker. Unit: MB. The MaxCompute table is compressed before you export the table. The size of the exported data is about four times the data size before the export. set odps.stage.mapper.split.size=256; -- Export data. unload from (select * from sale_detail) into location 'oss://oss-cn-hangzhou-internal.aliyuncs.com/mc-unload/data_location' stored by 'com.aliyun.odps.CsvStorageHandler' with serdeproperties ('odps.properties.rolearn'='acs:ram::139699392458****:role/AliyunODPSDefaultRole', 'odps.text.option.gzip.output.enabled'='true'); -- The preceding statements are equivalent to the following statements: set odps.stage.mapper.split.size=256; unload from sale_detail into location 'oss://oss-cn-hangzhou-internal.aliyuncs.com/mc-unload/data_location' stored by 'com.aliyun.odps.CsvStorageHandler' with serdeproperties ('odps.properties.rolearn'='acs:ram::139699392458****:role/AliyunODPSDefaultRole', 'odps.text.option.gzip.output.enabled'='true');
Example 2: Export data from the partition (sale_date='2013', region='china') in the sale_detail table as a TSV file to OSS and package the file into a GZIP file.
-- Control the number of exported files: Set the size of data in the MaxCompute internal table read by a single worker. Unit: MB. The MaxCompute table is compressed before you export the table. The size of the exported data is about four times the data size before the export. set odps.stage.mapper.split.size=256; -- Export data. unload from sale_detail partition (sale_date='2013',region='china') into location 'oss://oss-cn-hangzhou-internal.aliyuncs.com/mc-unload/data_location' stored by 'com.aliyun.odps.TsvStorageHandler' with serdeproperties ('odps.properties.rolearn'='acs:ram::139699392458****:role/AliyunODPSDefaultRole', 'odps.text.option.gzip.output.enabled'='true');
'odps.text.option.gzip.output.enabled'='true'
specifies that the exported file is compressed in a GZIP file. Only the GZIP format is supported.Log on to the OSS console to view the import result in the destination OSS directory.
Import result of Example 1
Import result of Example 2
Export data from MaxCompute to Hologres
This example shows how to export data from the
data_test
table of a MaxCompute project to Hologres. Thedata_test
table contains the following data:+------------+------+ | id | name | +------------+------+ | 3 | rgege | | 4 | Gegegegr | +------------+------+
Create a Hologres table named
mc_2_holo
to which you want to write data. The database to which the table belongs is namedtest
. You can execute the following table creation statement in the SQL editor of the HoloWeb console. For more information, see Connect to HoloWeb. Sample statement:NoteThe data types of the fields in the table to which you want to write data must be the same as the data types of the fields in the MaxCompute table. For more information, see Data type mappings between MaxCompute and Hologres.
create table mc_2_holo (id int, name text);
Log on to the MaxCompute client and execute the UNLOAD statement to export data from the
data_test
table to Hologres. Sample statements:unload from (select * from data_test) into location 'jdbc:postgresql://hgprecn-cn-5y**********-cn-hangzhou-internal.hologres.aliyuncs.com:80/test?ApplicationName=MaxCompute¤tSchema=public&useSSL=false&table=mc_2_holo/' stored by 'com.aliyun.odps.jdbc.JdbcStorageHandler' with serdeproperties ( 'odps.properties.rolearn'='acs:ram::13**************:role/aliyunodpsholorole', 'mcfed.mapreduce.jdbc.driver.class'='org.postgresql.Driver', 'odps.federation.jdbc.target.db.type'='holo' );
Query the exported data in Hologres.
SELECT * FROM mc_2_holo;
The following result is returned:
id name 4 Gegegegr 3 rgege
Export data in another open source format
Syntax
unload from {<select_statement>|<table_name> [partition (<pt_spec>)]} into location <external_location> [row format serde '<serde_class>' [with serdeproperties ('<property_name>'='<property_value>',...)] ] storeds as <file_format> [properties('<tbproperty_name>'='<tbproperty_value>')];
Parameters
select_statement: the
SELECT
clause that is used to query the data that you want to insert into the destination OSS directory from the source table. The source table can be a partitioned table or a non-partitioned table. For more information about theSELECT
clause, see SELECT syntax.table_name and pt_spec: You can use the table name or the combination of the table and partition names to specify the data that you want to export. This export method does not automatically generate query statements. Therefore, no fees are incurred. The value of pt_spec is in the
(partition_col1 = partition_col_value1, partition_col2 = partition_col_value2, ...)
format.external_location: required. The OSS directory to which you want to export data. The value of this parameter is in the
'oss://<oss_endpoint>/<object>'
format. For more information about OSS directories, see OSS domain names.serde_class: optional. The configuration of this parameter is the same as that for MaxCompute external tables. For more information, see Create an OSS external table.
'<property_name>'='<property_value>': optional. property_name specifies the name of a property and property_value specifies the value of a property. You can use this clause in the same way as you use it for MaxCompute external tables. For more information about properties, see Create an OSS external table.
file_format: required. The format of the exported file, such as ORC, PARQUET, RCFILE, SEQUENCEFILE, or TEXTFILE. The configuration of this parameter is the same as that for MaxCompute external tables. For more information, see Create an OSS external table.
'<tbproperty_name>'='<tbproperty_value>': optional. tbproperty_name specifies the name of a property in the extended information of the external table. tbproperty_value specifies the value of a property in the extended information of the external table. For example, you can export open source data as a file in the SNAPPY or LZO compression format. You can configure
'mcfed.parquet.compression'='SNAPPY'
or'mcfed.parquet.compression'='LZO'
.
Examples
This example shows how to export data of the sale_detail table from a MaxCompute project to OSS. The sale_detail table contains the following data:
+------------+-------------+-------------+------------+------------+ | shop_name | customer_id | total_price | sale_date | region | +------------+-------------+-------------+------------+------------+ | s1 | c1 | 100.1 | 2013 | china | | s2 | c2 | 100.2 | 2013 | china | | s3 | c3 | 100.3 | 2013 | china | | null | c5 | NULL | 2014 | shanghai | | s6 | c6 | 100.4 | 2014 | shanghai | | s7 | c7 | 100.5 | 2014 | shanghai | +------------+-------------+-------------+------------+------------+
Log on to the OSS console. Then, create the directory
mc-unload/data_location/
in the OSS bucket in theoss-cn-hangzhou
region and organize the OSS directory. For more information about how to create an OSS bucket, see Create buckets.The following OSS directory is organized based on the bucket, region, and endpoint:
oss://oss-cn-hangzhou-internal.aliyuncs.com/mc-unload/data_location
Log on to the MaxCompute client and execute the UNLOAD statement to export data from the sale_detail table to OSS. The following examples are provided:
Example 1: Export data of the sale_detail table as a file in the PARQUET format and compress the file by using Snappy. Sample statements:
-- Control the number of exported files: Set the size of data in the MaxCompute internal table read by a single worker. Unit: MB. The MaxCompute table is compressed before you export the table. The size of the exported data is about four times the data size before the export. set odps.stage.mapper.split.size=256; -- Export data. unload from (select * from sale_detail) into location 'oss://oss-cn-hangzhou-internal.aliyuncs.com/mc-unload/data_location' row format serde 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' with serdeproperties ('odps.properties.rolearn'='acs:ram::139699392458****:role/AliyunODPSDefaultRole') stored as parquet properties('mcfed.parquet.compression'='SNAPPY');
Example 2: Export data from the partition (sale_date='2013', region='china') in the sale_detail table as a PARQUET file to OSS and compress the file by using Snappy. Sample statements:
-- Control the number of exported files: Set the size of data in the MaxCompute internal table read by a single worker. Unit: MB. The MaxCompute table is compressed before you export the table. The size of the exported data is about four times the data size before the export. set odps.stage.mapper.split.size=256; -- Export data. unload from sale_detail partition (sale_date='2013',region='china') into location 'oss://oss-cn-hangzhou-internal.aliyuncs.com/mc-unload/data_location' row format serde 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' with serdeproperties ('odps.properties.rolearn'='acs:ram::139699392458****:role/AliyunODPSDefaultRole') stored as parquet properties('mcfed.parquet.compression'='SNAPPY');
Log on to the OSS console to view the import result in the destination OSS directory.
Import result of Example 1
Import result of Example 2
NoteIf the exported data is compressed by using Snappy or LZO, the file name extension .snappy or .lzo of the exported file cannot be displayed.
Specify a prefix for the exported file and display the file name extension
When you execute the UNLOAD
statement to export data from a MaxCompute table as a file, you may need to specify a prefix for the file and display the file name extension in specific business scenarios. You can perform the following operations to specify the prefix and display the file name extension.
Syntax
Use a built-in extractor to export a CSV or TSV file.
-- Use a built-in extractor to export a CSV or TSV file. unload from {<select_statement>|<table_name> [partition (<pt_spec>)]} into location <external_location> [stored by <StorageHandler>] [with serdeproperties ('<property_name>'='<property_value>',...)];
You can configure the odps.external.data.prefix property to specify a custom prefix. The value can be up to 10 characters in length.
You can set the odps.external.data.enable.extension property to
true
to display the file name extension.For more information about other parameters, see Use a built-in extractor to export data.
Export a file in an open source format, such as ORC or Parquet.
unload from {<select_statement>|<table_name> [partition (<pt_spec>)]} into location <external_location> [row format serde '<serde_class>' [with serdeproperties ('<property_name>'='<property_value>',...)] ] storeds as <file_format> [properties('<tbproperty_name>'='<tbproperty_value>')];
You can configure the odps.external.data.prefix property to specify a custom prefix. The value can be up to 10 characters in length.
You can set the odps.external.data.enable.extension property to
true
to display the file name extension.For more information about other parameters, see Export data in another open source format.
File name extension description
File Format
SerDe
Extension
SEQUENCEFILE
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
.sequencefile
TEXTFILE
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
.txt
RCFILE
org.apache.hadoop.hive.serde2.columnar.LazyBinaryColumnarSerDe
.rcfile
ORC
org.apache.hadoop.hive.ql.io.orc.OrcSerde
.orc
PARQUET
org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe
.parquet
AVRO
org.apache.hadoop.hive.serde2.avro.AvroSerDe
.avro
JSON
org.apache.hive.hcatalog.data.JsonSerDe
.json
CSV
org.apache.hadoop.hive.serde2.OpenCSVSerde
.csv
Examples
Export a TXT file. Add the
mf_
prefix and display the .txt extension.set odps.sql.hive.compatible=true; set odps.sql.split.hive.bridge=true; unload from (select col_tinyint,col_binary from mf_fun_datatype limit 1) into location 'oss://oss-cn-beijing-internal.aliyuncs.com/mfosscostfee/demo6/' stored as textfile properties ('odps.external.data.prefix'='mf_', 'odps.external.data.enable.extension'='true');
View the export result in the destination OSS directory of the specified data store.
Export a CSV file. Add the
mf_
prefix and display the .csv extension.set odps.sql.hive.compatible=true; set odps.sql.split.hive.bridge=true; unload from (select col_tinyint,col_binary from mf_fun_datatype limit 2) into location 'oss://oss-cn-beijing-internal.aliyuncs.com/mfosscostfee/demo6/' stored by 'com.aliyun.odps.CsvStorageHandler' properties ('odps.external.data.prefix'='mf_', 'odps.external.data.enable.extension'='true');
View the export result in the destination OSS directory of the specified data store.
References
If you want to import data in the CVS format or another open source format from an external data store into MaxCompute, see LOAD.