AnalyticDB for MySQL allows you to use external tables to import and export data. This topic describes how to use external tables to export data from AnalyticDB for MySQL Data Warehouse Edition (V3.0) to Apsara File Storage for HDFS.
Prerequisites
The AnalyticDB for MySQL cluster runs V3.1.4.4 or later.
Note For information about how to query the minor version of a cluster, see How do I query the version of an AnalyticDB for MySQL cluster? To update the minor version of a cluster, contact technical support.An Apsara File Storage for HDFS cluster is created, and a folder is created in the cluster to store the imported AnalyticDB for MySQL data. In this example, the folder is named hdfs_output_test_csv_data.
NoteWhen you use the
INSERT OVERWRITE
statement to import data, the existing file in the destination folder is overwritten. To prevent the existing file from being overwritten, we recommend that you create another destination folder when you export data.The following service access ports for the AnalyticDB for MySQL cluster are configured in the Apsara File Storage for HDFS cluster:
namenode
: used to read and write metadata of a file system. You can configure the port number by using the fs.defaultFS parameter. The default port number is 8020.For information about the detailed configurations, see core-default.xml.
datanode
: used to read and write data. You can configure the port number by using the dfs.datanode.address parameter. The default port number is 50010.For information about the detailed configurations, see hdfs-default.xml.
- If the AnalyticDB for MySQL cluster is in elastic mode, you must turn on ENI in the Network Information section of the Cluster Information page.
Usage notes
Only CSV and Parquet files can be exported from an AnalyticDB for MySQL cluster to an Apsara File Storage for HDFS cluster. ORC files cannot be exported.
AnalyticDB for MySQL does not support the
INSERT
syntax that is used to insert data to external tables by row, such asINSERT INTO VALUES
andREPLACE INTO VALUES
.You cannot export individual files from AnalyticDB for MySQL to Apsara File Storage for HDFS by using partitioned external tables.
When you export data from partitioned external tables, the data files do not contain data of partition columns. The data of partition columns is displayed in the form of Apsara File Storage for HDFS directories.
For example, three ordinary columns and two partition columns are defined in a partitioned external table. The name of the first-level partition column is
p1
, and the value of the column is1
. The name of the second-level partition column isp2
, and the value of the column isa
. You must use the partitioned external table to export data to the adb_data/ directory of an Apsara File Storage for HDFS cluster.If
p1
is set to 1 andp2
is set to a, the relative path to which data is exported is adb_data/p1=1/p2=a/. The exported CSV or Parquet file contains only the values of the three ordinary columns and does not contain the values of thep1
andp2
columns.
Procedure
- Connect to the AnalyticDB for MySQL cluster. For more information, see Connect to an AnalyticDB for MySQL cluster.
- Create a source database. For more information, see Create a database.
In this example, a database named
adb_demo
is used as the source database in the AnalyticDB for MySQL cluster. Create a source table and insert data into the source table.
Execute the following statement to create a source table named
adb_hdfs_import_source
in theadb_demo
source database:CREATE TABLE IF NOT EXISTS adb_hdfs_import_source ( uid string, other string ) DISTRIBUTED BY HASH(uid);
Execute the following statement to insert a row of test data into the
adb_hdfs_import_source
table:INSERT INTO adb_hdfs_import_source VALUES ("1", "a"), ("2", "b"), ("3", "c");
Create an external table.
You can use the following syntax to create an external table in the
adb_demo
source database. The table is used to export AnalyticDB for MySQL data to Apsara File Storage for HDFS.Execute the following statement to create a standard external table. In this example, the destination table is named
hdfs_import_external
.CREATE TABLE IF NOT EXISTS hdfs_import_external ( uid string, other string ) ENGINE='HDFS' TABLE_PROPERTIES='{ "format":"csv", "delimiter":",", "hdfs_url":"hdfs://172.17.***.***:9000/adb/hdfs_output_test_csv_data" }';
Execute the following statement to create a partitioned external table. You must define ordinary columns (such as
uid
andother
) and partition columns (such asp1
,p2
, andp3
) in the statement. In this example, the destination table is namedhdfs_import_external_par
.CREATE TABLE IF NOT EXISTS hdfs_import_external_par ( uid string, other string, p1 date, p2 int, p3 varchar ) ENGINE='HDFS' TABLE_PROPERTIES='{ "format":"csv", "delimiter":",", "hdfs_url":"hdfs://172.17.***.***:9000/adb/hdfs_output_test_csv_data" "partition_column":"p1, p2, p3" }';
NoteOnly CSV and Parquet files can be exported from an AnalyticDB for MySQL cluster to an Apsara File Storage for HDFS cluster. ORC files cannot be exported.
For the detailed syntax to create an external table, see the "Create an Apsara File Storage for HDFS external table" and "Create a partitioned Apsara File Storage for HDFS external table" sections of the Use external tables to import data to Data Warehouse Edition topic.
Export the source data from the AnalyticDB for MySQL cluster to Apsara File Storage for HDFS.
For the detailed syntax to export data by using a standard external table, see the "Appendix 1: Syntax to export data by using a standard external table" section of this topic.
For the detailed syntax to export data by using a partitioned external table, see the "Appendix 2: Syntax to export data by using a partitioned external table" section of this topic.
What to do next
After the data is exported, you can view the exported data in the hdfs_output_test_csv_data
destination folder by using the Hadoop client. You can also connect to the AnalyticDB for MySQL cluster and execute the following statement to query the data exported by using an external table. The query syntax is the same for partitioned external tables and standard external tables. In this example, the hdfs_import_external
standard external table is used.
SELECT * FROM hdfs_import_external LIMIT 100;
Appendix 1: Syntax to export data by using a standard external table
If partition columns are not specified when you create an external table, you can use one of the following methods to export data:
Method 1: If the destination table already contains data, use the
INSERT INTO
statement to import data into the external table. Each time when you use this statement to write data from the source table to the Apsara File Storage for HDFS folder, a new Apsara File Storage for HDFS file is generated.NoteThe columns that you want to export must all be written to the external table.
INSERT INTO
generates new files without overwriting existing files.Syntax:
INSERT INTO <target_table> SELECT <col_name> FROM <source_table>;
Example:
INSERT INTO hdfs_import_external SELECT col1, col2, col3 FROM adb_hdfs_import_source;
Notecol1, col2, col3
indicate all columns of the external table.Method 2: You cannot define primary keys in external tables for Apsara File Storage for HDFS.
REPLACE INTO
andINSERT INTO
are completely equivalent. They both replicate data into external tables. If the destination table already contains data, existing data remains unchanged and new data is appended to the destination file when you import data by executing theREPLACE INTO
statement.NoteThe columns that you want to export must all be written to the external table. You cannot specify specific columns to be written.
REPLACE INTO
generates new files without overwriting existing files.
Syntax:
REPLACE INTO <target_table> SELECT <col_name> FROM <source_table>;
Example:
REPLACE INTO hdfs_import_external SELECT col1, col2, col3 FROM adb_hdfs_import_source;
Method 3: Batch insert data into the external table by using the
INSERT OVERWRITE
statement. If the destination external table already contains data, all files in the external table path are deleted before a new Apsara File Storage for HDFS file is generated.ImportantThe columns that you want to export must all be written to the external table. You cannot specify specific columns to be written.
INSERT OVERWRITE
overwrites existing data in a directory. Proceed with caution.
Syntax:
INSERT OVERWRITE <target_table> SELECT <col_name> FROM <source_table>;
Example:
INSERT OVERWRITE hdfs_import_external SELECT col1, col2, col3 FROM adb_hdfs_import_source;
Method 4: Export data by asynchronously executing the
INSERT OVERWRITE
statement. Syntax:SUBMIT job INSERT OVERWRITE <target_table> SELECT <col_name> FROM <source_table>;
Example:
SUBMIT JOB INSERT OVERWRITE hdfs_import_external SELECT col1, col2, col3 FROM adb_hdfs_import_source;
The following result is returned:
+---------------------------------------+ | job_id | +---------------------------------------+ | 2020112122202917203100908203303****** | +---------------------------------------+
You can also check the state of the asynchronous task based on the
job_id
value. For more information, see Asynchronously submit an import task.
Appendix 2: Syntax to export data by using a partitioned external table
When you export data by using a partitioned external table, you must add the PARTITION
field to the syntax. You can also specify partition columns and partition values in the PARTITION
field to determine whether to use static or dynamic partitions.
Method 1: Use the
INSERT INTO PARTITION
statement to batch insert data into the partitioned external table.NoteWhen you write data, the data is appended to the corresponding column. Each time when you write data, a new Apsara File Storage for HDFS file is generated, and existing data is not overwritten. The columns that you want to export must all be written to the external table. You cannot specify specific columns to be written.
Fully static partition
Syntax:
INSERT INTO <target_table> PARTITION(par1=val1,par2=val2,...) SELECT <col_name> FROM <source_table>;
Example:
INSERT INTO hdfs_import_external_par PARTITION(p1='2021-05-06',p2=1,p3='test') SELECT col1, col2, col3, FROM adb_hdfs_import_source;
Static-dynamic partition
NoteStatic partitions must be placed in front of dynamic partitions. You cannot change the sequence.
Syntax:
INSERT INTO <target_table> PARTITION(par1=val1,par2,...) SELECT <col_name> FROM <source_table>;
Example:
INSERT INTO hdfs_import_external_par PARTITION(p1='2021-05-27',p2,p3) SELECT col1, col2, col3, FROM adb_hdfs_import_source;
Fully dynamic partition (partition that does not require the
PARTITION
field)Syntax:
INSERT INTO <target_table> SELECT <col_name> FROM <source_table>;
Example:
INSERT INTO hdfs_import_external_par SELECT col1, col2, col3, FROM adb_hdfs_import_source;
Method 2: You cannot define primary keys in external tables for Apsara File Storage for HDFS.
REPLACE INTO PARTITION
andINSERT INTO PARTITION
are completely equivalent.NoteThe columns that you want to export must all be written to the external table. You cannot specify specific columns to be written.
REPLACE INTO PARTITION
generates new files without overwriting existing files.Syntax:
Fully static partition
Syntax:
REPLACE INTO <target_table> PARTITION(par1=val1,par2=val2,...) SELECT <col_name> FROM <source_table>;
Example:
REPLACE INTO hdfs_import_external_par PARTITION(p1='2021-05-06',p2=1,p3='test') SELECT col1, col2, col3, FROM adb_hdfs_import_source;
Static-dynamic partition
NoteStatic partitions must be placed in front of dynamic partitions. You cannot change the sequence.
Syntax:
REPLACE INTO <target_table> PARTITION(par1=val1,par2,...) SELECT <col_name> FROM <source_table>;
Example:
REPLACE INTO hdfs_import_external_par PARTITION(p1='2021-05-06',p2,p3) SELECT col1, col2, col3, FROM adb_hdfs_import_source;
Fully dynamic partition (partition that does not require the
PARTITION
field)Syntax:
REPLACE INTO <target_table> SELECT <col_name> FROM <source_table>;
Example:
REPLACE INTO hdfs_import_external_par SELECT col1, col2, col3, FROM adb_hdfs_import_source;
Method 3: The
INSERT OVERWRITE PARTITION
statement can be used in the same manner as theINSERT INTO PARTITION
statement. However, when you execute theINSERT OVERWRITE PARTITION
statement, existing data in the destination partitions is overwritten. If no new data is written to the partitions, existing data is not cleared.Syntax:
INSERT OVERWRITE <target_table> PARTITION(par1=val1,par2=val2,...)[IF NOT EXISTS] SELECT <col_name> FROM <source_table>;
ImportantThe columns that you want to export must all be written to the external table. You cannot specify specific columns to be written.
INSERT OVERWRITE PARTITION
overwrites existing data in a directory. Proceed with caution.IF NOT EXISTS
: indicates that if an external table has a partition, data is not imported into this partition.
Example:
INSERT OVERWRITE hdfs_import_external_par PARTITION(p1='2021-05-06',p2=1,p3='test') IF NOT EXISTS SELECT col1, col2, col3 FROM adb_hdfs_import_source;
Method 4: Export data by asynchronously executing the
INSERT OVERWRITE
statement. Syntax:SUBMIT JOB INSERT OVERWRITE <target_table> SELECT <col_name> FROM <source_table>;
Example:
SUBMIT JOB INSERT OVERWRITE hdfs_import_external_par PARTITION(p1='2021-05-06',p2=1,p3='test') IF NOT EXISTS SELECT col1, col2, col3 FROM adb_hdfs_import_source;
The following result is returned:
+---------------------------------------+ | job_id | +---------------------------------------+ | 2020112122202917203100908203303****** | +---------------------------------------+
You can also check the state of the asynchronous task based on the
job_id
value. For more information, see Asynchronously submit an import task.