AnalyticDB for MySQL allows you to access and import MaxCompute data by using external tables. This maximizes the utilization of cluster resources and improves data import performance. This topic describes how to use external tables to import data from MaxCompute to AnalyticDB for MySQL Data Warehouse Edition (V3.0).
Prerequisites
A MaxCompute project and an AnalyticDB for MySQL cluster are created in the same region. For more information, see Create a cluster.
The CIDR blocks of the virtual private cloud (VPC) in which the AnalyticDB for MySQL cluster is located are added to a whitelist of the MaxCompute project.
You can log on to the AnalyticDB for MySQL console to view the VPC ID on the Cluster Information page. Then, you can log on to the VPC console and find the VPC ID on the VPC page to view the CIDR blocks. For information about how to configure a MaxCompute whitelist, see Manage IP address whitelists.
Sample data
In this example, a MaxCompute project named odps_project1
and a MaxCompute table named odps_nopart_import_test
are used. Execute the following statement to create the odps_nopart_import_test table:
CREATE TABLE IF NOT EXISTS odps_nopart_import_test (
id int,
name string,
age int)
partitioned by (dt string);
Execute the following statement to create a partition in the odps_nopart_import_test
table:
ALTER TABLE odps_nopart_import_test
ADD
PARTITION (dt='202207');
Execute the following statement to insert data into the partition:
INSERT INTO odps_project1.odps_nopart_import_test
PARTITION (dt='202207')
VALUES (1,'james',10),(2,'bond',20),(3,'jack',30),(4,'lucy',40);
Access and import MaxCompute data in tunnel mode
Connect to the AnalyticDB for MySQL cluster. For more information, see Connect to an AnalyticDB for MySQL cluster.
Create a database.
CREATE database test_adb;
Create a MaxCompute external table. In this example, an external table named
odps_nopart_import_test_external_table
is used.CREATE TABLE IF NOT EXISTS odps_nopart_import_test_external_table ( id int, name string, age int, dt string ) ENGINE='ODPS' TABLE_PROPERTIES='{ "endpoint":"http://service.cn.maxcompute.aliyun-inc.com/api", "accessid":"L*******FsE", "accesskey":"CcwF********iWjv", "partition_column":"dt", "project_name":"odps_project1", "table_name":"odps_nopart_import_test" }';
Parameter
Description
ENGINE='ODPS'
The storage engine of the external table. To read and write MaxCompute data, set the storage engine to ODPS.
endpoint
The endpoint of the MaxCompute project.
NoteAnalyticDB for MySQL can access MaxCompute only by using VPC endpoints.
For information about the VPC endpoint in each region, see the "Endpoints in different regions (VPC)" section of the Endpoints topic.
accessid
The AccessKey ID of an Alibaba Cloud account or a Resource Access Management (RAM) user used to access MaxCompute.
For information about how to obtain an AccessKey ID and an AccessKey secret, see Accounts and permissions.
accesskey
The AccessKey secret of an Alibaba Cloud account or a RAM user used to access MaxCompute.
For information about how to obtain an AccessKey ID and an AccessKey secret, see Accounts and permissions.
partition_column
The partition column. The
partition_column
parameter is required to create a partitioned table. To create a non-partitioned MaxCompute table, you do not need to configure thepartition_column
parameter but you must create a table without partitions in AnalyticDB for MySQL.project_name
The name of the MaxCompute project.
table_name
The name of the MaxCompute table.
Create a table named
adb_nopart_import_test
in thetest_adb
database to store data imported from MaxCompute.CREATE TABLE IF NOT EXISTS adb_nopart_import_test ( id int, name string, age int, dt string, PRIMARY KEY(id,dt) ) DISTRIBUTED BY HASH(id) PARTITION BY VALUE('dt') LIFECYCLE 30;
Import data.
Method 1: Execute the
INSERT INTO
statement to import data. If the primary key has duplicate values, data is not repeatedly inserted and the INSERT INTO statement is equivalent to theINSERT IGNORE INTO
statement. For more information, see INSERT INTO.INSERT INTO adb_nopart_import_test SELECT * FROM odps_nopart_import_test_external_table;
Execute the SELECT statement to query the data written to the table. Sample statement:
SELECT * FROM adb_nopart_import_test;
Sample result:
+------+-------+------+---------+ | id | name | age | dt | +------+-------+------+---------+ | 1 | james | 10 | 202207 | | 2 | bond | 20 | 202207 | | 3 | jack | 30 | 202207 | | 4 | lucy | 40 | 202207 | +------+-------+------+---------+
If you want to import data from a specific partition to the
adb_nopart_import_test
table, you can execute the following statement:INSERT INTO adb_nopart_import_test SELECT * FROM odps_nopart_import_test_external_table WHERE dt = '202207';
Method 2: Execute the
INSERT OVERWRITE
statement to import data. If the primary key has duplicate values, the original value is overwritten by the new value. Sample statement:INSERT OVERWRITE adb_nopart_import_test SELECT * FROM odps_nopart_import_test_external_table;
Method 3: Execute the
INSERT OVERWRITE
statement to asynchronously import data. In most cases, theSUBMIT JOB
statement is used to submit an asynchronous job. You can add a hint before the data import statement to accelerate the job. For more information, see the "Asynchronous writing" section of the INSERT OVERWRITE SELECT topic. Sample statement:SUBMIT JOB INSERT OVERWRITE adb_nopart_import_test SELECT * FROM odps_nopart_import_test_external_table;
Sample result:
+---------------------------------------+ | job_id | +---------------------------------------+ | 2020112122202917203100908203303****** |
For information about how to asynchronously submit a job, see Asynchronously submit an import job.