By default, AnalyticDB for MySQL allows you to access and import MaxCompute data by using external tables in tunnel record API mode. You can also select the tunnel arrow API mode. Compared with the tunnel record API mode, the tunnel arrow API mode can read MaxCompute data in columns, which improves the efficiency of data access and processing.
Prerequisites
An AnalyticDB for MySQL Data Lakehouse Edition cluster is created.
A MaxCompute project is created in the same region as the AnalyticDB for MySQL cluster.
Elastic Network Interface (ENI) is enabled for the AnalyticDB for MySQL cluster.
NoteYou can log on to the AnalyticDB for MySQL console, choose in the left-side navigation pane of the cluster details page, and then turn on ENI in the Network Information section.
The CIDR blocks of the virtual private cloud (VPC) in which the AnalyticDB for MySQL cluster resides are added to an IP address whitelist of the MaxCompute project.
NoteYou can log on to the AnalyticDB for MySQL console to view the VPC ID on the Cluster Information page. Then, you can log on to the VPC console and find the VPC ID on the VPC page to view the CIDR blocks. For information about how to configure a MaxCompute whitelist, see Manage IP address whitelists.
The minor version of the AnalyticDB for MySQL cluster is 3.2.2.1 or later if you use the tunnel arrow API mode to access and import MaxCompute data.
NoteTo query the minor version of an AnalyticDB for MySQL Data Lakehouse Edition cluster, execute the
SELECT adb_version();
statement. To update the minor version of a cluster, contact technical support.
Sample Data
In this example, a MaxCompute project named test_adb
and a MaxCompute table named person
are used.
CREATE TABLE IF NOT EXISTS person (
id int,
name varchar(1023),
age int)
partitioned by (dt string);
Execute the following statement to create a partition in the person
table:
ALTER TABLE person
ADD
PARTITION (dt='202207');
Execute the following statement to insert data into the partition:
INSERT INTO test_adb.person
partition (dt='202207')
VALUES (1,'james',10),(2,'bond',20),(3,'jack',30),(4,'lucy',40);
(Optional) Enable the arrow API feature
By default, AnalyticDB for MySQL allows you to access and import MaxCompute data in tunnel record API mode. If you want to use the tunnel arrow API mode to access and import MaxCompute data, you must first enable the arrow API feature. After you enable the arrow API feature, AnalyticDB for MySQL allows you to access and import MaxCompute data in tunnel arrow API mode.
Enable the arrow API feature
You can use one of the following methods to enable the arrow API feature:
Execute the following SET statement to enable the arrow API feature at the cluster level:
SET ADB_CONFIG <config_name>= <value>;
Use the following hint to enable the arrow API feature at the query level:
/*<config_name>= <value>*/ SELECT * FROM table;
Parameters
config_name | Description |
ODPS_TUNNEL_ARROW_ENABLED | Specifies whether to enable the arrow API feature. Valid values:
|
ODPS_TUNNEL_SPLIT_BY_SIZE_ENABLED | Specifies whether to enable the dynamic splitting feature. Valid values:
|
Procedure
You can import data by using the regular import or elastic import method. By default, the regular import method is used. If you use the regular import method, data is read from compute nodes and indexes are created on storage nodes. The regular import method consumes computing and storage resources. If you use the elastic import method, data is read and indexes are created for Serverless Spark jobs. The elastic import method consumes resources of job resource groups. Only AnalyticDB for MySQL clusters of V3.1.10.0 or later that have job resource groups support the elastic import method. Compared with the regular import method, the elastic import method consumes fewer resources. This reduces impacts on real-time data reads and writes, and improves resource isolation and data import efficiency. For more information, see Data import methods.
Regular import
Go to the SQL editor.
Log on to the AnalyticDB for MySQL console. In the upper-left corner of the console, select a region. In the left-side navigation pane, click Clusters. On the Data Lakehouse Edition tab, find the cluster that you want to manage and click the cluster ID.
In the left-side navigation pane, choose .
Create an external database.
CREATE EXTERNAL DATABASE adb_external_db;
Create an external table. In this example, an external table named
test_adb
is created.CREATE EXTERNAL TABLE IF NOT EXISTS adb_external_db.test_adb ( id int, name varchar(1023), age int, dt string ) ENGINE='ODPS' TABLE_PROPERTIES='{ "accessid":"LTAILd4****", "endpoint":"http://service.cn-hangzhou.maxcompute.aliyun.com/api", "accesskey":"4A5Q7ZVzcYnWMQPysX****", "partition_column":"dt", "project_name":"test_adb", "table_name":"person" }';
NoteThe AnalyticDB for MySQL external table must use the same names, number, and order of fields as the MaxCompute table. The data types of the fields must be compatible between the two tables.
For information about the parameters that are used to create an external table, see CREATE EXTERNAL TABLE.
Query data.
SELECT * FROM adb_external_db.test_adb;
Sample result:
+------+-------+------+---------+ | id | name | age | dt | +------+-------+------+---------+ | 1 | james | 10 | 202207 | | 2 | bond | 20 | 202207 | | 3 | jack | 30 | 202207 | | 4 | lucy | 40 | 202207 | +------+-------+------+---------+ 4 rows in set (0.35 sec)
Perform the following steps to import data from MaxCompute to AnalyticDB for MySQL:
Create a database in the AnalyticDB for MySQL cluster.
CREATE DATABASE adb_demo;
Create a table in the AnalyticDB for MySQL database to store data that is imported from MaxCompute.
NoteThe created table must use the same number and order of fields as the external table that is created in Step 3. The data types of the fields must be compatible between the two tables.
CREATE TABLE IF NOT EXISTS adb_demo.adb_import_test( id int, name string, age int, dt string PRIMARY KEY(id,dt) ) DISTRIBUTED BY HASH(id) PARTITION BY VALUE('dt');
Write data to the table.
Method 1: Execute the INSERT INTO statement to import data. If the primary key has duplicate values, data is not repeatedly inserted. In this case, the INSERT INTO statement is equivalent to the
INSERT IGNORE INTO
statement. For more information, see INSERT INTO.INSERT INTO adb_demo.adb_import_test SELECT * FROM adb_external_db.test_adb;
If you want to import data from a specific partition to the
adb_demo.adb_import_test
table, you can execute the following statement:INSERT INTO adb_demo.adb_import_test SELECT * FROM adb_external_db.test_adb WHERE dt = '202207';
Method 2: Execute the INSERT OVERWRITE INTO statement to import data. If the primary key has duplicate values, the original value is overwritten by the new value.
INSERT OVERWRITE INTO adb_demo.adb_import_test SELECT * FROM adb_external_db.test_adb;
Method 3: Execute the INSERT OVERWRITE INTO statement to asynchronously import data. In most cases, the
SUBMIT JOB
statement is used to submit an asynchronous job. You can add a hint (/*+ direct_batch_load=true*/
) before the data import statement to accelerate the job. For more information, see the "Asynchronous writing" section of the INSERT OVERWRITE SELECT topic.SUBMIT job INSERT OVERWRITE INTO adb_demo.adb_import_test SELECT * FROM adb_external_db.test_adb;
Sample result:
+---------------------------------------+ | job_id | +---------------------------------------+ | 2020112122202917203100908203303****** | +---------------------------------------+
For information about how to asynchronously submit a job, see Asynchronously submit an import job.
Elastic import
Go to the SQL editor.
Log on to the AnalyticDB for MySQL console. In the upper-left corner of the console, select a region. In the left-side navigation pane, click Clusters. On the Data Lakehouse Edition tab, find the cluster that you want to manage and click the cluster ID.
In the left-side navigation pane, choose .
Create a database. If you have already created a database, skip this step.
CREATE DATABASE adb_demo;
Create an external table.
NoteThe name of the AnalyticDB for MySQL external table must be the same as that of the MaxCompute project. Otherwise, the external table fails to be created.
The AnalyticDB for MySQL external table must use the same names, number, and order of fields as the MaxCompute table. The data types of the fields must be compatible between the two tables.
The elastic import method allows you to create external tables only by using the
CREATE TABLE
statement.
CREATE TABLE IF NOT EXISTS test_adb ( id int, name string, age int, dt string ) ENGINE='ODPS' TABLE_PROPERTIES='{ "endpoint":"http://service.cn-hangzhou.maxcompute.aliyun-inc.com/api", "accessid":"LTAILd4****", "accesskey":"4A5Q7ZVzcYnWMQPysX****", "partition_column":"dt", "project_name":"test_adb", "table_name":"person" }';
For information about the external table parameters, see the parameter table in the Use external tables to import data to Data Warehouse Edition topic.
Query data.
SELECT * FROM adb_demo.test_adb;
Sample result:
+------+-------+------+---------+ | id | name | age | dt | +------+-------+------+---------+ | 1 | james | 10 | 202207 | | 2 | bond | 20 | 202207 | | 3 | jack | 30 | 202207 | | 4 | lucy | 40 | 202207 | +------+-------+------+---------+ 4 rows in set (0.35 sec)
Create a table in the AnalyticDB for MySQL database to store data that is imported from MaxCompute.
NoteThe created internal table must use the same names, number, order, and data types of fields as the external table that is created in Step 3.
CREATE TABLE IF NOT EXISTS adb_import_test ( id int, name string, age int, dt string, PRIMARY KEY(id,dt) ) DISTRIBUTE BY HASH(id) PARTITION BY VALUE('dt') LIFECYCLE 30;
Import data.
ImportantThe elastic import method allows you to import data only by using the
INSERT OVERWRITE INTO
statement.Method 1: Execute the INSERT OVERWRITE INTO statement to elastically import data. If the primary key has duplicate values, the original value is overwritten by the new value.
/*+ elastic_load=true, elastic_load_configs=[adb.load.resource.group.name=resource_group|spark.adb.eni.vswitchId=vsw-bp12ldm83z4zu9k4d****]*/ INSERT OVERWRITE INTO adb_demo.adb_import_test SELECT * FROM adb_demo.test_adb;
Method 2: Asynchronously execute the INSERT OVERWRITE INTO statement to elastically import data. In most cases, the
SUBMIT JOB
statement is used to submit an asynchronous job./*+ elastic_load=true, elastic_load_configs=[adb.load.resource.group.name=resource_group|spark.adb.eni.vswitchId=vsw-bp12ldm83z4zu9k4d****]*/ SUBMIT JOB INSERT OVERWRITE INTO adb_demo.adb_import_test SELECT * FROM adb_demo.test_adb;
ImportantWhen you asynchronously submit an elastic import job, you cannot configure priority queues.
Sample result:
+---------------------------------------+ | job_id | +---------------------------------------+ | 2023081517192220291720310090151****** | +---------------------------------------+
After you use the
SUBMIT JOB
statement to submit an asynchronous job, only a job ID is returned, which indicates that the asynchronous job is successfully submitted. You can use the returned job ID to terminate the asynchronous job or query the status of the asynchronous job. For more information, see Asynchronously submit an import job.Hint parameters:
elastic_load: specifies whether to use elastic import. Valid values: true and false. Default value: false.
elastic_load_configs: the configuration parameters of the elastic import feature. You must enclose the parameters within brackets ([ ]) and separate multiple parameters with vertical bars (|). The following table describes the parameters.
Parameter
Required
Description
adb.load.resource.group.name
Yes
The name of the job resource group that runs the elastic import job.
adb.load.job.max.acu
No
The maximum amount of resources for an elastic import job. Unit: AnalyticDB compute units (ACUs). Minimum value: 5 ACUs. Default value: number of shards plus 1.
Execute the following statement to query the number of shards in the cluster:
SELECT count(1) FROM information_schema.kepler_meta_shards;
spark.driver.resourceSpec
No
The resource type of the Spark driver. Default value: small. For information about the valid values, see the Type column in the "Spark resource specifications" table of the Conf configuration parameters topic.
spark.executor.resourceSpec
No
The resource type of the Spark executor. Default value: large. For information about the valid values, see the Type column in the "Spark resource specifications" table of the Conf configuration parameters topic.
spark.adb.executorDiskSize
No
The disk capacity of the Spark executor. Valid values: (0,100]. Unit: GiB. Default value: 10 GiB. For more information, see the "Specify driver and executor resources" section of the Conf configuration parameters topic.
(Optional) Check whether the submitted job is an elastic import job.
SELECT job_name, (job_type = 3) AS is_elastic_load FROM INFORMATION_SCHEMA.kepler_meta_async_jobs WHERE job_name = "2023081818010602101701907303151******";
Sample result:
+---------------------------------------+------------------+ | job_name | is_elastic_load | +---------------------------------------+------------------+ | 2023081517195203101701907203151****** | 1 | +---------------------------------------+------------------+
If an elastic import job is submitted, 1 is returned for the
is_elastic_load
parameter. If a regular import job is submitted, 0 is returned.