AnalyticDB for MySQL Data Lakehouse Edition allows you to access and import data by using external tables. You can import data by using the regular import or elastic import method. Compared with the regular import method, the elastic import method consumes fewer resources and reduces impacts on real-time data reads and writes. This topic describes how to use external tables to query Object Storage Service (OSS) data and import data from OSS to AnalyticDB for MySQL.
Prerequisites
An AnalyticDB for MySQL Data Lakehouse Edition cluster is created.
An OSS bucket is created in the same region as the AnalyticDB for MySQL cluster.
Data is uploaded to an OSS directory. For more information, see Upload objects.
Sample data
In this example, the person
object is uploaded to the testBucketName/adb/dt=2023-06-15
directory in OSS. Line feeds are used as row delimiters, and commas (,) are used as column delimiters. Sample data in the person
object:
1,james,10,2023-06-15
2,bond,20,2023-06-15
3,jack,30,2023-06-15
4,lucy,40,2023-06-15
Procedure
Go to the SQL editor.
Log on to the AnalyticDB for MySQL console. In the upper-left corner of the console, select a region. In the left-side navigation pane, click Clusters. On the Data Lakehouse Edition tab, find the cluster that you want to manage and click the cluster ID.
In the left-side navigation pane, choose .
Import data.
You can import data by using the regular import or elastic import method. By default, the regular import method is used. If you use the regular import method, data is read from compute nodes and indexes are created on storage nodes. The regular import method consumes computing and storage resources. If you use the elastic import method, data is read and indexes are created for Serverless Spark jobs. The elastic import method consumes resources of job resource groups. Only AnalyticDB for MySQL Data Lakehouse Edition clusters of V3.1.10.0 or later that have job resource groups support the elastic import method. For more information, see Data import methods.
Regular import
Create an external database.
CREATE EXTERNAL DATABASE adb_external_db;
Use the CREATE EXTERNAL TABLE statement to create an OSS external table in the
adb_external_db
database. In this example, the external table is named adb_external_db.person.NoteThe OSS external table of AnalyticDB for MySQL must use the same names, number, order, and data types of fields as the OSS object.
Create a non-partitioned OSS external table
Create a partitioned OSS external table
For information about the syntax, see CREATE EXTERNAL TABLE.
Query data.
After you create an external table, you can execute the SELECT statement in AnalyticDB for MySQL to query data of the external table.
SELECT * FROM adb_external_db.person;
Sample result:
+------+-------+------+-----------+ | id | name | age | dt | +------+-------+------+-----------+ | 1 | james | 10 |2023-06-15 | | 2 | bond | 20 |2023-06-15 | | 3 | jack | 30 |2023-06-15 | | 4 | lucy | 40 |2023-06-15 | +------+-------+------+-----------+ 4 rows in set (0.35 sec)
Create a database and a table in an AnalyticDB for MySQL cluster. For more information, see CREATE DATABASE and AnalyticDB for MySQL. If you have already created a database, skip this step.
CREATE DATABASE adb_demo;
Create a table in the AnalyticDB for MySQL cluster to store data that is imported from OSS.
NoteThe created internal table must use the same names, number, order, and data types of fields as the external table that is created in Step b.
CREATE TABLE IF NOT EXISTS adb_demo.adb_import_test( id INT, name VARCHAR(1023), age INT, dt VARCHAR(1023) ) DISTRIBUTED BY HASH(id);
Import data to the table.
Method 1: Execute the
INSERT INTO
statement to import data. If the primary key has duplicate values, data is not repeatedly inserted, and the INSERT INTO statement is equivalent toINSERT IGNORE INTO
. For more information, see INSERT INTO.INSERT INTO adb_demo.adb_import_test SELECT * FROM adb_external_db.person;
Method 2: Execute the
INSERT OVERWRITE INTO
statement to synchronously import data. If the primary key has duplicate values, the original value is overwritten by the new value.INSERT OVERWRITE INTO adb_demo.adb_import_test SELECT * FROM adb_external_db.person;
Method 3: Execute the
INSERT OVERWRITE INTO
statement to asynchronously import data. For more information, see the "Asynchronous writing" section of the INSERT OVERWRITE SELECT topic.SUBMIT JOB INSERT OVERWRITE adb_demo.adb_import_test SELECT * FROM adb_external_db.person;
Elastic import
Create a database. If you have created a database, skip this step.
CREATE DATABASE adb_demo;
Create an OSS external table.
NoteThe OSS external table of AnalyticDB for MySQL must use the same names, number, order, and data types of fields as the OSS object.
The elastic import method allows you to create external tables only by using the
CREATE TABLE
statement.
CREATE TABLE oss_import_test_external_table ( id INT(1023), name VARCHAR(1023), age INT, dt VARCHAR(1023) ) ENGINE='OSS' TABLE_PROPERTIES='{ "endpoint":"oss-cn-hangzhou-internal.aliyuncs.com", "url":"oss://<bucket-name>/adb/oss_import_test_data.csv", "accessid":"LTAI5t8sqJn5GhpBVtN8****", "accesskey":"HlClegbiV5mJjBYBJHEZQOnRF7****", "delimiter":"," }';
ImportantWhen you create an external table, you can configure the following TABLE_PROPERTIES parameters for the CSV, Parquet, or ORC external table:
CSV:
endpoint
,url
,accessid
,accesskey
,format
,delimiter
,null_value
, andpartition_column
.Parquet:
endpoint
,url
,accessid
,accesskey
,format
, andpartition_column
.ORC:
endpoint
,url
,accessid
,accesskey
,format
, andpartition_column
.
For more information about external table parameters, see the "Non-partitioned OSS external tables" and "Partitioned OSS external tables" sections of the Use external tables to import data to Data Warehouse Edition topic.
Query data.
After you create an external table, you can execute the SELECT statement in AnalyticDB for MySQL to query data of the external table.
SELECT * FROM oss_import_test_external_table;
Sample result:
+------+-------+------+-----------+ | id | name | age | dt | +------+-------+------+-----------+ | 1 | james | 10 |2023-06-15 | | 2 | bond | 20 |2023-06-15 | | 3 | jack | 30 |2023-06-15 | | 4 | lucy | 40 |2023-06-15 | +------+-------+------+-----------+ 4 rows in set (0.35 sec)
Create a table in the AnalyticDB for MySQL cluster to store data that is imported from OSS.
NoteThe created internal table must use the same names, number, order, and data types of fields as the external table that is created in Step b.
CREATE TABLE adb_import_test ( id INT, name VARCHAR(1023), age INT, dt VARCHAR(1023) primary key(id) ) DISTRIBUTED BY HASH(uid);
Import data.
ImportantThe elastic import method allows you to import data only by using the
INSERT OVERWRITE INTO
statement.Method 1: Execute the INSERT OVERWRITE INTO statement to elastically import data. If the primary key has duplicate values, the original value is overwritten by the new value.
/+*elastic_load=true, elastic_load_configs=[adb.load.resource.group.name=resource_group]*/ INSERT OVERWRITE INTO adb_demo.adb_import_test SELECT * FROM adb_demo.oss_import_test_external_table;
Method 2: Asynchronously execute the INSERT OVERWRITE INTO statement to elastically import data. In most cases, the
SUBMIT JOB
statement is used to submit an asynchronous job./*elastic_load=true, elastic_load_configs=[adb.load.resource.group.name=resource_group]*/ SUBMIT JOB INSERT OVERWRITE INTO adb_demo.adb_import_test SELECT * FROM adb_demo.oss_import_test_external_table;
ImportantWhen you asynchronously submit an elastic import job, you cannot configure priority queues.
Sample result:
+---------------------------------------+ | job_id | +---------------------------------------+ | 2023081517195102101701907203151****** |
After you use the
SUBMIT JOB
statement to submit an asynchronous job, only a job ID is returned, which indicates that the asynchronous job is successfully submitted. You can use the returned job ID to terminate the asynchronous job or query the status of the asynchronous job. For more information, see Asynchronously submit an import job.Hint parameters:
elastic_load: specifies whether to use elastic import. Valid values: true and false. Default value: false.
elastic_load_configs: the configuration parameters of the elastic import feature. You must enclose the parameters within brackets ([ ]) and separate multiple parameters with vertical bars (|). The following table describes the parameters.
Parameter
Required
Description
adb.load.resource.group.name
Yes
The name of the job resource group that runs the elastic import job.
adb.load.job.max.acu
No
The maximum amount of resources for an elastic import job. Unit: AnalyticDB compute units (ACUs). Minimum value: 5 ACUs. Default value: number of shards plus 1.
Execute the following statement to query the number of shards in the cluster:
SELECT count(1) FROM information_schema.kepler_meta_shards;
spark.driver.resourceSpec
No
The resource type of the Spark driver. Default value: small. For information about the valid values, see the Type column in the "Spark resource specifications" table of the Conf configuration parameters topic.
spark.executor.resourceSpec
No
The resource type of the Spark executor. Default value: large. For information about the valid values, see the Type column in the "Spark resource specifications" table of the Conf configuration parameters topic.
spark.adb.executorDiskSize
No
The disk capacity of the Spark executor. Valid values: (0,100]. Unit: GiB. Default value: 10 GiB. For more information, see the "Specify driver and executor resources" section of the Conf configuration parameters topic.
(Optional) Check whether the submitted job is an elastic import job.
SELECT job_name, (job_type == 3) AS is_elastic_load FROM INFORMATION_SCHEMA.kepler_meta_async_jobs where job_name = "2023081818010602101701907303151******"
Sample result:
+---------------------------------------+------------------+ | job_name | is_elastic_load | +---------------------------------------+------------------+ | 2023081517195102101701907203151****** | 1 | +---------------------------------------+------------------+
If an elastic import job is submitted, 1 is returned for the
is_elastic_load
parameter. If a regular import job is submitted, 0 is returned.