AnalyticDB for MySQL provides a variety of solutions for you to synchronize data in different scenarios. This topic describes how to import objects from Object Storage Service (OSS) to the AnalyticDB for MySQL database named adb_demo
and query data in AnalyticDB for MySQL.
Prerequisites
- A directory is created in an OSS bucket by performing the following operations to store data imported from an AnalyticDB for MySQL cluster:
- Activate OSS. For more information, see Activate OSS.
- Create an OSS bucket. For more information, see Create buckets. Important Make sure that the OSS bucket resides in the same region as the AnalyticDB for MySQL cluster.
- Create a directory. For more information, see Create a directory.
- Upload an object. For more information, see Upload objects. In this example, the
oss_import_test_data.txt
object is uploaded to the<bucket-name>.oss-cn-hangzhou.aliyuncs.com/adb/
directory in OSS. The row delimiter is a line feed, and the column delimiter is a semicolon (;). The following code shows some of the data contained in this object:uid;other 12;hello_world_1 27;hello_world_2 28;hello_world_3 33;hello_world_4 37;hello_world_5 40;hello_world_6 ...
- An AnalyticDB for MySQL cluster is created. A whitelist is configured for the cluster. An account and a database are created for the cluster. For more information, see Use AnalyticDB for MySQL Data Warehouse Edition (V3.0).
Procedure
- Execute a CREATE TABLE statement to create an external table in the
adb_demo
database. For more information about how to create an OSS external table for an object that is in the CSV, Parquet, or TEXT format, see the "Syntax for creating an OSS external table" section of this topic. - Query OSS data.
You can query external table data in the same manner as you query AnalyticDB for MySQL table data. Example:
select uid, other from oss_import_test_external_table where uid < 100 limit 10;
- If an object is in the CSV or TEXT format and contains a large amount of data, we recommend that you import the object to AnalyticDB for MySQL before you query the data. Otherwise, the query performance may be compromised.
- If an object is in the Parquet format, you can determine whether to directly query the data or import the object to AnalyticDB for MySQL before you query the data.
- Execute a CREATE TABLE statement to create a destination table named
adb_oss_import_test
in theadb_demo
database to store the data imported from OSS.CREATE TABLE IF NOT EXISTS adb_oss_import_test ( uid string, other string ) DISTRIBUTED BY HASH(uid);
- Execute an INSERT statement to import data from OSS to AnalyticDB for MySQL. Important By default,
INSERT INTO
andINSERT OVERWRITE INTO SELECT
statements import data synchronously. If hundreds of gigabytes of data is imported, the import may fail due to a disconnection from the client to the AnalyticDB for MySQL server. Therefore, if you want to import a large amount of data, we recommend that you execute theSUB JOB INSERT OVERWRITE INTO SELECT
statement to import data asynchronously.- Method 1: Execute the INSERT INTO statement to import data. If the primary key has duplicate values, data is not repeatedly inserted and the INSERT INTO statement is equivalent to the
INSERT IGNORE INTO
statement. For more information, see INSERT INTO. Sample statement:INSERT INTO adb_oss_import_test SELECT * FROM oss_import_test_external_table;
- Method 2: Execute the INSERT OVERWRITE INTO statement to import data. If the primary key has duplicate values, the original value is overwritten by the new value. Sample statement:
INSERT OVERWRITE INTO adb_oss_import_test SELECT * FROM oss_import_test_external_table;
- Method 3: Asynchronously execute the INSERT OVERWRITE INTO statement to import data. Typically,
SUBMIT JOB
is used to submit an asynchronous task. You can add a hint (/* direct_batch_load=true*/
) before the data import statement to accelerate the task. For more information, see Asynchronous writing. Sample statement:SUBMIT JOB INSERT OVERWRITE INTO adb_oss_import_test SELECT * FROM oss_import_test_external_table;
The following information is returned:+---------------------------------------+ | job_id | +---------------------------------------+ | 2020112122202917203100908203303****** |
For more information about how to submit asynchronous tasks, see Asynchronously submit an import task.
- Method 1: Execute the INSERT INTO statement to import data. If the primary key has duplicate values, data is not repeatedly inserted and the INSERT INTO statement is equivalent to the
- Execute the following statement to query the data of the
adb_oss_import_test
table:SELECT * FROM adb_oss_import_test;
Syntax for creating an OSS external table
- Create an OSS external table for an object that is in the CSV format
The
oss_import_test_data.txt
object is in the CSV format. The following section describes how to create an OSS external table for an object that is in the CSV format:CREATE TABLE IF NOT EXISTS oss_import_test_external_table ( uid string, other string ) ENGINE='OSS' TABLE_PROPERTIES='{ "endpoint":"oss-cn-hangzhou-internal.aliyuncs.com", "url":"oss://<bucket-name>/adb/oss_import_test_data.csv", "accessid":"LTAIF****5FsE", "accesskey":"Ccw****iWjv", "delimiter":";", "skip_header_line_count":1 }';
Parameter Required Description ENGINE='OSS'
Yes The storage engine that is used for the external table, which is OSS. TABLE_PROPERTIES
The connection information that is used by AnalyticDB for MySQL to access OSS. endpoint
The endpoint of the OSS bucket. Note AnalyticDB for MySQL can access OSS only from Elastic Compute Service (ECS) instances over VPCs.You can log on to the OSS console, find the bucket, and then view the endpoint on the Overview page.
url
The absolute path of the source object or directory in OSS. We recommend that you use a forward slash (/) at the end of the absolute path of the directory. Examples:- Object:
oss://<bucket-name>/adb/oss_import_test_data.csv
. - Directory:
oss://<bucket-name>/adb_data/
.
accessid
The AccessKey ID that is used to access the object or directory in OSS. For more information about how to obtain an AccessKey ID and an AccessKey secret, see Obtain an AccessKey pair.
accesskey
The AccessKey secret that is used to access the object or directory in OSS. delimiter
The column delimiter of the object in the CSV format. For example, you can set the column delimiter to a comma (,). null_value
No The NULL
value of the object in the CSV format. By default, an empty string is defined asNULL
, which is"null_value": ""
.Note This parameter is supported only for AnalyticDB for MySQL clusters of V3.1.4.2 or later. For more information, see Release notes.ossnull
The rule for defining the NULL
value of the object in the CSV format. Default value: 1. Valid values:- 1:
EMPTY_SEPARATORS
. Only empty strings are defined asNULL
.For example,
a,"",,c
is expressed as "a","",NULL,"c" based on this rule. - 2:
EMPTY_QUOTES
. Only quotation marks (""
) are defined asNULL
.For example,
a,"",,c
is expressed as "a",NULL,"","c" based on this rule. - 3:
BOTH
. Both empty strings and quotation marks (""
) are defined asNULL
.For example,
a,"",,c
is expressed as "a",NULL,NULL,"c" based on this rule. - 4:
NEITHER
. Empty strings and quotation marks (""
) are not defined asNULL
.For example,
a,"",,c
is expressed as "a","","","c" based on this rule.
Note The preceding examples are provided on the premise of"null_value": ""
.skip_header_line_count
The number of header rows to skip when you import data. The first row of a CSV object is the table header. If you set this parameter to 1, the first row of the object is skipped when you import data. The default value of this parameter is 0, which indicates that no rows are skipped.
oss_ignore_quote_and_escape
Specifies whether to ignore quotation marks (") and escape characters. The default value of this parameter is false
, which indicates that quotation marks (") and escape characters are not ignored.Note This parameter is supported only for AnalyticDB for MySQL clusters of V3.1.4.2 or later. For more information, see Release notes.- Object:
- Create an OSS external table for an object that is in the Parquet or ORC format
- The following statement shows how to create an external table in the Parquet format:
CREATE TABLE IF NOT EXISTS oss_import_test_external_table ( uid string, other string ) ENGINE='OSS' TABLE_PROPERTIES='{ "endpoint":"oss-cn-hangzhou-internal.aliyuncs.com", "url":"oss://<bucket-name>/adb_data/", "accessid":"LTAIF****5FsE", "accesskey":"Ccw****iWjv", "format":"parquet" }';
Parameter Description ENGINE= 'OSS'
The storage engine that is used for the external table, which is OSS. TABLE_PROPERTIES
The connection information that is used by AnalyticDB for MySQL to access OSS. endpoint
The endpoint of the OSS bucket. Note AnalyticDB for MySQL can access OSS only from ECS instances over VPCs.You can log on to the OSS console, find the bucket, and then view the endpoint on the Overview page.
url
The absolute path of the source object or directory in OSS. We recommend that you use a forward slash (/) at the end of the absolute path of the directory. Examples:- Object:
oss://<bucket-name>/adb/oss_import_test_data.parquet
. - Directory:
oss://<bucket-name>/adb_data/
.
Note When you create an external table, specify an actual absolute path asurl
.accessid
The AccessKey ID that is used to access the object or directory in OSS. For more information about how to obtain an AccessKey ID and an AccessKey secret, see Obtain an AccessKey pair.
accesskey
The AccessKey secret that is used to access the object or directory in OSS. format
The format of the object. - When you create an external table in the Parquet format, you must set this parameter to
parquet
. - When you create an external table in the ORC format, you must set this parameter to
orc
.
Note If the format parameter is not specified, the CSV format is used.Note- The column names used in the statement to create an external table must be the same as those in the Parquet or ORC file. Column names are case-insensitive. The sequence of the columns in the statement must be the same as that in the Parquet or ORC file.
- When you create an external table, you can choose only specific columns in a Parquet or ORC file as the columns of the external table. Columns that are not selected in the Parquet or ORC file are not imported.
- If the statement used to create an external table contains a column that is not in the Parquet or ORC file, NULL is returned for this column.
When you create an external table for an object that is in the Parquet format, take note of the following mappings between data types:
The following table describes the mappings between data types in Parquet and AnalyticDB for MySQL V3.0.Basic type in Parquet Logical type in Parquet Data type in AnalyticDB for MySQL V3.0 BOOLEAN None BOOLEAN INT32 INT_8 TINYINT INT32 INT_16 SMALLINT INT32 None INT or INTEGER INT64 None BIGINT FLOAT None FLOAT DOUBLE None DOUBLE - FIXED_LEN_BYTE_ARRAY
- BINARY
- INT64
- INT32
DECIMAL DECIMAL BINARY UTF-8 - VARCHAR
- STRING
- JSON (JSON is available if an object in the Parquet format contains a column in the JSON format.)
INT32 DATE DATE INT64 TIMESTAMP_MILLIS TIMESTAMP or DATETIME INT96 None TIMESTAMP or DATETIME Important External tables in the Parquet format do not support theSTRUCT
data type. Tables cannot be created for this data type. - Object:
Create an OSS external table for a partitioned object that is in the Parquet or CSV format
parquet_partition_classic/
├── p1=2020-01-01
│ ├── p2=4
│ │ ├── p3=SHANGHAI
│ │ │ ├── 000000_0
│ │ │ └── 000000_1
│ │ └── p3=SHENZHEN
│ │ └── 000000_0
│ └── p2=6
│ └── p3=SHENZHEN
│ └── 000000_0
├── p1=2020-01-02
│ └── p2=8
│ ├── p3=SHANGHAI
│ │ └── 000000_0
│ └── p3=SHENZHEN
│ └── 000000_0
└── p1=2020-01-03
└── p2=6
├── p2=HANGZHOU
└── p3=SHENZHEN
└── 000000_0
CREATE TABLE IF NOT EXISTS oss_parquet_partition_table
(
uid varchar,
other varchar,
p1 date,
p2 int,
p3 varchar
)
ENGINE='OSS'
TABLE_PROPERTIES='{
"endpoint":"oss-cn-hangzhou-internal.aliyuncs.com",
"url":"oss://<bucket-name>/adb/oss_parquet_data_dir",
"accessid":"LTAIF****5FsE",
"accesskey":"Ccw****iWjv",
"format":"parquet",
"partition_column":"p1, p2, p3"
}';
- The partition_column property of
TABLE_PROPERTIES
must declare the partition columns (such as p1, p2, and p3 in the example). The sequence of partition columns specified by partition_column must be the same as the sequence of partition columns defined in the statement used to create the external table. - When you define partition columns in an external table, you must specify the columns that contain partitions (such as p1, p2, and p3 in the example) and their data types. You must specify the partition columns at the end of the statement.
- The sequence of partition columns defined in the statement used to create an external table must be the same as the sequence of partition columns specified by the partition_column property.
- Partition columns support the following data types:
BOOLEAN
,TINYINT
,SMALLINT
,INT
,INTEGER
,BIGINT
,FLOAT
,DOUBLE
,DECIMAL
,VARCHAR
,STRING
,DATE
, andTIMESTAMP
. - The syntax used to query partition columns and the way in which the query results are displayed are the same as those for other columns.
- If the format parameter is not specified, the CSV format is used.
References
For more information about how to import data to AnalyticDB for MySQL, see Supported data sources.