ApsaraDB for SelectDB works with Object Storage Service (OSS) to provide the high-speed import feature over an internal network. You can use this feature to import hundreds of GB of data at a time. This topic describes how to use OSS to import data to an ApsaraDB for SelectDB instance.
Prerequisites
An AccessKey pair is obtained. For more information, see Create an AccessKey pair.
OSS is activated and an OSS bucket is created. The OSS bucket resides in the same region as the ApsaraDB for SelectDB instance to which you want to import data. For more information, see Get started by using the OSS console.
The OSS bucket resides in the same region as the ApsaraDB for SelectDB instance. OSS can be accessed over a virtual private cloud (VPC).
Syntax for importing data
LOAD LABEL load_label
(
data_desc1[, data_desc2, ...]
)
WITH S3
(
"AWS_PROVIDER" = "OSS",
"AWS_REGION" = "REGION",
"AWS_ENDPOINT" = "ENDPOINT",
"AWS_ACCESS_KEY" = "AccessKey ID",
"AWS_SECRET_KEY"="AccessKey Secret"
)
PROPERTIES
(
"key1"="value1", ...
);
ApsaraDB for SelectDB allows you to import data from all object storage systems that are compatible with the Amazon Simple Storage Service (Amazon S3) protocol. Therefore, the syntax for importing data uses keywords such as AWS and S3.
Parameters
The data_desc1
parameter is used to describe a group of files to be imported. The following sample code provides the complete definition of the parameter.
[MERGE|APPEND|DELETE]
DATA INFILE
(
"file_path1"[, file_path2, ...]
)
[NEGATIVE]
INTO TABLE `table_name`
[PARTITION (p1, p2, ...)]
[COLUMNS TERMINATED BY "column_separator"]
[FORMAT AS "file_type"]
[(column_list)]
[COLUMNS FROM PATH AS (c1, c2, ...)]
[PRECEDING FILTER predicate]
[SET (column_mapping)]
[WHERE predicate]
[DELETE ON expr]
[ORDER BY source_sequence]
[PROPERTIES ("key1"="value1", ...)]
Parameter | Description |
| The type of data merging. By default, this parameter is set to |
| The path of the file to be imported. You can specify multiple values for this parameter. Wildcards are supported. The path that you specify must be a path of a file. If the path is a path of a directory, the import fails. |
| The data of this batch is imported in a negative method. This method is applicable only to a table that is aggregated by using the SUM method and contains integers. In this method, the integers that correspond to the columns aggregated in the imported data by using the SUM method are reversed to offset the error data that was imported. |
| The partitions of the table to be imported. Data that is not in the specified partitions is ignored. |
| The column delimiter. The column delimiter is valid only in files of the |
| The format of the file to be imported. Valid values: |
| The order in which the columns are sorted in the source file. For more information, see Converting Source Data. |
| The columns that are extracted from the file to be imported. |
| The preset conditions for filtering data. The data is merged into source data rows in sequence based on the values of the |
| The function that is used to convert the specified column. |
| The conditions based on which the imported data is filtered. For more information, see Converting Source Data. |
| This parameter must be used together with the |
| The sequence_col column in the data to be imported. This parameter is applicable only to tables in the Unique key model and ensures the data sequence when data is imported. |
| The parameters that specify the format of the file to be imported. If the file to be imported is in the |
The following table describes the parameters that are used to configure the cloud storage service.
Parameter | Description |
| The service provider of the object storage service, which is fixed to OSS. |
| The region in which the OSS data to be accessed resides. |
| The Important Make sure that the OSS bucket and your ApsaraDB for SelectDB instance reside in the same region. |
| The |
| The |
The following table describes the parameters of PROPERTIES
.
Parameter | Description |
| The timeout period for the import job. Unit: seconds. Default value: |
| The maximum ratio of data that can be filtered out. For example, data is filtered out because it does not conform to standards. Default value: |
| The maximum size of memory that is available for the import job. Unit: bytes. Default value: |
| Specifies whether to enable the strict mode for the import job. Default value: |
| The time zone of the time zone-related functions, such as |
| The maximum number of import jobs that can concurrently run at a time. Default value: |
| The maximum number of concurrent jobs to send data for processing in batches. If the value of this parameter is greater than the value of the |
| Specifies whether to import data to only one tablet of the corresponding partition. Default value: |
Examples
Create an ApsaraDB for SelectDB table to which you want to import data. Sample code:
CREATE TABLE test_table ( id int, name varchar(50), age int, address varchar(50), url varchar(500) ) DISTRIBUTED BY HASH(id) BUCKETS 4 PROPERTIES("replication_num" = "1");
Create a file to be imported, which is named
test_file.txt
. Sample code:1,yang,32,shanghai,http://example.com 2,wang,22,beijing,http://example.com 3,xiao,23,shenzhen,http://example.com 4,jess,45,hangzhou,http://example.com 5,jack,14,shanghai,http://example.com 6,tomy,25,hangzhou,http://example.com 7,lucy,45,shanghai,http://example.com 8,tengyin,26,shanghai,http://example.com 9,wangli,27,shenzhen,http://example.com 10,xiaohua,37,shanghai,http://example.com
Import data by using OSS. Sample code:
LOAD LABEL test_db.test_label_1 ( DATA INFILE("s3://your_bucket_name/test_file.txt") INTO TABLE test_table COLUMNS TERMINATED BY "," ) WITH S3 ( "AWS_PROVIDER" = "OSS", "AWS_REGION" = "oss-cn-beijing", "AWS_ENDPOINT" = "oss-cn-beijing-internal.aliyuncs.com", "AWS_ACCESS_KEY" = "<your_access_key>", "AWS_SECRET_KEY"="<your_secret_key>" ) PROPERTIES ( "timeout" = "3600" );
NoteThe path of the OSS bucket must start with s3://.