Data Integration is a tool that allows you to import external data into ApsaraDB for SelectDB instances and databases in a visualized manner. This topic describes how to use the Data Integration tool of ApsaraDB for SelectDB.
Prerequisites
An ApsaraDB for SelectDB instance is created. For more information, see Create an instance.
The version of the instance is 3.0.7 or later.
Procedure
Log on to the ApsaraDB for SelectDB console.
In the top navigation bar, select the region in which the instance that you want to manage resides.
In the left-side navigation pane, click Instances. On the Instances page, find the instance and click its ID to go to the Instance Details page.
Click Data Development and Management in the upper-right corner.
If you use the tools of Data Development and Management for the first time, a message appears and prompts you to add the public IP address of your machine to the IP address whitelist named webui_whitelist of the instance. Read the message and click OK.
Select Data Integration from the drop-down list.
If you use Data Integration for the first time and have not logged on to the WebUI system, the WebUI logon page appears.
You can use the admin account to log on to the WebUI system.
If you do not know or forget the password of the admin account, you can reset the password. For more information, see Reset the password of an account.
On the Integration page, perform the following operations based on your business requirements:
If you have not created a data integration task, the Stage page appears after you perform the preceding steps. On the Stage page, you can only create data integration tasks.
Create a data integration task.
Sample dataOSSSample data is used to perform benchmark tests on the performance of analytical databases. You can perform the following steps to import sample data:
Select a sample data type.
Click Create in the upper-left corner. On the New Integration page, select a sample data type in the Sample Data section.
Sample data
Description
ClickBench
The ClickBench datasets.
TPC-H
The TPC-H datasets.
Github Demo
The GitHub events.
SSB-FLAT
The SSB-FLAT datasets.
On the New Integration page, configure the parameters that are described in the following table and click Create and Load.
Parameter
Description
Example
Integration Name
The name of the data integration task.
test
Comment
The description of the data integration task.
test comment
Cluster
The cluster in which you want to run the data integration task.
new_cluster
Sample Data Size
The size of the sample data.
1GB
Select an integration type.
Click Create in the upper-left corner. On the New Integration page, click Object Storage in the Stage section.
Configure the parameters.
On the New Integration - Object Storage OSS page, configure the parameters that are described in the following table and click Confirm.
Parameter
Description
Example
Integration Name
The name of the data integration task.
test
Comment
The description of the data integration task.
test comment
Bucket
The name of the Object Storage Service (OSS) bucket.
test_bucket_name
Default Data File Path
The default path of the file in the OSS bucket.
N/A
Authentication
The authorization method used to access OSS.
Access Key
Access Key
The AccessKey ID of your Alibaba Cloud account.
akdemo
Secret Key
The AccessKey secret of your Alibaba Cloud account.
skdemo
Advanced Settings
The default properties to be used when you integrate and import objects.
N/A
File Configuration
The properties of the objects to be integrated.
N/A
File Type
The file type of OSS objects.
Valid values: JSON, ORC, CSV, Parquet, and Automatic Recognition.
JSON
Compression Method
The compression method of OSS objects.
gz
Column Separator
The column delimiter of data in OSS objects.
\t
Line Delimiter
The row delimiter of data in OSS objects.
\n
File Size
The limits on the size of OSS objects.
Unlimited
Loading Configuration
The default operations to be performed to import objects.
N/A
on Error
Continue: continues to import objects if an error occurs.
Abort: stops importing objects if an error occurs.
Customized: uses the custom policy to import objects if an error occurs.
Abort
Strict Mode
Open: filters out error data after column type conversion. The following rules apply:
Error data refers to NULL values that are generated in destination columns from NOT NULL values of source columns after column type conversion. The strict mode does not apply to destination columns whose NULL values are generated by functions.
If a destination column restricts values to a specific range and a value of the source column supports type conversion but the converted value does not belong to the range, the strict mode does not apply to the destination column. For example, a value of the source column is 10 and the destination column is of the DECIMAL(1,0) type. The value 10 can be converted but the converted value does not belong to the range specified for the destination column. In this case, the strict mode does not apply to the destination column.
Close: does not filter out error data after column type conversion.
Open
Query a data integration task: In the upper-right corner of the Integration page, click the Search icon and enter the name of a data integration task in the search box.
Delete an integration task: In the integration task list on the Integrations page, find the integration task that you want to delete and click the Delete icon in the Actions column.
If you delete a data integration task, the imported data is not affected but the data being imported may be affected.
After a data integration task is deleted, it cannot be recovered.