ApsaraDB for SelectDB allows you to use the Data Integration feature of DataWorks to import data from a table by using SelectDB Writer. This topic describes how to use DataWorks to synchronize data from a MySQL data source to an ApsaraDB for SelectDB instance.
Usage notes
If you use the Data Integration feature of DataWorks to import data to an ApsaraDB for SelectDB instance, the data must be written offline.
You cannot write fields of the BITMAP, HyperLogLog (HLL), or QUANTILE_STATE data type.
Develop a data synchronization task
Add data sources
Before you configure a data synchronization task, you must add a MySQL data source and an ApsaraDB for SelectDB data source to DataWorks.
For more information about how to add a MySQL data source, see MySQL data source.
For more information about how to add an ApsaraDB for SelectDB data source, see Add and manage data sources. The following table describes the parameters that are used to add an ApsaraDB for SelectDB data source.
Parameter
Description
Data Source Name
The name of the data source.
JDBC URL
The Java Database Connectivity (JDBC) URL in the
jdbc:mysql://<ip>:<port>/<dbname>
format.To obtain the virtual private cloud (VPC) endpoint or public endpoint and MySQL port of an ApsaraDB for SelectDB instance, perform the following operations: Log on to the ApsaraDB for SelectDB console and go to the Instance Details page of the instance whose information you want to view. In the Network Information section of the Basic Information tab, view the values of the VPC Endpoint or Public Endpoint parameter and the MySQL Port parameter.
Example:
jdbc:mysql://selectdb-cn-4xl3jv1****.selectdbfe.rds.aliyuncs.com:9030/test_db
NoteIf the MySQL instance and the ApsaraDB for SelectDB instance reside in the same VPC, use the VPC endpoint for this parameter. If the MySQL instance and the ApsaraDB for SelectDB instance reside in different VPCs, use the public endpoint for this parameter.
HTTP address
The IP address and port number that are used to access the ApsaraDB for SelectDB instance over HTTP. The value is in the
<ip>:<port>
format.To obtain the VPC endpoint or public endpoint and HTTP port of an ApsaraDB for SelectDB instance, perform the following operations: Log on to the ApsaraDB for SelectDB console and go to the Instance Details page of the instance whose information you want to view. In the Network Information section of the Basic Information tab, view the values of the VPC Endpoint or Public Endpoint parameter and the HTTP Port parameter.
Example:
selectdb-cn-4xl3jv1****.selectdbfe.rds.aliyuncs.com:8080
NoteIf the MySQL instance and the ApsaraDB for SelectDB instance reside in the same VPC, use the VPC endpoint for this parameter. If the MySQL instance and the ApsaraDB for SelectDB instance reside in different VPCs, use the public endpoint for this parameter.
Username
The username of the owner account of the ApsaraDB for SelectDB instance.
Password
The password of the owner account of the ApsaraDB for SelectDB instance.
ImportantTo ensure that the data source can be added, you must add the IP address of the resource group in DataWorks to the IP address whitelist of the data source such as ApsaraDB for SelectDB. For more information, see Configure an IP address whitelist. When you configure IP address whitelists, you must add the IP addresses of different types of resource groups to different IP address whitelists. For example, the IP addresses of an exclusive resource group for Data Integration and a shared resource group for Data Integration must be added to different IP address whitelists.
Configure a batch synchronization task to synchronize data from a single table
You can configure a batch synchronization task by using the codeless UI or the code editor. For more information, see the following topics:
Configure a batch synchronization task by using the codeless UI
Configure a batch synchronization task by using the code editor
For more information about the example on how to configure a batch synchronization task by using the code editor and the related parameters, see the Sample code and parameters section of this topic.
Sample code and parameters
Configure a batch synchronization task by using the code editor
If you configure a batch synchronization task by using the code editor, you must write the script based on the format requirements in the code editor. For more information, see Configure a batch synchronization task by using the code editor.
Code for MySQL Reader and SelectDB Writer
{
"type":"job",
"version":"2.0",
"steps":[
{
"stepType":"mysql",
"parameter":{
"column":[
"<id>",
"<table_id>",
"<table_no>",
"<table_name>",
"<table_status>"
],
"connection":[
{
"datasource":"<mysql_datasource>",
"table":[
"<mysql_table_name>"
]
}
],
"where":"",
"splitPk":"",
"encoding":"UTF-8"
},
"name":"Reader",
"category":"reader"
},
{
"stepType":"selectdb",
"parameter":{
"postSql":[
],
"preSql":[
],
"username": "<selectdb_username>",
"password": "<selectdb_password>",
"loadUrl":[
"<ip:port>"
],
"column":[
"<id>",
"<table_id>",
"<table_no>",
"<table_name>",
"<table_status>"
],
"connection":[
{
"datasource":"<selectdb_datasource>",
"table":[
"<selectdb_table_name>"
]
}
],
"maxBatchRows":1000000,
"loadProps":{
"format":"csv",
"column_separator": "\\x01",
"line_delimiter": "\\x02"
}
},
"name":"Writer",
"category":"writer"
}
],
"setting":{
"errorLimit":{
"record":"0"
},
"speed":{
"throttle":false,
"concurrent":1
}
},
"order":{
"hops":[
{
"from":"Reader",
"to":"Writer"
}
]
}
}
Parameters
Parameter | Description |
datasource | The name of the data source. This parameter is required. You can add a data source when you configure a batch synchronization task by using the code editor. The value of this parameter must be the same as the name of the data source to be added. |
table | The name of the table from which data is to be synchronized. This parameter is required. |
column | The names of the columns to which you want to write data in the destination table. Separate the column names with commas (,). This parameter is required. Example: |
loadUrl | The URL that is used to access the ApsaraDB for SelectDB instance. This parameter is required. Format: |
username | The username that is used to access the ApsaraDB for SelectDB database. This parameter is required. |
password | The password that is used to access the ApsaraDB for SelectDB database. This parameter is required. |
preSql | The SQL statement that you want to execute before the batch synchronization task is run. For example, you can set this parameter to the SQL statement that is used to delete outdated data. You can execute only one SQL statement in the codeless UI and multiple SQL statements in the code editor. |
postSql | The SQL statement that you want to execute after the synchronization task is run. For example, you can set this parameter to the SQL statement that is used to add a timestamp. You can execute only one SQL statement on the codeless UI and multiple SQL statements in the code editor. |
maxBatchRows | The maximum number of rows that you can write to the destination table at a time. Default value: 500000. |
loadProps | The parameters of the COPY INTO request, which are used to specify the format of the data to be imported. Default value: json. If the loadProps parameter is left empty or is set to
If you want to import data in the CSV format, you can configure the following settings to specify the CSV format and configure the row and column delimiters. If you do not specify the row and column delimiters, the imported data is converted into strings by default.
|
Supported data types
The following table describes the data types that are supported by ApsaraDB for SelectDB.
Data type | Offline data writing of SelectDB Writer |
INT | Supported |
BIGINT | Supported |
LARGEINT | Supported |
SMALLINT | Supported |
TINYINT | Supported |
BOOLEAN | Supported |
DECIMAL | Supported |
DOUBLE | Supported |
FLOAT | Supported |
CHAR | Supported |
VARCHAR | Supported |
STRING | Supported |
DATE | Supported |
DATEV2 | Supported |
DATETIME | Supported |
DATETIMEV2 | Supported |
ARRAY | Supported |
JSONB | Supported |
BITMAP | Not supported |
HLL | Not supported |
QUANTILE_STATE | Not supported |