You can migrate data from a MySQL data source to an ApsaraDB for SelectDB instance by using Data Transmission Service (DTS), Flink change data capture (CDC), a catalog, or DataWorks. The MySQL data source can be a self-managed MySQL database, an ApsaraDB RDS for MySQL instance, or a PolarDB for MySQL cluster. You can select an appropriate solution to migrate data based on the amount of data to be migrated and your business scenario. This topic describes and compares the solutions used to migrate data from a MySQL data source to an ApsaraDB for SelectDB instance.
Comparison of migration solutions
You can migrate data from a MySQL data source to an ApsaraDB for SelectDB instance by using DTS, Flink CDC, a catalog, or DataWorks. The data that can be migrated varies based on the migration solution. You can select an appropriate solution based on your business requirements.
Solution | Historical data migration | Incremental data synchronization | Schema migration | Database migration | Incremental synchronization of DDL operations | Data verification |
DTS | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ |
Flink CDC | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ❌ |
Catalog | ✔️ | ❌ | ❌ | ❌ | ❌ | ❌ |
DataWorks | ✔️ | ❌ | ❌ | ❌ | ❌ | ❌ |
The following sections describe how to migrate data by using these solutions. For more information, see the relevant references.
Prerequisites
An ApsaraDB for SelectDB instance and a MySQL database are created.
The ApsaraDB for SelectDB instance and the MySQL database are connected over a network. Alternatively, the ApsaraDB for SelectDB instance and MySQL database are connected to the same migration tool.
Migrate data by using DTS
DTS allows you to migrate historical data and synchronize incremental data from a MySQL data source to an ApsaraDB for SelectDB instance. DTS also supports schema migration, DDL operation synchronization, and data verification. In this example, an ApsaraDB RDS for MySQL instance is used to describe how to migrate data from a MySQL data source to an ApsaraDB for SelectDB instance. For more information, see Use DTS to import data.
Procedure
Log on to the ApsaraDB for SelectDB console.
In the top navigation bar, select the region in which the instance that you want to manage resides.
In the left-side navigation pane, click Instances. On the Instances page, find the instance and click its ID to go to the Instance Details page.
In the left-side navigation pane of the Instance Details page, click Data Pipeline. On the page that appears, click the Data Synchronization tab.
NoteIn most cases, data synchronization in DTS involves the migration of existing data and the real-time synchronization of incremental data. Data migration in DTS is generally used to migrate existing historical data.
Click Create Data Synchronization Task. On the Create Data Synchronization Task page, configure the information about the source and destination databases.
In the lower part of the page, click Test Connectivity and Proceed.
Configure the objects to be migrated and advanced settings.
If you select databases as the objects to be synchronized, DTS does not synchronize new tables. If you need to create tables in the source database during data synchronization, select tables as the objects to be synchronized. To modify the objects to be synchronized in a data synchronization task, find the data synchronization task on the Data Synchronization tab and choose More > Reselect Objects in the Actions column.
If you select tables as the objects to be synchronized and you need to edit the tables, such as renaming tables or columns, you can synchronize up to 1,000 tables in a single data synchronization task. If you run a task to synchronize more than 1,000 tables, a request error occurs. In this case, we recommend that you configure multiple tasks to synchronize the tables or configure a task to synchronize the entire database.
To rename an object that you want to synchronize to the destination instance, right-click the object in the Selected Objects section. For more information, see the Map the name of a single object section of the Map object names topic.
To rename multiple objects at a time, click Batch Edit in the upper-right corner of the Selected Objects section. For more information, see the Map multiple object names at a time section of the Map object names topic.
To select the SQL operations performed on a specific database or table, right-click an object in the Selected Objects section. In the dialog box that appears, select the SQL operations that you want to synchronize. For more information, see the "SQL operations that support incremental synchronization" section of the Synchronize data from an ApsaraDB RDS for MySQL instance to an ApsaraDB for SelectDB instance topic.
To specify WHERE conditions to filter data, right-click a table in the Selected Objects section. In the dialog box that appears, specify the conditions. For more information, see Specify filter conditions.
If you use the object name mapping feature to rename an object, other objects that depend on the object may fail to be synchronized.
Optional. In the lower part of the page, click Next: Configure Database and Table Fields. In the dialog box that appears, configure the Primary Key Column, Distribution Key, and Engine parameters for the tables to be synchronized to the ApsaraDB for SelectDB instance.
You can perform this step only if you select Schema Synchronization for the Synchronization Types parameter. To modify the parameters, set the Definition Status parameter to All.
You can select multiple columns for the Primary Key Column parameter from the drop-down list. One or more columns specified for the Primary Key Column parameter can be selected for the Distribution Key parameter. You can select only Unique for the Engine parameter.
Save the task settings and run a precheck. Wait until the success rate becomes 100%. Then, click Next: Purchase Instance.
On the Purchase Instance page, configure the Billing Method and Instance Class parameters for the data synchronization instance, read and agree to the Data Transmission Service (Pay-as-you-go) Service Terms by selecting the check box, and then click Buy and Start to start the data synchronization task. You can view the progress of the task in the task list.
Section | Parameter | Description |
Task information | Task Name | The name of the DTS task. DTS automatically generates a task name. We recommend that you specify a descriptive name that makes it easy to identify the task. You do not need to specify a unique task name. |
Source Database | Database Type | The type of the source database. Select MySQL. |
Access Method | The access method of the source database. Select Alibaba Cloud Instance. | |
Instance Region | The region in which the source ApsaraDB RDS for MySQL instance resides. | |
RDS Instance ID | The ID of the source ApsaraDB RDS for MySQL instance. Example: rm-2z3m****. | |
Database Account | The database account of the source ApsaraDB RDS for MySQL instance. For more information about the permissions that are required for the account, see the "Permissions required for database accounts" section of the Synchronize data from an ApsaraDB RDS for MySQL instance to an ApsaraDB for SelectDB instance topic. | |
Database Password | The password that is used to access the database instance. | |
Encryption | Specifies whether to encrypt the connection to the database. You can select Non-encrypted or SSL-encrypted based on your business requirements. If you want to set this parameter to SSL-encrypted, you must enable SSL encryption for the ApsaraDB RDS for MySQL instance before you configure the DTS task. For more information, see Use a cloud certificate to enable SSL encryption. | |
Destination Database | Database Type | The type of the destination database. Select SelectDB. |
Access Method | The access method of the destination database. Select Alibaba Cloud Instance. | |
Instance Region | The region in which the destination ApsaraDB for SelectDB instance resides. | |
Instance ID | The ID of the destination ApsaraDB for SelectDB instance. | |
Database Account | The database account of the destination ApsaraDB for SelectDB instance. For more information about the permissions that are required for the account, see the "Permissions required for database accounts" section of the Synchronize data from an ApsaraDB RDS for MySQL instance to an ApsaraDB for SelectDB instance topic. | |
Database Password | The password that is used to access the database instance. |
Parameter | Description |
Synchronization Types | The synchronization types. By default, Incremental Data Synchronization is selected. You must also select Schema Synchronization and Full Data Synchronization. After the precheck is complete, DTS synchronizes the historical data of the selected objects from the source database to the destination cluster. The historical data is the basis for subsequent incremental synchronization. Important When you synchronize data from a MySQL database to an ApsaraDB for SelectDB instance, data types are converted. If you do not select Schema Synchronization, you must create tables that use the Unique key model with corresponding schemas in the destination ApsaraDB for SelectDB instance in advance. For more information, see the "Data type mappings" section of the Synchronize data from an ApsaraDB RDS for MySQL instance to an ApsaraDB for SelectDB instance topic and Data models. |
Source Objects | Select one or more objects from the Source Objects section and click the icon to add the objects to the Selected Objects section. You can select columns, tables, or databases as the objects to be synchronized. Important |
Selected Objects | Note |
Use Flink CDC to migrate data
Flink provides Flink SQL, Flink CDC, and DataStream that you can use to migrate data from a MySQL database to an ApsaraDB for SelectDB instance. You can use Flink CDC to migrate historical data, incremental data, and schemas, and synchronize DDL operations. In this example, Flink CDC is used to describe how to synchronize data from an upstream MySQL database to an ApsaraDB for SelectDB instance. For more information, see Import data by using Flink.
Procedure
Prepare the environment
Build a Flink environment. In this example, a Flink standalone cluster of version 1.16 is deployed.
Download the flink-1.16.3-bin-scala_2.12.tgz package and decompress the package. Sample code:
wget https://dlcdn.apache.org/flink/flink-1.16.3/flink-1.16.3-bin-scala_2.12.tgz tar -zxvf flink-1.16.3-bin-scala_2.12.tgz
Download the flink-sql-connector-mysql-cdc-2.4.2 and flink-doris-connector-1.16-1.5.2 packages to the FLINK_HOME/lib directory. Sample code:
cd flink-1.16.3 cd lib/ wget https://repo1.maven.org/maven2/com/ververica/flink-sql-connector-mysql-cdc/2.4.2/flink-sql-connector-mysql-cdc-2.4.2.jar wget https://repo.maven.apache.org/maven2/org/apache/doris/flink-doris-connector-1.16/1.5.2/flink-doris-connector-1.16-1.5.2.jar
Start a Flink standalone cluster. Sample code:
bin/start-cluster.sh
Create an ApsaraDB for SelectDB instance. For more information, see Create an instance.
Connect to the ApsaraDB for SelectDB instance over the MySQL protocol. For more information, see Connect to an instance.
Create a test database and a test table.
Execute the following statement to create a test database:
CREATE DATABASE test_db;
Execute the following statements to create a test table:
USE test_db; CREATE TABLE employees ( emp_no int NOT NULL, birth_date date, first_name varchar(20), last_name varchar(20), gender char(2), hire_date date ) UNIQUE KEY(`emp_no`) DISTRIBUTED BY HASH(`emp_no`) BUCKETS 1;
Synchronize data by using Flink CDC
The following sample code shows the syntax of Flink CDC:
<FLINK_HOME>/bin/flink run \
-Dexecution.checkpointing.interval=10s \
-Dparallelism.default=1 \
-c org.apache.doris.flink.tools.cdc.CdcTools \
lib/flink-doris-connector-1.16-1.5.2.jar \
mysql-sync-database \
--database test_db \
--including-tables "tbl1|test.*" \
--mysql-conf hostname=127.0.0.1 \
--mysql-conf username=root \
--mysql-conf password=123456 \
--mysql-conf database-name=mysql_db \
--sink-conf fenodes=selectdb-cn-****.selectdbfe.rds.aliyuncs.com:8080 \
--sink-conf username=admin \
--sink-conf password=****
Parameter | Description |
execution.checkpointing.interval | The checkpoint interval, which affects the frequency of data synchronization. We recommend that you set this parameter to 10s. |
parallelism.default | The parallelism of the Flink job. You can appropriately increase the parallelism to accelerate data synchronization. |
database | The name of the database to which data is synchronized in the ApsaraDB for SelectDB instance. |
including-tables | The MySQL tables from which data is to be synchronized. You can use vertical bars (|) to separate multiple table names. Regular expressions are also supported. Example: |
excluding-tables | The tables to be excluded. You can specify this parameter in the same way as you specify the including-tables parameter. |
mysql-conf | The configuration items of the MySQL CDC source. For more information about the configuration items, see MySQL CDC Connector. The |
sink-conf | The configuration items of Doris Sink. For more information, see the "Configuration items of Doris Sink" section of the Import data by using Flink topic. |
table-conf | The configuration items of the ApsaraDB for SelectDB table. The configuration items are contained in the properties when the ApsaraDB for SelectDB table is created. |
To synchronize data, you must install the dependencies of Flink CDC, such as flink-sql-connector-mysql-cdc-${version}.jar and flink-sql-connector-oracle-cdc-${version}.jar, in the $FLINK_HOME/lib directory.
If you want to synchronize full data from a database, the Flink version must be 1.15 or later. For more information about how to download Flink Doris Connector of different versions, visit org/apache/doris.
Migrate data by using a catalog
ApsaraDB for SelectDB provides the catalog feature to access MySQL data sources by performing federated queries. This allows you to easily and quickly migrate historical data from a MySQL data source to an ApsaraDB for SelectDB instance. The following example shows how to synchronize data from an upstream MySQL database to an ApsaraDB for SelectDB instance by using a catalog. For more information, see JDBC data source.
Procedure
Connect to the SelectDB instance. For more information, see Connect to an instance.
NoteIf you log on to the ApsaraDB for SelectDB instance by using Data Management (DMS), the
SWITCH
command becomes invalid. We recommend that you connect to the ApsaraDB for SelectDB instance by using a MySQL client.Create a Java Database Connectivity (JDBC) catalog for the MySQL data source.
CREATE CATALOG jdbc_mysql PROPERTIES (
"type"="jdbc",
"user"="root",
"password"="123456",
"jdbc_url" = "jdbc:mysql://127.0.0.1:3306/demo",
"driver_url" = "mysql-connector-java-8.0.25.jar",
"driver_class" = "com.mysql.cj.jdbc.Driver",
"checksum" = "fdf55dcef04b09f2eaf42b75e61ccc9a"
)
Parameters
Parameter | Required | Default value | Description |
user | Yes | No default value | The account that is used to connect to the MySQL database. |
password | Yes | No default value | The password of the account that is used to connect to the MySQL database. |
jdbc_url | Yes | No default value | The JDBC URL that is used to connect to the MySQL database. |
driver_url | Yes | No default value | The name of the JAR package of the JDBC driver. |
driver_class | Yes | No default value | The name of the JDBC driver class. |
lower_case_table_names | No | "false" | Specifies whether to synchronize the names of the databases and tables of the external JDBC data source in lowercase. |
only_specified_database | No | "false" | Specifies whether to migrate data only from specific databases. |
include_database_list | No | "" | The names of the databases from which you want to migrate data. This parameter takes effect only if you set the |
exclude_database_list | No | "" | The names of the databases from which you do not want to migrate data. This parameter takes effect only if you set the |
Create a table in the ApsaraDB for SelectDB instance and then execute the
INSERT INTO SELECT
statement of the extract, transform, and load (ETL) syntax to synchronize data. For more information, see Import data by using INSERT INTO statements.
# Create a table.
CREATE TABLE selectdb_table ...
# Synchronize data.
INSERT INTO selectdb_table SELECT * FROM mysql_catalog.mysql_database.mysql_table;
Migrate data by using DataWorks
ApsaraDB for SelectDB allows you to use the Data Integration feature of DataWorks to import historical data. This section describes how to use DataWorks to synchronize data from a MySQL database to an ApsaraDB for SelectDB instance. For more information, see Import data by using DataWorks.
You cannot write fields of the BITMAP, HyperLogLog (HLL), or QUANTILE_STATE data type.
Develop a data synchronization task
Add data sources
Before you configure a data synchronization task, you must add a MySQL data source and an ApsaraDB for SelectDB data source to DataWorks.
For more information about how to add a MySQL data source, see MySQL data source.
For more information about how to add an ApsaraDB for SelectDB data source, see Add and manage data sources. The following table describes the parameters that are used to add an ApsaraDB for SelectDB data source.
Parameter
Description
Data Source Name
The name of the data source.
Host Address/IP Address
The JDBC URL in the
jdbc:mysql://<ip>:<port>/<dbname>
format.To obtain the virtual private cloud (VPC) endpoint or public endpoint and MySQL port of an ApsaraDB for SelectDB instance, perform the following operations: Log on to the ApsaraDB for SelectDB console and go to the Instance Details page of the instance whose information you want to view. In the Network Information section of the Basic Information page, view the values of the VPC Endpoint or Public Endpoint parameter and the MySQL Port parameter.
Example:
jdbc:mysql://selectdb-cn-4xl3jv1****.selectdbfe.rds.aliyuncs.com:9030/test_db
HTTP Connection Address
The IP address and port number that are used to access the ApsaraDB for SelectDB instance over HTTP. The value is in the
<ip>:<port>
format.To obtain the VPC endpoint or public endpoint and HTTP port of an ApsaraDB for SelectDB instance, perform the following operations: Log on to the ApsaraDB for SelectDB console and go to the Instance Details page of the instance whose information you want to view. In the Network Information section of the Basic Information page, view the values of the VPC Endpoint or Public Endpoint parameter and the HTTP Port parameter.
Example:
selectdb-cn-4xl3jv1****.selectdbfe.rds.aliyuncs.com:8080
.Username
The username of the owner account of the ApsaraDB for SelectDB instance.
Password
The password of the owner account of the ApsaraDB for SelectDB instance.
Configure a single-table batch synchronization task
You can configure a batch synchronization task by using the codeless user interface (UI) or the code editor. For more information, see the following topics: