All Products
Search
Document Center

ApsaraDB for SelectDB:Migrate data from a MySQL data source

Last Updated:Sep 20, 2024

This topic describes how to migrate data from a MySQL data source to an ApsaraDB for SelectDB instance.

Overview

You can migrate historical data and incremental data from a MySQL data source to an ApsaraDB for SelectDB instance. The MySQL data source can be a self-managed MySQL database, an ApsaraDB RDS for MySQL instance, or a PolarDB for MySQL cluster. Your requirements, such as schema migration, DDL operation synchronization, and data verification, can also be met during data migration. This topic describes how to migrate data from a MySQL data source to an ApsaraDB for SelectDB instance.

Prerequisites

  • An ApsaraDB for SelectDB instance and a MySQL database are created.

  • The ApsaraDB for SelectDB instance and the MySQL database are connected over a network. Alternatively, the ApsaraDB for SelectDB instance and MySQL database are connected to the same migration tool.

Migrate data by using DTS

Data Transmission Service (DTS) allows you to migrate historical data and synchronize incremental data from a MySQL data source to an ApsaraDB for SelectDB instance. DTS also provides the schema migration, DDL operation synchronization, and data verification features. In this example, an ApsaraDB RDS for MySQL instance is used to describe how to migrate data from a MySQL data source to an ApsaraDB for SelectDB instance. For more information, see Use DTS to import data.

Procedure

  1. Log on to the ApsaraDB for SelectDB console.

  2. In the top navigation bar, select the region in which the instance that you want to manage resides.

  3. In the left-side navigation pane, click Instances. On the Instances page, find the instance and click its ID to go to the Instance Details page.

  4. In the left-side navigation pane of the Instance Details page, click Data Pipeline. On the page that appears, click the Data Synchronization tab.

    Note

    In most cases, data synchronization in DTS involves the migration of existing data and the real-time synchronization of incremental data. Data migration in DTS is generally used to migrate existing historical data.

  5. Click Create Data Synchronization Task. On the Create Data Synchronization Task page, configure the information about the source and destination databases.

  6. Section

    Parameter

    Description

    Task information

    Task Name

    The name of the DTS task. DTS automatically generates a task name. We recommend that you specify a descriptive name that makes it easy to identify the task. You do not need to specify a unique task name.

    Source Database

    Database Type

    The type of the source database. Select MySQL.

    Access Method

    The access method of the source database. Select Alibaba Cloud Instance.

    Instance Region

    The region in which the source ApsaraDB RDS for MySQL instance resides.

    RDS Instance ID

    The ID of the source ApsaraDB RDS for MySQL instance. Example: rm-2z3m****.

    Database Account

    The database account of the source ApsaraDB RDS for MySQL instance. For more information about the permissions that are required for the account, see the "Permissions required for database accounts" section of the Synchronize data from an ApsaraDB RDS for MySQL instance to an ApsaraDB for SelectDB instance topic.

    Database Password

    The password that is used to access the database instance.

    Encryption

    Specifies whether to encrypt the connection to the database. You can select Non-encrypted or SSL-encrypted based on your business requirements. If you want to set this parameter to SSL-encrypted, you must enable SSL encryption for the ApsaraDB RDS for MySQL instance before you configure the DTS task. For more information, see Use a cloud certificate to enable SSL encryption.

    Destination Database

    Database Type

    The type of the destination database. Select SelectDB.

    Access Method

    The access method of the destination database. Select Alibaba Cloud Instance.

    Instance Region

    The region in which the destination ApsaraDB for SelectDB instance resides.

    Instance ID

    The ID of the destination ApsaraDB for SelectDB instance.

    Database Account

    The database account of the destination ApsaraDB for SelectDB instance. For more information about the permissions that are required for the account, see the "Permissions required for database accounts" section of the Synchronize data from an ApsaraDB RDS for MySQL instance to an ApsaraDB for SelectDB instance topic.

    Database Password

    The password that is used to access the database instance.

  7. In the lower part of the page, click Test Connectivity and Proceed.

  8. Configure the objects to be migrated and advanced settings.

  9. Parameter

    Description

    Synchronization Types

    The synchronization types. By default, Incremental Data Synchronization is selected. You must also select Schema Synchronization and Full Data Synchronization. After the precheck is complete, DTS synchronizes the historical data of the selected objects from the source database to the destination cluster. The historical data is the basis for subsequent incremental synchronization.

    Important

    When you synchronize data from a MySQL database to an ApsaraDB for SelectDB instance, data types are converted. If you do not select Schema Synchronization, you must create tables that use the Unique key model with corresponding schemas in the destination ApsaraDB for SelectDB instance in advance. For more information, see the "Data type mappings" section of the Synchronize data from an ApsaraDB RDS for MySQL instance to an ApsaraDB for SelectDB instance topic and Data models.

    Source Objects

    Select one or more objects from the Source Objects section and click the 向右 icon to add the objects to the Selected Objects section. You can select columns, tables, or databases as the objects to be synchronized.

    Important
    • If you select databases as the objects to be synchronized, DTS does not synchronize new tables. If you need to create tables in the source database during data synchronization, select tables as the objects to be synchronized. To modify the objects to be synchronized in a data synchronization task, find the data synchronization task on the Data Synchronization tab and choose More > Reselect Objects in the Actions column.

    • If you select tables as the objects to be synchronized and you need to edit the tables, such as renaming tables or columns, you can synchronize up to 1,000 tables in a single data synchronization task. If you run a task to synchronize more than 1,000 tables, a request error occurs. In this case, we recommend that you configure multiple tasks to synchronize the tables or configure a task to synchronize the entire database.

    Selected Objects

    • To rename an object that you want to synchronize to the destination instance, right-click the object in the Selected Objects section. For more information, see the Map the name of a single object section of the Map object names topic.

    • To rename multiple objects at a time, click Batch Edit in the upper-right corner of the Selected Objects section. For more information, see the Map multiple object names at a time section of the Map object names topic.

    Note
    • To select the SQL operations performed on a specific database or table, right-click an object in the Selected Objects section. In the dialog box that appears, select the SQL operations that you want to synchronize. For more information, see the "SQL operations that support incremental synchronization" section of the Synchronize data from an ApsaraDB RDS for MySQL instance to an ApsaraDB for SelectDB instance topic.

    • To specify WHERE conditions to filter data, right-click the object in the Selected Objects section. In the dialog box that appears, specify the conditions. For more information, see Specify filter conditions.

    • If you use the object name mapping feature to rename an object, other objects that depend on the object may fail to be synchronized.

  10. Optional. In the lower part of the page, click Next: Configure Database and Table Fields. In the dialog box that appears, configure the Primary Key Column, Distribution Key, and Engine parameters for the tables to be synchronized to the ApsaraDB for SelectDB instance.

  11. Note
    • You can perform this step only if you select Schema Synchronization for the Synchronization Types parameter. To modify the parameters, set the Definition Status parameter to All.

    • You can select multiple columns for the Primary Key Column parameter from the drop-down list. One or more columns specified for the Primary Key Column parameter can be selected for the Distribution Key parameter. You can select only Unique for the Engine parameter.

  12. Save the task settings and run a precheck. Wait until the success rate becomes 100%. Then, click Next: Purchase Instance.

  13. On the Purchase Instance page, configure the Billing Method and Instance Class parameters for the data synchronization instance, read and agree to the Data Transmission Service (Pay-as-you-go) Service Terms by selecting the check box, and then click Buy and Start to start the data synchronization task. You can view the progress of the task in the task list.

Use Flink CDC to migrate data

Flink provides Flink SQL, Flink change data capture (CDC), and DataStream that you can use to migrate data from a MySQL database to an ApsaraDB for SelectDB instance. You can use Flink CDC to migrate historical data, incremental data, and schemas, and synchronize DDL operations. In this example, Flink CDC is used to describe how to synchronize data from an upstream MySQL database to an ApsaraDB for SelectDB instance. For more information, see Import data by using Flink.

Procedure

Prepare the environment

Build a Flink environment. In this example, a Flink standalone cluster of version 1.16 is deployed.

  1. Download the flink-1.16.3-bin-scala_2.12.tgz package and decompress the package. Sample code:

    wget https://dlcdn.apache.org/flink/flink-1.16.3/flink-1.16.3-bin-scala_2.12.tgz
    tar -zxvf flink-1.16.3-bin-scala_2.12.tgz
  2. Download the flink-sql-connector-mysql-cdc-2.4.2 and flink-doris-connector-1.16-1.5.2 packages to the FLINK_HOME/lib directory. Sample code:

    cd flink-1.16.3
    cd lib/
    wget https://repo1.maven.org/maven2/com/ververica/flink-sql-connector-mysql-cdc/2.4.2/flink-sql-connector-mysql-cdc-2.4.2.jar
    wget https://repo.maven.apache.org/maven2/org/apache/doris/flink-doris-connector-1.16/1.5.2/flink-doris-connector-1.16-1.5.2.jar
  3. Start a Flink standalone cluster. Sample code:

    bin/start-cluster.sh
  4. Create an ApsaraDB for SelectDB instance. For more information, see Create an instance.

  5. Connect to the ApsaraDB for SelectDB instance over the MySQL protocol. For more information, see Connect to an instance.

  6. Create a test database and a test table.

    1. Execute the following statement to create a test database:

      CREATE DATABASE test_db;
    2. Execute the following statements to create a test table:

      USE test_db;
      CREATE TABLE employees (
          emp_no       int NOT NULL,
          birth_date   date,
          first_name   varchar(20),
          last_name    varchar(20),
          gender       char(2),
          hire_date    date
      )
      UNIQUE KEY(`emp_no`)
      DISTRIBUTED BY HASH(`emp_no`) BUCKETS 1;

Synchronize data by using Flink CDC

The following sample code shows the syntax of Flink CDC:

<FLINK_HOME>/bin/flink run \
    -Dexecution.checkpointing.interval=10s \
    -Dparallelism.default=1 \
    -c org.apache.doris.flink.tools.cdc.CdcTools \
    lib/flink-doris-connector-1.16-1.5.2.jar \
    mysql-sync-database \
    --database test_db \
    --including-tables "tbl1|test.*" \
    --mysql-conf hostname=127.0.0.1 \
    --mysql-conf username=root \
    --mysql-conf password=123456 \
    --mysql-conf database-name=mysql_db \
    --sink-conf fenodes=selectdb-cn-****.selectdbfe.rds.aliyuncs.com:8080 \
    --sink-conf username=admin \
    --sink-conf password=****

Parameter

Description

execution.checkpointing.interval

The checkpoint interval, which affects the frequency of data synchronization. We recommend that you set this parameter to 10s.

parallelism.default

The parallelism of the Flink job. You can appropriately increase the parallelism to accelerate data synchronization.

database

The name of the database to which data is synchronized in the ApsaraDB for SelectDB instance.

including-tables

The MySQL tables from which data is to be synchronized. You can use vertical bars (|) to separate multiple table names. Regular expressions are also supported. Example: --including-tables table1|tbl.*, which specifies that data is synchronized from table1 and all tables whose names start with tbl.

excluding-tables

The tables to be excluded. You can specify this parameter in the same way as you specify the including-tables parameter.

mysql-conf

The configuration items of the MySQL CDC source. For more information about the configuration items, see MySQL CDC Connector. The hostname, username, password, and database-name parameters are required.

sink-conf

The configuration items of Doris Sink. For more information, see the "Configuration items of Doris Sink" section of the Import data by using Flink topic.

table-conf

The configuration items of the ApsaraDB for SelectDB table. The configuration items are contained in the properties when the ApsaraDB for SelectDB table is created.

Note
  1. To synchronize data, you must install the dependencies of Flink CDC, such as flink-sql-connector-mysql-cdc-${version}.jar and flink-sql-connector-oracle-cdc-${version}.jar, in the $FLINK_HOME/lib directory.

  2. If you want to synchronize full data from a database, the Flink version must be 1.15 or later. For more information about how to download Flink Doris Connector of different versions, visit org/apache/doris.

Migrate data by using a catalog

ApsaraDB for SelectDB provides the catalog feature to access MySQL data sources by performing federated queries. This allows you to easily and quickly migrate historical data from a MySQL data source to an ApsaraDB for SelectDB instance. The following example shows how to synchronize data from an upstream MySQL database to an ApsaraDB for SelectDB instance by using a catalog. For more information, see JDBC data source.

Procedure

  1. Connect to the ApsaraDB for SelectDB instance and create a Java Database Connectivity (JDBC) catalog for the MySQL data source.

CREATE CATALOG jdbc_mysql PROPERTIES (
    "type"="jdbc",
    "user"="root",
    "password"="123456",
    "jdbc_url" = "jdbc:mysql://127.0.0.1:3306/demo",
    "driver_url" = "mysql-connector-java-8.0.25.jar",
    "driver_class" = "com.mysql.cj.jdbc.Driver",
    "checksum" = "fdf55dcef04b09f2eaf42b75e61ccc9a"
)

Parameters

Parameter

Required

Default value

Description

user

Yes

No default value

The account that is used to connect to the MySQL database.

password

Yes

No default value

The password of the account that is used to connect to the MySQL database.

jdbc_url

Yes

No default value

The JDBC URL that is used to connect to the MySQL database.

driver_url

Yes

No default value

The name of the JAR package of the JDBC driver.

driver_class

Yes

No default value

The name of the JDBC driver class.

lower_case_table_names

No

"false"

Specifies whether to synchronize the names of the databases and tables of the external JDBC data source in lowercase.

only_specified_database

No

"false"

Specifies whether to migrate data only from specific databases.

include_database_list

No

""

The names of the databases from which you want to migrate data. This parameter takes effect only if you set the only_specified_database parameter to true. Separate multiple database names with commas (,). Database names are case-sensitive.

exclude_database_list

No

""

The names of the databases from which you do not want to migrate data. This parameter takes effect only if you set the only_specified_database parameter to true. Separate multiple database names with commas (,). Database names are case-sensitive.

  1. Create a table in the ApsaraDB for SelectDB instance and then execute the INSERT INTO SELECT statement of the extract, transform, and load (ETL) syntax to synchronize data.

# Create a table.
CREATE TABLE selectdb_table ...

# Synchronize data.
INSERT INTO selectdb_table SELECT * FROM mysql_catalog.mysql_database.mysql_table;

Migrate data by using DataWorks

ApsaraDB for SelectDB allows you to use the Data Integration feature of DataWorks to import historical data. This section describes how to use DataWorks to synchronize data from a MySQL database to an ApsaraDB for SelectDB instance. For more information, see Import data by using DataWorks.

Important

You cannot write fields of the BITMAP, HyperLogLog (HLL), or QUANTILE_STATE data type.

Develop a data synchronization task

Add data sources

Before you configure a data synchronization task, you must add a MySQL data source and an ApsaraDB for SelectDB data source to DataWorks.

  1. For more information about how to add a MySQL data source, see MySQL data source.

  2. For more information about how to add an ApsaraDB for SelectDB data source, see Add and manage data sources. The following table describes the parameters that are used to add an ApsaraDB for SelectDB data source.

    Parameter

    Description

    Data Source Name

    The name of the data source.

    Host Address/IP Address

    The Java Database Connectivity (JDBC) URL in the jdbc:mysql://<ip>:<port>/<dbname> format.

    To obtain the virtual private cloud (VPC) endpoint or public endpoint and MySQL port of an ApsaraDB for SelectDB instance, perform the following operations: Log on to the ApsaraDB for SelectDB console and go to the Instance Details page of the instance whose information you want to view. In the Network Information section of the Basic Information page, view the values of the VPC Endpoint or Public Endpoint parameter and the MySQL Port parameter.

    Example: jdbc:mysql://selectdb-cn-4xl3jv1****.selectdbfe.rds.aliyuncs.com:9030/test_db

    HTTP Connection Address

    The IP address and port number that are used to access the ApsaraDB for SelectDB instance over HTTP. The value is in the <ip>:<port> format.

    To obtain the VPC endpoint or public endpoint and HTTP port of an ApsaraDB for SelectDB instance, perform the following operations: Log on to the ApsaraDB for SelectDB console and go to the Instance Details page of the instance whose information you want to view. In the Network Information section of the Basic Information page, view the values of the VPC Endpoint or Public Endpoint parameter and the HTTP Port parameter.

    Example: selectdb-cn-4xl3jv1****.selectdbfe.rds.aliyuncs.com:8080.

    Username

    The username of the owner account of the ApsaraDB for SelectDB instance.

    Password

    The password of the owner account of the ApsaraDB for SelectDB instance.

Configure a single-table batch synchronization task

You can configure a batch synchronization task by using the codeless UI or the code editor. For more information, see the following topics:

Compare solutions

The preceding sections describe how to migrate data from a MySQL data source by using DTS, Flink CDC, DataWorks, and a catalog. You can also use a solution, such as mysqldump, to migrate data. You can select a solution based on your technology stack. The following table compares the features that are supported by the common solutions for MySQL data migration.

Historical data migration

Incremental data synchronization

Schema migration

Database migration

Incremental synchronization of DDL operations

Data verification

DTS

✔️

✔️

✔️

✔️

✔️

✔️

Flink CDC

✔️

✔️

✔️

✔️

✔️

Catalog

✔️

DataWorks

✔️