All Products
Search
Document Center

ApsaraDB for SelectDB:Migrate data from an Apache Doris database

Last Updated:Nov 05, 2024

This topic describes how to use a catalog to migrate offline data from an Apache Doris database to an ApsaraDB for SelectDB instance.

Prerequisites

  • The IP address of the Doris data source is added to the IP address whitelist of the ApsaraDB for SelectDB instance. For more information, see Configure an IP address whitelist.

  • The Apache Doris version based on which SelectDB Core of ApsaraDB for SelectDB is developed is later than or equal to the Apache Doris version of the Doris data source.

    Note

    An ApsaraDB for SelectDB instance is a cloud-native real-time data warehouse that is built based on Apache Doris. For more information about the version mapping between ApsaraDB for SelectDB and Apache Doris, see Release notes for SelectDB Core.

  • You understand what a catalog is and the operations that you can perform on catalogs. For more information, see Data lakehouse.

Sample environment

In this example, data is migrated from the doris_t table in the Doris database named doris_db to the test_doris2SelectDB table in the ApsaraDB for SelectDB database named test_db. You can modify the corresponding parameters based on the actual cases when you migrate data. Sample environment:

  • Destination database: test_db

  • Destination table: test_doris2SelectDB

  • Source database: doris_db

  • Source table: doris_t

Prepare the sample source data

Log on to your Doris data source and perform the following steps:

  1. Create a database.

    CREATE DATABASE doris_db;
  2. Create a table.

    CREATE TABLE doris_t
    (
        id int,
        name string,
        age int
    )
    DISTRIBUTED BY HASH(id) BUCKETS 4
    PROPERTIES("replication_num" = "1");
  3. Insert data into the table.

    INSERT INTO doris_t VALUES
    (1, 'Alice', 25),
    (2, 'Bob', 30),
    (3, 'Charlie', 35),
    (4, 'David', 40),
    (5, 'Eve', 45);

Procedure

  1. Connect to the ApsaraDB for SelectDB instance. For more information, see Connect to an instance.

    Note

    If you log on to the ApsaraDB for SelectDB instance by using Data Management (DMS), the SWITCH command becomes invalid. We recommend that you connect to the ApsaraDB for SelectDB instance by using a MySQL client.

  2. Create a Java Database Connectivity (JDBC) catalog for the Doris database.

    CREATE CATALOG doris_catalog PROPERTIES (
     "type"="jdbc",
     "user"="root",
     "password"="123456",
     "jdbc_url" = "jdbc:mysql://127.0.0.1:3306/demo",
     "driver_url" = "mysql-connector-java-8.0.25.jar",
     "driver_class" = "com.mysql.cj.jdbc.Driver",
     "checksum" = "fdf55dcef04b09f2eaf42b75e61ccc9a"
    )

    Parameter

    Required

    Default value

    Description

    user

    Yes

    No default value

    The account that is used to log on to the Doris database.

    password

    Yes

    No default value

    The password that is used to log on to the Doris database.

    jdbc_url

    Yes

    No default value

    The JDBC URL that is used to connect to the Doris database. The JDBC URL must include the endpoint of the Doris database.

    driver_url

    Yes

    No default value

    The name of the JAR package of the JDBC driver.

    Note
    • We recommend that you use the mysql-connector-java-8.0.25.jar package.

    • If you want to use another JAR package, submit a ticket for consultation.

    driver_class

    Yes

    No default value

    The class name of the JDBC driver.

    We recommend that you set this parameter to com.mysql.cj.jdbc.Driver.

    lower_case_table_names

    Note

    This parameter is renamed lower_case_meta_names for ApsaraDB for SelectDB instances that run SelectDB Core V4.0.

    No

    "false"

    Specifies whether to migrate the names of the databases and tables of the external JDBC data source in lowercase. Valid values:

    true: maintains the mapping of lowercase names to actual names in the remote system and allows you to query the databases and tables whose names are not in lowercase. In this case, the names of databases, tables, and columns are converted to lowercase names.

    false: does not allow you to query the databases and tables whose names are not in lowercase.

    Important
    • For ApsaraDB for SelectDB instances that run SelectDB Core V3.0, take note of the following items:

      • If the lower_case_table_names parameter of the frontend (FE) is set to 1 or 2, you must set the lower_case_table_names parameter of the catalog to true.

      • If the lower_case_table_names parameter of the FE is set to 0, you can set the lower_case_table_names parameter of the catalog to true or false.

    • For ApsaraDB for SelectDB instances that run SelectDB Core V4.0, take note of the following items:

      • If the lower_case_table_names parameter of the FE is set to 0 or 2, the names of databases, tables, and columns are not converted to lowercase names.

      • If the lower_case_table_names parameter of the FE is set to 1, the names of tables are converted to lowercase names, and the names of databases and columns remain unchanged.

    only_specified_database

    No

    "false"

    Specifies whether to migrate data only from specific databases.

    true: migrates data from specific databases of the data source that is specified in the JDBC URL.

    false: migrates data from all the databases of the data source that is specified in the JDBC URL.

    include_database_list

    No

    ""

    The names of the databases from which you want to migrate data. If you set the only_specified_database parameter to true and want to migrate data from multiple databases, separate multiple database names with commas (,). Database names are case-sensitive.

    exclude_database_list

    No

    ""

    The names of the databases from which you do not want to migrate data. If you set the only_specified_database parameter to true and want to exclude multiple databases, separate multiple database names with commas (,). Database names are case-sensitive.

    meta_names_mapping

    No

    ""

    If the external data source has databases whose names differ only in case, such as DORIS and doris, an error is reported due to ambiguity when ApsaraDB for SelectDB queries the catalog. To resolve the name conflict, you must specify the meta_names_mapping parameter.

    For more information, see Lowercase name synchronization in Apache Doris documentation.

    Important

    This parameter is applicable only to ApsaraDB for SelectDB instances that run SelectDB Core V4.0.

  3. Query the catalog.

    SHOW CATALOGS; -- Check whether the catalog is created.

    The following output is returned:

    +--------------+--------------+----------+-----------+-------------------------+---------------------+------------------------+
    | CatalogId    | CatalogName  | Type     | IsCurrent | CreateTime              | LastUpdateTime      | Comment                |
    +--------------+--------------+----------+-----------+-------------------------+---------------------+------------------------+
    | 436009309195 | doris_catalog | jdbc      |           | 2024-08-06 17:09:08.058 | 2024-07-19 18:04:37 |                        |
    |            0 | internal     | internal | yes       | UNRECORDED              | NULL                | Doris internal catalog |
    +--------------+--------------+----------+-----------+-------------------------+---------------------+------------------------+
  4. Optional. Switch to the doris_catalog directory of the external catalog.

    You can view and access data in the doris_catalog directory of the external catalog in the same way as you view and access data in an internal catalog.

    Note

    ApsaraDB for SelectDB allows you only to read data in an external catalog.

    SWITCH doris_catalog;
  5. Optional. Switch to the internal directory of the internal catalog.

    If you do not perform Step 4, skip this step.

    SWITCH internal;
  6. Optional. Create data.

    If you have created the destination database, skip this step.

    CREATE database test_db;
  7. Switch to the destination database.

    USE test_db;
  8. Create a table.

    If you have created the destination table, check whether the data types of the columns in the destination table are mapped to those in the source table of the Doris database.

    If you have not created the destination table, make sure that the data types of the columns in the destination table are mapped to those in the source table of the Doris database when you create the table.

    For more information about mappings between the data types of columns, see the "Data type mappings" section of the JDBC data source topic.

    CREATE TABLE test_doris2SelectDB
    (
        id int,
        name string,
        age int
    )
    DISTRIBUTED BY HASH(id) BUCKETS 4
    PROPERTIES("replication_num" = "1");
  9. Migrate data.

    INSERT INTO test_doris2SelectDB SELECT *  FROM doris_catalog.doris_db.doris_t;
  10. Query the imported data in the destination table.

    SELECT *  FROM test_doris2SelectDB;

Migrate incremental data

In actual production environments, Doris data is mainly divided into offline data and incremental data. In most cases, you migrate data from Doris to ApsaraDB for SelectDB because you need to replicate data to a data warehouse to accelerate queries. To migrate incremental data, you can use one of the following methods:

  • Replicate and write Doris data to ApsaraDB for SelectDB when the Doris data is generated.

  • Read data from Doris partitions by using periodic jobs and write the data to ApsaraDB for SelectDB.