To meet a variety of data import requirements, AnalyticDB for MySQL allows you to import data by using external tables, DataWorks, Data Transmission Service (DTS), or a Java Database Connectivity (JDBC)-based program. This topic describes the features and use scenarios of the preceding import methods to help you select a suitable data import method.
Use external tables to import data
AnalyticDB for MySQL allows you to access different data sources and create external tables to map external data sources. This way, you can concurrently read external data and import the data to AnalyticDB for MySQL. This data import method maximizes the use of cluster resources to improve import performance.
Basic characteristics
This method is suitable for importing a large amount of data in a single job.
A data import job consumes large amounts of cluster resources to ensure high performance. We recommend that you use this method during off-peak hours.
The imported data is invisible when the data import job is in progress, and becomes visible only after the job is complete.
If you import a partition by using this method, data of the existing partition that uses the same name is overwritten.
When a data import job is completed by using an external table, table indexes are created to improve query performance.
Common scenarios
Data initialization in data warehouses
If you want to initialize and import terabytes of data to AnalyticDB for MySQL for analysis, we recommend that you store the data in OSS or HDFS and then use external tables to import the data.
Accelerated analysis of batch data
Batch data is stored in batch data warehouses such as MaxCompute. Each day, gigabytes or terabytes of data are imported from batch data warehouses to AnalyticDB for MySQL for accelerated analysis.
Methods
You can import data by using the regular import or elastic import method. By default, the regular import method is used. If you use the regular import method, data is read from compute nodes and indexes are created on storage nodes. The regular import method consumes computing and storage resources. If you use the elastic import method, data is read and indexes are created for Serverless Spark jobs. The elastic import method consumes resources of job resource groups and generates fees. The following table describes the comparison between the regular import method and the elastic import method.
Optimization of data import performance
For information about how to improve the performance of the external table-based data import method, see the "Optimize the performance when you use external tables to import data" section of the Optimize data import performance topic.
Use DataWorks to import data
DataWorks provides a visualized data import method that allows you to import data from a variety of data sources to AnalyticDB for MySQL. Compared with external tables, DataWorks is more suitable when you want to import only a small amount of data.
DataWorks To import several hundred gigabytes of data, we recommend that you use external tables instead of DataWorks. For more information, see the "Use external tables to import data" section of this topic.
Common scenarios
Data import every minute or hour
You want to import small amounts of data to AnalyticDB for MySQL every minute or hour for analysis.
Data import from multiple external data sources
You want to import data from multiple data sources such as Tablestore, Redis, and PostgreSQL to AnalyticDB for MySQL.
Methods
To use DataWorks to import data, perform the following steps:
Configure the data source. The following data sources are supported: RDS for MySQL, Oracle, RDS for SQL Server, OSS, MaxCompute, and HDFS. For more information, see the following topics:
Configure the source and destination connections for a synchronization job.
Optimization of data import performance
For information about how to improve the performance of the DataWorks-based data import method, see the "Optimize the performance when you use DataWorks to import data" section of the Optimize data import performance topic.
Use DTS to import data
Data Transmission Service (DTS) is a real-time data streaming service. DTS supports data transmission between data sources such as relational database management system (RDBMS) databases, NoSQL databases, and online analytical processing (OLAP) databases. DTS has the following advantages over traditional data migration and synchronization tools: diverse features, high performance, security, reliability, and ease of use. DTS helps simplify data transmission and allows you to focus on business development. You can use DTS to import data to AnalyticDB for MySQL from a variety of data sources for real-time analysis.
Common scenarios
Real-time data synchronization within seconds
You want to synchronize data in real time from RDS for MySQL or PolarDB for MySQL to AnalyticDB for MySQL for analysis.
Multi-source aggregation
You want to synchronize data from multiple RDS for MySQL instances or PolarDB for MySQL clusters to an AnalyticDB for MySQL cluster for analysis. You can use the multi-table merging feature of DTS to synchronize multiple source tables that use the same schema to a single table of an AnalyticDB for MySQL cluster.
Methods
Use a JDBC-based program to import data
If the method in which external tables or DataWorks is used to import data cannot meet your business requirements, you can use a JDBC-based program to import data in data cleansing and other complex scenarios that involve unstructured data.
Common scenarios
Import after data preprocessing
You want to parse and import the generated log files to AnalyticDB for MySQL in real time.
On-premises data import
You want to import on-premises data that fails to be uploaded to OSS, HDFS, or MaxCompute to AnalyticDB for MySQL.
Usage notes
When you use a JDBC-based program to connect to AnalyticDB for MySQL, you must configure an appropriate JDBC driver. For more information, see the "Supported MySQL JDBC driver versions" section of the Java topic.
If you want to import a large amount of data and reduce the amount of time that is required to complete the import job, we recommend that you configure a connection pool. For more information, see Druid connection pool.
This data import method supports batch import and concurrent import for higher import performance.
For information about how to import streaming data, see Import data from Apache Flink.
For information about how to import non-customized on-premises data, see Use LOAD DATA to import data to Data Warehouse Edition and Use the AnalyticDB for MySQL import tool to import data to Data Warehouse Edition.
Optimization of data import performance
For information about how to improve the performance of the data import method that uses a JDBC-based program, see the "Optimize the performance when you use a JDBC-based program to import data" section of the Optimize data import performance topic.
Use data synchronization to import data
AnalyticDB for MySQL provides a variety of data synchronization features, such as AnalyticDB Pipeline Service (APS), metadata discovery, and Hive data migration. The data synchronization features allow you to discover and organize OSS metadata, synchronize data in real time from Simple Log Service (SLS) and ApsaraMQ for Kafka to AnalyticDB for MySQL, and migrate Hive data to OSS for subsequent data analysis.
Common scenarios
Low-cost storage and analysis
If you want to store and analyze a large amount of logs such as SLS data and messages such as Kafka data in a cost-effective manner, you can use the APS feature to synchronize data in real time from SLS and Kafka to AnalyticDB for MySQL.
Metadata discovery
A large amount of OSS data is difficult to manage and analyze due to the absence of metadata. AnalyticDB for MySQL uses the metadata discovery feature to build metadata levels, such as databases, tables, and partitions, and identify data formats and field information to generate a table schema. The metadata built by using the metadata discovery feature allows data to be identified by other analysis engines for subsequent analysis.
Hive data migration
If you want AnalyticDB for MySQL to perform operations on the data stored in Hive, including metadata management and subsequent analysis, you can use the Hive data migration feature to migrate Hive data to OSS. Then, AnalyticDB for MySQL automatically organizes the metadata and performs subsequent analysis.
Methods
Automatically synchronize the metadata of a PolarDB-X instance to a Data Lakehouse Edition cluster
Use data synchronization to synchronize data from Simple Log Service to Data Warehouse Edition
Use data synchronization to synchronize data from Simple Log Service to Data Lakehouse Edition
Use metadata discovery to import data to Data Lakehouse Edition