MaxCompute provides a variety of data upload and download tools. You can find the source code for most of these tools on the open source community GitHub, in which the tools are maintained. You can select a tool to upload and download data based on the data migration scenario.
Alibaba Cloud services
- MaxCompute client (Tunnel)
- The MaxCompute client provides built-in Tunnel commands for data upload and download based on the Tunnel SDK. For more information about Tunnel commands, see Tunnel commands.
- For more information about how to install and use the MaxCompute client, see MaxCompute client.
Note This is an open-source program. You can visit aliyun-odps-console to view the source code. - Data Integration of DataWorks (Tunnel)
Data Integration of DataWorks is a stable, efficient, and scalable data synchronization platform. You can use Data Integration for full offline and incremental real-time data synchronization, integration, and exchange between heterogeneous data storage systems on Alibaba Cloud.
Data synchronization tasks support the following data sources: MaxCompute, ApsaraDB RDS (MySQL, SQL Server, and PostgreSQL), Oracle, FTP, AnalyticDB, Object Storage Service (OSS), ApsaraDB for Memcache, and PolarDB-X. For more information, see Data Integration overview.
- DTS (Tunnel)
What is DTS?is an Alibaba Cloud data service that supports data exchange between data sources, such as Relational Database Management System (RDBMS), NoSQL, and Online Analytical Processing (OLAP) databases. DTS provides data transmission features, such as data migration, real-time data subscription, and real-time data synchronization.
DTS can synchronize data from ApsaraDB RDS and MySQL instances to MaxCompute tables in real time. Other data source types are not supported.
Open-source software
- Sqoop (Tunnel)
Sqoop 1.4.6 in the community is further developed to provide enhanced support for MaxCompute. You can use Sqoop to import data from relational databases such as MySQL and data from HDFS or Hive to MaxCompute tables. You can also use Sqoop to export data from MaxCompute tables to relational databases such as MySQL.
Note This is an open-source program. You can visit aliyun-maxcompute-data-collectors to view the source code. - Kettle (Tunnel)
Kettle is an open-source extract, transform, load (ETL) tool that is developed in Java. Kettle runs on Windows, UNIX, or Linux and provides graphic interfaces for you to define a data transmission topology by using drag-and-drop components.
Note This is an open-source program. You can visit aliyun-maxcompute-data-collectors to view the source code. - Apache Flume (DataHub)
Apache Flume is a distributed and reliable system that collects large amounts of log data from data sources and aggregates and stores the data in a centralized data storage. Apache Flume supports various Source and Sink plug-ins.
The DataHub Sink plug-in of Apache Flume allows you to upload log data to DataHub in real time and archive the data into MaxCompute tables.
Note This is an open-source program. You can visit aliyun-maxcompute-data-collectors to view the source code. - Fluentd (DataHub)
Fluentd is open-source software that can collect logs such as application logs, system logs, and access logs from various data sources. Fluentd allows you to use plug-ins to filter log data and store data in data processors, such as MySQL, Oracle, MongoDB, Hadoop, and Treasure Data.
The DataHub plug-in of Fluentd allows you to upload log data to DataHub in real time and archive the data into MaxCompute tables.
- Logstash (DataHub)
Logstash is an open-source log collection and processing framework. The logstash-output-datahub plug-in allows you to import data to DataHub. You can use Logstash to collect and transmit data based on simple operations. You can use Logstash with MaxCompute or StreamCompute to create an all-in-one streaming data solution that provides all features from data collection to analysis.
The DataHub plug-in of Logstash allows you to upload log data to DataHub in real time and archive the data into MaxCompute tables.
- OGG (DataHub)
The DataHub plug-in of OGG allows you to synchronize incremental data in an Oracle database to DataHub in real time and archive the data into MaxCompute tables.
Note This is an open-source program. You can visit aliyun-maxcompute-data-collectors to view the source code.