This topic describes how to migrate data to Object Storage Service (OSS) or OSS-HDFS.
Migrate data to OSS
You can migrate data from local devices, third-party storage devices, or a source bucket to a destination bucket. The following table describes the methods that you can use to migrate data to OSS.
Migration method | Description | References |
Data Online Migration | You can use Data Online Migration to migrate data from third-party data storage devices to OSS. You do not need to set up an environment for migration tasks. You can submit a migration job online and monitor the migration process. | |
ossimport | Migrate historical data from various sources to OSS in batches, including historical data from local storage devices, Qiniu Cloud Object Storage (KODO), Baidu Object Storage (BOS), Amazon Simple Storage Service (Amazon S3), Azure Blob, UPYUN Storage Service (USS), Tencent Cloud Object Service (COS), Kingsoft Standard Storage Service (KS3), HTTP, and OSS. You can specify additional sources based on your business requirements. | |
ossutil | Migrate large amounts of historical data from various sources to OSS in batches. | |
Mirroring-based back-to-origin | Seamlessly migrate data from origins to OSS. You can migrate your business from origins or other cloud services to OSS without service interruption. After ossimport migrates historical data to OSS and the business runs in OSS, if requested data does not exist in OSS, mirroring-based back-to-origin is triggered to retrieve the requested data from the origins and download the data to OSS. For example, you can use mirroring-based back-to-origin rules to migrate business from a self-managed origin or from another cloud service to OSS without service interruption. You can use mirroring-based back-to-origin rules during migration to obtain data that is not migrated to OSS. This ensures business continuity. | |
Data replication | Use OSS data replication features to replicate objects across buckets within the same region or across regions within the same account or across accounts. | |
Data Transport | Migrate terabytes to petabytes of data from a local data center to OSS. | |
OSS API operations or OSS SDKs | Call OSS API operations or use OSS SDKs to programmatically migrate data to OSS. This migration method is especially suitable for developers. | |
OSS external tables (gpossext) | Use the OSS external table (gpossext) feature of AnalyticDB for PostgreSQL to import data from or export data to OSS tables. | |
Jindo DistCp | Copy files within or between large-scale clusters. Jindo DistCp uses MapReduce to distribute files, handle errors, and restore data. The lists of files and directories are used as the input of the MapReduce tasks. Each task copies specific files and directories from the input list. |
Migrate data to OSS-HDFS
OSS-HDFS (JindoFS) is a cloud-native data lake storage service. OSS-HDFS provides unified metadata management capabilities and is fully compatible with the Hadoop Distributed File System (HDFS) API. OSS-HDFS also supports Portable Operating System Interface (POSIX). OSS-HDFS allows you to manage data in data lake-based computing scenarios in the big data and AI fields. You can migrate data to OSS-HDFS or between buckets for which OSS-HDFS is enabled. The following table describes the methods that you can use to migrate data to OSS-HDFS.
Migration method | Description | References |
Jindo DistCp | Copy files within or between large-scale clusters. Jindo DistCp uses MapReduce to distribute files, handle errors, and restore data. The lists of files and directories are used as the input of the MapReduce tasks. Each task copies specific files and directories from the input list. | |
JindoDistJob | Migrate full or incremental metadata of files from a semi-hosted JindoFS cluster to OSS-HDFS without the need to copy data blocks. | |
MoveTo command of JindoTable | Automatically update metadata after the command copies the underlying data. This way, data in a table or partitions can be fully migrated to the destination path. If you want to migrate data in a large number of partitions at the same time, you can specify filter conditions in the MoveTo command. JindoTable also provides protective measures to ensure data integrity and security when the MoveTo command is used to migrate data. | Use the JindoTable MoveTo command to migrate Hive tables and partitions to OSS-HDFS |