Preparations for migrating network resource data to OSS - Data Online Migration

This topic describes the operations that you must perform before you migrate data.

Step 1: Upload list files

HTTP or HTTPS list files contain two types of files, including one manifest.json file and one or more example.csv.gz files. An example.csv.gz file is a compressed CSV list file. A single example.csv.gz file cannot exceed 50 MB in size. A manifest.json file is used to configure columns in each CSV file. You can upload the list files to Alibaba Cloud Object Storage Service (OSS) or Amazon Simple Storage Service (Amazon S3).

Create a CSV list file.

Create a CSV list file on your on-premises machine. A list file can contain up to eight columns separated by commas (,). Each line represents one file to be migrated. Separate multiple files with line feeds (\n). The following tables describe the columns.

Important

The Key and Url columns are required. Other columns are optional.

Required columns

Column

Required

Description

Notes

Url

Yes

The download URL of the file to be migrated. Data Online Migration uses HTTP GET requests to download files from the HTTP or HTTPS URLs and uses HTTP HEAD requests to obtain the metadata of the files.

Note

Make sure that the Url column can be accessed by using commands such as [curl --HEAD "$Url"] and [curl --GET "$Url"]. Data Online Migration does not support URL redirection.

The values of the Url and Key columns must be encoded. Otherwise, the migration may fail due to the special characters contained in the values.

Before you encode the value of the Url column, make sure that the URL can be accessed by using CLI tools that are not for redirection, such as curl. Then, perform URL encoding.
Before you encode the value of the Key column, make sure that you can obtain the required object name in OSS after the migration. Then, perform URL encoding.

Important

After you encode the values of the Url and Key columns, make sure that the following requirements are met. Otherwise, the migration may fail, or the source files are not migrated to the specified destination path.

A plus sign (+) in the original string is encoded as %2B.
A percent sign (%) in the original string is encoded as %25.
A comma (,) in the original string is encoded as %2C.

For example, if the original string is a+b%c,d.file, the encoded string is a%2Bb%25c%2Cd.file.

Key

Yes

The name of the file to be migrated. After a file is migrated, the name of the object that corresponds to the file consists of a prefix and the file name.

The following example shows a CSV file named plain_example.csv that contains unencoded URLs. The file has two columns: Url and Key. The Url column contains URLs that can be directly accessed by using the curl command. The Key column contains the OSS object names that correspond to the URLs. Example:

https://help-static-aliyun-doc.aliyuncs.com/assets/img/zh-CN/1354977961/p486238.jpg,assets/img/zh-CN/1354977961/p486238.jpg

Important

Do not use the built-in Notepad application in Windows to edit the manifest.json or plain_example.csv file. Notepad may add a special mark (0xefbbbf) to the first three bytes of these files, which may cause a parsing error in Data Online Migration. You can run the od -c plain_example.csv | less command in Linux or macOS to check whether the first three bytes of the plain_example.csv file contain the special mark. In Windows, we recommend that you use software such as Notepad++ or Visual Studio Code to create or edit files.

The following Python code provides an example on how to read the plain_example.csv file line by line, URL-encode the values in each line, and write the encoded results to the example.csv file. The code is for your reference only. You can modify the code based on your business requirements.

# -*- coding: utf-8 -*-
import sys

if sys.version_info.major == 3:
    from urllib.parse import quote_plus
else:
    from urllib import quote_plus
    reload(sys)
    sys.setdefaultencoding("utf-8")

# Source CSV file path.
src_path = "plain_example.csv"
# URL-encoded file path.
out_path = "example.csv"

# The sample CSV contains only two columns: url and key.
with open(src_path) as fin, open(out_path, "w") as fout:
    for line in fin:
        items = line.strip().split(",")
        url, key = items[0], items[1]
        enc_url = quote_plus(url.encode("utf-8"))
        enc_key = quote_plus(key.encode("utf-8"))
        # The enc_url and enc_key vars are encoded format.
        fout.write(enc_url + "," + enc_key + "\n")

The following code shows the encoded results in the example.csv file:

https%3A%2F%2Fhelp-static-aliyun-doc.aliyuncs.com%2Fassets%2Fimg%2Fzh-CN%2F1354977961%2Fp486238.jpg,assets%2Fimg%2Fzh-CN%2F1354977961%2Fp486238.jpg
https%3A%2F%2Fwww.example-fake1.com%2F%25E7%25BC%2596%25E7%25A0%2581%25E5%2590%258E%25E6%2589%258D%25E8%2583%25BD%25E8%25AE%25BF%25E9%2597%25AE%25E7%259A%2584url%2F123.png,%E7%BC%96%E7%A0%81%E5%90%8E%E6%89%8D%E8%83%BD%E8%AE%BF%E9%97%AE%E7%9A%84url%2F123.png
https%3A%2F%2Fwww.example-fake2.com%2F%E6%97%A0%E9%9C%80%E7%BC%96%E7%A0%81%E5%8D%B3%E5%8F%AF%E8%AE%BF%E9%97%AE%E7%9A%84url%2F123.png,%E6%97%A0%E9%9C%80%E7%BC%96%E7%A0%81%E5%8D%B3%E5%8F%AF%E8%AE%BF%E9%97%AE%E7%9A%84url%2F123.png
https%3A%2F%2Fwww.example-fake3.com%2F%E6%B1%89%E8%AF%AD%2F%E6%97%A5%E6%9C%AC%E8%AA%9E%E3%81%AB%E3%81%BB%E3%82%93%E3%81%94%2F%ED%95%9C%EA%B5%AD%EC%96%B4%2F123.png,%E6%B1%89%E8%AF%AD%2F%E6%97%A5%E6%9C%AC%E8%AA%9E%E3%81%AB%E3%81%BB%E3%82%93%E3%81%94%2F%ED%95%9C%EA%B5%AD%EC%96%B4%2F123.png

All columns

Column	Required	Description
Key	Yes	The name of the file to be migrated. After a file is migrated, the name of the object that corresponds to the file consists of a prefix and the file name.
Url	Yes	The download URL of the file to be migrated. Data Online Migration uses HTTP GET requests to download files from the HTTP or HTTPS URLs and uses HTTP HEAD requests to obtain the metadata of the files.
Size	No	The size of the file to be migrated. Unit: bytes.
StorageClass	No	The storage class of the file in the source bucket.
LastModifiedDate	No	The time when the file to be migrated was last modified.
ETag	No	The entity tag (ETag) of the file to be migrated.
HashAlg	No	The hash algorithm of the file to be migrated.
HashValue	No	The hash value of the file to be migrated.

Note

The order of the preceding columns varies in CSV files. You need to only ensure that the order of the columns in a CSV file is the same as that in the fileSchema column of the manifest.json file.

Compress the CSV list file.
Compress the CSV file into a CSV GZ file. The following examples show how to compress one or more CSV files:
- Compress a CSV file
  In this example, a file named example.csv resides in the dir directory. Run the following command to compress the file:
```
gzip -c example.csv > example.csv.gz
```
  Note
  If you run the preceding gzip command to compress a file, the source file is not retained. To retain both the compressed file and the source file, run the gzip -c Source file > Source file.gz command.
  The .csv.gz file is generated.
- Compress multiple CSV files
  In this example, the example1.csv, example2.csv, and example3.csv files reside in the dir directory. Run the following command to compress the files:
```
gzip -r dir
```
  Note
  The gzip command is not used to package a directory. It only separately compresses all files in the specified directory and does not retain the source files.
  The example1.csv.gz, example2.csv.gz, and example3.csv.gz files are generated in the dir directory.
Create a manifest.json file.
You can use the file to configure multiple CSV files. The following information shows the content of a manifest.json file:
- fileFormat: the format of the list file. Example: CSV.
- fileSchema: the columns in the CSV file. Pay attention to the order of columns.
- files:
  - key: the location of the CSV file in the source bucket.
  - MD5checksum: the MD5 value of the CSV file. The value is a hexadecimal MD5 string, which is not case-sensitive. Example: 91A76757B25C8BE78BC321DEEBA6A5AD. If you do not specify this parameter, the CSV file is not verified.
  - size: the size of the CSV file.
The following code provides an example:
```
{
    "fileFormat":"CSV",
    "fileSchema":"Url, Key",
    "files":[{
        "key":"dir/example1.csv.gz",
        "MD5checksum":"",
        "size":0
    },{
        "key":"dir/example2.csv.gz",
        "MD5checksum":"",
        "size":0
    }]
}
```
Upload the list files that you create to OSS or Amazon S3.
- Upload the list files to OSS. For more information, see Simple upload.
  Note
  - After a list file is uploaded to OSS, Data Online Migration downloads the list file and migrates the files based on the specified URLs.
  - When you create a migration task, specify the bucket in which a list file is stored. You must specify the path of the list file in the Directory in which the list file resides/manifest.json format. Example: dir/manifest.json.
- Upload the list files to Amazon S3.
  Note
  - After a list file is uploaded to Amazon S3, Data Online Migration downloads the list file and migrates the files based on the specified URLs.
  - When you create a migration task, specify the bucket in which a list file is stored. You must specify the path of the list file in the Directory in which the list file resides/manifest.json format. Example: dir/manifest.json.

Step 2: Create a destination bucket

Create an Object Storage Service (OSS) bucket as the destination to store the migrated data. For more information, see Create buckets.

Step 3: Create a RAM user and grant permissions to the RAM user

Important

The Resource Access Management (RAM) user is used to perform the data migration task. You must create RAM roles and perform migration as the RAM user. We recommend that you create the RAM user within the Alibaba Cloud account that owns the source or destination OSS bucket.
For more information, see Create a RAM user and grant permissions to the RAM user.

Log on to the RAM console with an Alibaba Cloud account. On the Users page, find the RAM user that you created and click Add Permissions in the Actions column.

System policy: Attach the AliyunOSSImportFullAccess policy to the RAM user.
Custom policy: Attach a custom policy that includes the ram:CreateRole, ram:CreatePolicy, ram:AttachPolicyToRole, and ram:ListRoles permissions to the RAM user.
For more information about how to attach a custom policy, see Create custom policies. The following sample code provides an example of the custom policy:
```
{
    "Version":"1",
    "Statement":[
        {
            "Effect":"Allow",
            "Action":[
                "ram:CreateRole",
                "ram:CreatePolicy",
                "ram:AttachPolicyToRole",
                "ram:ListRoles"
            ],
            "Resource":"*"
        }
    ]
}
```

Step 4: Grant permissions on the inventory storage bucket

Perform the corresponding operations based on whether the inventory storage bucket belongs to the current Alibaba Cloud account.

The inventory storage bucket belongs to the current Alibaba Cloud account

Automatic authorization
We recommend that you use automatic authorization in the Data Online Migration console. For more information, see the "Step 2: Create a source data address" section of the Migrate data topic.
Manual authorization
1. Grant permissions on the inventory storage bucket
On the Roles page, find the created RAM role and click Grant Permission in the Actions column.
- Custom policy: Attach a custom policy that includes the oss:List* and oss:Get* permissions to the RAM role.
For more information about how to attach a custom policy, see Create custom policies. The following sample code provides an example of the custom policy:
Note
The following policy is for reference only. Replace <myInvBucket> with the name of the inventory storage bucket.
For more information about RAM policies for OSS, see Common examples of RAM policies.
```
{
  "Version": "1",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "oss:List*",
        "oss:Get*"
      ],
      "Resource": [
        "acs:oss:*:*:<myInvBucket>",
        "acs:oss:*:*:<myInvBucket>/*"
      ]
    }
  ]
}
```

The inventory storage bucket does not belong to the current Alibaba Cloud account

1. Grant permissions on the inventory storage bucket

Log on to the OSS console with the Alibaba Cloud account that owns the inventory storage bucket.
In the left-side navigation pane, click Buckets. On the Buckets page, click the name of the inventory storage bucket.
In the left-side navigation pane, choose Permission Control > Bucket Policy.
On the Bucket Policy tab, click Add by Syntax. On the page that appears, click Edit, enter the custom bucket policy in the code editor, and then click Save.

Custom policy:
Grant the RAM role the permissions to query and read all resources in the inventory storage bucket.
Note
The following policy is for reference only. Replace <otherInvBucket> with the name of the inventory storage bucket, <myuid> with the ID of the Alibaba Cloud account that is used to log on to the Data Online Migration console, <otherUid> with the ID of the Alibaba Cloud account that owns the inventory storage bucket, and <roleName> with the name of the RAM role that you created. For more information about RAM policies for OSS, see Common examples of RAM policies.
```
{
  "Version": "1",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "oss:List*",
        "oss:Get*"
      ],
      "Principal": [
         "arn:sts::<myUid>:assumed-role/<roleName>/*"
      ],
      "Resource": [
        "acs:oss:*:<otherUid>:<otherInvBucket>",
        "acs:oss:*:<otherUid>:<othereInvBucket>/*"
      ]
    }
  ]
}
```

Step 5: Grant permissions on the destination bucket

Perform the corresponding operations based on whether the destination bucket belongs to the current Alibaba Cloud account.

The destination bucket belongs to the current Alibaba Cloud account

Automatic authorization
We recommend that you use automatic authorization in the Data Online Migration console. For more information, see the "Step 3: Create a destination data address" section of the Migrate data topic.
Manual authorization
1. Grant permissions on the destination bucket to the RAM role
On the Roles page, find the created RAM role and click Grant Permission in the Actions column.
- Custom policy: Attach a custom policy that includes the oss:List*, oss:Get*, oss:Put*, and oss:AbortMultipartUpload* permissions to the RAM role.
For more information about how to attach a custom policy, see Create custom policies. The following sample code provides an example of the custom policy:
Note
The following policy is for reference only. Replace <myDestBucket> with the name of the destination bucket.
For more information about RAM policies for OSS, see Common examples of RAM policies.
```
{
  "Version": "1",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "oss:List*",
        "oss:Get*",
        "oss:Put*",
        "oss:AbortMultipartUpload"
      ],
      "Resource": [
        "acs:oss:*:*:<myDestBucket>",
        "acs:oss:*:*:<myDestBucket>/*"
      ]
    }
  ]
}
```

The destination bucket does not belong to the current Alibaba Cloud account

1. Grant permissions on the destination bucket to the RAM role

Important

If you configure a bucket policy by specifying policy statements to grant the RAM role the required permissions, the new bucket policy overwrites the existing bucket policy. Make sure that the new bucket policy contains the content of the existing bucket policy. Otherwise, the authorization based on the existing bucket policy may fail.

Log on to the OSS console with the Alibaba Cloud account that owns the destination bucket.
In the left-side navigation pane, click Buckets. On the Buckets page, click the name of the destination bucket.
In the left-side pane of the bucket details page, choose Permission Control > Bucket Policy.
On the Bucket Policy tab, click Add by Syntax. On the page that appears, click Edit, enter the custom bucket policy in the code editor, and then click Save.
- Grant the RAM role the permissions to query, read, delete, and write all resources in the destination bucket.

Note

The following policy is for reference only. Replace <otherDestBucket> with the name of the destination bucket, <otherUid> with the ID of the Alibaba Cloud account that owns the destination bucket, <myUid> with the ID of the Alibaba Cloud account that is used to log on to the Data Online Migration console, and <roleName> with the name of the RAM role that you created. For more information about RAM policies for OSS, see Common examples of RAM policies.

{
  "Version": "1",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "oss:List*",
        "oss:Get*",
        "oss:Put*",
        "oss:AbortMultipartUpload"
      ],
      "Principal": [
         "arn:sts::<myUid>:assumed-role/<roleName>/*"
      ],
      "Resource": [
        "acs:oss:*:<otherUid>:<otherDestBucket>",
        "acs:oss:*:<otherUid>:<otherDestBucket>/*"
      ]
    }
  ]
}

Step 1: Upload list files

Step 2: Create a destination bucket

Step 3: Create a RAM user and grant permissions to the RAM user

Step 4: Grant permissions on the inventory storage bucket

The inventory storage bucket belongs to the current Alibaba Cloud account

Automatic authorization

Manual authorization

1. Grant permissions on the inventory storage bucket

The inventory storage bucket does not belong to the current Alibaba Cloud account

1. Grant permissions on the inventory storage bucket

Step 5: Grant permissions on the destination bucket

The destination bucket belongs to the current Alibaba Cloud account

Automatic authorization

Manual authorization

1. Grant permissions on the destination bucket to the RAM role

The destination bucket does not belong to the current Alibaba Cloud account

1. Grant permissions on the destination bucket to the RAM role