All Products
Search
Document Center

Data Online Migration:Preparations

Last Updated:Nov 05, 2024

This topic describes the operations that you must perform before you migrate data.

Step 1: Upload list files

HTTP or HTTPS list files contain two types of files, including one manifest.json file and one or more example.csv.gz files. An example.csv.gz file is a compressed CSV list file. A single example.csv.gz file cannot exceed 50 MB in size. A manifest.json file is used to configure columns in each CSV file. You can upload the list files to Alibaba Cloud Object Storage Service (OSS) or Amazon Simple Storage Service (Amazon S3).

  1. Create a CSV list file.

    Create a CSV list file on your on-premises machine. A list file can contain up to eight columns separated by commas (,). Each line represents one file to be migrated. Separate multiple files with line feeds (\n). The following tables describe the columns.

    Important

    The Key and Url columns are required. Other columns are optional.

    • Required columns

      Column

      Required

      Description

      Notes

      Url

      Yes

      The download URL of the file to be migrated. Data Online Migration uses HTTP GET requests to download files from the HTTP or HTTPS URLs and uses HTTP HEAD requests to obtain the metadata of the files.

      Note

      Make sure that the Url column can be accessed by using commands such as [curl --HEAD "$Url"] and [curl --GET "$Url"]. Data Online Migration does not support URL redirection.

      The values of the Url and Key columns must be encoded. Otherwise, the migration may fail due to the special characters contained in the values.

      • Before you encode the value of the Url column, make sure that the URL can be accessed by using CLI tools that are not for redirection, such as curl. Then, perform URL encoding.

      • Before you encode the value of the Key column, make sure that you can obtain the required object name in OSS after the migration. Then, perform URL encoding.

      Important

      After you encode the values of the Url and Key columns, make sure that the following requirements are met. Otherwise, the migration may fail, or the source files are not migrated to the specified destination path.

      • A plus sign (+) in the original string is encoded as %2B.

      • A percent sign (%) in the original string is encoded as %25.

      • A comma (,) in the original string is encoded as %2C.

      For example, if the original string is a+b%c,d.file, the encoded string is a%2Bb%25c%2Cd.file.

      Key

      Yes

      The name of the file to be migrated. After a file is migrated, the name of the object that corresponds to the file consists of a prefix and the file name.

      The following example shows a CSV file named plain_example.csv that contains unencoded URLs. The file has two columns: Url and Key. The Url column contains URLs that can be directly accessed by using the curl command. The Key column contains the OSS object names that correspond to the URLs. Example:

      https://help-static-aliyun-doc.aliyuncs.com/assets/img/zh-CN/1354977961/p486238.jpg,assets/img/zh-CN/1354977961/p486238.jpg
       
       
       
      Important

      Do not use the built-in Notepad application in Windows to edit the manifest.json or plain_example.csv file. Notepad may add a special mark (0xefbbbf) to the first three bytes of these files, which may cause a parsing error in Data Online Migration. You can run the od -c plain_example.csv | less command in Linux or macOS to check whether the first three bytes of the plain_example.csv file contain the special mark. In Windows, we recommend that you use software such as Notepad++ or Visual Studio Code to create or edit files.

      The following Python code provides an example on how to read the plain_example.csv file line by line, URL-encode the values in each line, and write the encoded results to the example.csv file. The code is for your reference only. You can modify the code based on your business requirements.

      # -*- coding: utf-8 -*-
      import sys
      
      if sys.version_info.major == 3:
          from urllib.parse import quote_plus
      else:
          from urllib import quote_plus
          reload(sys)
          sys.setdefaultencoding("utf-8")
      
      # Source CSV file path.
      src_path = "plain_example.csv"
      # URL-encoded file path.
      out_path = "example.csv"
      
      # The sample CSV contains only two columns: url and key.
      with open(src_path) as fin, open(out_path, "w") as fout:
          for line in fin:
              items = line.strip().split(",")
              url, key = items[0], items[1]
              enc_url = quote_plus(url.encode("utf-8"))
              enc_key = quote_plus(key.encode("utf-8"))
              # The enc_url and enc_key vars are encoded format.
              fout.write(enc_url + "," + enc_key + "\n")
      

      The following code shows the encoded results in the example.csv file:

      https%3A%2F%2Fhelp-static-aliyun-doc.aliyuncs.com%2Fassets%2Fimg%2Fzh-CN%2F1354977961%2Fp486238.jpg,assets%2Fimg%2Fzh-CN%2F1354977961%2Fp486238.jpg
      https%3A%2F%2Fwww.example-fake1.com%2F%25E7%25BC%2596%25E7%25A0%2581%25E5%2590%258E%25E6%2589%258D%25E8%2583%25BD%25E8%25AE%25BF%25E9%2597%25AE%25E7%259A%2584url%2F123.png,%E7%BC%96%E7%A0%81%E5%90%8E%E6%89%8D%E8%83%BD%E8%AE%BF%E9%97%AE%E7%9A%84url%2F123.png
      https%3A%2F%2Fwww.example-fake2.com%2F%E6%97%A0%E9%9C%80%E7%BC%96%E7%A0%81%E5%8D%B3%E5%8F%AF%E8%AE%BF%E9%97%AE%E7%9A%84url%2F123.png,%E6%97%A0%E9%9C%80%E7%BC%96%E7%A0%81%E5%8D%B3%E5%8F%AF%E8%AE%BF%E9%97%AE%E7%9A%84url%2F123.png
      https%3A%2F%2Fwww.example-fake3.com%2F%E6%B1%89%E8%AF%AD%2F%E6%97%A5%E6%9C%AC%E8%AA%9E%E3%81%AB%E3%81%BB%E3%82%93%E3%81%94%2F%ED%95%9C%EA%B5%AD%EC%96%B4%2F123.png,%E6%B1%89%E8%AF%AD%2F%E6%97%A5%E6%9C%AC%E8%AA%9E%E3%81%AB%E3%81%BB%E3%82%93%E3%81%94%2F%ED%95%9C%EA%B5%AD%EC%96%B4%2F123.png

    • All columns

      Column

      Required

      Description

      Key

      Yes

      The name of the file to be migrated. After a file is migrated, the name of the object that corresponds to the file consists of a prefix and the file name.

      Url

      Yes

      The download URL of the file to be migrated. Data Online Migration uses HTTP GET requests to download files from the HTTP or HTTPS URLs and uses HTTP HEAD requests to obtain the metadata of the files.

      Size

      No

      The size of the file to be migrated. Unit: bytes.

      StorageClass

      No

      The storage class of the file in the source bucket.

      LastModifiedDate

      No

      The time when the file to be migrated was last modified.

      ETag

      No

      The entity tag (ETag) of the file to be migrated.

      HashAlg

      No

      The hash algorithm of the file to be migrated.

      HashValue

      No

      The hash value of the file to be migrated.

      Note

      The order of the preceding columns varies in CSV files. You need to only ensure that the order of the columns in a CSV file is the same as that in the fileSchema column of the manifest.json file.

  2. Compress the CSV list file.

    Compress the CSV file into a CSV GZ file. The following examples show how to compress one or more CSV files:

    • Compress a CSV file

      In this example, a file named example.csv resides in the dir directory. Run the following command to compress the file:

      gzip -c example.csv > example.csv.gz
      Note

      If you run the preceding gzip command to compress a file, the source file is not retained. To retain both the compressed file and the source file, run the gzip -c Source file > Source file.gz command.

      The .csv.gz file is generated.

    • Compress multiple CSV files

      In this example, the example1.csv, example2.csv, and example3.csv files reside in the dir directory. Run the following command to compress the files:

      gzip -r dir
      Note

      The gzip command is not used to package a directory. It only separately compresses all files in the specified directory and does not retain the source files.

      The example1.csv.gz, example2.csv.gz, and example3.csv.gz files are generated in the dir directory.

  3. Create a manifest.json file.

    You can use the file to configure multiple CSV files. The following information shows the content of a manifest.json file:

    • fileFormat: the format of the list file. Example: CSV.

    • fileSchema: the columns in the CSV file. Pay attention to the order of columns.

    • files:

      • key: the location of the CSV file in the source bucket.

      • MD5checksum: the MD5 value of the CSV file. The value is a hexadecimal MD5 string, which is not case-sensitive. Example: 91A76757B25C8BE78BC321DEEBA6A5AD. If you do not specify this parameter, the CSV file is not verified.

      • size: the size of the CSV file.

    The following code provides an example:

    {
        "fileFormat":"CSV",
        "fileSchema":"Url, Key",
        "files":[{
            "key":"dir/example1.csv.gz",
            "MD5checksum":"",
            "size":0
        },{
            "key":"dir/example2.csv.gz",
            "MD5checksum":"",
            "size":0
        }]
    }
  4. Upload the list files that you create to OSS or Amazon S3.

    • Upload the list files to OSS. For more information, see Simple upload.

      Note
      • After a list file is uploaded to OSS, Data Online Migration downloads the list file and migrates the files based on the specified URLs.

      • When you create a migration task, specify the bucket in which a list file is stored. You must specify the path of the list file in the Directory in which the list file resides/manifest.json format. Example: dir/manifest.json.

    • Upload the list files to Amazon S3.

      Note
      • After a list file is uploaded to Amazon S3, Data Online Migration downloads the list file and migrates the files based on the specified URLs.

      • When you create a migration task, specify the bucket in which a list file is stored. You must specify the path of the list file in the Directory in which the list file resides/manifest.json format. Example: dir/manifest.json.

Step 2: Create a destination bucket

Create an Object Storage Service (OSS) bucket as the destination to store the migrated data. For more information, see Create buckets.

Step 3: Create a RAM user and grant permissions to the RAM user

Important
  • The Resource Access Management (RAM) user is used to perform the data migration task. You must create RAM roles and perform migration as the RAM user. We recommend that you create the RAM user within the Alibaba Cloud account that owns the source or destination OSS bucket.

  • For more information, see Create a RAM user and grant permissions to the RAM user.

Log on to the RAM console with an Alibaba Cloud account. On the Users page, find the RAM user that you created and click Add Permissions in the Actions column.

  1. System policy: Attach the AliyunOSSImportFullAccess policy to the RAM user.

  2. Custom policy: Attach a custom policy that includes the ram:CreateRole, ram:CreatePolicy, ram:AttachPolicyToRole, and ram:ListRoles permissions to the RAM user.

    For more information about how to attach a custom policy, see Create custom policies. The following sample code provides an example of the custom policy:

    {
        "Version":"1",
        "Statement":[
            {
                "Effect":"Allow",
                "Action":[
                    "ram:CreateRole",
                    "ram:CreatePolicy",
                    "ram:AttachPolicyToRole",
                    "ram:ListRoles"
                ],
                "Resource":"*"
            }
        ]
    }

Step 4: Grant permissions on the inventory storage bucket

Perform the corresponding operations based on whether the inventory storage bucket belongs to the current Alibaba Cloud account.

The inventory storage bucket belongs to the current Alibaba Cloud account

  • Automatic authorization

    We recommend that you use automatic authorization in the Data Online Migration console. For more information, see the "Step 2: Create a source data address" section of the Migrate data topic.

  • Manual authorization

    1. Grant permissions on the inventory storage bucket

    On the Roles page, find the created RAM role and click Grant Permission in the Actions column.

    • Custom policy: Attach a custom policy that includes the oss:List* and oss:Get* permissions to the RAM role.

    For more information about how to attach a custom policy, see Create custom policies. The following sample code provides an example of the custom policy:

    Note

    The following policy is for reference only. Replace <myInvBucket> with the name of the inventory storage bucket.

    For more information about RAM policies for OSS, see Common examples of RAM policies.

    {
      "Version": "1",
      "Statement": [
        {
          "Effect": "Allow",
          "Action": [
            "oss:List*",
            "oss:Get*"
          ],
          "Resource": [
            "acs:oss:*:*:<myInvBucket>",
            "acs:oss:*:*:<myInvBucket>/*"
          ]
        }
      ]
    }

The inventory storage bucket does not belong to the current Alibaba Cloud account

1. Grant permissions on the inventory storage bucket

  1. Log on to the OSS console with the Alibaba Cloud account that owns the inventory storage bucket.

  2. In the left-side navigation pane, click Buckets. On the Buckets page, click the name of the inventory storage bucket.

  3. In the left-side navigation pane, choose Permission Control > Bucket Policy.

  4. On the Bucket Policy tab, click Add by Syntax. On the page that appears, click Edit, enter the custom bucket policy in the code editor, and then click Save.

  • Custom policy:

    Grant the RAM role the permissions to query and read all resources in the inventory storage bucket.

    Note

    The following policy is for reference only. Replace <otherInvBucket> with the name of the inventory storage bucket, <myuid> with the ID of the Alibaba Cloud account that is used to log on to the Data Online Migration console, <otherUid> with the ID of the Alibaba Cloud account that owns the inventory storage bucket, and <roleName> with the name of the RAM role that you created. For more information about RAM policies for OSS, see Common examples of RAM policies.

    {
      "Version": "1",
      "Statement": [
        {
          "Effect": "Allow",
          "Action": [
            "oss:List*",
            "oss:Get*"
          ],
          "Principal": [
             "arn:sts::<myUid>:assumed-role/<roleName>/*"
          ],
          "Resource": [
            "acs:oss:*:<otherUid>:<otherInvBucket>",
            "acs:oss:*:<otherUid>:<othereInvBucket>/*"
          ]
        }
      ]
    }

Step 5: Grant permissions on the destination bucket

Perform the corresponding operations based on whether the destination bucket belongs to the current Alibaba Cloud account.

The destination bucket belongs to the current Alibaba Cloud account

  • Automatic authorization

    We recommend that you use automatic authorization in the Data Online Migration console. For more information, see the "Step 3: Create a destination data address" section of the Migrate data topic.

  • Manual authorization

    1. Grant permissions on the destination bucket to the RAM role

    On the Roles page, find the created RAM role and click Grant Permission in the Actions column.

    • Custom policy: Attach a custom policy that includes the oss:List*, oss:Get*, oss:Put*, and oss:AbortMultipartUpload* permissions to the RAM role.

    For more information about how to attach a custom policy, see Create custom policies. The following sample code provides an example of the custom policy:

    Note

    The following policy is for reference only. Replace <myDestBucket> with the name of the destination bucket.

    For more information about RAM policies for OSS, see Common examples of RAM policies.

    {
      "Version": "1",
      "Statement": [
        {
          "Effect": "Allow",
          "Action": [
            "oss:List*",
            "oss:Get*",
            "oss:Put*",
            "oss:AbortMultipartUpload"
          ],
          "Resource": [
            "acs:oss:*:*:<myDestBucket>",
            "acs:oss:*:*:<myDestBucket>/*"
          ]
        }
      ]
    }

The destination bucket does not belong to the current Alibaba Cloud account

1. Grant permissions on the destination bucket to the RAM role

Important

If you configure a bucket policy by specifying policy statements to grant the RAM role the required permissions, the new bucket policy overwrites the existing bucket policy. Make sure that the new bucket policy contains the content of the existing bucket policy. Otherwise, the authorization based on the existing bucket policy may fail.

  1. Log on to the OSS console with the Alibaba Cloud account that owns the destination bucket.

  2. In the left-side navigation pane, click Buckets. On the Buckets page, click the name of the destination bucket.

  3. In the left-side pane of the bucket details page, choose Permission Control > Bucket Policy.

  4. On the Bucket Policy tab, click Add by Syntax. On the page that appears, click Edit, enter the custom bucket policy in the code editor, and then click Save.

    • Grant the RAM role the permissions to query, read, delete, and write all resources in the destination bucket.

Note

The following policy is for reference only. Replace <otherDestBucket> with the name of the destination bucket, <otherUid> with the ID of the Alibaba Cloud account that owns the destination bucket, <myUid> with the ID of the Alibaba Cloud account that is used to log on to the Data Online Migration console, and <roleName> with the name of the RAM role that you created. For more information about RAM policies for OSS, see Common examples of RAM policies.

{
  "Version": "1",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "oss:List*",
        "oss:Get*",
        "oss:Put*",
        "oss:AbortMultipartUpload"
      ],
      "Principal": [
         "arn:sts::<myUid>:assumed-role/<roleName>/*"
      ],
      "Resource": [
        "acs:oss:*:<otherUid>:<otherDestBucket>",
        "acs:oss:*:<otherUid>:<otherDestBucket>/*"
      ]
    }
  ]
}