How to Migrate Data from an Amazon ES Domain to an Alibaba Cloud Elasticsearch Cluster

Released by ELK Geek

In China's cloud service market, Alibaba Cloud has become popular among developers due to its convenience and stability. This article is intended for customers who want to migrate data from an Amazon Elasticsearch Service (Amazon ES) domain to an Alibaba Cloud Elasticsearch cluster. The following figure shows the reference architecture for the migration.

Introduction to Migration

Terms

Elasticsearch: This is a distributed RESTful search and analysis engine designed for a wide range of scenarios. As the core of Elastic Stack, Elasticsearch stores data in a centralized manner and helps to search for expected and unexpected data.

Kibana: This visualizes Elasticsearch data and provides a user interface for managing Elastic Stack.

Amazon ES: This is a fully managed service that offers easy-to-use Elasticsearch API operations and real-time analytics capabilities. This service also provides the availability, scalability, and security required for production workloads. You may use Amazon ES to deploy, protect, manage, and scale Elasticsearch clusters for scenarios such as log analysis, full-text search, and application monitoring.

Alibaba Cloud Elasticsearch: This service is not yet available on the international site, so this article discusses the service provided on the China site.

Snapshot and Restore: Store snapshots of individual indexes or an entire cluster in a remote repository, such as a shared file system like Amazon Simple Storage Service (Amazon S3) or Hadoop Distributed File System (HDFS). The snapshots are used to restore data quickly; however, data can be restored only to Elasticsearch clusters of specific versions:

1) Data in a snapshot created in an Elasticsearch 5.x cluster can be restored to an Elasticsearch 6.x cluster.
2) Data in a snapshot created in an Elasticsearch 2.x cluster can be restored to an Elasticsearch 5.x cluster.
3) Data in a snapshot created in an Elasticsearch 1.x cluster can be restored to an Elasticsearch 2.x cluster.

Note: Data in a snapshot created in an Elasticsearch 1.x cluster cannot be restored to an Elasticsearch 5.x or 6.x cluster. Data in a snapshot created in an Elasticsearch 2.x cluster cannot be restored to an Elasticsearch 6.x cluster. Snapshots are incremental and contain indexes that are created in multiple versions of Elasticsearch. If any indexes in a snapshot are created in an incompatible Elasticsearch version, the snapshot cannot be restored.

Migration Plan

The procedure to migrate data to an Alibaba Cloud Elasticsearch cluster is as follows:

1) Create a Baseline Index

1) Create a snapshot repository.

2) Create the first full snapshot for the index data to be migrated. This snapshot is automatically stored in the S3 bucket.

3) Create an Object Storage Service (OSS) bucket in Alibaba Cloud and register it with the snapshot repository of Alibaba Cloud Elasticsearch cluster.

4) Use OSSImport to transfer the full snapshot from the S3 bucket to the OSS bucket.

5) Restore data from the full snapshot to your Alibaba Cloud Elasticsearch cluster.

2) Process Incremental Snapshots on a Regular Basis

Repeat the preceding steps to restore data from incremental snapshots.

3) Identify the Final Snapshot and Switch the Service

1) Stop services that may modify index data.

2) Create the final snapshot for your Amazon ES domain.

3) Transfer the final snapshot to your OSS bucket. Then, restore data from the snapshot to your Alibaba Cloud Elasticsearch cluster.

4) Switch to the cluster.

Prerequisites

Elasticsearch Service

Create an Amazon ES 5.5.2 domain in the Singapore region.
Create an Alibaba Cloud Elasticsearch v5.5.3 cluster in the China (Hangzhou) region.
The sample index named "movies".

Prerequisites for Creating Manual Snapshots in an Amazon ES Domain

Amazon ES automatically creates snapshots for the primary index shards in a domain every day and stores them in a pre-configured S3 bucket. These snapshots are retained for a maximum of 14 days without additional charge. Use these snapshots to restore data to the domain.

However, these cannot be used to migrate data to other domains. Automatic snapshots can only be read from the specified domain. To migrate data, use manual snapshots stored in the S3 bucket. Standard S3 charges apply to manual snapshots.

To create manual snapshots and restore data from the snapshots, use AWS Identity and Access Management (IAM) and S3. Before creating snapshots, perform the operations listed in the following table.

Operation	Description
Create an S3 bucket	The bucket stores manual snapshots of your Amazon ES domain.
Create an IAM role	The role is used to grant permissions on Amazon ES. When you add a trust relationship for the role, you must specify Amazon ES in the Principal element. This role is also required when you register a snapshot repository with Amazon ES. Only IAM users assigned this role can register the snapshot repository.
Create an IAM policy	This policy specifies the actions that S3 can perform on your S3 bucket. The policy must be attached to the IAM role that is used to grant permissions on Amazon ES. You must specify your S3 bucket in the Resource element of the policy.

Create an S3 Bucket

An S3 bucket is required to store manual snapshots. Record its Amazon Resource Name (ARN). The ARN is used by the following items:

1) The resource element of the IAM policy that is attached to the specific IAM role

2) The Python client that is used to register a snapshot repository

The following example shows the ARN of an S3 bucket:

 arn:aws:s3:::eric-es-index-backups

Create an IAM Role

An IAM role is a must for which Amazon ES (es.amazonaws.com) is specified in the Service element in its trust relationship.

Example:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "",
      "Effect": "Allow",
      "Principal": {
        "Service": "es.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}

View the trust relationship details in the AWS IAM console.

While creating a role in the IAM console, Amazon ES is not included in the "Select role type" drop-down list. Select Amazon EC2 from the drop-down list and create the role as prompted. Then, change ec2.amazonaws.com in the trust relationship of the role to es.amazonaws.com.

Create an IAM Policy

Attach an IAM policy to the IAM role. The policy specifies the S3 bucket used to store the manual snapshots of your Amazon ES domain. The following example specifies the ARN of the eric-es-index-backups bucket:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Action": [
                "s3:ListBucket"
            ],
            "Effect": "Allow",
            "Resource": [
                "arn:aws:s3:::eric-es-index-backups"
            ]
        },
        {
            "Action": [
                "s3:GetObject",
                "s3:PutObject",
                "s3:DeleteObject"
            ],
            "Effect": "Allow",
            "Resource": [
                "arn:aws:s3:::eric-es-index-backups/*"
            ]
        }
    ]
}

Copy the policy content to the "Edit Policy" section.

Click "Policy Summary" to check whether the policy is correct.

Attach an IAM policy to an IAM role.

Register a Manual Snapshot Repository

Create manual snapshots only after registering a snapshot repository with Amazon ES. Before creating manual snapshots, sign an AWS request to the user or role specified in the trust relationship of the IAM role.

You cannot run a curl command to register a snapshot repository because this command does not support AWS request signing. Use the sample Python client to register a snapshot repository.

1) Modify the Sample Python Client File

Download the sample Python client file and change the values highlighted in yellow in the file based on actual conditions. Then, copy the content into a Python file named snapshot.py.

The following table describes the variables in the sample Python client file.

Variable	Description
region	The AWS region where the snapshot repository is created.
host	The endpoint of your Amazon ES domain.
aws_access_key_id	The ID of your IAM credential.
aws_secret_access_key	The key of your IAM credential.
path	The name of the snapshot repository.
data:bucket;region;role_arn	This value must include the name and ARN of the S3 bucket for the IAM role that you created in Prerequisites for Creating Manual Snapshots in an Amazon ES Domain. If you want to enable server-side encryption with S3-managed keys for the snapshot repository, add "server_side_encryption": true to the settings JSON array. If the S3 bucket resides in the us-east-1 region, replace "region":"us-east-1" with "endpoint":"s3.amazonaws.com".

2) Install Amazon Web Services Library boto-2.48.0

The sample Python client requires installing the boto package of version 2.x on the computer where the snapshot repository is registered.

# wget https://pypi.python.org/packages/66/e7/fe1db6a5ed53831b53b8a6695a8f134a58833cadb5f2740802bc3730ac15/boto-2.48.0.tar.gz#md5=ce4589dd9c1d7f5d347363223ae1b970 
# tar zxvf boto-2.48.0.tar.gz
# cd boto-2.48.0
# python setup.py install

3) Run the Python Client to Register the Snapshot Repository

# pyth
on snapshot.py

Log on to the Kibana console of the AWS ES domain. In the left-side navigation pane, click "Dev Tools". On the "Console" tab, run the following command to view the registration result:

GET _snapshot

Create the First Snapshot and Restore Data from the Snapshot

1) Create a Snapshot in the Amazon ES Domain

Run the following commands in the Kibana console or by executing curl commands in the Linux or Mac OS X command-line interface (CLI).

Create a snapshot named snapshot_movies_1 for the movies index in the eric-snapshot-repository snapshot repository.

PUT _snapshot/eric-snapshot-repository/snapshot_movies_1
{
"indexes": "movies"
}

View the snapshot status.

GET _snapshot/ eric-snapshot-repository/snapshot_movies_1

In the S3 console, view snapshot objects.

2) Transfer the Created Snapshot From S3 Bucket to OSS Bucket

In this step, pull snapshot data from the AWS S3 bucket to Alibaba Cloud OSS bucket. For more information, see Migrate data from Amazon S3 to Alibaba Cloud OSS.

After the snapshot is transferred, view the snapshot in the OSS console.

3) Restore Data From the Snapshot to Alibaba Cloud Elasticsearch Cluster

Create a Snapshot Repository

Log on to the Kibana console of the Elasticsearch cluster. In the left-side navigation pane, click "Dev Tools". On the "Console" tab, run the following command to create a snapshot repository. The name of the snapshot repository must be the same as that of the snapshot repository registered with Amazon ES. Enter the actual values according to the parameter description.

PUT _snapshot/eric-snapshot-repository
{
"type": "oss",
"settings": {
            "endpoint": "http://oss-cn-hangzhou-internal.aliyuncs.com",  
            "access_key_id": "Put your AccessKey id here.",
            "secret_access_key": "Put your secret AccessKey here.",
            "bucket": "eric-oss-aws-es-snapshot-s3",
            "compress": true
      }
}

View the status of the snapshot named snapshot_movies_1.

GET _snapshot/eric-snapshot-repository/snapshot_movies_1

Note: Record the start time and end time of the snapshot creation operation. This record is used while using OssImport to migrate data in incremental snapshots.

Example:

"start_time_in_millis": 1519786844591
"end_time_in_millis": 1519786846236

4) Restore Data From the Snapshot

Log on to the Kibana console of the Elasticsearch cluster. In the left-side navigation pane, click "Dev Tools".

POST _snapshot/eric-snapshot-repository/snapshot_movies_1/_restore
{
    "indexes": "movies"
}
GET movies/_recovery

On the "Console" tab, run the following command to view the availability of the movies index. View three sets of data in the movies index. In addition, the data is the same as that in the Amazon ES domain.

Create the Final Snapshot and Restore Data from the Snapshot

1) Insert Data to the Movies Index in Amazon ES Domain

The movies index contains three sets of data. Insert two other sets of data.

Run the GET movies/_count command to view the data volume in the index.

2) Create a Snapshot

For more information, see step 1 in the "Create the First Snapshot and Restore Data from the Snapshot" section.

View objects in the S3 bucket.

Also, note the differences in the index folder.

3) Transfer the Snapshot From S3 Bucket to OSS Bucket

Use OSSImport to transfer the snapshot from the S3 bucket to the OSS bucket. The S3 bucket stores two snapshot objects. Change the value of the isSkipExistFile variable in the local_job.cfg file to migrate the incremental snapshot object.

Variable	Description	Setting
isSkipExistFile	Indicates whether existing objects are skipped during data migration. The value of this variable is of the Boolean type.	If you set it to true, objects are skipped based on the size and LastModifiedTime settings. If you set it to false, existing objects are overwritten. The default value is false. If jobType is set to audit, this variable is invalid.

Then, view the incremental snapshot object in the OSS bucket.

Alibaba Cloud OSS bucket:

AWS S3 bucket:

4) Restore Data from the Snapshot

For more information, see step 4 in the "Create the First Snapshot and Restore Data from the Snapshot" section. Before restoring data, close the movies index. After restoration, open the index.

POST /movies/_close
GET movies/_stats
POST _snapshot/eric-snapshot-repository/snapshot_movies_2/_restore
{
    "indexes": "movies"
}
POST /movies/_open

After data is restored from the snapshot, there will be five documents in the movies index of the Elasticsearch cluster. This number is the same as that in the index of the Amazon ES domain.

Summary

Use the snapshot and restore feature to migrate data from an Amazon ES domain to an Alibaba Cloud Elasticsearch cluster. This feature requires closing the index to be migrated to avoid requests and write operations during the migration.