Released by ELK Geek
In China's cloud service market, Alibaba Cloud has become popular among developers due to its convenience and stability. This article is intended for customers who want to migrate data from an Amazon Elasticsearch Service (Amazon ES) domain to an Alibaba Cloud Elasticsearch cluster. The following figure shows the reference architecture for the migration.
Elasticsearch: This is a distributed RESTful search and analysis engine designed for a wide range of scenarios. As the core of Elastic Stack, Elasticsearch stores data in a centralized manner and helps to search for expected and unexpected data.
Kibana: This visualizes Elasticsearch data and provides a user interface for managing Elastic Stack.
Amazon ES: This is a fully managed service that offers easy-to-use Elasticsearch API operations and real-time analytics capabilities. This service also provides the availability, scalability, and security required for production workloads. You may use Amazon ES to deploy, protect, manage, and scale Elasticsearch clusters for scenarios such as log analysis, full-text search, and application monitoring.
Alibaba Cloud Elasticsearch: This service is not yet available on the international site, so this article discusses the service provided on the China site.
Snapshot and Restore: Store snapshots of individual indexes or an entire cluster in a remote repository, such as a shared file system like Amazon Simple Storage Service (Amazon S3) or Hadoop Distributed File System (HDFS). The snapshots are used to restore data quickly; however, data can be restored only to Elasticsearch clusters of specific versions:
1) Data in a snapshot created in an Elasticsearch 5.x cluster can be restored to an Elasticsearch 6.x cluster.
2) Data in a snapshot created in an Elasticsearch 2.x cluster can be restored to an Elasticsearch 5.x cluster.
3) Data in a snapshot created in an Elasticsearch 1.x cluster can be restored to an Elasticsearch 2.x cluster.
Note: Data in a snapshot created in an Elasticsearch 1.x cluster cannot be restored to an Elasticsearch 5.x or 6.x cluster. Data in a snapshot created in an Elasticsearch 2.x cluster cannot be restored to an Elasticsearch 6.x cluster. Snapshots are incremental and contain indexes that are created in multiple versions of Elasticsearch. If any indexes in a snapshot are created in an incompatible Elasticsearch version, the snapshot cannot be restored.
The procedure to migrate data to an Alibaba Cloud Elasticsearch cluster is as follows:
1) Create a snapshot repository.
2) Create the first full snapshot for the index data to be migrated. This snapshot is automatically stored in the S3 bucket.
3) Create an Object Storage Service (OSS) bucket in Alibaba Cloud and register it with the snapshot repository of Alibaba Cloud Elasticsearch cluster.
4) Use OSSImport to transfer the full snapshot from the S3 bucket to the OSS bucket.
5) Restore data from the full snapshot to your Alibaba Cloud Elasticsearch cluster.
Repeat the preceding steps to restore data from incremental snapshots.
1) Stop services that may modify index data.
2) Create the final snapshot for your Amazon ES domain.
3) Transfer the final snapshot to your OSS bucket. Then, restore data from the snapshot to your Alibaba Cloud Elasticsearch cluster.
4) Switch to the cluster.
Amazon ES automatically creates snapshots for the primary index shards in a domain every day and stores them in a pre-configured S3 bucket. These snapshots are retained for a maximum of 14 days without additional charge. Use these snapshots to restore data to the domain.
However, these cannot be used to migrate data to other domains. Automatic snapshots can only be read from the specified domain. To migrate data, use manual snapshots stored in the S3 bucket. Standard S3 charges apply to manual snapshots.
To create manual snapshots and restore data from the snapshots, use AWS Identity and Access Management (IAM) and S3. Before creating snapshots, perform the operations listed in the following table.
Operation | Description |
---|---|
Create an S3 bucket | The bucket stores manual snapshots of your Amazon ES domain. |
Create an IAM role | The role is used to grant permissions on Amazon ES. When you add a trust relationship for the role, you must specify Amazon ES in the Principal element. This role is also required when you register a snapshot repository with Amazon ES. Only IAM users assigned this role can register the snapshot repository. |
Create an IAM policy | This policy specifies the actions that S3 can perform on your S3 bucket. The policy must be attached to the IAM role that is used to grant permissions on Amazon ES. You must specify your S3 bucket in the Resource element of the policy. |
An S3 bucket is required to store manual snapshots. Record its Amazon Resource Name (ARN). The ARN is used by the following items:
1) The resource element of the IAM policy that is attached to the specific IAM role
2) The Python client that is used to register a snapshot repository
The following example shows the ARN of an S3 bucket:
arn:aws:s3:::eric-es-index-backups
An IAM role is a must for which Amazon ES (es.amazonaws.com) is specified in the Service element in its trust relationship.
Example:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "",
"Effect": "Allow",
"Principal": {
"Service": "es.amazonaws.com"
},
"Action": "sts:AssumeRole"
}
]
}
View the trust relationship details in the AWS IAM console.
While creating a role in the IAM console, Amazon ES is not included in the "Select role type" drop-down list. Select Amazon EC2 from the drop-down list and create the role as prompted. Then, change ec2.amazonaws.com in the trust relationship of the role to es.amazonaws.com.
Attach an IAM policy to the IAM role. The policy specifies the S3 bucket used to store the manual snapshots of your Amazon ES domain. The following example specifies the ARN of the eric-es-index-backups bucket:
{
"Version": "2012-10-17",
"Statement": [
{
"Action": [
"s3:ListBucket"
],
"Effect": "Allow",
"Resource": [
"arn:aws:s3:::eric-es-index-backups"
]
},
{
"Action": [
"s3:GetObject",
"s3:PutObject",
"s3:DeleteObject"
],
"Effect": "Allow",
"Resource": [
"arn:aws:s3:::eric-es-index-backups/*"
]
}
]
}
Copy the policy content to the "Edit Policy" section.
Click "Policy Summary" to check whether the policy is correct.
Attach an IAM policy to an IAM role.
Create manual snapshots only after registering a snapshot repository with Amazon ES. Before creating manual snapshots, sign an AWS request to the user or role specified in the trust relationship of the IAM role.
You cannot run a curl command to register a snapshot repository because this command does not support AWS request signing. Use the sample Python client to register a snapshot repository.
Download the sample Python client file and change the values highlighted in yellow in the file based on actual conditions. Then, copy the content into a Python file named snapshot.py
.
The following table describes the variables in the sample Python client file.
Variable | Description |
---|---|
region | The AWS region where the snapshot repository is created. |
host | The endpoint of your Amazon ES domain. |
aws_access_key_id | The ID of your IAM credential. |
aws_secret_access_key | The key of your IAM credential. |
path | The name of the snapshot repository. |
data:bucket;region;role_arn | This value must include the name and ARN of the S3 bucket for the IAM role that you created in Prerequisites for Creating Manual Snapshots in an Amazon ES Domain. If you want to enable server-side encryption with S3-managed keys for the snapshot repository, add "server_side_encryption": true to the settings JSON array. If the S3 bucket resides in the us-east-1 region, replace "region":"us-east-1" with "endpoint":"s3.amazonaws.com". |
The sample Python client requires installing the boto package of version 2.x on the computer where the snapshot repository is registered.
# wget https://pypi.python.org/packages/66/e7/fe1db6a5ed53831b53b8a6695a8f134a58833cadb5f2740802bc3730ac15/boto-2.48.0.tar.gz#md5=ce4589dd9c1d7f5d347363223ae1b970
# tar zxvf boto-2.48.0.tar.gz
# cd boto-2.48.0
# python setup.py install
# pyth
on snapshot.py
Log on to the Kibana console of the AWS ES domain. In the left-side navigation pane, click "Dev Tools". On the "Console" tab, run the following command to view the registration result:
GET _snapshot
Run the following commands in the Kibana console or by executing curl commands in the Linux or Mac OS X command-line interface (CLI).
Create a snapshot named snapshot_movies_1 for the movies index in the eric-snapshot-repository snapshot repository.
PUT _snapshot/eric-snapshot-repository/snapshot_movies_1
{
"indexes": "movies"
}
View the snapshot status.
GET _snapshot/ eric-snapshot-repository/snapshot_movies_1
In the S3 console, view snapshot objects.
In this step, pull snapshot data from the AWS S3 bucket to Alibaba Cloud OSS bucket. For more information, see Migrate data from Amazon S3 to Alibaba Cloud OSS.
After the snapshot is transferred, view the snapshot in the OSS console.
Log on to the Kibana console of the Elasticsearch cluster. In the left-side navigation pane, click "Dev Tools". On the "Console" tab, run the following command to create a snapshot repository. The name of the snapshot repository must be the same as that of the snapshot repository registered with Amazon ES. Enter the actual values according to the parameter description.
PUT _snapshot/eric-snapshot-repository
{
"type": "oss",
"settings": {
"endpoint": "http://oss-cn-hangzhou-internal.aliyuncs.com",
"access_key_id": "Put your AccessKey id here.",
"secret_access_key": "Put your secret AccessKey here.",
"bucket": "eric-oss-aws-es-snapshot-s3",
"compress": true
}
}
View the status of the snapshot named snapshot_movies_1.
GET _snapshot/eric-snapshot-repository/snapshot_movies_1
Note: Record the start time and end time of the snapshot creation operation. This record is used while using OssImport to migrate data in incremental snapshots.
Example:
"start_time_in_millis": 1519786844591
"end_time_in_millis": 1519786846236
Log on to the Kibana console of the Elasticsearch cluster. In the left-side navigation pane, click "Dev Tools".
POST _snapshot/eric-snapshot-repository/snapshot_movies_1/_restore
{
"indexes": "movies"
}
GET movies/_recovery
On the "Console" tab, run the following command to view the availability of the movies index. View three sets of data in the movies index. In addition, the data is the same as that in the Amazon ES domain.
The movies index contains three sets of data. Insert two other sets of data.
Run the GET movies/_count
command to view the data volume in the index.
For more information, see step 1 in the "Create the First Snapshot and Restore Data from the Snapshot" section.
View objects in the S3 bucket.
Also, note the differences in the index folder.
Use OSSImport to transfer the snapshot from the S3 bucket to the OSS bucket. The S3 bucket stores two snapshot objects. Change the value of the isSkipExistFile variable in the local_job.cfg
file to migrate the incremental snapshot object.
Variable | Description | Setting |
---|---|---|
isSkipExistFile | Indicates whether existing objects are skipped during data migration. The value of this variable is of the Boolean type. | If you set it to true, objects are skipped based on the size and LastModifiedTime settings. If you set it to false, existing objects are overwritten. The default value is false. If jobType is set to audit, this variable is invalid. |
Then, view the incremental snapshot object in the OSS bucket.
Alibaba Cloud OSS bucket:
AWS S3 bucket:
For more information, see step 4 in the "Create the First Snapshot and Restore Data from the Snapshot" section. Before restoring data, close the movies index. After restoration, open the index.
POST /movies/_close
GET movies/_stats
POST _snapshot/eric-snapshot-repository/snapshot_movies_2/_restore
{
"indexes": "movies"
}
POST /movies/_open
After data is restored from the snapshot, there will be five documents in the movies index of the Elasticsearch cluster. This number is the same as that in the index of the Amazon ES domain.
Use the snapshot and restore feature to migrate data from an Amazon ES domain to an Alibaba Cloud Elasticsearch cluster. This feature requires closing the index to be migrated to avoid requests and write operations during the migration.
2,599 posts | 762 followers
FollowData Geek - April 24, 2024
Alibaba Clouder - May 31, 2018
Data Geek - May 9, 2024
Data Geek - September 3, 2024
Data Geek - June 6, 2024
Alibaba Clouder - January 29, 2021
2,599 posts | 762 followers
FollowSecure and easy solutions for moving you workloads to the cloud
Learn MoreAlibaba Cloud Elasticsearch helps users easy to build AI-powered search applications seamlessly integrated with large language models, and featuring for the enterprise: robust access control, security monitoring, and automatic updates.
Learn MoreAlibaba Cloud offers Independent Software Vendors (ISVs) the optimal cloud migration solutions to ready your cloud business with the shortest path.
Learn MoreAlibaba Cloud provides big data consulting services to help enterprises leverage advanced data technology.
Learn MoreMore Posts by Alibaba Clouder