Engineers often find themselves in a position where they need to migrate data in ElasticSearch. The purpose of migrating a cluster could be to ensure data backup and system upgrade. There are just as many methods as there are reasons to perform the migration; for example, you can use ElasticSearch-dump, snapshot, or even reindex method. In this article, we will introduce a new method to quickly migrate an ElasticSearch cluster using Logstash.
I hope that with this explanation, you will be able to understand the theory behind using Logstash to migrate data. In its essence, the operation consists of using Logstash to read data from the source ElasticSearch cluster, then writing the data into the target ElasticSearch cluster. I have outlined the exact operation in the following section.
Step 1: Create a data sync conf file in the Logstash directory
vim ./logstash-5.5.3/es-es.conf
Step 2: Ensure Identical Names: When configuring the conf file, ensure that the index names are identical in both the target and source clusters. Refer to the screenshot below.
input {
ElasticSearch {
hosts => ["********your host**********"]
user => "*******"
password => "*********"
index => "logstash-2017.11.07"
size => 1000
scroll => "1m"
}
}
# a note in this section indicates that filter can be selected
filter {
}
output {
ElasticSearch {
hosts => ["***********your host**************"]
user => "********"
password => "**********"
index => "logstash-2017.11.07"
}
}
Step 3: Running Logstash: Once you have configured the conf file, run Logstash
bin/logstash -f es-es.conf
Sometimes running this command will generate the following error message
[FATAL][logstash.runner] Logstash could not be started because there is already another instance using the configured data directory. If you wish to run multiple instances, you must change the "path.data" setting.
This is because the current version of Logstash does not allow multiple instances to share the same path.data. Therefore, when you start it up, include "--path.data PATH” in the command to define different paths for different instances.
bin/logstash -f es-es.conf --path.data ./logs/
If all goes as intended, you can use the following command to view the corresponding index in the target ElasticSearch
curl -u username:password host:port/_cat/indices
Let us now look at a sample use case.
**A lot of clients using their own home-built versions of ElasticSearch have been paying close attention to the Alibaba Cloud ElasticSearch products. They want to use it but have difficulties migrating their data from their own ElasticSearch to Alibaba Cloud ElasticSearch. The following will be an explanation of how to use Logstash to quickly migrate home built ElasticSearch index data on the cloud.
The logic behind this solution is quite simple, you require to configure multiple es-to-es conf file. However, this can be a cumbersome process. You can make this easier by using Logstash. Before I start explaining how you can do this, let me explain three core concepts of Logstash.
Because of the way metadata works, you can "inherit” the index and type information from the input to the output. Additionally, you can create the index, type, and id information in the target cluster that is identical to that in the source cluster.
If at any point in the process you want to see and debug the metadata information, you need to add the following setting to the output:
stdout { codec => rubydebug { metadata => true } }
Use the following configuration code:
input {
ElasticSearch {
hosts => ["yourhost"]
user => "**********"
password => "*********"
index => "*"#This wildcard requires the process to read all index information
size => 1000
scroll => "1m"
codec => "json"
docinfo => true
}
}
# a note in this section indicates that filter can be selected
filter {
}
output {
ElasticSearch {
hosts => ["yourhost"]
user => "********"
password => "********"
index => "%{[@metadata][_index]}"
}
stdout { codec => rubydebug { metadata => true } }
}
After running the command, Logstash will copy all of the indexes in the source cluster to the target cluster, carrying with it the mapping information. Next, it will begin gradually migrating the data inside the indexes.
When formally executing, you will see a setting which looks like this:
stdout { codec => rubydebug { metadata => true } }
I would recommend you to delete this setting to prevent your screen from being filled with metadata information.
I hope this article helped you understand how you can migrate ElasticSearch data using Logstash. I have also described the core concepts of Logstash, which you should be aware before you start the migration process.
Harness the True Potential of Blockchain and Artificial Intelligence with AI 3.0
CI/CD with Jenkins - Part 3: Use Jenkins for Continuous Delivery
2,599 posts | 762 followers
FollowData Geek - May 9, 2024
Data Geek - May 9, 2024
Alibaba Cloud Storage - June 19, 2019
Alibaba Clouder - December 29, 2020
Alibaba Clouder - January 7, 2021
Data Geek - September 3, 2024
2,599 posts | 762 followers
FollowElastic and secure virtual cloud servers to cater all your cloud hosting needs.
Learn MoreA Big Data service that uses Apache Hadoop and Spark to process and analyze data
Learn MoreConduct large-scale data warehousing with MaxCompute
Learn MoreMore Posts by Alibaba Clouder