×
Community Blog How to Migrate from ELK to Log Service

How to Migrate from ELK to Log Service

In this article, we will show you how you can migrate data stored in Elasticsearch to Log Service by using a one-line command.

By Bruce Wu

Overview

In comparison with self-built Elasticsearch, Logstash, and Kibana (ELK) services, Log Service has many advantages in terms of functionality, performance, scale, and cost. For more information, see Comprehensive comparison between self-built ELK and Log Service. You can migrate data stored in Elasticsearch to Log Service by using a one-line command.

Introduction to Data Migration

As the name suggests, data migration refers to migrating data from one data source to another. Based on whether the data source storage engines are the same, data migration is divided into homogenous migration and heterogeneous migration. According to the migration type, data migration can be further divided into full-data migration and incremental data migration.

Currently, many data migration solutions provided by different cloud computing vendors are available on the market, such as AWS DMS, Azure DMA, and Alibaba Cloud Data Transmission Service (DTS). These solutions mainly address data migration problems between relational databases, and have not covered the Elasticsearch scenario yet.

Elasticsearch Data Migration Solution

To migrate data from Elasticsearch, the Log Service team provides a solution based on aliyun-log-python-sdk and aliyun-log-cli. This solution mainly addresses full-data migration of historical data.

Mechanism

  • Log Service uses Scroll API to pull data from Elasticsearch. Scroll API is able to efficiently pull large amounts of data from Elasticsearch without the cost of deep pagination.
  • Log Service creates a data migration task for each shard of each index in Elasticsearch, and submits these tasks to an internal process pool for execution. This improves the parallelism and throughput of data migration.

1

Features

  • Allows you to migrate all or some indexed documents from Elasticsearch to a specified Log Service project (CLI initializes logstores with the same names as those of the Elasticsearch indexes).
  • Allows you to customize filter conditions to migrate only the qualified documents to Log Service.
  • Allows you to customize the mapping relationships between indexes in Elasticsearch and logstores in Log Service.
  • Allows you to control the parallelism of data migration tasks by using the pool_size parameter.
  • Allows you to customize values of some log fields: time, __source__, and __topic__.
  • Allows you to use HTTP basic access authentication to migrate data from Elasticsearch.

Mapping Relationship

Data model of Elasticsearch consists of some concepts, such as index, type, document, mapping, and field datatypes. The mapping relationships between these concepts and Log Service data types are provided in the following table.

2

For more information about the mapping relationships, see Data type mapping.

Example

The following video shows you how to migrate NGINX access logs from Elasticsearch to Log Service, and how to query and analyze these logs by using CLI.

Migration Command

aliyun log es_migration --hosts=<your_es> --project_name=<your_project> --indexes=filebeat-* --logstore_index_mappings='{"nginx-access": "filebeat-*"}' --time_reference=@timestamp

Query and Analysis

1. Query for the status code count on a daily basis.

* | SELECT  date_trunc('day' ,  __time__)  as t, "nginx.access.response_code" AS status, COUNT(1) AS count GROUP BY status, t ORDER BY t ASC

2. Query for the countries and regions where the requests originated.

* | SELECT ip_to_country("nginx.access.remote_ip") as country, count(1) as count GROUP BY country

Performance Tuning

The performance of CLI mainly depends on the speed of reading data from Elasticsearch and that of writing data to Log Service.

The Speed of Reading Data from Elasticsearch

Each index of Elasticsearch consists of multiple shards. CLI creates a data migration task for every shard of each index, and submits these tasks to the internal process pool for execution. You can specify the size of the pool by setting the pool_size parameter. Theoretically, a target index with more shards should have a higher overall throughput.

The Speed of Writing Data to Log Service

Log Service has shards, too. Each shard provides a writing capacity of 5 MB/s or 500 writes/s. You can enable more shards for your logstore to improve the speed of writing data to Log Service.

Performance Data

Assume that the target Elasticsearch index has only one shard. The logstore of Log Service also has only one shard. The size of each document to be migrated is 100 Bytes. Then the average data migration speed would be 3 MB/s.

References

0 0 0
Share on

Alibaba Cloud Storage

57 posts | 12 followers

You may also like

Comments