How Do We Use an Ingest Node in Elasticsearch to Enrich Logs and Metrics

This blog discusses how by importing data into Elasticsearch and enriching the document with other information, we can search for information or view data efficiently.

In the past, data enrichment was available only in Logstash. Then, after an enrich processor was introduced in Elasticsearch 7.5.0, we could enrich data in Elasticsearch without having to configure a separate service or system.

The master data used for enrichment is usually created in CSV files. In this article, we will explain step by step how to enrich data by using the data in CSV files and the enrich processor that runs on an ingest node.

Sample CSV Data

First, you can import the following sample master data in CSV format by using Kibana on Alibaba Cloud Elasticsearch or user-created Elasticsearch, Logstash, and Kibana (ELK) on Elastic Compute Service (ECS). Then, enrich the target document with the sample master data while ingesting the document into Elasticsearch. In this example, we store the master data in a file named test.csv. The data indicates information about devices in an organization's inventory.

test.csv

"Device ID","Device Location","Device Owner","Device Type"
"device1","London","Engineering","Computer"
"device2","Toronto","Consulting","Mouse"
"device3","Winnipeg","Sales","Computer"
"device4","Barcelona","Engineering","Phone"
"device5","Toronto","Consulting","Computer"
"device6","London","Consulting","Computer"

Note that CSV data must not contain any spaces because the current version of Data Visualizer only accepts precisely formatted data. This issue is recorded in Github.

Import CSV Data into Elasticsearch

You can use Kibana to import the data into Elasticsearch. Open Kibana:

Click Import a CSV, NDJSON, or log file.

Click Select or drag and drop a file and then select the test.csv file that we just created.

Click Import.

In this example, we name the imported index master_data_from_csv. Click Import.

So far, we have created the master_data_from_csv index. You can select any of the four options at the lower part of the UI to view the imported data.

Enrich a Document with Primary Data

In this section, we will show you how to use the enrich processor to merge master data into a document from an input data stream. For information about the enrich processor, see my previous article "Elasticsearch: Enrich Processor - New Feature in Elasticsearch 7.5."

First, create an enrich policy that defines the field used to match the master data with the document from the input data stream. The following code is a sample policy applicable to our data:

PUT /_enrich/policy/enrich-devices-policy
{
  "match": {
    "indices": "master_data_from_csv",
    "match_field": "Device ID",
    "enrich_fields": [
      "Device Location",
      "Device Owner",
      "Device Type"
    ]
  }
}

Run the preceding policy. Then, call the execute enrich policy API to create an enrich index for the policy.

PUT /_enrich/policy/enrich-devices-policy/_execute

Next, create an ingest pipeline that uses the enrich policy.

PUT /_ingest/pipeline/device_lookup
{
  "description": "Enrich device information",
  "processors": [
    {
      "enrich": {
        "policy_name": "enrich-devices-policy",
        "field": "device_id",
        "target_field": "my_enriched_data",
        "max_matches": "1"
      }
    }
  ]
}

Next, insert a document and apply the ingest pipeline to the document.

PUT /device_index/_doc/1? pipeline=device_lookup
{
  "device_id": "device1",
  "other_field": "some value"
}

You can call the GET API operation to view the imported document.

GET device_index/_doc/1

Output:

{
  "_index" : "device_index",
  "_type" : "_doc",
  "_id" : "1",
  "_version" : 1,
  "_seq_no" : 0,
  "_primary_term" : 1,
  "found" : true,
  "_source" : {
    "my_enriched_data" : {
      "Device Location" : "London",
      "Device Owner" : "Engineering",
      "Device ID" : "device1",
      "Device Type" : "Computer"
    },
    "device_id" : "device1",
    "other_field" : "some value"
  }
}

As shown in the preceding code, a field named my_enriched_data is added to the returned document information. This field includes Device Location, Device Owner, Device ID, and Device Type, which come from the test.csv document that we previously imported. The enrich processor obtains the information from the master_data_from_csv index based on the associated device_id device1. In other words, our data contains more information now. As we said previously, this is the result of enrichment.

Specify a Pipeline in Index Settings

In the preceding process, we call the enrich processor by using the pipeline that we specified when we imported the data. However, in actual scenarios, we prefer to add this configuration to the index settings, instead of specifying a pipeline in the request URL. You can do this by adding index.default_pipeline to the index settings.

PUT device_index/_settings
{
  "index.default_pipeline": "device_lookup"
}

Now, all documents sent to device_index pass through the device_lookup pipeline. You no longer need to add pipeline=device_lookup to the URL. You can run the following PUT command to check whether the pipeline works properly.

PUT /device_index/_doc/2
{
  "device_id": "device2",
  "other_field": "some value"
}

You can run the following command to view the document we just ingested:

GET device_index/_doc/2

The following code shows the document:

{
  "_index" : "device_index",
  "_type" : "_doc",
  "_id" : "2",
  "_version" : 1,
  "_seq_no" : 1,
  "_primary_term" : 1,
  "found" : true,
  "_source" : {
    "my_enriched_data" : {
      "Device Location" : "Toronto",
      "Device Owner" : "Consulting",
      "Device ID" : "device2",
      "Device Type" : "Mouse"
    },
    "device_id" : "device2",
    "other_field" : "some value"
  }
}

Conclusion

In most cases, we need to enrich a document while importing the document. This ensures that the document in Elasticsearch contains the information that is required for searching or viewing the document. In this article, we showed how the enrich processor running on an ingest node enriches a document by using CSV data. This feature is useful for merging master data into a document while ingesting the master data into Elasticsearch.

Declaration: This article is adapted from "Elastic Helper in the China Community" with the authorization of the author Liu Xiaoguo. We reserve the right to investigate unauthorized use. Source: https://elasticstack.blog.csdn.net/

Community

How Do We Use an Ingest Node in Elasticsearch to Enrich Logs and Metrics

Sample CSV Data

test.csv

Import CSV Data into Elasticsearch

Enrich a Document with Primary Data

Specify a Pipeline in Index Settings

Conclusion

Read previous post:

Read next post:

Data Geek

You may also like

Comments

5101747859894340 March 21, 2022 at 10:52 am

Dikky Ryan Pratama May 9, 2023 at 5:43 am

Data Geek

Related Products

Big Data Consulting for Data Technology Solution

Big Data Consulting Services for Retail Solution

Alibaba Cloud Elasticsearch

ApsaraDB for HBase