How to extract content from log fields in multiple modes - Simple Log Service

If you use Logtail to collect logs, you can add Logtail plug-ins to extract content from log fields in regex, anchor, CSV, single-character delimiter, multi-character delimiter, key-value pair, and Grok modes. This topic describes the parameters of Logtail plug-ins and provides examples on how to configure the plug-ins.

Limits

The input plug-ins for text logs and container stdout and stderr support only form configuration. Other input plug-ins support only editor configuration in JSON.

Entry point

If you want to use a Logtail plug-in to process logs, you can add a Logtail plug-in configuration when you create or modify a Logtail configuration. For more information, see Overview of Logtail plug-ins for data processing.

Regex mode

You can extract content from log fields by using a regular expression.

Form configuration

Parameters

Set Processor Type to Extract Field (Regex Mode). Then, configure other parameters based on the following table.

Parameter	Description
Original Field	The name of the original field.
Regular Expression	The regular expression. You must enclose the field from which you want to extract content in parentheses `()`.
New Field	The field name that you want to specify for the extracted content. You can specify multiple field names.
Report Original Field Missing Error	Specifies whether to report an error if the raw log does not contain the original field.
Report Regex Mismatch Error	Specifies whether to report an error if the value of the original field does not match the regular expression.
Retain Original Field	Specifies whether to retain the original field in the new log that is obtained after parsing.
Retain Original Field If Parsing Fails	Specifies whether to retain the original field in the new log that is obtained after the raw log fails to be parsed.
Full Regex Match	Specifies whether to extract the value of the original field in full match mode. If you select this option, the value of the original field is extracted only if all fields that are specified in New Field match the value of the original field based on the regular expression that is specified in Regular Expression.

Configuration example

Extract the value of the content field in regex mode and specify field names ip, time, method, url, request_time, request_length, status, length, ref_url, and browser for the value.

Raw log

"content" : "10.200.**.** - - [10/Aug/2022:14:57:51 +0800] \"POST /PutData?
Category=YunOsAccountOpLog&AccessKeyId=<yourAccessKeyId>&Date=Fri%2C%2028%20Jun%202013%2006%3A53%3A30%20GMT&Topic=raw&Signature=<yourSignature> HTTP/1.1\" 0.024 18204 200 37 \"-\" \"aliyun-sdk-java"

Logtail plug-in configuration for data processing

Result

"ip" : "10.200.**.**"
"time" : "10/Aug/2022:14:57:51"
"method" : "POST"
"url" : "/PutData?Category=YunOsAccountOpLog&AccessKeyId=<yourAccessKeyId>&Date=Fri%2C%2028%20Jun%202013%2006%3A53%3A30%20GMT&Topic=raw&Signature=<yourSignature>"
"request_time" : "0.024"
"request_length" : "18204"
"status" : "200"
"length" : "27"
"ref_url" : "-"
"browser" : "aliyun-sdk-java"

Editor configuration in JSON

Parameters

Set type to processor_regex. Then, configure other parameters in detail based on the following table.

Parameter	Type	Required	Description
SourceKey	String	Yes	The name of the original field.
Regex	String	Yes	The regular expression. You must enclose the field from which you want to extract content in parentheses `()`.
Keys	String array	Yes	The field names that you want to specify for the extracted content. Example: ["ip", "time", "method"].
NoKeyError	Boolean	No	Specifies whether to report an error if the raw log does not contain the original field. Valid values: true false (default)
NoMatchError	Boolean	No	Specifies whether to report an error if the value of the original field does not match the regular expression. Valid values: true (default) false
KeepSource	Boolean	No	Specifies whether to retain the original field in the new log that is obtained after parsing. Valid values: true false (default)
FullMatch	Boolean	No	Specifies whether to extract the value of the original field in full match mode. Valid values: true (default): The value of the original field is extracted only if all fields that are specified in Keys match the value of the original field based on the regular expression that is specified in Regex. false: The value of the original field is extracted even if only some fields that are specified in Keys match the value of the original field based on the regular expression that is specified in Regex.
KeepSourceIfParseError	Boolean	No	Specifies whether to retain the original field in the new log that is obtained after the raw log fails to be parsed. Valid values: true (default) false

Configuration example

Extract the value of the content field in regex mode and specify field names ip, time, method, url, request_time, request_length, status, length, ref_url, and browser for the value.

Raw log

"content" : "10.200.**.** - - [10/Aug/2022:14:57:51 +0800] \"POST /PutData?
Category=YunOsAccountOpLog&AccessKeyId=<yourAccessKeyId>&Date=Fri%2C%2028%20Jun%202013%2006%3A53%3A30%20GMT&Topic=raw&Signature=<yourSignature> HTTP/1.1\" 0.024 18204 200 37 \"-\" \"aliyun-sdk-java"

Logtail plug-in configuration for data processing

{
    "type" : "processor_regex",
    "detail" : {"SourceKey" : "content",
         "Regex" : "([\\d\\.]+) \\S+ \\S+ \\[(\\S+) \\S+\\] \"(\\w+) ([^\\\"]*)\" ([\\d\\.]+) (\\d+) (\\d+) (\\d+|-) \"([^\\\"]*)\" \"([^\\\"]*)\" (\\d+)",
         "Keys"   : ["ip", "time", "method", "url", "request_time", "request_length", "status", "length", "ref_url", "browser"],
         "NoKeyError" : true,
         "NoMatchError" : true,
         "KeepSource" : false
    }
}

Result

"ip" : "10.200.**.**"
"time" : "10/Aug/2022:14:57:51"
"method" : "POST"
"url" : "/PutData?Category=YunOsAccountOpLog&AccessKeyId=<yourAccessKeyId>&Date=Fri%2C%2028%20Jun%202013%2006%3A53%3A30%20GMT&Topic=raw&Signature=<yourSignature>"
"request_time" : "0.024"
"request_length" : "18204"
"status" : "200"
"length" : "27"
"ref_url" : "-"
"browser" : "aliyun-sdk-java"

Anchor mode

You can extract content from log fields by anchoring start and stop keywords. If you want to extract content from a JSON-formatted field, you can expand the field.

Form configuration

Parameters

Set Processor Type to Extract Field (Anchor Mode). Then, configure other parameters based on the following table.

Parameter	Description
Original Field	The name of the original field.
Anchor Parameters	The fields that are configured to anchor keywords.
Start Keyword	The keyword that specifies the start of anchoring. If you do not configure this parameter, the start of a string is matched.
End Keyword	The keyword that specifies the end of anchoring. If you do not configure this parameter, the end of a string is matched.
New Field	The field name that you want to specify for the extracted content.
Field Type	The type of the field. Valid values: string and json.
JSON Expansion	Specifies whether to expand the JSON-formatted field.
Character to Concatenate Expanded Keys	The character that is used to concatenate the expanded keys. The default value is an underscore (_).
Maximum Depth of JSON Expansion	The maximum depth of JSON expansion. Default value: 0, which indicates that the maximum depth is unlimited.
Report Original Field Missing Error	Specifies whether to report an error if the raw log does not contain the original field.
Report Keywords Missing Error	Specifies whether to report an error if no keywords are matched in the raw log.
Retain Original Field	Specifies whether to retain the original field in the new log that is obtained after parsing.

Configuration example

Extract the value of the content field in anchor mode and specify field names time, val_key1, val_key2, val_key3, value_key4_inner1, and value_key4_inner2 for the value.

Raw log

"content" : "time:2022.09.12 20:55:36\t json:{\"key1\" : \"xx\", \"key2\": false, \"key3\":123.456, \"key4\" : { \"inner1\" : 1, \"inner2\" : false}}"

Logtail plug-in configuration for data processing

Result

"time" : "2022.09.12 20:55:36"
"val_key1" : "xx"
"val_key2" : "false"
"val_key3" : "123.456"
"value_key4_inner1" : "1"
"value_key4_inner2" : "false"

Editor configuration in JSON

Parameters

Set type to processor_anchor. Then, configure other parameters in detail based on the following table.

Parameter	Type	Required	Description
SourceKey	String	Yes	The name of the original field.
Anchors	Anchor array	Yes	The fields that are configured to anchor keywords.
Start	String	Yes	The keyword that specifies the start of anchoring. If you do not configure this parameter, the start of a string is matched.
Stop	String	Yes	The keyword that specifies the end of anchoring. If you do not configure this parameter, the end of a string is matched.
FieldName	String	Yes	The field name that you want to specify for the extracted content.
FieldType	String	Yes	The type of the field. Valid values: string and json.
ExpondJson	Boolean	No	Specifies whether to expand the JSON-formatted field. Valid values: true false (default) This parameter takes effect only when you set FieldType to json.
ExpondConnecter	String	No	The character that is used to concatenate the expanded keys. The default value is an underscore (_).
MaxExpondDepth	Int	No	The maximum depth of JSON expansion. Default value: 0, which indicates that the maximum depth is unlimited.
NoAnchorError	Boolean	No	Specifies whether to report an error if no keywords are matched in the raw log. Valid values: true false (default)
NoKeyError	Boolean	No	Specifies whether to report an error if the raw log does not contain the original field. Valid values: true false (default)

Configuration example

Extract the value of the content field in anchor mode and specify field names time, val_key1, val_key2, val_key3, value_key4_inner1, and value_key4_inner2 for the value.

Raw log

"content" : "time:2022.09.12 20:55:36\t json:{\"key1\" : \"xx\", \"key2\": false, \"key3\":123.456, \"key4\" : { \"inner1\" : 1, \"inner2\" : false}}"

Logtail plug-in configuration for data processing

{
   "type" : "processor_anchor",
   "detail" : {"SourceKey" : "content",
      "Anchors" : [
          {
              "Start" : "time",
              "Stop" : "\t",
              "FieldName" : "time",
              "FieldType" : "string",
              "ExpondJson" : false
          },
          {
              "Start" : "json:",
              "Stop" : "",
              "FieldName" : "val",
              "FieldType" : "json",
              "ExpondJson" : true 
          }
      ]
  }
}

Result

"time" : "2022.09.12 20:55:36"
"val_key1" : "xx"
"val_key2" : "false"
"val_key3" : "123.456"
"value_key4_inner1" : "1"
"value_key4_inner2" : "false"

CSV mode

You can parse CSV-formatted logs and extract content from the fields in the logs in CSV mode.

Form configuration

Parameters

Set Processor Type to Extract Field (CSV Mode). Then, configure other parameters based on the following table.

Parameter	Description
Original Field	The name of the original field.
New Field	The field name that you want to specify for the extracted content. You can specify multiple field names. Important If the system does not find a match for a field that is specified in New Field from the value of the original field, the field is skipped.
Delimiter	The delimiter. The default value is a comma (,).
Retain Excess Part	Specifies whether to retain the content that remains in the value of the original field after the system finds a match for each field that is specified in New Field from the value. For ease of understanding, the content is referred to as the excess part in this topic.
Parse Excess Part	Specifies whether to parse the excess part. If you select the option, you can configure Name Prefix of Field to which Excess Part is Assigned to specify a prefix for the names of fields to which the excess part is assigned. If you select Retain Excess Part but do not select Parse Excess Part, the excess part is stored in the _decode_preserve_ field. Note If the excess part contains invalid data, you must standardize the data in the CSV format and then store the data.
Name Prefix of Field to which Excess Part is Assigned	The prefix for the names of fields to which the excess part is assigned. For example, if you set this parameter to `expand_`, the fields are named expand_1 and expand_2.
Ignore Spaces before Field	Specifies whether to skip the spaces at the beginning of the value of the original field.
Retain Original Field	Specifies whether to retain the original field in the new log that is obtained after parsing.
Report Original Field Missing Error	Specifies whether to report an error if the raw log does not contain the original field.

Configuration example

Extract the value of the csv field.

Raw log

{
    "csv": "2022-06-09,192.0.2.0,\"{\"\"key1\"\":\"\"value\"\",\"\"key2\"\":{\"\"key3\"\":\"\"string\"\"}}\"",
    ......
}

Logtail plug-in configuration for data processing

Result

{
    "date": "2022-06-09",
    "ip": "192.0.2.0",
    "content": "{\"key1\":\"value\",\"key2\":{\"key3\":\"string\"}}"
    ......

}

Editor configuration in JSON

Parameters

Set type to processor_csv. Then, configure other parameters in detail based on the following table.

Parameter	Type	Required	Description
SourceKey	String	Yes	The name of the original field.
SplitKeys	String array	Yes	The field names that you want to specify for the extracted content. Example: ["date", "ip", "content"]. Important If the system does not find a match for a field that is specified in SplitKeys from the value of the original field, the field is skipped.
PreserveOthers	Boolean	No	Specifies whether to retain the excess part after the system finds a match for each field that is specified in SplitKeys from the value. Valid values: true false (default)
ExpandOthers	Boolean	No	Specifies whether to parse the excess part. Valid values: true. You can set ExpandOthers to true to parse the excess part and configure ExpandKeyPrefix to specify a prefix for the names of fields to which the excess part is assigned. false (default). If you set PreserveOthers to true and ExpandOthers to false, the excess part is stored in the _decode_preserve_ field. Note If the excess part contains invalid data, you must standardize the data in the CSV format and then store the data.
ExpandKeyPrefix	String	No	The prefix for the names of fields to which the excess part is assigned. For example, if you set this parameter to `expand_`, the fields are named expand_1 and expand_2.
TrimLeadingSpace	Boolean	No	Specifies whether to skip the spaces at the beginning of the value of the original field. Valid values: true false (default)
SplitSep	String	No	The delimiter. The default value is a comma (,).
KeepSource	Boolean	No	Specifies whether to retain the original field in the new log that is obtained after parsing. Valid values: true false (default)
NoKeyError	Boolean	No	Specifies whether to report an error if the raw log does not contain the original field. Valid values: true false (default)

Configuration example

Extract the value of the csv field.

Raw log

{
    "csv": "2022-06-09,192.0.2.0,\"{\"\"key1\"\":\"\"value\"\",\"\"key2\"\":{\"\"key3\"\":\"\"string\"\"}}\"",
    ......
}

Logtail plug-in configuration for data processing

 {
    ......
    "type":"processor_csv",
    "detail":{
        "SourceKey":"csv",
        "SplitKeys":["date", "ip", "content"],
    }
    ......
}

Result

{
    "date": "2022-06-09",
    "ip": "192.0.2.0",
    "content": "{\"key1\":\"value\",\"key2\":{\"key3\":\"string\"}}"
    ......

}

Single-character delimiter mode

You can extract content from log fields by using a single-character delimiter. You can use a quote to enclose the delimiter.

Form configuration

Parameters

Set Processor Type to Extract Field (Single-character Delimiter Mode). Then, configure other parameters based on the following table.

Parameter	Description
Original Field	The name of the original field.
Delimiter	The delimiter. The delimiter must be a single character. You can specify a non-printable character as the single-character delimiter. Example: \u0001.
New Field	The field name that you want to specify for the extracted content.
Use Quote	Specifies whether to use a quote to enclose the specified delimiter.
Quote	The quote. The quote must be a single character. You can specify a non-printable character as the quote. Example: \u0001.
Report Original Field Missing Error	Specifies whether to report an error if the raw log does not contain the original field.
Report Delimiter Mismatch Error	Specifies whether to report an error if the delimiter in the raw log does not match the specified delimiter.
Retain Original Field	Specifies whether to retain the original field in the new log that is obtained after parsing.

Configuration example

Extract the value of the content field by using a vertical bar (|) as the delimiter and specify field names ip, time, method, url, request_time, request_length, status, length, ref_url, and browser for the value.

Raw log

"content" : "10.**.**.**|10/Aug/2022:14:57:51 +0800|POST|PutData?
Category=YunOsAccountOpLog&AccessKeyId=<yourAccessKeyId>&Date=Fri%2C%2028%20Jun%202013%2006%3A53%3A30%20GMT&Topic=raw&Signature=<yourSignature>|0.024|18204|200|37|-|
aliyun-sdk-java"

Logtail plug-in configuration for data processing

Result

"ip" : "10.**.**.**"
"time" : "10/Aug/2022:14:57:51 +0800"
"method" : "POST"
"url" : "/PutData?Category=YunOsAccountOpLog&AccessKeyId=<yourAccessKeyId>&Date=Fri%2C%2028%20Jun%202013%2006%3A53%3A30%20GMT&Topic=raw&Signature=<yourSignature>"
"request_time" : "0.024"
"request_length" : "18204"
"status" : "200"
"length" : "27"
"ref_url" : "-"
"browser" : "aliyun-sdk-java"

Editor configuration in JSON

Parameters

Set type to processor_split_char. Then, configure other parameters in detail based on the following table.

Parameter	Type	Required	Description
SourceKey	String	Yes	The name of the original field.
SplitSep	String	Yes	The delimiter. The delimiter must be a single character. You can specify a non-printable character as the single-character delimiter. Example: \u0001.
SplitKeys	String array	Yes	The field names that you want to specify for the extracted content. Example: ["ip", "time", "method"].
PreserveOthers	Boolean	No	Specifies whether to retain the excess part after the system finds a match for each field that is specified in SplitKeys from the value. Valid values: true false (default)
QuoteFlag	Boolean	No	Specifies whether to use a quote to enclose the specified delimiter. Valid values: true false (default)
Quote	String	No	The quote. The quote must be a single character. You can specify a non-printable character as the quote. Example: \u0001. This parameter takes effect only when QuoteFlag is set to true.
NoKeyError	Boolean	No	Specifies whether to report an error if the raw log does not contain the original field. Valid values: true false (default)
NoMatchError	Boolean	No	Specifies whether to report an error if the delimiter in the raw log does not match the specified delimiter. Valid values: true false (default)
KeepSource	Boolean	No	Specifies whether to retain the original field in the new log that is obtained after parsing. Valid values: true false (default)

Configuration example

Raw log

"content" : "10.**.**.**|10/Aug/2022:14:57:51 +0800|POST|PutData?
Category=YunOsAccountOpLog&AccessKeyId=<yourAccessKeyId>&Date=Fri%2C%2028%20Jun%202013%2006%3A53%3A30%20GMT&Topic=raw&Signature=<yourSignature>|0.024|18204|200|37|-|
aliyun-sdk-java"

Logtail plug-in configuration for data processing

{
   "type" : "processor_split_char",
   "detail" : {"SourceKey" : "content",
      "SplitSep" : "|",
      "SplitKeys" : ["ip", "time", "method", "url", "request_time", "request_length", "status", "length", "ref_url", "browser"]     
  }
}

Result

"ip" : "10.**.**.**"
"time" : "10/Aug/2022:14:57:51 +0800"
"method" : "POST"
"url" : "/PutData?Category=YunOsAccountOpLog&AccessKeyId=<yourAccessKeyId>&Date=Fri%2C%2028%20Jun%202013%2006%3A53%3A30%20GMT&Topic=raw&Signature=<yourSignature>"
"request_time" : "0.024"
"request_length" : "18204"
"status" : "200"
"length" : "27"
"ref_url" : "-"
"browser" : "aliyun-sdk-java"

Multi-character delimiter mode

You can extract content from log fields by using a multi-character delimiter. You cannot use a quote to enclose the delimiter.

Form configuration

Parameters

Set Processor Type to Extract Field (Multi-character Delimiter Mode). Then, configure other parameters based on the following table.

Parameter	Description
Original Field	The name of the original field.
Delimiter String	The delimiter. You can specify a non-printable character as the multi-character delimiter. Example: \u0001\u0002.
New Field	The field name that you want to specify for the extracted content. Important If the system does not find a match for a field that is specified in New Field from the value of the original field, the field is skipped.
Report Original Field Missing Error	Specifies whether to report an error if the raw log does not contain the original field.
Report Delimiter Mismatch Error	Specifies whether to report an error if the delimiter in the raw log does not match the specified delimiter.
Retain Original Field	Specifies whether to retain the original field in the new log that is obtained after parsing.
Retain Excess Part	Specifies whether to retain the excess part after the system finds a match for each field that is specified in New Field from the value.
Parse Excess Part	Specifies whether to parse the excess part. If you select this option, you can configure Name Prefix of Field to which Excess Part is Assigned to specify a prefix for the names of fields to which the excess part is assigned.
Name Prefix of Field to which Excess Part is Assigned	The prefix for the names of fields to which the excess part is assigned. For example, if you set this parameter to expand_, the fields are named expand_1 and expand_2.

Configuration example

Extract the value of the content field by using the delimiter |#| and specify field names ip, time, method, url, request_time, request_length, status, expand_1, expand_2, and expand_3 for the value.

Raw log

"content" : "10.**.**.**|#|10/Aug/2022:14:57:51 +0800|#|POST|#|PutData?
Category=YunOsAccountOpLog&AccessKeyId=<yourAccessKeyId>&Date=Fri%2C%2028%20Jun%202013%2006%3A53%3A30%20GMT&Topic=raw&Signature=<yourSignature>|#|0.024|#|18204|#|200|#|27|#|-|#|
aliyun-sdk-java"

Logtail plug-in configuration for data processing

Result

"ip" : "10.**.**.**"
"time" : "10/Aug/2022:14:57:51 +0800"
"method" : "POST"
"url" : "/PutData?Category=YunOsAccountOpLog&AccessKeyId=<yourAccessKeyId>&Date=Fri%2C%2028%20Jun%202013%2006%3A53%3A30%20GMT&Topic=raw&Signature=<yourSignature>"
"request_time" : "0.024"
"request_length" : "18204"
"status" : "200"
"expand_1" : "27"
"expand_2" : "-"
"expand_3" : "aliyun-sdk-java"

Editor configuration in JSON

Parameters

Set type to processor_split_string. Then, configure other parameters in detail based on the following table.

Parameter	Type	Required	Description
SourceKey	String	Yes	The name of the original field.
SplitSep	String	Yes	The delimiter. You can specify a non-printable character as the multi-character delimiter. Example: \u0001\u0002.
SplitKeys	String array	Yes	The field names that you want to specify for the extracted content. Example: ["key1","key2"]. Note If the system does not find a match for a field that is specified in SplitKeys from the value of the original field, the field is skipped.
PreserveOthers	Boolean	No	Specifies whether to retain the excess part after the system finds a match for each field that is specified in SplitKeys from the value. Valid values: true false (default)
ExpandOthers	Boolean	No	Specifies whether to parse the excess part. Valid values: true. You can set ExpandOthers to true to parse the excess part and configure ExpandKeyPrefix to specify a prefix for the names of fields to which the excess part is assigned. false (default).
ExpandKeyPrefix	String	No	The prefix for the names of fields to which the excess part is assigned. For example, if you set this parameter to expand_, the fields are named expand_1 and expand_2.
NoKeyError	Boolean	No	Specifies whether to report an error if the raw log does not contain the original field. Valid values: true false (default)
NoMatchError	Boolean	No	Specifies whether to report an error if the delimiter in the raw log does not match the specified delimiter. Valid values: true false (default)
KeepSource	Boolean	No	Specifies whether to retain the original field in the new log that is obtained after parsing. Valid values: true false (default)

Configuration example

Extract the value of the content field by using the delimiter |#| and specify field names ip, time, method, url, request_time, request_length, status, expand_1, expand_2, and expand_3 for the value.

Raw log

"content" : "10.**.**.**|#|10/Aug/2022:14:57:51 +0800|#|POST|#|PutData?
Category=YunOsAccountOpLog&AccessKeyId=<yourAccessKeyId>&Date=Fri%2C%2028%20Jun%202013%2006%3A53%3A30%20GMT&Topic=raw&Signature=<yourSignature>|#|0.024|#|18204|#|200|#|27|#|-|#|
aliyun-sdk-java"

Logtail plug-in configuration for data processing

{
   "type" : "processor_split_string",
   "detail" : {"SourceKey" : "content",
      "SplitSep" : "|#|",
      "SplitKeys" : ["ip", "time", "method", "url", "request_time", "request_length", "status"],
      "PreserveOthers" : true,
      "ExpandOthers" : true,
      "ExpandKeyPrefix" : "expand_"
  }
}

Result

"ip" : "10.**.**.**"
"time" : "10/Aug/2022:14:57:51 +0800"
"method" : "POST"
"url" : "/PutData?Category=YunOsAccountOpLog&AccessKeyId=<yourAccessKeyId>&Date=Fri%2C%2028%20Jun%202013%2006%3A53%3A30%20GMT&Topic=raw&Signature=<yourSignature>"
"request_time" : "0.024"
"request_length" : "18204"
"status" : "200"
"expand_1" : "27"
"expand_2" : "-"
"expand_3" : "aliyun-sdk-java"

Key-value pair mode

You can extract content from log fields by splitting key-value pairs.

Note

Logtail V0.16.26 and later support the processor_split_key_value plug-in.

Form configuration

Parameters

Set Processor Type to Extract Field (Key-value Pair Mode). Then, configure other parameters based on the following table.

Parameter	Description
Original Field	The name of the original field.
Key-value Pair Delimiter	The delimiter that is used to separate key-value pairs. The default value is a tab character (`\t`).
Key and Value Delimiter	The delimiter that is used to separate the key and the value in a single key-value pair. The default value is a colon (:).
Retain Original Field	Specifies whether to retain the original field.
Report Original Field Missing Error	Specifies whether to report an error if the raw log does not contain the original field.
Drop Key-value Pairs That Fail to Match Delimiter	Specifies whether to discard a key-value pair if the delimiter in the raw log does not match the specified delimiter.
Report Key and Value Delimiter Missing Error	Specifies whether to report an error if the raw log does not contain the specified delimiter.
Report Empty Key Error	Specifies whether to report an error if the key is empty after delimiting.
Quote	The quote. If a key value is enclosed in the specified quote, the key value in the quote is extracted. You can specify multiple characters as a quote. Important If a key value that is enclosed in the specified quote contains a backslash (\) and the backlash (\) is adjacent to the quote, the backlash (\) is extracted as a part of the key value.

Configuration example
- Example 1: Extract the value of a specified field in key-value pair mode.
  Extract the value of the content field in key-value pair mode. The delimiter that is used to separate key-value pairs is a tab character (\t). The delimiter that is used to separate the key and the value in a single key-value pair is a colon (:).
  - Raw log
```
"content": "class:main\tuserid:123456\tmethod:get\tmessage:\"wrong user\""
```
  - Logtail plug-in configuration for data processing
  - Result
```
"content": "class:main\tuserid:123456\tmethod:get\tmessage:\"wrong user\""
"class": "main"
"userid": "123456"
"method": "get"
"message": "\"wrong user\""
```
- Example 2: Extract the value of a specified field in key-value pair mode when a quote is used.
  Extract the value of the content field in key-value pair mode. The delimiter that is used to separate key-value pairs is a tab character (\t). The delimiter that is used to separate the key and the value in a single key-value pair is a colon (:). The quote that is used is a double quotation mark (").
  - Raw log
```
"content": "class:main http_user_agent:\"User Agent\" \"Chinese\" \"hello\\t\\\"ilogtail\\\"\\tworld\""
```
  - Logtail plug-in configuration for data processing
  - Result
```
"class": "main",
"http_user_agent": "User Agent",
"no_separator_key_0": "Chinese",
"no_separator_key_1": "hello\t\"ilogtail\"\tworld",
```
- Example 3: Extract the value of a specified field in key-value pair mode when a multi-character quote is used.
  Extract the value of the content field in key-value pair mode. The delimiter that is used to separate key-value pairs is a tab character (\t). The delimiter that is used to separate the key and the value in a single key-value pair is a colon (:). The quote that is used is double quotation marks ("").
  - Raw log
```
"content": "class:main http_user_agent:\"\"\"User Agent\"\"\" \"\"\"Chinese\"\"\""
```
  - Logtail plug-in configuration for data processing
  - Result
```
"class": "main",
"http_user_agent": "User Agent",
"no_separator_key_0": "Chinese",
```

Editor configuration in JSON

Parameters

Set type to processor_split_key_value. Then, configure other parameters in detail based on the following table.

Parameter	Type	Required	Description
SourceKey	string	Yes	The name of the original field.
Delimiter	string	No	The delimiter that is used to separate key-value pairs. The default value is a tab character (`\t`).
Separator	string	No	The delimiter that is used to separate the key and the value in a single key-value pair. The default value is a colon (:).
KeepSource	Boolean	No	Specifies whether to retain the original field in the new log that is obtained after parsing. Valid values: true false (default)
ErrIfSourceKeyNotFound	Boolean	No	Specifies whether to report an error if the raw log does not contain the original field. Valid values: true (default) false
DiscardWhenSeparatorNotFound	Boolean	No	Specifies whether to discard a key-value pair if the delimiter in the raw log does not match the specified delimiter. Valid values: true false (default)
ErrIfSeparatorNotFound	Boolean	No	Specifies whether to report an error if the raw log does not contain the specified delimiter. Valid values: true (default) false
ErrIfKeyIsEmpty	Boolean	No	Specifies whether to report an error if the key is empty after delimiting. Valid values: true (default) false
Quote	String	No	The quote. If a key value is enclosed in the specified quote, the key value in the quote is extracted. You can specify multiple characters as a quote. By default, the quote feature is disabled. Important If you specify double quotation marks ("") as the quote, you must add a backslash (\) as the escape character to each pair of double quotation mark (""). If a key value that is enclosed in the specified quote contains a backslash (\) and the backlash (\) is adjacent to the quote, the backlash (\) is extracted as a part of the key value.

Configuration example
- Example 1: Extract the value of a specified field in key-value pair mode.
  Extract the value of the content field in key-value pair mode. The delimiter that is used to separate key-value pairs is a tab character (\t). The delimiter that is used to separate the key and the value in a single key-value pair is a colon (:).
  - Raw log
```
"content": "class:main\tuserid:123456\tmethod:get\tmessage:\"wrong user\""
```
  - Logtail plug-in configuration for data processing
```
{
  "processors":[
    {
      "type":"processor_split_key_value",
      "detail": {
        "SourceKey": "content",
        "Delimiter": "\t",
        "Separator": ":",
        "KeepSource": true
      }
    }
  ]
}
```
  - Result
```
"content": "class:main\tuserid:123456\tmethod:get\tmessage:\"wrong user\""
"class": "main"
"userid": "123456"
"method": "get"
"message": "\"wrong user\""
```
- Example 2: Extract the value of a specified field in key-value pair mode.
  Extract the value of the content field in key-value pair mode. The delimiter that is used to separate key-value pairs is a tab character (\t). The delimiter that is used to separate the key and the value in a single key-value pair is a colon (:). The quote that is used is a double quotation mark (").
  - Raw log
```
"content": "class:main http_user_agent:\"User Agent\" \"Chinese\" \"hello\\t\\\"ilogtail\\\"\\tworld\""
```
  - Logtail plug-in configuration for data processing
```
{
  "processors":[
    {
      "type":"processor_split_key_value",
      "detail": {
        "SourceKey": "content",
        "Delimiter": " ",
        "Separator": ":",
        "Quote": "\""
      }
    }
  ]
}
```
  - Result
```
"class": "main",
"http_user_agent": "User Agent",
"no_separator_key_0": "Chinese",
"no_separator_key_1": "hello\t\"ilogtail\"\tworld",
```
- Example 3: Extract the value of a specified field in key-value pair mode.
  Extract the value of the content field in key-value pair mode. The delimiter that is used to separate key-value pairs is a tab character (\t). The delimiter that is used to separate the key and the value in a single key-value pair is a colon (:). The quote that is used is double quotation marks (""").
  - Raw log
```
"content": "class:main http_user_agent:\"\"\"User Agent\"\"\" \"\"\"Chinese\"\"\""
```
  - Logtail plug-in configuration for data processing
```
{
  "processors":[
    {
      "type":"processor_split_key_value",
      "detail": {
        "SourceKey": "content",
        "Delimiter": " ",
        "Separator": ":",
        "Quote": "\"\"\""
      }
    }
  ]
}
```
  - Result
```
"class": "main",
"http_user_agent": "User Agent",
"no_separator_key_0": "Chinese",
```

Grok mode

You can extract content from log fields by using Grok expressions.

Important

Logtail V1.2.0 and later support the processor_grok plug-in.

Form configuration

Parameters

Set Processor Type to Extract Field (Grok Mode). Then, configure other parameters based on the following table.

Parameter	Description
Original Field	The name of the original field.
Grok Expression Array	The array of Grok expressions. The processor_grok plug-in matches a log field based on the specified expressions in sequence and returns the content that is extracted based on the first match. For more information about the default expressions that are supported by processor_grok, see processor_grok. If the expressions that are provided on the linked page do not meet your business requirements, you can specify a custom Grok expression in Custom Grok Pattern. Note If you specify multiple Grok expressions, the processing performance may be affected. We recommend that you specify no more than five expressions.
Custom Grok Pattern	The custom Grok pattern, which consists of the rule name and Grok expression.
Custom Grok Pattern File Directory	The directory where the custom Grok pattern file is stored. The processor_grok plug-in reads all files in the directory. Important If you update the custom Grok pattern file, the update can take effect only after you restart Logtail.
Maximum Timeout	The timeout period to extract content from the original field by using a Grok expression. Unit: milliseconds. If you do not include this parameter in the configuration or set this parameter to 0, the extraction never times out.
Retain Logs that Fails to be Parsed	Specifies whether to retain the raw log if the raw log fails to be parsed.
Retain Original Field	Specifies whether to retain the original field in the new log that is obtained after parsing.
Report Original Field Missing Error	Specifies whether to report an error if the raw log does not contain the original field.
Report No Expressions Matched Error	Specifies whether to report an error if the value of the original field does not match any expression that is specified in Grok Expression Array.
Report Match Timeout Error	Specifies whether to report an error if the match times out.

Configuration example
Extract the value of the content field in Grok mode and specify field names year, month, and day for the value.
- Raw log
```
"content" : "2022 October 17"
```
- Logtail plug-in configuration for data processing
- Result
```
"year":"2022"
"month":"October"
"day":"17"
```

Editor configuration in JSON

Parameters

Set type to processor_grok. Then, configure other parameters in detail based on the following table.

Parameter	Type	Required	Description
CustomPatternDir	String array	No	The directory where the custom Grok pattern file is stored. The processor_grok plug-in reads all files in the directory. If you do not include this parameter in the configuration, the system does not import custom Grok pattern files. Important If you update the custom Grok pattern file, the update can take effect only after you restart Logtail.
CustomPatterns	Map	No	The custom Grok pattern. key specifies the rule name and value specifies the Grok expression. For more information about the default expressions that are supported by processor_grok, see processor_grok. If the expressions that are provided on the linked page do not meet your business requirements, you can specify a custom Grok expression in Match. If you do not include this parameter in the configuration, the system does not use custom Grok patterns.
SourceKey	String	No	The name of the original field. The default value is content.
Match	String array	Yes	The array of Grok expressions. The processor_grok plug-in matches a log field based on the specified expressions in sequence and returns the content that is extracted based on the first match. Note If you specify multiple Grok expressions, the processing performance may be affected. We recommend that you specify no more than five expressions.
TimeoutMilliSeconds	Long	No	The timeout period to extract content from the original field by using a Grok expression. Unit: milliseconds. If you do not include this parameter in the configuration or set this parameter to 0, the extraction never times out.
IgnoreParseFailure	Boolean	No	Specifies whether to ignore the raw log if the raw log fails to be parsed. Valid values: true (default): ignores the raw log if the raw log fails to be parsed. false: deletes the raw log if the raw log fails to be parsed.
KeepSource	Boolean	No	Specifies whether to retain the original field in the new log that is obtained after parsing. Valid values: true (default): retains the original field. false: discards the original field.
NoKeyError	Boolean	No	Specifies whether to report an error if the raw log does not contain the original field. Valid values: true false (default)
NoMatchError	Boolean	No	Specifies whether to report an error if the value of the original field does not match any expression that is specified in Match. Valid values: true (default) false
TimeoutError	Boolean	No	Specifies whether to report an error if the match times out. Valid values: true (default) false

Example 1

Extract the value of the content field in Grok mode and specify field names year, month, and day for the value.

Raw log
```
"content" : "2022 October 17"
```

Logtail plug-in configuration for data processing

{
   "type" : "processor_grok",
   "detail" : {
      "KeepSource" : false,
      "Match" : [
         "%{YEAR:year} %{MONTH:month} %{MONTHDAY:day}"
      ],
      "IgnoreParseFailure" : false
   }
}

Result

"year":"2022"
"month":"October"
"day":"17"

Example 2
Extract the value of the content field from multiple logs in Grok mode and parse the extracted values into different results based on different Grok expressions.
- Raw log
```
{
    "content" : "begin 123.456 end"
}
{
    "content" : "2019 June 24 \"I am iron man"\"
}
{
    "content" : "WRONG LOG"
}
{
    "content" : "10.0.0.0 GET /index.html 15824 0.043"
}
```
- Logtail plug-in configuration for data processing
```
{
        "type" : "processor_grok",
        "detail" : {
                "CustomPatterns" : {
                        "HTTP" : "%{IP:client} %{WORD:method} %{URIPATHPARAM:request} %{NUMBER:bytes} %{NUMBER:duration}"
                },
                "IgnoreParseFailure" : false,
                "KeepSource" : false,
                "Match" : [
                        "%{HTTP}",
                        "%{WORD:word1} %{NUMBER:request_time} %{WORD:word2}",
                        "%{YEAR:year} %{MONTH:month} %{MONTHDAY:day} %{QUOTEDSTRING:motto}"
                ],
                "SourceKey" : "content"
        },
}
```
- Result
  - In this example, the processor_grok plug-in matches the first log against the first expression %{HTTP} that is specified in Match, and the match fails. Then, the processor_grok plug-in matches the log against the second expression %{WORD:word1} %{NUMBER:request_time} %{WORD:word2}, and the match is successful. In this case, the content that is extracted based on the second expression is returned.
    In the result, the content field in the raw log is discarded because KeepSource is set to false.
  - In this example, the processor_grok plug-in matches the second log against the first expression %{HTTP} and the second expression %{WORD:word1} %{NUMBER:request_time} %{WORD:word2} that are specified in Match in sequence, and both matches fail. Then, the processor_grok plug-in matches the second log against the third expression %{YEAR:year} %{MONTH:month} %{MONTHDAY:day} %{QUOTEDSTRING:motto}, and the match is successful. In this case, the content that is extracted based on the third expression is returned.
  - In this example, the processor_grok plug-in matches the third log against the three expressions that are specified in Match in sequence, and all matches fail. The log is discarded because IgnoreParseFailure is set to false.
  - In this example, the processor_grok plug-in matches the fourth log against the first expression %{HTTP} that is specified in Match, and the match is successful. In this case, the content that is extracted based on the first expression is returned.
```
{
  "word1":"begin",
  "request_time":"123.456",
  "word2":"end",
}
{
  "year":"2019",
  "month":"June",
  "day":"24",
  "motto":"\"I am iron man"\",
}
{
  "client":"10.0.0.0",
  "method":"GET",
  "request":"/index.html",
  "bytes":"15824",
  "duration":"0.043",
}
```