If you use Logtail to collect logs, you can add Logtail plug-ins to extract content from log fields in regex, anchor, CSV, single-character delimiter, multi-character delimiter, key-value pair, and Grok modes. This topic describes the parameters of Logtail plug-ins and provides examples on how to configure the plug-ins.
Limits
The input plug-ins for text logs and container stdout and stderr support only form configuration. Other input plug-ins support only editor configuration in JSON.
Entry point
If you want to use a Logtail plug-in to process logs, you can add a Logtail plug-in configuration when you create or modify a Logtail configuration. For more information, see Overview of Logtail plug-ins for data processing.
Regex mode
You can extract content from log fields by using a regular expression.
Form configuration
Parameters
Set Processor Type to Extract Field (Regex Mode). Then, configure other parameters based on the following table.
Parameter
Description
Original Field
The name of the original field.
Regular Expression
The regular expression. You must enclose the field from which you want to extract content in parentheses
()
.New Field
The field name that you want to specify for the extracted content. You can specify multiple field names.
Report Original Field Missing Error
Specifies whether to report an error if the raw log does not contain the original field.
Report Regex Mismatch Error
Specifies whether to report an error if the value of the original field does not match the regular expression.
Retain Original Field
Specifies whether to retain the original field in the new log that is obtained after parsing.
Retain Original Field If Parsing Fails
Specifies whether to retain the original field in the new log that is obtained after the raw log fails to be parsed.
Full Regex Match
Specifies whether to extract the value of the original field in full match mode. If you select this option, the value of the original field is extracted only if all fields that are specified in New Field match the value of the original field based on the regular expression that is specified in Regular Expression.
Configuration example
Extract the value of the content field in regex mode and specify field names
ip
,time
, method, url, request_time, request_length, status, length, ref_url, and browser for the value.Raw log
"content" : "10.200.**.** - - [10/Aug/2022:14:57:51 +0800] \"POST /PutData? Category=YunOsAccountOpLog&AccessKeyId=<yourAccessKeyId>&Date=Fri%2C%2028%20Jun%202013%2006%3A53%3A30%20GMT&Topic=raw&Signature=<yourSignature> HTTP/1.1\" 0.024 18204 200 37 \"-\" \"aliyun-sdk-java"
Logtail plug-in configuration for data processing
Result
"ip" : "10.200.**.**" "time" : "10/Aug/2022:14:57:51" "method" : "POST" "url" : "/PutData?Category=YunOsAccountOpLog&AccessKeyId=<yourAccessKeyId>&Date=Fri%2C%2028%20Jun%202013%2006%3A53%3A30%20GMT&Topic=raw&Signature=<yourSignature>" "request_time" : "0.024" "request_length" : "18204" "status" : "200" "length" : "27" "ref_url" : "-" "browser" : "aliyun-sdk-java"
Editor configuration in JSON
Parameters
Set type to processor_regex. Then, configure other parameters in detail based on the following table.
Parameter
Type
Required
Description
SourceKey
String
Yes
The name of the original field.
Regex
String
Yes
The regular expression. You must enclose the field from which you want to extract content in parentheses
()
.Keys
String array
Yes
The field names that you want to specify for the extracted content. Example: ["ip", "time", "method"].
NoKeyError
Boolean
No
Specifies whether to report an error if the raw log does not contain the original field. Valid values:
true
false (default)
NoMatchError
Boolean
No
Specifies whether to report an error if the value of the original field does not match the regular expression. Valid values:
true (default)
false
KeepSource
Boolean
No
Specifies whether to retain the original field in the new log that is obtained after parsing. Valid values:
true
false (default)
FullMatch
Boolean
No
Specifies whether to extract the value of the original field in full match mode. Valid values:
true (default): The value of the original field is extracted only if all fields that are specified in Keys match the value of the original field based on the regular expression that is specified in Regex.
false: The value of the original field is extracted even if only some fields that are specified in Keys match the value of the original field based on the regular expression that is specified in Regex.
KeepSourceIfParseError
Boolean
No
Specifies whether to retain the original field in the new log that is obtained after the raw log fails to be parsed. Valid values:
true (default)
false
Configuration example
Extract the value of the content field in regex mode and specify field names ip, time, method, url, request_time, request_length, status, length, ref_url, and browser for the value.
Raw log
"content" : "10.200.**.** - - [10/Aug/2022:14:57:51 +0800] \"POST /PutData? Category=YunOsAccountOpLog&AccessKeyId=<yourAccessKeyId>&Date=Fri%2C%2028%20Jun%202013%2006%3A53%3A30%20GMT&Topic=raw&Signature=<yourSignature> HTTP/1.1\" 0.024 18204 200 37 \"-\" \"aliyun-sdk-java"
Logtail plug-in configuration for data processing
{ "type" : "processor_regex", "detail" : {"SourceKey" : "content", "Regex" : "([\\d\\.]+) \\S+ \\S+ \\[(\\S+) \\S+\\] \"(\\w+) ([^\\\"]*)\" ([\\d\\.]+) (\\d+) (\\d+) (\\d+|-) \"([^\\\"]*)\" \"([^\\\"]*)\" (\\d+)", "Keys" : ["ip", "time", "method", "url", "request_time", "request_length", "status", "length", "ref_url", "browser"], "NoKeyError" : true, "NoMatchError" : true, "KeepSource" : false } }
Result
"ip" : "10.200.**.**" "time" : "10/Aug/2022:14:57:51" "method" : "POST" "url" : "/PutData?Category=YunOsAccountOpLog&AccessKeyId=<yourAccessKeyId>&Date=Fri%2C%2028%20Jun%202013%2006%3A53%3A30%20GMT&Topic=raw&Signature=<yourSignature>" "request_time" : "0.024" "request_length" : "18204" "status" : "200" "length" : "27" "ref_url" : "-" "browser" : "aliyun-sdk-java"
Anchor mode
You can extract content from log fields by anchoring start and stop keywords. If you want to extract content from a JSON-formatted field, you can expand the field.
Form configuration
Parameters
Set Processor Type to Extract Field (Anchor Mode). Then, configure other parameters based on the following table.
Parameter
Description
Original Field
The name of the original field.
Anchor Parameters
The fields that are configured to anchor keywords.
Start Keyword
The keyword that specifies the start of anchoring. If you do not configure this parameter, the start of a string is matched.
End Keyword
The keyword that specifies the end of anchoring. If you do not configure this parameter, the end of a string is matched.
New Field
The field name that you want to specify for the extracted content.
Field Type
The type of the field. Valid values: string and json.
JSON Expansion
Specifies whether to expand the JSON-formatted field.
Character to Concatenate Expanded Keys
The character that is used to concatenate the expanded keys. The default value is an underscore (_).
Maximum Depth of JSON Expansion
The maximum depth of JSON expansion. Default value: 0, which indicates that the maximum depth is unlimited.
Report Original Field Missing Error
Specifies whether to report an error if the raw log does not contain the original field.
Report Keywords Missing Error
Specifies whether to report an error if no keywords are matched in the raw log.
Retain Original Field
Specifies whether to retain the original field in the new log that is obtained after parsing.
Configuration example
Extract the value of the content field in anchor mode and specify field names time, val_key1, val_key2, val_key3, value_key4_inner1, and value_key4_inner2 for the value.
Raw log
"content" : "time:2022.09.12 20:55:36\t json:{\"key1\" : \"xx\", \"key2\": false, \"key3\":123.456, \"key4\" : { \"inner1\" : 1, \"inner2\" : false}}"
Logtail plug-in configuration for data processing
Result
"time" : "2022.09.12 20:55:36" "val_key1" : "xx" "val_key2" : "false" "val_key3" : "123.456" "value_key4_inner1" : "1" "value_key4_inner2" : "false"
Editor configuration in JSON
Parameters
Set type to processor_anchor. Then, configure other parameters in detail based on the following table.
Parameter
Type
Required
Description
SourceKey
String
Yes
The name of the original field.
Anchors
Anchor array
Yes
The fields that are configured to anchor keywords.
Start
String
Yes
The keyword that specifies the start of anchoring. If you do not configure this parameter, the start of a string is matched.
Stop
String
Yes
The keyword that specifies the end of anchoring. If you do not configure this parameter, the end of a string is matched.
FieldName
String
Yes
The field name that you want to specify for the extracted content.
FieldType
String
Yes
The type of the field. Valid values: string and json.
ExpondJson
Boolean
No
Specifies whether to expand the JSON-formatted field. Valid values:
true
false (default)
This parameter takes effect only when you set FieldType to json.
ExpondConnecter
String
No
The character that is used to concatenate the expanded keys. The default value is an underscore (_).
MaxExpondDepth
Int
No
The maximum depth of JSON expansion. Default value: 0, which indicates that the maximum depth is unlimited.
NoAnchorError
Boolean
No
Specifies whether to report an error if no keywords are matched in the raw log. Valid values:
true
false (default)
NoKeyError
Boolean
No
Specifies whether to report an error if the raw log does not contain the original field. Valid values:
true
false (default)
Configuration example
Extract the value of the content field in anchor mode and specify field names time, val_key1, val_key2, val_key3, value_key4_inner1, and value_key4_inner2 for the value.
Raw log
"content" : "time:2022.09.12 20:55:36\t json:{\"key1\" : \"xx\", \"key2\": false, \"key3\":123.456, \"key4\" : { \"inner1\" : 1, \"inner2\" : false}}"
Logtail plug-in configuration for data processing
{ "type" : "processor_anchor", "detail" : {"SourceKey" : "content", "Anchors" : [ { "Start" : "time", "Stop" : "\t", "FieldName" : "time", "FieldType" : "string", "ExpondJson" : false }, { "Start" : "json:", "Stop" : "", "FieldName" : "val", "FieldType" : "json", "ExpondJson" : true } ] } }
Result
"time" : "2022.09.12 20:55:36" "val_key1" : "xx" "val_key2" : "false" "val_key3" : "123.456" "value_key4_inner1" : "1" "value_key4_inner2" : "false"
CSV mode
You can parse CSV-formatted logs and extract content from the fields in the logs in CSV mode.
Form configuration
Parameters
Set Processor Type to Extract Field (CSV Mode). Then, configure other parameters based on the following table.
Parameter
Description
Original Field
The name of the original field.
New Field
The field name that you want to specify for the extracted content. You can specify multiple field names.
ImportantIf the system does not find a match for a field that is specified in New Field from the value of the original field, the field is skipped.
Delimiter
The delimiter. The default value is a comma (,).
Retain Excess Part
Specifies whether to retain the content that remains in the value of the original field after the system finds a match for each field that is specified in New Field from the value. For ease of understanding, the content is referred to as the excess part in this topic.
Parse Excess Part
Specifies whether to parse the excess part. If you select the option, you can configure Name Prefix of Field to which Excess Part is Assigned to specify a prefix for the names of fields to which the excess part is assigned.
If you select Retain Excess Part but do not select Parse Excess Part, the excess part is stored in the _decode_preserve_ field.
NoteIf the excess part contains invalid data, you must standardize the data in the CSV format and then store the data.
Name Prefix of Field to which Excess Part is Assigned
The prefix for the names of fields to which the excess part is assigned. For example, if you set this parameter to expand_, the fields are named expand_1 and expand_2.
Ignore Spaces before Field
Specifies whether to skip the spaces at the beginning of the value of the original field.
Retain Original Field
Specifies whether to retain the original field in the new log that is obtained after parsing.
Report Original Field Missing Error
Specifies whether to report an error if the raw log does not contain the original field.
Configuration example
Extract the value of the csv field.
Raw log
{ "csv": "2022-06-09,192.0.2.0,\"{\"\"key1\"\":\"\"value\"\",\"\"key2\"\":{\"\"key3\"\":\"\"string\"\"}}\"", ...... }
Logtail plug-in configuration for data processing
Result
{ "date": "2022-06-09", "ip": "192.0.2.0", "content": "{\"key1\":\"value\",\"key2\":{\"key3\":\"string\"}}" ...... }
Editor configuration in JSON
Parameters
Set type to processor_csv. Then, configure other parameters in detail based on the following table.
Parameter
Type
Required
Description
SourceKey
String
Yes
The name of the original field.
SplitKeys
String array
Yes
The field names that you want to specify for the extracted content. Example: ["date", "ip", "content"].
ImportantIf the system does not find a match for a field that is specified in SplitKeys from the value of the original field, the field is skipped.
PreserveOthers
Boolean
No
Specifies whether to retain the excess part after the system finds a match for each field that is specified in SplitKeys from the value. Valid values:
true
false (default)
ExpandOthers
Boolean
No
Specifies whether to parse the excess part. Valid values:
true.
You can set ExpandOthers to true to parse the excess part and configure ExpandKeyPrefix to specify a prefix for the names of fields to which the excess part is assigned.
false (default).
If you set PreserveOthers to true and ExpandOthers to false, the excess part is stored in the _decode_preserve_ field.
NoteIf the excess part contains invalid data, you must standardize the data in the CSV format and then store the data.
ExpandKeyPrefix
String
No
The prefix for the names of fields to which the excess part is assigned. For example, if you set this parameter to expand_, the fields are named expand_1 and expand_2.
TrimLeadingSpace
Boolean
No
Specifies whether to skip the spaces at the beginning of the value of the original field. Valid values:
true
false (default)
SplitSep
String
No
The delimiter. The default value is a comma (,).
KeepSource
Boolean
No
Specifies whether to retain the original field in the new log that is obtained after parsing. Valid values:
true
false (default)
NoKeyError
Boolean
No
Specifies whether to report an error if the raw log does not contain the original field. Valid values:
true
false (default)
Configuration example
Extract the value of the csv field.
Raw log
{ "csv": "2022-06-09,192.0.2.0,\"{\"\"key1\"\":\"\"value\"\",\"\"key2\"\":{\"\"key3\"\":\"\"string\"\"}}\"", ...... }
Logtail plug-in configuration for data processing
{ ...... "type":"processor_csv", "detail":{ "SourceKey":"csv", "SplitKeys":["date", "ip", "content"], } ...... }
Result
{ "date": "2022-06-09", "ip": "192.0.2.0", "content": "{\"key1\":\"value\",\"key2\":{\"key3\":\"string\"}}" ...... }
Single-character delimiter mode
You can extract content from log fields by using a single-character delimiter. You can use a quote to enclose the delimiter.
Form configuration
Parameters
Set Processor Type to Extract Field (Single-character Delimiter Mode). Then, configure other parameters based on the following table.
Parameter
Description
Original Field
The name of the original field.
Delimiter
The delimiter. The delimiter must be a single character. You can specify a non-printable character as the single-character delimiter. Example: \u0001.
New Field
The field name that you want to specify for the extracted content.
Use Quote
Specifies whether to use a quote to enclose the specified delimiter.
Quote
The quote. The quote must be a single character. You can specify a non-printable character as the quote. Example: \u0001.
Report Original Field Missing Error
Specifies whether to report an error if the raw log does not contain the original field.
Report Delimiter Mismatch Error
Specifies whether to report an error if the delimiter in the raw log does not match the specified delimiter.
Retain Original Field
Specifies whether to retain the original field in the new log that is obtained after parsing.
Configuration example
Extract the value of the content field by using a vertical bar (|) as the delimiter and specify field names ip, time, method, url, request_time, request_length, status, length, ref_url, and browser for the value.
Raw log
"content" : "10.**.**.**|10/Aug/2022:14:57:51 +0800|POST|PutData? Category=YunOsAccountOpLog&AccessKeyId=<yourAccessKeyId>&Date=Fri%2C%2028%20Jun%202013%2006%3A53%3A30%20GMT&Topic=raw&Signature=<yourSignature>|0.024|18204|200|37|-| aliyun-sdk-java"
Logtail plug-in configuration for data processing
Result
"ip" : "10.**.**.**" "time" : "10/Aug/2022:14:57:51 +0800" "method" : "POST" "url" : "/PutData?Category=YunOsAccountOpLog&AccessKeyId=<yourAccessKeyId>&Date=Fri%2C%2028%20Jun%202013%2006%3A53%3A30%20GMT&Topic=raw&Signature=<yourSignature>" "request_time" : "0.024" "request_length" : "18204" "status" : "200" "length" : "27" "ref_url" : "-" "browser" : "aliyun-sdk-java"
Editor configuration in JSON
Parameters
Set type to processor_split_char. Then, configure other parameters in detail based on the following table.
Parameter
Type
Required
Description
SourceKey
String
Yes
The name of the original field.
SplitSep
String
Yes
The delimiter. The delimiter must be a single character. You can specify a non-printable character as the single-character delimiter. Example: \u0001.
SplitKeys
String array
Yes
The field names that you want to specify for the extracted content. Example: ["ip", "time", "method"].
PreserveOthers
Boolean
No
Specifies whether to retain the excess part after the system finds a match for each field that is specified in SplitKeys from the value. Valid values:
true
false (default)
QuoteFlag
Boolean
No
Specifies whether to use a quote to enclose the specified delimiter. Valid values:
true
false (default)
Quote
String
No
The quote. The quote must be a single character. You can specify a non-printable character as the quote. Example: \u0001.
This parameter takes effect only when QuoteFlag is set to true.
NoKeyError
Boolean
No
Specifies whether to report an error if the raw log does not contain the original field. Valid values:
true
false (default)
NoMatchError
Boolean
No
Specifies whether to report an error if the delimiter in the raw log does not match the specified delimiter. Valid values:
true
false (default)
KeepSource
Boolean
No
Specifies whether to retain the original field in the new log that is obtained after parsing. Valid values:
true
false (default)
Configuration example
Extract the value of the content field by using a vertical bar (|) as the delimiter and specify field names ip, time, method, url, request_time, request_length, status, length, ref_url, and browser for the value.
Raw log
"content" : "10.**.**.**|10/Aug/2022:14:57:51 +0800|POST|PutData? Category=YunOsAccountOpLog&AccessKeyId=<yourAccessKeyId>&Date=Fri%2C%2028%20Jun%202013%2006%3A53%3A30%20GMT&Topic=raw&Signature=<yourSignature>|0.024|18204|200|37|-| aliyun-sdk-java"
Logtail plug-in configuration for data processing
{ "type" : "processor_split_char", "detail" : {"SourceKey" : "content", "SplitSep" : "|", "SplitKeys" : ["ip", "time", "method", "url", "request_time", "request_length", "status", "length", "ref_url", "browser"] } }
Result
"ip" : "10.**.**.**" "time" : "10/Aug/2022:14:57:51 +0800" "method" : "POST" "url" : "/PutData?Category=YunOsAccountOpLog&AccessKeyId=<yourAccessKeyId>&Date=Fri%2C%2028%20Jun%202013%2006%3A53%3A30%20GMT&Topic=raw&Signature=<yourSignature>" "request_time" : "0.024" "request_length" : "18204" "status" : "200" "length" : "27" "ref_url" : "-" "browser" : "aliyun-sdk-java"
Multi-character delimiter mode
You can extract content from log fields by using a multi-character delimiter. You cannot use a quote to enclose the delimiter.
Form configuration
Parameters
Set Processor Type to Extract Field (Multi-character Delimiter Mode). Then, configure other parameters based on the following table.
Parameter
Description
Original Field
The name of the original field.
Delimiter String
The delimiter. You can specify a non-printable character as the multi-character delimiter. Example: \u0001\u0002.
New Field
The field name that you want to specify for the extracted content.
ImportantIf the system does not find a match for a field that is specified in New Field from the value of the original field, the field is skipped.
Report Original Field Missing Error
Specifies whether to report an error if the raw log does not contain the original field.
Report Delimiter Mismatch Error
Specifies whether to report an error if the delimiter in the raw log does not match the specified delimiter.
Retain Original Field
Specifies whether to retain the original field in the new log that is obtained after parsing.
Retain Excess Part
Specifies whether to retain the excess part after the system finds a match for each field that is specified in New Field from the value.
Parse Excess Part
Specifies whether to parse the excess part. If you select this option, you can configure Name Prefix of Field to which Excess Part is Assigned to specify a prefix for the names of fields to which the excess part is assigned.
Name Prefix of Field to which Excess Part is Assigned
The prefix for the names of fields to which the excess part is assigned. For example, if you set this parameter to expand_, the fields are named expand_1 and expand_2.
Configuration example
Extract the value of the content field by using the delimiter |#| and specify field names ip, time, method, url, request_time, request_length, status, expand_1, expand_2, and expand_3 for the value.
Raw log
"content" : "10.**.**.**|#|10/Aug/2022:14:57:51 +0800|#|POST|#|PutData? Category=YunOsAccountOpLog&AccessKeyId=<yourAccessKeyId>&Date=Fri%2C%2028%20Jun%202013%2006%3A53%3A30%20GMT&Topic=raw&Signature=<yourSignature>|#|0.024|#|18204|#|200|#|27|#|-|#| aliyun-sdk-java"
Logtail plug-in configuration for data processing
Result
"ip" : "10.**.**.**" "time" : "10/Aug/2022:14:57:51 +0800" "method" : "POST" "url" : "/PutData?Category=YunOsAccountOpLog&AccessKeyId=<yourAccessKeyId>&Date=Fri%2C%2028%20Jun%202013%2006%3A53%3A30%20GMT&Topic=raw&Signature=<yourSignature>" "request_time" : "0.024" "request_length" : "18204" "status" : "200" "expand_1" : "27" "expand_2" : "-" "expand_3" : "aliyun-sdk-java"
Editor configuration in JSON
Parameters
Set type to processor_split_string. Then, configure other parameters in detail based on the following table.
Parameter
Type
Required
Description
SourceKey
String
Yes
The name of the original field.
SplitSep
String
Yes
The delimiter. You can specify a non-printable character as the multi-character delimiter. Example: \u0001\u0002.
SplitKeys
String array
Yes
The field names that you want to specify for the extracted content. Example: ["key1","key2"].
NoteIf the system does not find a match for a field that is specified in SplitKeys from the value of the original field, the field is skipped.
PreserveOthers
Boolean
No
Specifies whether to retain the excess part after the system finds a match for each field that is specified in SplitKeys from the value. Valid values:
true
false (default)
ExpandOthers
Boolean
No
Specifies whether to parse the excess part. Valid values:
true.
You can set ExpandOthers to true to parse the excess part and configure ExpandKeyPrefix to specify a prefix for the names of fields to which the excess part is assigned.
false (default).
ExpandKeyPrefix
String
No
The prefix for the names of fields to which the excess part is assigned. For example, if you set this parameter to expand_, the fields are named expand_1 and expand_2.
NoKeyError
Boolean
No
Specifies whether to report an error if the raw log does not contain the original field. Valid values:
true
false (default)
NoMatchError
Boolean
No
Specifies whether to report an error if the delimiter in the raw log does not match the specified delimiter. Valid values:
true
false (default)
KeepSource
Boolean
No
Specifies whether to retain the original field in the new log that is obtained after parsing. Valid values:
true
false (default)
Configuration example
Extract the value of the content field by using the delimiter |#| and specify field names ip, time, method, url, request_time, request_length, status, expand_1, expand_2, and expand_3 for the value.
Raw log
"content" : "10.**.**.**|#|10/Aug/2022:14:57:51 +0800|#|POST|#|PutData? Category=YunOsAccountOpLog&AccessKeyId=<yourAccessKeyId>&Date=Fri%2C%2028%20Jun%202013%2006%3A53%3A30%20GMT&Topic=raw&Signature=<yourSignature>|#|0.024|#|18204|#|200|#|27|#|-|#| aliyun-sdk-java"
Logtail plug-in configuration for data processing
{ "type" : "processor_split_string", "detail" : {"SourceKey" : "content", "SplitSep" : "|#|", "SplitKeys" : ["ip", "time", "method", "url", "request_time", "request_length", "status"], "PreserveOthers" : true, "ExpandOthers" : true, "ExpandKeyPrefix" : "expand_" } }
Result
"ip" : "10.**.**.**" "time" : "10/Aug/2022:14:57:51 +0800" "method" : "POST" "url" : "/PutData?Category=YunOsAccountOpLog&AccessKeyId=<yourAccessKeyId>&Date=Fri%2C%2028%20Jun%202013%2006%3A53%3A30%20GMT&Topic=raw&Signature=<yourSignature>" "request_time" : "0.024" "request_length" : "18204" "status" : "200" "expand_1" : "27" "expand_2" : "-" "expand_3" : "aliyun-sdk-java"
Key-value pair mode
You can extract content from log fields by splitting key-value pairs.
Logtail V0.16.26 and later support the processor_split_key_value plug-in.
Form configuration
Parameters
Set Processor Type to Extract Field (Key-value Pair Mode). Then, configure other parameters based on the following table.
Parameter
Description
Original Field
The name of the original field.
Key-value Pair Delimiter
The delimiter that is used to separate key-value pairs. The default value is a tab character (
\t
).Key and Value Delimiter
The delimiter that is used to separate the key and the value in a single key-value pair. The default value is a colon (:).
Retain Original Field
Specifies whether to retain the original field.
Report Original Field Missing Error
Specifies whether to report an error if the raw log does not contain the original field.
Drop Key-value Pairs That Fail to Match Delimiter
Specifies whether to discard a key-value pair if the delimiter in the raw log does not match the specified delimiter.
Report Key and Value Delimiter Missing Error
Specifies whether to report an error if the raw log does not contain the specified delimiter.
Report Empty Key Error
Specifies whether to report an error if the key is empty after delimiting.
Quote
The quote. If a key value is enclosed in the specified quote, the key value in the quote is extracted. You can specify multiple characters as a quote.
ImportantIf a key value that is enclosed in the specified quote contains a backslash (\) and the backlash (\) is adjacent to the quote, the backlash (\) is extracted as a part of the key value.
Configuration example
Example 1: Extract the value of a specified field in key-value pair mode.
Extract the value of the content field in key-value pair mode. The delimiter that is used to separate key-value pairs is a tab character (
\t
). The delimiter that is used to separate the key and the value in a single key-value pair is a colon (:).Raw log
"content": "class:main\tuserid:123456\tmethod:get\tmessage:\"wrong user\""
Logtail plug-in configuration for data processing
Result
"content": "class:main\tuserid:123456\tmethod:get\tmessage:\"wrong user\"" "class": "main" "userid": "123456" "method": "get" "message": "\"wrong user\""
Example 2: Extract the value of a specified field in key-value pair mode when a quote is used.
Extract the value of the content field in key-value pair mode. The delimiter that is used to separate key-value pairs is a tab character (
\t
). The delimiter that is used to separate the key and the value in a single key-value pair is a colon (:). The quote that is used is a double quotation mark (").Raw log
"content": "class:main http_user_agent:\"User Agent\" \"Chinese\" \"hello\\t\\\"ilogtail\\\"\\tworld\""
Logtail plug-in configuration for data processing
Result
"class": "main", "http_user_agent": "User Agent", "no_separator_key_0": "Chinese", "no_separator_key_1": "hello\t\"ilogtail\"\tworld",
Example 3: Extract the value of a specified field in key-value pair mode when a multi-character quote is used.
Extract the value of the content field in key-value pair mode. The delimiter that is used to separate key-value pairs is a tab character (
\t
). The delimiter that is used to separate the key and the value in a single key-value pair is a colon (:). The quote that is used is double quotation marks ("").Raw log
"content": "class:main http_user_agent:\"\"\"User Agent\"\"\" \"\"\"Chinese\"\"\""
Logtail plug-in configuration for data processing
Result
"class": "main", "http_user_agent": "User Agent", "no_separator_key_0": "Chinese",
Editor configuration in JSON
Parameters
Set type to processor_split_key_value. Then, configure other parameters in detail based on the following table.
Parameter
Type
Required
Description
SourceKey
string
Yes
The name of the original field.
Delimiter
string
No
The delimiter that is used to separate key-value pairs. The default value is a tab character (
\t
).Separator
string
No
The delimiter that is used to separate the key and the value in a single key-value pair. The default value is a colon (:).
KeepSource
Boolean
No
Specifies whether to retain the original field in the new log that is obtained after parsing. Valid values:
true
false (default)
ErrIfSourceKeyNotFound
Boolean
No
Specifies whether to report an error if the raw log does not contain the original field. Valid values:
true (default)
false
DiscardWhenSeparatorNotFound
Boolean
No
Specifies whether to discard a key-value pair if the delimiter in the raw log does not match the specified delimiter. Valid values:
true
false (default)
ErrIfSeparatorNotFound
Boolean
No
Specifies whether to report an error if the raw log does not contain the specified delimiter. Valid values:
true (default)
false
ErrIfKeyIsEmpty
Boolean
No
Specifies whether to report an error if the key is empty after delimiting. Valid values:
true (default)
false
Quote
String
No
The quote. If a key value is enclosed in the specified quote, the key value in the quote is extracted. You can specify multiple characters as a quote. By default, the quote feature is disabled.
ImportantIf you specify double quotation marks ("") as the quote, you must add a backslash (\) as the escape character to each pair of double quotation mark ("").
If a key value that is enclosed in the specified quote contains a backslash (\) and the backlash (\) is adjacent to the quote, the backlash (\) is extracted as a part of the key value.
Configuration example
Example 1: Extract the value of a specified field in key-value pair mode.
Extract the value of the content field in key-value pair mode. The delimiter that is used to separate key-value pairs is a tab character (
\t
). The delimiter that is used to separate the key and the value in a single key-value pair is a colon (:).Raw log
"content": "class:main\tuserid:123456\tmethod:get\tmessage:\"wrong user\""
Logtail plug-in configuration for data processing
{ "processors":[ { "type":"processor_split_key_value", "detail": { "SourceKey": "content", "Delimiter": "\t", "Separator": ":", "KeepSource": true } } ] }
Result
"content": "class:main\tuserid:123456\tmethod:get\tmessage:\"wrong user\"" "class": "main" "userid": "123456" "method": "get" "message": "\"wrong user\""
Example 2: Extract the value of a specified field in key-value pair mode.
Extract the value of the content field in key-value pair mode. The delimiter that is used to separate key-value pairs is a tab character (
\t
). The delimiter that is used to separate the key and the value in a single key-value pair is a colon (:). The quote that is used is a double quotation mark (").Raw log
"content": "class:main http_user_agent:\"User Agent\" \"Chinese\" \"hello\\t\\\"ilogtail\\\"\\tworld\""
Logtail plug-in configuration for data processing
{ "processors":[ { "type":"processor_split_key_value", "detail": { "SourceKey": "content", "Delimiter": " ", "Separator": ":", "Quote": "\"" } } ] }
Result
"class": "main", "http_user_agent": "User Agent", "no_separator_key_0": "Chinese", "no_separator_key_1": "hello\t\"ilogtail\"\tworld",
Example 3: Extract the value of a specified field in key-value pair mode.
Extract the value of the content field in key-value pair mode. The delimiter that is used to separate key-value pairs is a tab character (
\t
). The delimiter that is used to separate the key and the value in a single key-value pair is a colon (:). The quote that is used is double quotation marks (""").Raw log
"content": "class:main http_user_agent:\"\"\"User Agent\"\"\" \"\"\"Chinese\"\"\""
Logtail plug-in configuration for data processing
{ "processors":[ { "type":"processor_split_key_value", "detail": { "SourceKey": "content", "Delimiter": " ", "Separator": ":", "Quote": "\"\"\"" } } ] }
Result
"class": "main", "http_user_agent": "User Agent", "no_separator_key_0": "Chinese",
Grok mode
You can extract content from log fields by using Grok expressions.
Logtail V1.2.0 and later support the processor_grok plug-in.
Form configuration
Parameters
Set Processor Type to Extract Field (Grok Mode). Then, configure other parameters based on the following table.
Parameter
Description
Original Field
The name of the original field.
Grok Expression Array
The array of Grok expressions. The processor_grok plug-in matches a log field based on the specified expressions in sequence and returns the content that is extracted based on the first match.
For more information about the default expressions that are supported by processor_grok, see processor_grok. If the expressions that are provided on the linked page do not meet your business requirements, you can specify a custom Grok expression in Custom Grok Pattern.
NoteIf you specify multiple Grok expressions, the processing performance may be affected. We recommend that you specify no more than five expressions.
Custom Grok Pattern
The custom Grok pattern, which consists of the rule name and Grok expression.
Custom Grok Pattern File Directory
The directory where the custom Grok pattern file is stored. The processor_grok plug-in reads all files in the directory.
ImportantIf you update the custom Grok pattern file, the update can take effect only after you restart Logtail.
Maximum Timeout
The timeout period to extract content from the original field by using a Grok expression. Unit: milliseconds. If you do not include this parameter in the configuration or set this parameter to 0, the extraction never times out.
Retain Logs that Fails to be Parsed
Specifies whether to retain the raw log if the raw log fails to be parsed.
Retain Original Field
Specifies whether to retain the original field in the new log that is obtained after parsing.
Report Original Field Missing Error
Specifies whether to report an error if the raw log does not contain the original field.
Report No Expressions Matched Error
Specifies whether to report an error if the value of the original field does not match any expression that is specified in Grok Expression Array.
Report Match Timeout Error
Specifies whether to report an error if the match times out.
Configuration example
Extract the value of the content field in Grok mode and specify field names year, month, and day for the value.
Raw log
"content" : "2022 October 17"
Logtail plug-in configuration for data processing
Result
"year":"2022" "month":"October" "day":"17"
Editor configuration in JSON
Parameters
Set type to processor_grok. Then, configure other parameters in detail based on the following table.
Parameter
Type
Required
Description
CustomPatternDir
String array
No
The directory where the custom Grok pattern file is stored. The processor_grok plug-in reads all files in the directory.
If you do not include this parameter in the configuration, the system does not import custom Grok pattern files.
ImportantIf you update the custom Grok pattern file, the update can take effect only after you restart Logtail.
CustomPatterns
Map
No
The custom Grok pattern. key specifies the rule name and value specifies the Grok expression.
For more information about the default expressions that are supported by processor_grok, see processor_grok. If the expressions that are provided on the linked page do not meet your business requirements, you can specify a custom Grok expression in Match.
If you do not include this parameter in the configuration, the system does not use custom Grok patterns.
SourceKey
String
No
The name of the original field. The default value is content.
Match
String array
Yes
The array of Grok expressions. The processor_grok plug-in matches a log field based on the specified expressions in sequence and returns the content that is extracted based on the first match.
NoteIf you specify multiple Grok expressions, the processing performance may be affected. We recommend that you specify no more than five expressions.
TimeoutMilliSeconds
Long
No
The timeout period to extract content from the original field by using a Grok expression. Unit: milliseconds.
If you do not include this parameter in the configuration or set this parameter to 0, the extraction never times out.
IgnoreParseFailure
Boolean
No
Specifies whether to ignore the raw log if the raw log fails to be parsed. Valid values:
true (default): ignores the raw log if the raw log fails to be parsed.
false: deletes the raw log if the raw log fails to be parsed.
KeepSource
Boolean
No
Specifies whether to retain the original field in the new log that is obtained after parsing. Valid values:
true (default): retains the original field.
false: discards the original field.
NoKeyError
Boolean
No
Specifies whether to report an error if the raw log does not contain the original field. Valid values:
true
false (default)
NoMatchError
Boolean
No
Specifies whether to report an error if the value of the original field does not match any expression that is specified in Match. Valid values:
true (default)
false
TimeoutError
Boolean
No
Specifies whether to report an error if the match times out. Valid values:
true (default)
false
Example 1
Extract the value of the content field in Grok mode and specify field names year, month, and day for the value.
Raw log
"content" : "2022 October 17"
Logtail plug-in configuration for data processing
{ "type" : "processor_grok", "detail" : { "KeepSource" : false, "Match" : [ "%{YEAR:year} %{MONTH:month} %{MONTHDAY:day}" ], "IgnoreParseFailure" : false } }
Result
"year":"2022" "month":"October" "day":"17"
Example 2
Extract the value of the content field from multiple logs in Grok mode and parse the extracted values into different results based on different Grok expressions.
Raw log
{ "content" : "begin 123.456 end" } { "content" : "2019 June 24 \"I am iron man"\" } { "content" : "WRONG LOG" } { "content" : "10.0.0.0 GET /index.html 15824 0.043" }
Logtail plug-in configuration for data processing
{ "type" : "processor_grok", "detail" : { "CustomPatterns" : { "HTTP" : "%{IP:client} %{WORD:method} %{URIPATHPARAM:request} %{NUMBER:bytes} %{NUMBER:duration}" }, "IgnoreParseFailure" : false, "KeepSource" : false, "Match" : [ "%{HTTP}", "%{WORD:word1} %{NUMBER:request_time} %{WORD:word2}", "%{YEAR:year} %{MONTH:month} %{MONTHDAY:day} %{QUOTEDSTRING:motto}" ], "SourceKey" : "content" }, }
Result
In this example, the processor_grok plug-in matches the first log against the first expression
%{HTTP}
that is specified in Match, and the match fails. Then, the processor_grok plug-in matches the log against the second expression%{WORD:word1} %{NUMBER:request_time} %{WORD:word2}
, and the match is successful. In this case, the content that is extracted based on the second expression is returned.In the result, the content field in the raw log is discarded because KeepSource is set to false.
In this example, the processor_grok plug-in matches the second log against the first expression
%{HTTP}
and the second expression%{WORD:word1} %{NUMBER:request_time} %{WORD:word2}
that are specified in Match in sequence, and both matches fail. Then, the processor_grok plug-in matches the second log against the third expression%{YEAR:year} %{MONTH:month} %{MONTHDAY:day} %{QUOTEDSTRING:motto}
, and the match is successful. In this case, the content that is extracted based on the third expression is returned.In this example, the processor_grok plug-in matches the third log against the three expressions that are specified in Match in sequence, and all matches fail. The log is discarded because IgnoreParseFailure is set to false.
In this example, the processor_grok plug-in matches the fourth log against the first expression
%{HTTP}
that is specified in Match, and the match is successful. In this case, the content that is extracted based on the first expression is returned.
{ "word1":"begin", "request_time":"123.456", "word2":"end", } { "year":"2019", "month":"June", "day":"24", "motto":"\"I am iron man"\", } { "client":"10.0.0.0", "method":"GET", "request":"/index.html", "bytes":"15824", "duration":"0.043", }