All Products
Search
Document Center

Simple Log Service:Extract content from log fields

Last Updated:Aug 28, 2024

If you use Logtail to collect logs, you can add Logtail plug-ins to extract content from log fields in regex, anchor, CSV, single-character delimiter, multi-character delimiter, key-value pair, and Grok modes. This topic describes the parameters of Logtail plug-ins and provides examples on how to configure the plug-ins.

Limits

The input plug-ins for text logs and container stdout and stderr support only form configuration. Other input plug-ins support only editor configuration in JSON.

Entry point

If you want to use a Logtail plug-in to process logs, you can add a Logtail plug-in configuration when you create or modify a Logtail configuration. For more information, see Overview of Logtail plug-ins for data processing.

Regex mode

You can extract content from log fields by using a regular expression.

Form configuration

  • Parameters

    Set Processor Type to Extract Field (Regex Mode). Then, configure other parameters based on the following table.

    Parameter

    Description

    Original Field

    The name of the original field.

    Regular Expression

    The regular expression. You must enclose the field from which you want to extract content in parentheses ().

    New Field

    The field name that you want to specify for the extracted content. You can specify multiple field names.

    Report Original Field Missing Error

    Specifies whether to report an error if the raw log does not contain the original field.

    Report Regex Mismatch Error

    Specifies whether to report an error if the value of the original field does not match the regular expression.

    Retain Original Field

    Specifies whether to retain the original field in the new log that is obtained after parsing.

    Retain Original Field If Parsing Fails

    Specifies whether to retain the original field in the new log that is obtained after the raw log fails to be parsed.

    Full Regex Match

    Specifies whether to extract the value of the original field in full match mode. If you select this option, the value of the original field is extracted only if all fields that are specified in New Field match the value of the original field based on the regular expression that is specified in Regular Expression.

  • Configuration example

    Extract the value of the content field in regex mode and specify field names ip, time, method, url, request_time, request_length, status, length, ref_url, and browser for the value.

    • Raw log

      "content" : "10.200.**.** - - [10/Aug/2022:14:57:51 +0800] \"POST /PutData?
      Category=YunOsAccountOpLog&AccessKeyId=<yourAccessKeyId>&Date=Fri%2C%2028%20Jun%202013%2006%3A53%3A30%20GMT&Topic=raw&Signature=<yourSignature> HTTP/1.1\" 0.024 18204 200 37 \"-\" \"aliyun-sdk-java"
    • Logtail plug-in configuration for data processing提取字段(正则模式)

    • Result

      "ip" : "10.200.**.**"
      "time" : "10/Aug/2022:14:57:51"
      "method" : "POST"
      "url" : "/PutData?Category=YunOsAccountOpLog&AccessKeyId=<yourAccessKeyId>&Date=Fri%2C%2028%20Jun%202013%2006%3A53%3A30%20GMT&Topic=raw&Signature=<yourSignature>"
      "request_time" : "0.024"
      "request_length" : "18204"
      "status" : "200"
      "length" : "27"
      "ref_url" : "-"
      "browser" : "aliyun-sdk-java"

Editor configuration in JSON

  • Parameters

    Set type to processor_regex. Then, configure other parameters in detail based on the following table.

    Parameter

    Type

    Required

    Description

    SourceKey

    String

    Yes

    The name of the original field.

    Regex

    String

    Yes

    The regular expression. You must enclose the field from which you want to extract content in parentheses ().

    Keys

    String array

    Yes

    The field names that you want to specify for the extracted content. Example: ["ip", "time", "method"].

    NoKeyError

    Boolean

    No

    Specifies whether to report an error if the raw log does not contain the original field. Valid values:

    • true

    • false (default)

    NoMatchError

    Boolean

    No

    Specifies whether to report an error if the value of the original field does not match the regular expression. Valid values:

    • true (default)

    • false

    KeepSource

    Boolean

    No

    Specifies whether to retain the original field in the new log that is obtained after parsing. Valid values:

    • true

    • false (default)

    FullMatch

    Boolean

    No

    Specifies whether to extract the value of the original field in full match mode. Valid values:

    • true (default): The value of the original field is extracted only if all fields that are specified in Keys match the value of the original field based on the regular expression that is specified in Regex.

    • false: The value of the original field is extracted even if only some fields that are specified in Keys match the value of the original field based on the regular expression that is specified in Regex.

    KeepSourceIfParseError

    Boolean

    No

    Specifies whether to retain the original field in the new log that is obtained after the raw log fails to be parsed. Valid values:

    • true (default)

    • false

  • Configuration example

    Extract the value of the content field in regex mode and specify field names ip, time, method, url, request_time, request_length, status, length, ref_url, and browser for the value.

    • Raw log

      "content" : "10.200.**.** - - [10/Aug/2022:14:57:51 +0800] \"POST /PutData?
      Category=YunOsAccountOpLog&AccessKeyId=<yourAccessKeyId>&Date=Fri%2C%2028%20Jun%202013%2006%3A53%3A30%20GMT&Topic=raw&Signature=<yourSignature> HTTP/1.1\" 0.024 18204 200 37 \"-\" \"aliyun-sdk-java"
    • Logtail plug-in configuration for data processing

      {
          "type" : "processor_regex",
          "detail" : {"SourceKey" : "content",
               "Regex" : "([\\d\\.]+) \\S+ \\S+ \\[(\\S+) \\S+\\] \"(\\w+) ([^\\\"]*)\" ([\\d\\.]+) (\\d+) (\\d+) (\\d+|-) \"([^\\\"]*)\" \"([^\\\"]*)\" (\\d+)",
               "Keys"   : ["ip", "time", "method", "url", "request_time", "request_length", "status", "length", "ref_url", "browser"],
               "NoKeyError" : true,
               "NoMatchError" : true,
               "KeepSource" : false
          }
      }
    • Result

      "ip" : "10.200.**.**"
      "time" : "10/Aug/2022:14:57:51"
      "method" : "POST"
      "url" : "/PutData?Category=YunOsAccountOpLog&AccessKeyId=<yourAccessKeyId>&Date=Fri%2C%2028%20Jun%202013%2006%3A53%3A30%20GMT&Topic=raw&Signature=<yourSignature>"
      "request_time" : "0.024"
      "request_length" : "18204"
      "status" : "200"
      "length" : "27"
      "ref_url" : "-"
      "browser" : "aliyun-sdk-java"

Anchor mode

You can extract content from log fields by anchoring start and stop keywords. If you want to extract content from a JSON-formatted field, you can expand the field.

Form configuration

  • Parameters

    Set Processor Type to Extract Field (Anchor Mode). Then, configure other parameters based on the following table.

    Parameter

    Description

    Original Field

    The name of the original field.

    Anchor Parameters

    The fields that are configured to anchor keywords.

    Start Keyword

    The keyword that specifies the start of anchoring. If you do not configure this parameter, the start of a string is matched.

    End Keyword

    The keyword that specifies the end of anchoring. If you do not configure this parameter, the end of a string is matched.

    New Field

    The field name that you want to specify for the extracted content.

    Field Type

    The type of the field. Valid values: string and json.

    JSON Expansion

    Specifies whether to expand the JSON-formatted field.

    Character to Concatenate Expanded Keys

    The character that is used to concatenate the expanded keys. The default value is an underscore (_).

    Maximum Depth of JSON Expansion

    The maximum depth of JSON expansion. Default value: 0, which indicates that the maximum depth is unlimited.

    Report Original Field Missing Error

    Specifies whether to report an error if the raw log does not contain the original field.

    Report Keywords Missing Error

    Specifies whether to report an error if no keywords are matched in the raw log.

    Retain Original Field

    Specifies whether to retain the original field in the new log that is obtained after parsing.

  • Configuration example

    Extract the value of the content field in anchor mode and specify field names time, val_key1, val_key2, val_key3, value_key4_inner1, and value_key4_inner2 for the value.

    • Raw log

      "content" : "time:2022.09.12 20:55:36\t json:{\"key1\" : \"xx\", \"key2\": false, \"key3\":123.456, \"key4\" : { \"inner1\" : 1, \"inner2\" : false}}"
    • Logtail plug-in configuration for data processing 提取字段(标定模式)

    • Result

      "time" : "2022.09.12 20:55:36"
      "val_key1" : "xx"
      "val_key2" : "false"
      "val_key3" : "123.456"
      "value_key4_inner1" : "1"
      "value_key4_inner2" : "false"

Editor configuration in JSON

  • Parameters

    Set type to processor_anchor. Then, configure other parameters in detail based on the following table.

    Parameter

    Type

    Required

    Description

    SourceKey

    String

    Yes

    The name of the original field.

    Anchors

    Anchor array

    Yes

    The fields that are configured to anchor keywords.

    Start

    String

    Yes

    The keyword that specifies the start of anchoring. If you do not configure this parameter, the start of a string is matched.

    Stop

    String

    Yes

    The keyword that specifies the end of anchoring. If you do not configure this parameter, the end of a string is matched.

    FieldName

    String

    Yes

    The field name that you want to specify for the extracted content.

    FieldType

    String

    Yes

    The type of the field. Valid values: string and json.

    ExpondJson

    Boolean

    No

    Specifies whether to expand the JSON-formatted field. Valid values:

    • true

    • false (default)

    This parameter takes effect only when you set FieldType to json.

    ExpondConnecter

    String

    No

    The character that is used to concatenate the expanded keys. The default value is an underscore (_).

    MaxExpondDepth

    Int

    No

    The maximum depth of JSON expansion. Default value: 0, which indicates that the maximum depth is unlimited.

    NoAnchorError

    Boolean

    No

    Specifies whether to report an error if no keywords are matched in the raw log. Valid values:

    • true

    • false (default)

    NoKeyError

    Boolean

    No

    Specifies whether to report an error if the raw log does not contain the original field. Valid values:

    • true

    • false (default)

  • Configuration example

    Extract the value of the content field in anchor mode and specify field names time, val_key1, val_key2, val_key3, value_key4_inner1, and value_key4_inner2 for the value.

    • Raw log

      "content" : "time:2022.09.12 20:55:36\t json:{\"key1\" : \"xx\", \"key2\": false, \"key3\":123.456, \"key4\" : { \"inner1\" : 1, \"inner2\" : false}}"
    • Logtail plug-in configuration for data processing

      {
         "type" : "processor_anchor",
         "detail" : {"SourceKey" : "content",
            "Anchors" : [
                {
                    "Start" : "time",
                    "Stop" : "\t",
                    "FieldName" : "time",
                    "FieldType" : "string",
                    "ExpondJson" : false
                },
                {
                    "Start" : "json:",
                    "Stop" : "",
                    "FieldName" : "val",
                    "FieldType" : "json",
                    "ExpondJson" : true 
                }
            ]
        }
      }
    • Result

      "time" : "2022.09.12 20:55:36"
      "val_key1" : "xx"
      "val_key2" : "false"
      "val_key3" : "123.456"
      "value_key4_inner1" : "1"
      "value_key4_inner2" : "false"

CSV mode

You can parse CSV-formatted logs and extract content from the fields in the logs in CSV mode.

Form configuration

  • Parameters

    Set Processor Type to Extract Field (CSV Mode). Then, configure other parameters based on the following table.

    Parameter

    Description

    Original Field

    The name of the original field.

    New Field

    The field name that you want to specify for the extracted content. You can specify multiple field names.

    Important

    If the system does not find a match for a field that is specified in New Field from the value of the original field, the field is skipped.

    Delimiter

    The delimiter. The default value is a comma (,).

    Retain Excess Part

    Specifies whether to retain the content that remains in the value of the original field after the system finds a match for each field that is specified in New Field from the value. For ease of understanding, the content is referred to as the excess part in this topic.

    Parse Excess Part

    Specifies whether to parse the excess part. If you select the option, you can configure Name Prefix of Field to which Excess Part is Assigned to specify a prefix for the names of fields to which the excess part is assigned.

    If you select Retain Excess Part but do not select Parse Excess Part, the excess part is stored in the _decode_preserve_ field.

    Note

    If the excess part contains invalid data, you must standardize the data in the CSV format and then store the data.

    Name Prefix of Field to which Excess Part is Assigned

    The prefix for the names of fields to which the excess part is assigned. For example, if you set this parameter to expand_, the fields are named expand_1 and expand_2.

    Ignore Spaces before Field

    Specifies whether to skip the spaces at the beginning of the value of the original field.

    Retain Original Field

    Specifies whether to retain the original field in the new log that is obtained after parsing.

    Report Original Field Missing Error

    Specifies whether to report an error if the raw log does not contain the original field.

  • Configuration example

    Extract the value of the csv field.

    • Raw log

      {
          "csv": "2022-06-09,192.0.2.0,\"{\"\"key1\"\":\"\"value\"\",\"\"key2\"\":{\"\"key3\"\":\"\"string\"\"}}\"",
          ......
      }
    • Logtail plug-in configuration for data processing 提取字段(CSV模式)

    • Result

      {
          "date": "2022-06-09",
          "ip": "192.0.2.0",
          "content": "{\"key1\":\"value\",\"key2\":{\"key3\":\"string\"}}"
          ......
      
      }

Editor configuration in JSON

  • Parameters

    Set type to processor_csv. Then, configure other parameters in detail based on the following table.

    Parameter

    Type

    Required

    Description

    SourceKey

    String

    Yes

    The name of the original field.

    SplitKeys

    String array

    Yes

    The field names that you want to specify for the extracted content. Example: ["date", "ip", "content"].

    Important

    If the system does not find a match for a field that is specified in SplitKeys from the value of the original field, the field is skipped.

    PreserveOthers

    Boolean

    No

    Specifies whether to retain the excess part after the system finds a match for each field that is specified in SplitKeys from the value. Valid values:

    • true

    • false (default)

    ExpandOthers

    Boolean

    No

    Specifies whether to parse the excess part. Valid values:

    • true.

      You can set ExpandOthers to true to parse the excess part and configure ExpandKeyPrefix to specify a prefix for the names of fields to which the excess part is assigned.

    • false (default).

      If you set PreserveOthers to true and ExpandOthers to false, the excess part is stored in the _decode_preserve_ field.

      Note

      If the excess part contains invalid data, you must standardize the data in the CSV format and then store the data.

    ExpandKeyPrefix

    String

    No

    The prefix for the names of fields to which the excess part is assigned. For example, if you set this parameter to expand_, the fields are named expand_1 and expand_2.

    TrimLeadingSpace

    Boolean

    No

    Specifies whether to skip the spaces at the beginning of the value of the original field. Valid values:

    • true

    • false (default)

    SplitSep

    String

    No

    The delimiter. The default value is a comma (,).

    KeepSource

    Boolean

    No

    Specifies whether to retain the original field in the new log that is obtained after parsing. Valid values:

    • true

    • false (default)

    NoKeyError

    Boolean

    No

    Specifies whether to report an error if the raw log does not contain the original field. Valid values:

    • true

    • false (default)

  • Configuration example

    Extract the value of the csv field.

    • Raw log

      {
          "csv": "2022-06-09,192.0.2.0,\"{\"\"key1\"\":\"\"value\"\",\"\"key2\"\":{\"\"key3\"\":\"\"string\"\"}}\"",
          ......
      }
    • Logtail plug-in configuration for data processing

       {
          ......
          "type":"processor_csv",
          "detail":{
              "SourceKey":"csv",
              "SplitKeys":["date", "ip", "content"],
          }
          ......
      }
    • Result

      {
          "date": "2022-06-09",
          "ip": "192.0.2.0",
          "content": "{\"key1\":\"value\",\"key2\":{\"key3\":\"string\"}}"
          ......
      
      }

Single-character delimiter mode

You can extract content from log fields by using a single-character delimiter. You can use a quote to enclose the delimiter.

Form configuration

  • Parameters

    Set Processor Type to Extract Field (Single-character Delimiter Mode). Then, configure other parameters based on the following table.

    Parameter

    Description

    Original Field

    The name of the original field.

    Delimiter

    The delimiter. The delimiter must be a single character. You can specify a non-printable character as the single-character delimiter. Example: \u0001.

    New Field

    The field name that you want to specify for the extracted content.

    Use Quote

    Specifies whether to use a quote to enclose the specified delimiter.

    Quote

    The quote. The quote must be a single character. You can specify a non-printable character as the quote. Example: \u0001.

    Report Original Field Missing Error

    Specifies whether to report an error if the raw log does not contain the original field.

    Report Delimiter Mismatch Error

    Specifies whether to report an error if the delimiter in the raw log does not match the specified delimiter.

    Retain Original Field

    Specifies whether to retain the original field in the new log that is obtained after parsing.

  • Configuration example

    Extract the value of the content field by using a vertical bar (|) as the delimiter and specify field names ip, time, method, url, request_time, request_length, status, length, ref_url, and browser for the value.

    • Raw log

      "content" : "10.**.**.**|10/Aug/2022:14:57:51 +0800|POST|PutData?
      Category=YunOsAccountOpLog&AccessKeyId=<yourAccessKeyId>&Date=Fri%2C%2028%20Jun%202013%2006%3A53%3A30%20GMT&Topic=raw&Signature=<yourSignature>|0.024|18204|200|37|-|
      aliyun-sdk-java"
    • Logtail plug-in configuration for data processing提取字段(单字符分隔符模式)

    • Result

      "ip" : "10.**.**.**"
      "time" : "10/Aug/2022:14:57:51 +0800"
      "method" : "POST"
      "url" : "/PutData?Category=YunOsAccountOpLog&AccessKeyId=<yourAccessKeyId>&Date=Fri%2C%2028%20Jun%202013%2006%3A53%3A30%20GMT&Topic=raw&Signature=<yourSignature>"
      "request_time" : "0.024"
      "request_length" : "18204"
      "status" : "200"
      "length" : "27"
      "ref_url" : "-"
      "browser" : "aliyun-sdk-java"

Editor configuration in JSON

  • Parameters

    Set type to processor_split_char. Then, configure other parameters in detail based on the following table.

    Parameter

    Type

    Required

    Description

    SourceKey

    String

    Yes

    The name of the original field.

    SplitSep

    String

    Yes

    The delimiter. The delimiter must be a single character. You can specify a non-printable character as the single-character delimiter. Example: \u0001.

    SplitKeys

    String array

    Yes

    The field names that you want to specify for the extracted content. Example: ["ip", "time", "method"].

    PreserveOthers

    Boolean

    No

    Specifies whether to retain the excess part after the system finds a match for each field that is specified in SplitKeys from the value. Valid values:

    • true

    • false (default)

    QuoteFlag

    Boolean

    No

    Specifies whether to use a quote to enclose the specified delimiter. Valid values:

    • true

    • false (default)

    Quote

    String

    No

    The quote. The quote must be a single character. You can specify a non-printable character as the quote. Example: \u0001.

    This parameter takes effect only when QuoteFlag is set to true.

    NoKeyError

    Boolean

    No

    Specifies whether to report an error if the raw log does not contain the original field. Valid values:

    • true

    • false (default)

    NoMatchError

    Boolean

    No

    Specifies whether to report an error if the delimiter in the raw log does not match the specified delimiter. Valid values:

    • true

    • false (default)

    KeepSource

    Boolean

    No

    Specifies whether to retain the original field in the new log that is obtained after parsing. Valid values:

    • true

    • false (default)

  • Configuration example

    Extract the value of the content field by using a vertical bar (|) as the delimiter and specify field names ip, time, method, url, request_time, request_length, status, length, ref_url, and browser for the value.

    • Raw log

      "content" : "10.**.**.**|10/Aug/2022:14:57:51 +0800|POST|PutData?
      Category=YunOsAccountOpLog&AccessKeyId=<yourAccessKeyId>&Date=Fri%2C%2028%20Jun%202013%2006%3A53%3A30%20GMT&Topic=raw&Signature=<yourSignature>|0.024|18204|200|37|-|
      aliyun-sdk-java"
    • Logtail plug-in configuration for data processing

      {
         "type" : "processor_split_char",
         "detail" : {"SourceKey" : "content",
            "SplitSep" : "|",
            "SplitKeys" : ["ip", "time", "method", "url", "request_time", "request_length", "status", "length", "ref_url", "browser"]     
        }
      }
    • Result

      "ip" : "10.**.**.**"
      "time" : "10/Aug/2022:14:57:51 +0800"
      "method" : "POST"
      "url" : "/PutData?Category=YunOsAccountOpLog&AccessKeyId=<yourAccessKeyId>&Date=Fri%2C%2028%20Jun%202013%2006%3A53%3A30%20GMT&Topic=raw&Signature=<yourSignature>"
      "request_time" : "0.024"
      "request_length" : "18204"
      "status" : "200"
      "length" : "27"
      "ref_url" : "-"
      "browser" : "aliyun-sdk-java"

Multi-character delimiter mode

You can extract content from log fields by using a multi-character delimiter. You cannot use a quote to enclose the delimiter.

Form configuration

  • Parameters

    Set Processor Type to Extract Field (Multi-character Delimiter Mode). Then, configure other parameters based on the following table.

    Parameter

    Description

    Original Field

    The name of the original field.

    Delimiter String

    The delimiter. You can specify a non-printable character as the multi-character delimiter. Example: \u0001\u0002.

    New Field

    The field name that you want to specify for the extracted content.

    Important

    If the system does not find a match for a field that is specified in New Field from the value of the original field, the field is skipped.

    Report Original Field Missing Error

    Specifies whether to report an error if the raw log does not contain the original field.

    Report Delimiter Mismatch Error

    Specifies whether to report an error if the delimiter in the raw log does not match the specified delimiter.

    Retain Original Field

    Specifies whether to retain the original field in the new log that is obtained after parsing.

    Retain Excess Part

    Specifies whether to retain the excess part after the system finds a match for each field that is specified in New Field from the value.

    Parse Excess Part

    Specifies whether to parse the excess part. If you select this option, you can configure Name Prefix of Field to which Excess Part is Assigned to specify a prefix for the names of fields to which the excess part is assigned.

    Name Prefix of Field to which Excess Part is Assigned

    The prefix for the names of fields to which the excess part is assigned. For example, if you set this parameter to expand_, the fields are named expand_1 and expand_2.

  • Configuration example

    Extract the value of the content field by using the delimiter |#| and specify field names ip, time, method, url, request_time, request_length, status, expand_1, expand_2, and expand_3 for the value.

    • Raw log

      "content" : "10.**.**.**|#|10/Aug/2022:14:57:51 +0800|#|POST|#|PutData?
      Category=YunOsAccountOpLog&AccessKeyId=<yourAccessKeyId>&Date=Fri%2C%2028%20Jun%202013%2006%3A53%3A30%20GMT&Topic=raw&Signature=<yourSignature>|#|0.024|#|18204|#|200|#|27|#|-|#|
      aliyun-sdk-java"
    • Logtail plug-in configuration for data processing 提取字段(多字符分隔符模式)

    • Result

      "ip" : "10.**.**.**"
      "time" : "10/Aug/2022:14:57:51 +0800"
      "method" : "POST"
      "url" : "/PutData?Category=YunOsAccountOpLog&AccessKeyId=<yourAccessKeyId>&Date=Fri%2C%2028%20Jun%202013%2006%3A53%3A30%20GMT&Topic=raw&Signature=<yourSignature>"
      "request_time" : "0.024"
      "request_length" : "18204"
      "status" : "200"
      "expand_1" : "27"
      "expand_2" : "-"
      "expand_3" : "aliyun-sdk-java"

Editor configuration in JSON

  • Parameters

    Set type to processor_split_string. Then, configure other parameters in detail based on the following table.

    Parameter

    Type

    Required

    Description

    SourceKey

    String

    Yes

    The name of the original field.

    SplitSep

    String

    Yes

    The delimiter. You can specify a non-printable character as the multi-character delimiter. Example: \u0001\u0002.

    SplitKeys

    String array

    Yes

    The field names that you want to specify for the extracted content. Example: ["key1","key2"].

    Note

    If the system does not find a match for a field that is specified in SplitKeys from the value of the original field, the field is skipped.

    PreserveOthers

    Boolean

    No

    Specifies whether to retain the excess part after the system finds a match for each field that is specified in SplitKeys from the value. Valid values:

    • true

    • false (default)

    ExpandOthers

    Boolean

    No

    Specifies whether to parse the excess part. Valid values:

    • true.

      You can set ExpandOthers to true to parse the excess part and configure ExpandKeyPrefix to specify a prefix for the names of fields to which the excess part is assigned.

    • false (default).

    ExpandKeyPrefix

    String

    No

    The prefix for the names of fields to which the excess part is assigned. For example, if you set this parameter to expand_, the fields are named expand_1 and expand_2.

    NoKeyError

    Boolean

    No

    Specifies whether to report an error if the raw log does not contain the original field. Valid values:

    • true

    • false (default)

    NoMatchError

    Boolean

    No

    Specifies whether to report an error if the delimiter in the raw log does not match the specified delimiter. Valid values:

    • true

    • false (default)

    KeepSource

    Boolean

    No

    Specifies whether to retain the original field in the new log that is obtained after parsing. Valid values:

    • true

    • false (default)

  • Configuration example

    Extract the value of the content field by using the delimiter |#| and specify field names ip, time, method, url, request_time, request_length, status, expand_1, expand_2, and expand_3 for the value.

    • Raw log

      "content" : "10.**.**.**|#|10/Aug/2022:14:57:51 +0800|#|POST|#|PutData?
      Category=YunOsAccountOpLog&AccessKeyId=<yourAccessKeyId>&Date=Fri%2C%2028%20Jun%202013%2006%3A53%3A30%20GMT&Topic=raw&Signature=<yourSignature>|#|0.024|#|18204|#|200|#|27|#|-|#|
      aliyun-sdk-java"
    • Logtail plug-in configuration for data processing

      {
         "type" : "processor_split_string",
         "detail" : {"SourceKey" : "content",
            "SplitSep" : "|#|",
            "SplitKeys" : ["ip", "time", "method", "url", "request_time", "request_length", "status"],
            "PreserveOthers" : true,
            "ExpandOthers" : true,
            "ExpandKeyPrefix" : "expand_"
        }
      }
    • Result

      "ip" : "10.**.**.**"
      "time" : "10/Aug/2022:14:57:51 +0800"
      "method" : "POST"
      "url" : "/PutData?Category=YunOsAccountOpLog&AccessKeyId=<yourAccessKeyId>&Date=Fri%2C%2028%20Jun%202013%2006%3A53%3A30%20GMT&Topic=raw&Signature=<yourSignature>"
      "request_time" : "0.024"
      "request_length" : "18204"
      "status" : "200"
      "expand_1" : "27"
      "expand_2" : "-"
      "expand_3" : "aliyun-sdk-java"

Key-value pair mode

You can extract content from log fields by splitting key-value pairs.

Note

Logtail V0.16.26 and later support the processor_split_key_value plug-in.

Form configuration

  • Parameters

    Set Processor Type to Extract Field (Key-value Pair Mode). Then, configure other parameters based on the following table.

    Parameter

    Description

    Original Field

    The name of the original field.

    Key-value Pair Delimiter

    The delimiter that is used to separate key-value pairs. The default value is a tab character (\t).

    Key and Value Delimiter

    The delimiter that is used to separate the key and the value in a single key-value pair. The default value is a colon (:).

    Retain Original Field

    Specifies whether to retain the original field.

    Report Original Field Missing Error

    Specifies whether to report an error if the raw log does not contain the original field.

    Drop Key-value Pairs That Fail to Match Delimiter

    Specifies whether to discard a key-value pair if the delimiter in the raw log does not match the specified delimiter.

    Report Key and Value Delimiter Missing Error

    Specifies whether to report an error if the raw log does not contain the specified delimiter.

    Report Empty Key Error

    Specifies whether to report an error if the key is empty after delimiting.

    Quote

    The quote. If a key value is enclosed in the specified quote, the key value in the quote is extracted. You can specify multiple characters as a quote.

    Important

    If a key value that is enclosed in the specified quote contains a backslash (\) and the backlash (\) is adjacent to the quote, the backlash (\) is extracted as a part of the key value.

  • Configuration example

    • Example 1: Extract the value of a specified field in key-value pair mode.

      Extract the value of the content field in key-value pair mode. The delimiter that is used to separate key-value pairs is a tab character (\t). The delimiter that is used to separate the key and the value in a single key-value pair is a colon (:).

      • Raw log

        "content": "class:main\tuserid:123456\tmethod:get\tmessage:\"wrong user\""
      • Logtail plug-in configuration for data processing 提取字段(键值对模式)

      • Result

        "content": "class:main\tuserid:123456\tmethod:get\tmessage:\"wrong user\""
        "class": "main"
        "userid": "123456"
        "method": "get"
        "message": "\"wrong user\""
    • Example 2: Extract the value of a specified field in key-value pair mode when a quote is used.

      Extract the value of the content field in key-value pair mode. The delimiter that is used to separate key-value pairs is a tab character (\t). The delimiter that is used to separate the key and the value in a single key-value pair is a colon (:). The quote that is used is a double quotation mark (").

      • Raw log

        "content": "class:main http_user_agent:\"User Agent\" \"Chinese\" \"hello\\t\\\"ilogtail\\\"\\tworld\""
      • Logtail plug-in configuration for data processing键值对

      • Result

        "class": "main",
        "http_user_agent": "User Agent",
        "no_separator_key_0": "Chinese",
        "no_separator_key_1": "hello\t\"ilogtail\"\tworld",
    • Example 3: Extract the value of a specified field in key-value pair mode when a multi-character quote is used.

      Extract the value of the content field in key-value pair mode. The delimiter that is used to separate key-value pairs is a tab character (\t). The delimiter that is used to separate the key and the value in a single key-value pair is a colon (:). The quote that is used is double quotation marks ("").

      • Raw log

        "content": "class:main http_user_agent:\"\"\"User Agent\"\"\" \"\"\"Chinese\"\"\""
      • Logtail plug-in configuration for data processing 键值对模式

      • Result

        "class": "main",
        "http_user_agent": "User Agent",
        "no_separator_key_0": "Chinese",

Editor configuration in JSON

  • Parameters

    Set type to processor_split_key_value. Then, configure other parameters in detail based on the following table.

    Parameter

    Type

    Required

    Description

    SourceKey

    string

    Yes

    The name of the original field.

    Delimiter

    string

    No

    The delimiter that is used to separate key-value pairs. The default value is a tab character (\t).

    Separator

    string

    No

    The delimiter that is used to separate the key and the value in a single key-value pair. The default value is a colon (:).

    KeepSource

    Boolean

    No

    Specifies whether to retain the original field in the new log that is obtained after parsing. Valid values:

    • true

    • false (default)

    ErrIfSourceKeyNotFound

    Boolean

    No

    Specifies whether to report an error if the raw log does not contain the original field. Valid values:

    • true (default)

    • false

    DiscardWhenSeparatorNotFound

    Boolean

    No

    Specifies whether to discard a key-value pair if the delimiter in the raw log does not match the specified delimiter. Valid values:

    • true

    • false (default)

    ErrIfSeparatorNotFound

    Boolean

    No

    Specifies whether to report an error if the raw log does not contain the specified delimiter. Valid values:

    • true (default)

    • false

    ErrIfKeyIsEmpty

    Boolean

    No

    Specifies whether to report an error if the key is empty after delimiting. Valid values:

    • true (default)

    • false

    Quote

    String

    No

    The quote. If a key value is enclosed in the specified quote, the key value in the quote is extracted. You can specify multiple characters as a quote. By default, the quote feature is disabled.

    Important
    • If you specify double quotation marks ("") as the quote, you must add a backslash (\) as the escape character to each pair of double quotation mark ("").

    • If a key value that is enclosed in the specified quote contains a backslash (\) and the backlash (\) is adjacent to the quote, the backlash (\) is extracted as a part of the key value.

  • Configuration example

    • Example 1: Extract the value of a specified field in key-value pair mode.

      Extract the value of the content field in key-value pair mode. The delimiter that is used to separate key-value pairs is a tab character (\t). The delimiter that is used to separate the key and the value in a single key-value pair is a colon (:).

      • Raw log

        "content": "class:main\tuserid:123456\tmethod:get\tmessage:\"wrong user\""
      • Logtail plug-in configuration for data processing

        {
          "processors":[
            {
              "type":"processor_split_key_value",
              "detail": {
                "SourceKey": "content",
                "Delimiter": "\t",
                "Separator": ":",
                "KeepSource": true
              }
            }
          ]
        }
      • Result

        "content": "class:main\tuserid:123456\tmethod:get\tmessage:\"wrong user\""
        "class": "main"
        "userid": "123456"
        "method": "get"
        "message": "\"wrong user\""
    • Example 2: Extract the value of a specified field in key-value pair mode.

      Extract the value of the content field in key-value pair mode. The delimiter that is used to separate key-value pairs is a tab character (\t). The delimiter that is used to separate the key and the value in a single key-value pair is a colon (:). The quote that is used is a double quotation mark (").

      • Raw log

        "content": "class:main http_user_agent:\"User Agent\" \"Chinese\" \"hello\\t\\\"ilogtail\\\"\\tworld\""
      • Logtail plug-in configuration for data processing

        {
          "processors":[
            {
              "type":"processor_split_key_value",
              "detail": {
                "SourceKey": "content",
                "Delimiter": " ",
                "Separator": ":",
                "Quote": "\""
              }
            }
          ]
        }
      • Result

        "class": "main",
        "http_user_agent": "User Agent",
        "no_separator_key_0": "Chinese",
        "no_separator_key_1": "hello\t\"ilogtail\"\tworld",
    • Example 3: Extract the value of a specified field in key-value pair mode.

      Extract the value of the content field in key-value pair mode. The delimiter that is used to separate key-value pairs is a tab character (\t). The delimiter that is used to separate the key and the value in a single key-value pair is a colon (:). The quote that is used is double quotation marks (""").

      • Raw log

        "content": "class:main http_user_agent:\"\"\"User Agent\"\"\" \"\"\"Chinese\"\"\""
      • Logtail plug-in configuration for data processing

        {
          "processors":[
            {
              "type":"processor_split_key_value",
              "detail": {
                "SourceKey": "content",
                "Delimiter": " ",
                "Separator": ":",
                "Quote": "\"\"\""
              }
            }
          ]
        }
      • Result

        "class": "main",
        "http_user_agent": "User Agent",
        "no_separator_key_0": "Chinese",

Grok mode

You can extract content from log fields by using Grok expressions.

Important

Logtail V1.2.0 and later support the processor_grok plug-in.

Form configuration

  • Parameters

    Set Processor Type to Extract Field (Grok Mode). Then, configure other parameters based on the following table.

    Parameter

    Description

    Original Field

    The name of the original field.

    Grok Expression Array

    The array of Grok expressions. The processor_grok plug-in matches a log field based on the specified expressions in sequence and returns the content that is extracted based on the first match.

    For more information about the default expressions that are supported by processor_grok, see processor_grok. If the expressions that are provided on the linked page do not meet your business requirements, you can specify a custom Grok expression in Custom Grok Pattern.

    Note

    If you specify multiple Grok expressions, the processing performance may be affected. We recommend that you specify no more than five expressions.

    Custom Grok Pattern

    The custom Grok pattern, which consists of the rule name and Grok expression.

    Custom Grok Pattern File Directory

    The directory where the custom Grok pattern file is stored. The processor_grok plug-in reads all files in the directory.

    Important

    If you update the custom Grok pattern file, the update can take effect only after you restart Logtail.

    Maximum Timeout

    The timeout period to extract content from the original field by using a Grok expression. Unit: milliseconds. If you do not include this parameter in the configuration or set this parameter to 0, the extraction never times out.

    Retain Logs that Fails to be Parsed

    Specifies whether to retain the raw log if the raw log fails to be parsed.

    Retain Original Field

    Specifies whether to retain the original field in the new log that is obtained after parsing.

    Report Original Field Missing Error

    Specifies whether to report an error if the raw log does not contain the original field.

    Report No Expressions Matched Error

    Specifies whether to report an error if the value of the original field does not match any expression that is specified in Grok Expression Array.

    Report Match Timeout Error

    Specifies whether to report an error if the match times out.

  • Configuration example

    Extract the value of the content field in Grok mode and specify field names year, month, and day for the value.

    • Raw log

      "content" : "2022 October 17"
    • Logtail plug-in configuration for data processing GROK模式

    • Result

      "year":"2022"
      "month":"October"
      "day":"17"

Editor configuration in JSON

  • Parameters

    Set type to processor_grok. Then, configure other parameters in detail based on the following table.

    Parameter

    Type

    Required

    Description

    CustomPatternDir

    String array

    No

    The directory where the custom Grok pattern file is stored. The processor_grok plug-in reads all files in the directory.

    If you do not include this parameter in the configuration, the system does not import custom Grok pattern files.

    Important

    If you update the custom Grok pattern file, the update can take effect only after you restart Logtail.

    CustomPatterns

    Map

    No

    The custom Grok pattern. key specifies the rule name and value specifies the Grok expression.

    For more information about the default expressions that are supported by processor_grok, see processor_grok. If the expressions that are provided on the linked page do not meet your business requirements, you can specify a custom Grok expression in Match.

    If you do not include this parameter in the configuration, the system does not use custom Grok patterns.

    SourceKey

    String

    No

    The name of the original field. The default value is content.

    Match

    String array

    Yes

    The array of Grok expressions. The processor_grok plug-in matches a log field based on the specified expressions in sequence and returns the content that is extracted based on the first match.

    Note

    If you specify multiple Grok expressions, the processing performance may be affected. We recommend that you specify no more than five expressions.

    TimeoutMilliSeconds

    Long

    No

    The timeout period to extract content from the original field by using a Grok expression. Unit: milliseconds.

    If you do not include this parameter in the configuration or set this parameter to 0, the extraction never times out.

    IgnoreParseFailure

    Boolean

    No

    Specifies whether to ignore the raw log if the raw log fails to be parsed. Valid values:

    • true (default): ignores the raw log if the raw log fails to be parsed.

    • false: deletes the raw log if the raw log fails to be parsed.

    KeepSource

    Boolean

    No

    Specifies whether to retain the original field in the new log that is obtained after parsing. Valid values:

    • true (default): retains the original field.

    • false: discards the original field.

    NoKeyError

    Boolean

    No

    Specifies whether to report an error if the raw log does not contain the original field. Valid values:

    • true

    • false (default)

    NoMatchError

    Boolean

    No

    Specifies whether to report an error if the value of the original field does not match any expression that is specified in Match. Valid values:

    • true (default)

    • false

    TimeoutError

    Boolean

    No

    Specifies whether to report an error if the match times out. Valid values:

    • true (default)

    • false

  • Example 1

    Extract the value of the content field in Grok mode and specify field names year, month, and day for the value.

    • Raw log

      "content" : "2022 October 17"
    • Logtail plug-in configuration for data processing

      {
         "type" : "processor_grok",
         "detail" : {
            "KeepSource" : false,
            "Match" : [
               "%{YEAR:year} %{MONTH:month} %{MONTHDAY:day}"
            ],
            "IgnoreParseFailure" : false
         }
      }
    • Result

      "year":"2022"
      "month":"October"
      "day":"17"
  • Example 2

    Extract the value of the content field from multiple logs in Grok mode and parse the extracted values into different results based on different Grok expressions.

    • Raw log

      {
          "content" : "begin 123.456 end"
      }
      {
          "content" : "2019 June 24 \"I am iron man"\"
      }
      {
          "content" : "WRONG LOG"
      }
      {
          "content" : "10.0.0.0 GET /index.html 15824 0.043"
      }
    • Logtail plug-in configuration for data processing

      {
              "type" : "processor_grok",
              "detail" : {
                      "CustomPatterns" : {
                              "HTTP" : "%{IP:client} %{WORD:method} %{URIPATHPARAM:request} %{NUMBER:bytes} %{NUMBER:duration}"
                      },
                      "IgnoreParseFailure" : false,
                      "KeepSource" : false,
                      "Match" : [
                              "%{HTTP}",
                              "%{WORD:word1} %{NUMBER:request_time} %{WORD:word2}",
                              "%{YEAR:year} %{MONTH:month} %{MONTHDAY:day} %{QUOTEDSTRING:motto}"
                      ],
                      "SourceKey" : "content"
              },
      }
    • Result

      • In this example, the processor_grok plug-in matches the first log against the first expression %{HTTP} that is specified in Match, and the match fails. Then, the processor_grok plug-in matches the log against the second expression %{WORD:word1} %{NUMBER:request_time} %{WORD:word2}, and the match is successful. In this case, the content that is extracted based on the second expression is returned.

        In the result, the content field in the raw log is discarded because KeepSource is set to false.

      • In this example, the processor_grok plug-in matches the second log against the first expression %{HTTP} and the second expression %{WORD:word1} %{NUMBER:request_time} %{WORD:word2} that are specified in Match in sequence, and both matches fail. Then, the processor_grok plug-in matches the second log against the third expression %{YEAR:year} %{MONTH:month} %{MONTHDAY:day} %{QUOTEDSTRING:motto}, and the match is successful. In this case, the content that is extracted based on the third expression is returned.

      • In this example, the processor_grok plug-in matches the third log against the three expressions that are specified in Match in sequence, and all matches fail. The log is discarded because IgnoreParseFailure is set to false.

      • In this example, the processor_grok plug-in matches the fourth log against the first expression %{HTTP} that is specified in Match, and the match is successful. In this case, the content that is extracted based on the first expression is returned.

      {
        "word1":"begin",
        "request_time":"123.456",
        "word2":"end",
      }
      {
        "year":"2019",
        "month":"June",
        "day":"24",
        "motto":"\"I am iron man"\",
      }
      {
        "client":"10.0.0.0",
        "method":"GET",
        "request":"/index.html",
        "bytes":"15824",
        "duration":"0.043",
      }