All Products
Search
Document Center

Simple Log Service:Collect text logs in regex mode

Last Updated:Nov 13, 2024

You can use a Logtail plug-in to extract log fields from logs based on a regular expression. The logs are parsed into key-value pairs. This topic describes how to create a Logtail configuration in regex mode in the Simple Log Service console.

Solution overview

Consider the following raw log:

127.0.0.1 - - [16/Aug/2024:14:37:52 +0800] "GET /wp-admin/admin-ajax.php?action=rest-nonce HTTP/1.1" 200 41 "http://www.example.com/wp-admin/post-new.php?post_type=page" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/127.0.0.0 Safari/537.36 Edg/127.0.0.0"

After processing with the regex parsing plug-in, the result is as follows:

image

The regular expression applied is as follows:

(\S+)\s-\s(\S+)\s\[([^]]+)]\s"(\w+)\s(\S+)\s([^"]+)"\s(\d+)\s(\d+)\s"([^"]+)"\s"([^"]+).*

Prerequisites

  • A machine group has been created, and servers have been added to the machine group. We recommend that you create a custom identifier-based machine group. For more information, see Create a custom identifier-based machine group or Create an IP address-based machine group.

  • Ports 80 and 443 are enabled for the server on which Logtail is installed. If the server is an Elastic Computing Service (ECS) instance, you can reconfigure the related security group rules to enable the ports. For more information about how to configure a security group rule, see Add a security group rule.

  • The server from which you want to collect logs continuously generates logs. Logtail collects only incremental logs. If a log file on your server is not updated after a Logtail configuration is delivered and applied to the server, Logtail does not collect logs from the file. For more information, see Read log files.

1. Select a project and a logstore

  1. Log on to the Simple Log Service console.

  2. Click Quick Data Import on the right side of the console.

    image

  3. On the Import Data page, click Regular Expression - Text Logs. image

  4. Select the project and logstore you want.

2. Configure a machine group

Apply the Logtail configuration to the specified machine group to collect data from the server. Choose the appropriate scenario and installation environment based on your requirements, because this will influence subsequent configurations.

3. Configure a Logtail

3.1 Global configurationsimage

Description of global configurations

Parameter

Description

Configuration Name

Enter a name for the Logtail configuration. The name must be unique in a project. After you create the Logtail configuration, you cannot change its name.

Log Topic Type

Select a method to generate log topics. For more information, see Log topics.

  • Machine Group Topic: The topics of the machine groups are used as log topics. If you want to distinguish the logs from different machine groups, select this option.

  • File Path Extraction: You must specify a custom regular expression. A part of the file path that matches the regular expression is used as the log topic. If you want to distinguish the logs from different sources, select this option.

  • Custom: You must specify a custom log topic.

Advanced Parameters

Optional. Configure the advanced parameters that are related to global configurations. For more information, see CreateLogtailPipelineConfig.

3.2 Input configurations

image

Description of input configurations

Parameter

Description

File Path

Specify the directory and name of log files based on the location of the logs on your server, such as an ECS instance.

  • If you specify a file path in a Linux operating system, the path must start with a forward slash (/). Example: /apsara/nuwa/**/app.Log.

  • If you specify a file path in a Windows operating system, the path must start with a drive letter. Example: C:\Program Files\Intel\**\*.Log.

You can specify an exact directory and an exact name. You can also use wildcard characters to specify the directory and name. For more information, see Wildcard matching. When you configure this parameter, you can use only the asterisk (*) or question mark (?) as wildcard characters.

Simple Log Service scans all levels of the specified directory to find the log files that match the specified conditions. Examples:

  • If you specify /apsara/nuwa/**/*.log, Simple Log Service collects logs from the log files whose names are suffixed by .log in the /apsara/nuwa directory and the recursive subdirectories of the directory.

  • If you specify /var/logs/app_*/**/*.log, Simple Log Service collects logs from the log files that meet the following conditions: The file name is suffixed by .log. The file is stored in a subdirectory of the /var/logs directory or in a recursive subdirectory of the subdirectory. The name of the subdirectory matches the app_* pattern.

  • If you specify /var/log/nginx/**/access*, Simple Log Service collects logs from the log files whose names start with access in the /var/log/nginx directory and the recursive subdirectories of the directory.

Maximum Directory Monitoring Depth

Specify the maximum number of levels of subdirectories that you want to monitor. The subdirectories are in the log file directory that you specify. This parameter specifies the levels of subdirectories that can be matched by the ** wildcard characters included in the value of File Path. A value of 0 indicates that only the log file directory that you specify is monitored.

File Encoding

Select the encoding format of log files.

First Collection Size

Specify the size of data that Logtail can collect from a log file the first time Logtail collects logs from the file. The default value of First Collection Size is 1024. Unit: KB.

  • If the file size is less than 1,024 KB, Logtail collects data from the beginning of the file.

  • If the file size is greater than 1,024 KB, Logtail collects the last 1,024 KB of data in the file.

You can configure First Collection Size based on your business requirements. Valid values: 0 to 10485760. Unit: KB.

Collection Blacklist

If you turn on Collection Blacklist, you must configure a blacklist to specify the directories or files that you want Simple Log Service to skip when it collects logs. You can specify exact directories and file names. You can also use wildcard characters to specify directories and file names. When you configure this parameter, you can use only the asterisk (*) or question mark (?) as wildcard characters.

Important
  • If you use wildcard characters to specify a value for File Path and you want to skip some subdirectories in the specified directory, you must configure Collection Blacklist to specify the subdirectories. You must specify complete subdirectories.

    For example, if you set File Path to /home/admin/app*/log/*.log and you want to skip all subdirectories in the /home/admin/app1* directory, you must select Directory Blacklist and enter /home/admin/app1*/** in the Directory Name field. If you enter /home/admin/app1*, the blacklist does not take effect.

  • When a blacklist is in use, computational overhead is generated. We recommend that you add no more than 10 entries to a blacklist.

  • You cannot specify a directory that ends with a forward slash (/). For example, if you specify the /home/admin/dir1/ directory in a directory blacklist, the directory blacklist does not take effect.

The following types of blacklists are supported: File Path Blacklist, File Blacklist, and Directory Blacklist.

File Path Blacklist

  • If you select File Path Blacklist and enter /home/admin/private*.log in the File Path Name field, all files whose names are prefixed by private and suffixed by .log in the /home/admin/ directory are skipped.

  • If you select File Path Blacklist and enter /home/admin/private*/*_inner.log in the File Path Name field, all files whose names are suffixed by _inner.log in the subdirectories whose names are prefixed by private in the /home/admin/ directory are skipped. For example, the /home/admin/private/app_inner.log file is skipped, but the /home/admin/private/app.log file is not skipped.

File Blacklist

If you select File Blacklist and enter app_inner.log in the File Name field, all files whose names are app_inner.log are skipped.

Directory Blacklist

  • If you select Directory Blacklist and enter /home/admin/dir1 in the Directory Name field, all files in the /home/admin/dir1 directory are skipped.

  • If you select Directory Blacklist and enter /home/admin/dir* in the Directory Name field, all files in the subdirectories whose names are prefixed by dir in the /home/admin/ directory are skipped.

  • If you select Directory Blacklist and enter /home/admin/*/dir in the Directory Name field, all files in the dir subdirectory in each second-level subdirectory of the /home/admin/ directory are skipped. For example, the files in the /home/admin/a/dir directory are skipped, but the files in the /home/admin/a/b/dir directory are not skipped.

Allow File to Be Collected for Multiple Times

By default, you can use only one Logtail configuration to collect logs from a log file. If you want to collect multiple copies of logs from a log file, you must turn on Allow File to Be Collected for Multiple Times.

Advanced Parameters

Optional. Configure the advanced parameters that are related to input plug-ins. For more information, see CreateLogtailPipelineConfig.

3.3 Processor configurations

  1. Log Sample: Supports multiple logs. Log samples can help configure log processing parameters, facilitating the setup process. We recommend that you add log samples.

  2. Multi-line Mode: For multi-line logs, enable this option.

    • Type: Select Custom. For example, the regular expression to match the beginning of a line is (\S+)\s-.*.

    • Processing Method If Splitting Fails: Select Keep Single Line.

  3. Processing Method

    Use the Data Parsing (Regex Mode) plug-in. For multi-line logs, enable Multi-line Mode to automatically generate a regular expression.

    image

  4. Click Data Parsing (Regex Mode) to access the detailed configuration page for the plug-in. Configure the regular expression and set key values based on the extracted data. image

    Configuration description

    Parameter

    Description

    Original Field

    The original field that is used to store the content of a log before the log is parsed. Default value: content.

    Regular Expression

    The regular expression that is used to match logs.

    • If you specify a sample log, Simple Log Service can automatically generate a regular expression or use a regular expression that you manually specify.

      • Click Generate. In the Sample Log field, select the log content that you want to extract and click Generate Regular Expression. Simple Log Service generates a regular expression based on the content that you specified.

      • Click Manual to specify a regular expression. After you configure the settings, click Validate to check whether the regular expression can parse and extract log content as expected. For more information, see How do I test a regular expression?

    • If you do not specify a sample log, you must specify a regular expression based on the actual log content.

    Extracted Field

    The extracted fields. Configure the Key parameter for each Value parameter. The Key parameter specifies a new field name. The Value parameter specifies the content that is extracted from logs.

    Retain Original Field If Parsing Fails

    If you select the Retain Original Field If Parsing Fails parameter and parsing fails, the original field is retained.

    Retain Original Field If Parsing Succeeds

    If you select the Retain Original Field If Parsing Succeeds parameter and parsing is successful, the original field is retained.

    New Name of Original Field

    If you select the Retain Original Field If Parsing Fails or Retain Original Field If Parsing Succeeds parameter, you can rename the original field to store the original log content.

Description of processor configurations

Parameter

Description

Log Sample

Add a sample log that is collected from an actual scenario. You can use the sample log to configure parameters that are related to log processing with ease. You can add multiple sample logs. Make sure that the total length of the logs does not exceed 1,500 characters.

[2023-10-01T10:30:01,000] [INFO] java.lang.Exception: exception happened
    at TestPrintStackTrace.f(TestPrintStackTrace.java:3)
    at TestPrintStackTrace.g(TestPrintStackTrace.java:7)
    at TestPrintStackTrace.main(TestPrintStackTrace.java:16)

Multi-line Mode

  • Specify the type of multi-line logs. A multi-line log spans multiple consecutive lines. You can configure this parameter to identify each multi-line log in a log file.

    • Custom: A multi-line log is identified based on the value of Regex to Match First Line.

    • Multi-line JSON: Each JSON object is expanded into multiple lines. Example:

      {
        "name": "John Doe",
        "age": 30,
        "address": {
          "city": "New York",
          "country": "USA"
        }
      }
  • Configure Processing Method If Splitting Fails.

    Exception in thread "main" java.lang.NullPointerException
        at com.example.MyClass.methodA(MyClass.java:12)
        at com.example.MyClass.methodB(MyClass.java:34)
        at com.example.MyClass.main(MyClass.java:½0)

    For the preceding sample log, Simple Log Service can discard the log or retain each single line as a log when it fails to split the log.

    • Discard: The log is discarded.

    • Retain Single Line: Each line of log text is retained as a log. Four logs are retained.

Processing Method

Select Processors. You can add native plug-ins and extended plug-ins for data processing. For more information about Logtail plug-ins for data processing, see Logtail plug-ins overview.

Important

You are subject to the limits of Logtail plug-ins for data processing. For more information, see the on-screen instructions in the Simple Log Service console.

  • Logtail earlier than V2.0

    • You cannot add native plug-ins and extended plug-ins at the same time.

    • You can use native plug-ins only to collect text logs. When you add native plug-ins, take note of the following items:

      • You must add one of the following Logtail plug-ins for data processing as the first plug-in: Data Parsing (Regex Mode), Data Parsing (Delimiter Mode), Data Parsing (JSON Mode), Data Parsing (NGINX Mode), Data Parsing (Apache Mode), and Data Parsing (IIS Mode).

      • After you add the first plug-in, you can add one Time Parsing plug-in, one Data Filtering plug-in, and multiple Data Masking plug-ins.

    • You can add extended plug-ins only after you add native plug-ins.

  • Logtail V2.0

    • You can arbitrarily combine native plug-ins for data processing.

    • You can combine native plug-ins and extended plug-ins. Make sure that extended plug-ins are added after native plug-ins.

Important

A Logtail configuration requires up to three minutes to take effect.

4. Configure query and analysis

By default, Simple Log Service enables full-text indexing. You can also manually create field indexes, or click Automatic Index Generation to automatically generate indexes. For more information, see Create indexes.

image

5. Query logs

Click Query Log to go to the query and analysis page. image

Wait approximately one minute for the index to activate before viewing the collected logs on the Raw Logs tab. For more information, see Query and analyze logs.

Note

To query all fields in logs, use full-text indexes. To query only specific fields, use field indexes, which helps reduce index traffic. To perform field analysis, create field indexes and include a SELECT statement in your analysis query.

What to do next

  • For more information about log index types, configuration examples, and index-related billing, see Create indexes.

  • For more information about log query syntax, see Search syntax.