全部產品
Search
文件中心

Simple Log Service:GROK函數

更新時間:Jul 27, 2024

本文介紹GROK函數的文法規則,包括參數解釋、函數樣本等。

函數介紹

Regex函數在使用中較為複雜,推薦您優先使用GROK函數。您也可以將GROK函數與Regex函數混合使用,樣本如下:

e_match("content", grok(r"\w+: (%{IP})"))  #匹配abc: 192.0.2.0或者xyz: 192.0.2.2等形式。
e_match("content", grok(r"\w+: (%{IP})", escape=True)) #不會匹配abc: 192.0.2.0,而是匹配\w+: 192.0.2.0。

GROK函數根據Regex提取特定的值。

  • 函數格式

    grok(pattern, escape=False, extend=None)
  • GROK文法

    %{SYNTAX} 
    %{SYNTAX:NAME}

    其中SYNTAX表示預定義正則模式,NAME表示分組。

    "%{IP}"               #等價於r"(?:\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})"
    "%{IP:source_id}"     #等價於r"(?P<source_id>\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})"
    ("%{IP}")             #等價於r"(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})"

    GROK有兩種分組模式:

    • 捕獲分組模式

      GROK模式中部分是內建命名分組捕獲的,所以針對這種模式只能使用%{SYNTAX}方式的文法。此類模式常見於語句解析,具體請參見GROK模式參考中的Log formats模組。

      "%{SYSLOGBASE}"        
      "%{COMMONAPACHELOG}" 
      "%{COMBINEDAPACHELOG}"
      "%{HTTPD20_ERRORLOG}"
      "%{HTTPD24_ERRORLOG}"
      "%{HTTPD_ERRORLOG}"
      ...
    • 非捕獲分組模式

      在GROK模式中部分是非捕獲分組模式,例如:

      "%{INT}"    
      "%{YEAR}"
      "%{HOUR}"
      ...
  • 參數說明

    參數名稱

    參數類型

    是否必填

    說明

    pattern

    String

    GROK文法。具體的GROK模式,請參見GROK模式參考

    escape

    Bool

    是否將其他非GROK pattern中的正則相關特殊字元做轉義,預設不轉義。

    extend

    Dict

    使用者自訂的GROK運算式。

函數樣本

  • 樣本1:提取日期和引用內容。

    • 原始日誌

      content: 2019 June 24 "I am iron man"
    • 加工規則

      e_regex('content',grok('%{YEAR:year} %{MONTH:month} %{MONTHDAY:day} %{QUOTEDSTRING:motto}'))
    • 加工結果

      content: 2019 June 24 "I am iron man"
      year: 2019
      month: June
      day: 24
      motto: "I am iron man"
  • 樣本2:提取HTTP請求日誌。

    • 原始日誌

      content: 10.0.0.0 GET /index.html 15824 0.043
    • 加工規則

      e_regex('content',grok('%{IP:client} %{WORD:method} %{URIPATHPARAM:request} %{NUMBER:bytes} %{NUMBER:duration}'))
    • 加工結果

      content: 10.0.0.0 GET /index.html 15824 0.043
      client: 10.0.0.0
      method: GET
      request: /index.html
      bytes: 15824
      duration: 0.043
  • 樣本3:提取Apache日誌。

    • 原始日誌

      content: 127.0.0.1 - - [13/Apr/2015:17:22:03 +0800] "GET /router.php HTTP/1.1" 404 285 "-" "curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.15.3 zlib/1.2.3 libidn/1.18 libssh2/1.4.2"
    • 加工規則

      e_regex('content',grok('%{COMBINEDAPACHELOG}'))
    • 加工結果

      content: 127.0.0.1 - - [13/Apr/2015:17:22:03 +0800] "GET /router.php HTTP/1.1" 404 285 "-" "curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.15.3 zlib/1.2.3 libidn/1.18 libssh2/1.4.2"
      clientip: 127.0.0.1
      ident: -
      auth: -
      timestamp: 13/Apr/2015:17:22:03 +0800
      verb: GET
      request: /router.php
      httpversion: 1.1
      response: 404
      bytes: 285
      referrer: "-"
      agent: "curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.15.3 zlib/1.2.3 libidn/1.18 libssh2/1.4.2"
  • 樣本4:Syslog預設格式日誌。

    • 原始日誌

      content: May 29 16:37:11 sadness logger: hello world
    • 加工規則

      e_regex('content',grok('%{SYSLOGBASE} %{DATA:message}'))
    • 加工結果

      content: May 29 16:37:11 sadness logger: hello world
      timestamp: May 29 16:37:11
      logsource: sadness
      program: logger
      message: hello world
  • 樣本5:轉義特殊字元。

    • 原始日誌

      content: Nov  1 21:14:23 scorn kernel: pid 84558 (expect), uid 30206: exited on signal 3
    • 加工規則

      e_regex('content',grok(r'%{SYSLOGBASE} pid %{NUMBER:pid} \(%{WORD:program}\), uid %{NUMBER:uid}: exited on signal %{NUMBER:signal}'))

      因為加工規則中包含了正則特殊字元括弧,如果您不使用轉義符,則添加escape=True參數即可,如下所示:

      e_regex('content',grok('%{SYSLOGBASE} pid %{NUMBER:pid} (%{WORD:program}), uid %{NUMBER:uid}: exited on signal %{NUMBER:signal}', escape=True))
    • 加工結果

      content: Nov  1 21:14:23 scorn kernel: pid 84558 (expect), uid 30206: exited on signal 3
      timestamp: Nov  1 21:14:23
      logsource: scorn
      program: expect
      pid: 84558
      uid: 30206
      signal: 3
  • 樣本6:使用者自訂GROK運算式。

    • 原始日誌

      content: Beijing-1104,gary 25 "never quit"
    • 加工規則

      e_regex('content',grok('%{ID:user_id},%{WORD:name} %{INT:age} %{QUOTEDSTRING:motto}',extend={'ID': '%{WORD}-%{INT}'}))
    • 加工結果

      content: Beijing-1104,gary 25 "never quit"
      user_id: Beijing-1104
      name: gary
      age: 25
      motto: "never quit"
  • 樣本7:匹配JSON資料。

    • 原始日誌

      content: 2019-10-29 16:41:39,218 - INFO: owt.AudioFrameConstructor - McsStats: {"event":"mediaStats","connectionId":"331578616547393100","durationMs":"5000","rtpPackets":"250","rtpBytes":"36945","nackPackets":"0","nackBytes":"0","rtpIntervalAvg":"20","rtpIntervalMax":"104","rtpIntervalVar":"4","rtcpRecvPackets":"0","rtcpRecvBytes":"0","rtcpSendPackets":"1","rtcpSendBytes":"32","frame":"250","frameBytes":"36945","timeStampOutOfOrder":"0","frameIntervalAvg":"20","frameIntervalMax":"104","frameIntervalVar":"4","timeStampIntervalAvg":"960","timeStampIntervalMax":"960","timeStampIntervalVar":"0"}
    • 加工規則

      e_regex('content',grok('%{EXTRACTJSON}'))
    • 加工結果

      content: 2019-10-29 16:41:39,218 - INFO: owt.AudioFrameConstructor - McsStats: {"event":"mediaStats","connectionId":"331578616547393100","durationMs":"5000","rtpPackets":"250","rtpBytes":"36945","nackPackets":"0","nackBytes":"0","rtpIntervalAvg":"20","rtpIntervalMax":"104","rtpIntervalVar":"4","rtcpRecvPackets":"0","rtcpRecvBytes":"0","rtcpSendPackets":"1","rtcpSendBytes":"32","frame":"250","frameBytes":"36945","timeStampOutOfOrder":"0","frameIntervalAvg":"20","frameIntervalMax":"104","frameIntervalVar":"4","timeStampIntervalAvg":"960","timeStampIntervalMax":"960","timeStampIntervalVar":"0"}
      json:{"event":"mediaStats","connectionId":"331578616547393100","durationMs":"5000","rtpPackets":"250","rtpBytes":"36945","nackPackets":"0","nackBytes":"0","rtpIntervalAvg":"20","rtpIntervalMax":"104","rtpIntervalVar":"4","rtcpRecvPackets":"0","rtcpRecvBytes":"0","rtcpSendPackets":"1","rtcpSendBytes":"32","frame":"250","frameBytes":"36945","timeStampOutOfOrder":"0","frameIntervalAvg":"20","frameIntervalMax":"104","frameIntervalVar":"4","timeStampIntervalAvg":"960","timeStampIntervalMax":"960","timeStampIntervalVar":"0"}
  • 樣本8:解析標準w3c格式日誌。

    • 原始日誌

      content: 2018-12-26 00:00:00 W3SVC2 application001 192.168.0.0 HEAD / - 8000 - 10.0.0.0 HTTP/1.0 - - - - 404 0 64 0 19 0
    • 加工規則

      w3c中沒有的欄位使用了短劃線(-)替代,在GROK中也使用短劃線(-)去匹配這些欄位。

      e_regex("content",grok('%{DATE:data} %{TIME:time} %{WORD:s_sitename} %{WORD:s_computername} %{IP:s_ip} %{WORD:cs_method} %{NOTSPACE:cs_uri_stem} - %{NUMBER:s_port} - %{IP:c_ip} %{NOTSPACE:cs_version} - - - - %{NUMBER:sc_status} %{NUMBER:sc_substatus} %{NUMBER:sc_win32_status} %{NUMBER:sc_bytes} %{NUMBER:cs_bytes} %{NUMBER:time_taken}'))
    • 加工結果

      content: 2018-12-26 00:00:00 W3SVC2 application001 192.168.0.0 HEAD / - 8000 - 10.0.0.0 HTTP/1.0 - - - - 404 0 64 0 19 0 
      data: 18-12-26
      time: 00:00:00
      s_sitename: W3SVC2
      s_computername: application001
      s_ip: 192.168.0.0
      cs_method: HEAD 
      cs_uri_stem: /
      s_port: 8000
      c_ip: 10.0.0.0
      cs_version: HTTP/1.0
      sc_status: 404
      sc_substatus: 0
      sc_win32_status: 64 
      sc_bytes: 0 
      cs_bytes: 19 
      time_taken: 0