This topic describes the syntax and parameters of the Grok function. This topic also provides examples on how to use the function.
Description
Regular expression functions are complicated. We recommend that you use the Grok function instead of regular expression functions. For more information, see Regular expression functions. You can use the Grok function together with regular expression functions. Examples:
e_match("content", grok(r"\w+: (%{IP})")) # The Grok pattern matches the abc: 192.168.0.0 or xyz: 192.168.1.1 pattern of log data.
e_match("content", grok(r"\w+: (%{IP})", escape=True)) # The Grok pattern does not match the abc: 192.168.0.0 pattern of log data but matches the \w+: 192.168.0.0 pattern of log data.
The Grok function extracts specified values based on a regular expression.
- Syntax
grok(pattern, escape=False, extend=None)
- Grok syntax
%{SYNTAX} %{SYNTAX:NAME}
In the Grok syntax, SYNTAX specifies a predefined regular expression, and NAME specifies a group. Examples:"%{IP}" # Equivalent to r"(?:\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})" "%{IP:source_id}" # Equivalent to r"(?P<source_id>\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})" ("%{IP}") # Equivalent to r"(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})"
The Grok function supports the following grouping modes:- Capturing group modeSome Grok patterns support named capturing groups. You can use only the %{SYNTAX} syntax for these Grok patterns. These Grok patterns are commonly used to parse statements. For more information, see the "Log formats" section in Grok patterns. Examples:
"%{SYSLOGBASE}" "%{COMMONAPACHELOG}" "%{COMBINEDAPACHELOG}" "%{HTTPD20_ERRORLOG}" "%{HTTPD24_ERRORLOG}" "%{HTTPD_ERRORLOG}" ...
- Non-capturing group modeSome Grok patterns support non-capturing groups. Examples:
"%{INT}" "%{YEAR}" "%{HOUR}" ...
- Capturing group mode
- Parameters
Parameter Type Required Description pattern String Yes The Grok syntax. For more information, see Grok patterns. escape Bool No Specifies whether to escape special characters that are included in regular expressions in non-Grok patterns. Default value: False. extend Dict No The custom Grok expression.
Examples
- Example 1: Extract the date and reference content.
- Raw log
content: 2019 June 24 "I am iron man"
- Transformation rule
e_regex('content',grok('%{YEAR:year} %{MONTH:month} %{MONTHDAY:day} %{QUOTEDSTRING:motto}'))
- Result
content: 2019 June 24 "I am iron man" year: 2019 month: June day: 24 motto: "I am iron man"
- Raw log
- Example 2: Extract an HTTP request log.
- Raw log
content: 10.0.0.0 GET /index.html 15824 0.043
- Transformation rule
e_regex('content',grok('%{IP:client} %{WORD:method} %{URIPATHPARAM:request} %{NUMBER:bytes} %{NUMBER:duration}'))
- Result
content: 10.0.0.0 GET /index.html 15824 0.043 client: 10.0.0.0 method: GET request: /index.html bytes: 15824 duration: 0.043
- Raw log
- Example 3: Extract an Apache log.
- Raw log
content: 127.0.0.1 - - [13/Apr/2015:17:22:03 +0800] "GET /router.php HTTP/1.1" 404 285 "-" "curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.15.3 zlib/1.2.3 libidn/1.18 libssh2/1.4.2"
- Transformation rule
e_regex('content',grok('%{COMBINEDAPACHELOG}'))
- Result
content: 127.0.0.1 - - [13/Apr/2015:17:22:03 +0800] "GET /router.php HTTP/1.1" 404 285 "-" "curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.15.3 zlib/1.2.3 libidn/1.18 libssh2/1.4.2" clientip: 127.0.0.1 ident: - auth: - timestamp: 13/Apr/2015:17:22:03 +0800 verb: GET request: /router.php httpversion: 1.1 response: 404 bytes: 285 referrer: "-" agent: "curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.15.3 zlib/1.2.3 libidn/1.18 libssh2/1.4.2"
- Raw log
- Example 4: Extract a log in the default syslog format.
- Raw log
content: May 29 16:37:11 sadness logger: hello world
- Transformation rule
e_regex('content',grok('%{SYSLOGBASE} %{DATA:message}'))
- Result
content: May 29 16:37:11 sadness logger: hello world timestamp: May 29 16:37:11 logsource: sadness program: logger message: hello world
- Raw log
- Example 5: Escape special characters.
- Raw log
content: Nov 1 21:14:23 scorn kernel: pid 84558 (expect), uid 30206: exited on signal 3
- Transformation rule
e_regex('content',grok(r'%{SYSLOGBASE} pid %{NUMBER:pid} \(%{WORD:program}\), uid %{NUMBER:uid}: exited on signal %{NUMBER:signal}'))
The transformation rule contains special characters parentheses (), which are included in regular expressions. If you do not want to escape the parentheses (), set the escape parameter to True. Example:
e_regex('content',grok('%{SYSLOGBASE} pid %{NUMBER:pid} (%{WORD:program}), uid %{NUMBER:uid}: exited on signal %{NUMBER:signal}', escape=True))
- Result
content: Nov 1 21:14:23 scorn kernel: pid 84558 (expect), uid 30206: exited on signal 3 timestamp: Nov 1 21:14:23 logsource: scorn program: expect pid: 84558 uid: 30206 signal: 3
- Raw log
- Example 6: Extract a log by using a custom Grok expression.
- Raw log
content: Beijing-1104,gary 25 "never quit"
- Transformation rule
e_regex('content',grok('%{ID:user_id},%{WORD:name} %{INT:age} %{QUOTEDSTRING:motto}',extend={'ID': '%{WORD}-%{INT}'}))
- Result
content: Beijing-1104,gary 25 "never quit" user_id: Beijing-1104 name: gary age: 25 motto: "never quit"
- Raw log
- Example 7: Match JSON data.
- Raw log
content: 2019-10-29 16:41:39,218 - INFO: owt.AudioFrameConstructor - McsStats: {"event":"mediaStats","connectionId":"331578616547393100","durationMs":"5000","rtpPackets":"250","rtpBytes":"36945","nackPackets":"0","nackBytes":"0","rtpIntervalAvg":"20","rtpIntervalMax":"104","rtpIntervalVar":"4","rtcpRecvPackets":"0","rtcpRecvBytes":"0","rtcpSendPackets":"1","rtcpSendBytes":"32","frame":"250","frameBytes":"36945","timeStampOutOfOrder":"0","frameIntervalAvg":"20","frameIntervalMax":"104","frameIntervalVar":"4","timeStampIntervalAvg":"960","timeStampIntervalMax":"960","timeStampIntervalVar":"0"}
- Transformation rule
e_regex('content',grok('%{EXTRACTJSON}'))
- Result
content: 2019-10-29 16:41:39,218 - INFO: owt.AudioFrameConstructor - McsStats: {"event":"mediaStats","connectionId":"331578616547393100","durationMs":"5000","rtpPackets":"250","rtpBytes":"36945","nackPackets":"0","nackBytes":"0","rtpIntervalAvg":"20","rtpIntervalMax":"104","rtpIntervalVar":"4","rtcpRecvPackets":"0","rtcpRecvBytes":"0","rtcpSendPackets":"1","rtcpSendBytes":"32","frame":"250","frameBytes":"36945","timeStampOutOfOrder":"0","frameIntervalAvg":"20","frameIntervalMax":"104","frameIntervalVar":"4","timeStampIntervalAvg":"960","timeStampIntervalMax":"960","timeStampIntervalVar":"0"} json:{"event":"mediaStats","connectionId":"331578616547393100","durationMs":"5000","rtpPackets":"250","rtpBytes":"36945","nackPackets":"0","nackBytes":"0","rtpIntervalAvg":"20","rtpIntervalMax":"104","rtpIntervalVar":"4","rtcpRecvPackets":"0","rtcpRecvBytes":"0","rtcpSendPackets":"1","rtcpSendBytes":"32","frame":"250","frameBytes":"36945","timeStampOutOfOrder":"0","frameIntervalAvg":"20","frameIntervalMax":"104","frameIntervalVar":"4","timeStampIntervalAvg":"960","timeStampIntervalMax":"960","timeStampIntervalVar":"0"}
- Raw log
- Example 8: Parse a log in the World Wide Web Consortium (W3C) format.
- Raw log
content: 2018-12-26 00:00:00 W3SVC2 application001 192.168.0.0 HEAD / - 8000 - 10.0.0.0 HTTP/1.0 - - - - 404 0 64 0 19 0
- Transformation rule
Fields that are not supported by the W3C format are displayed as hyphens (-). Therefore, hyphens (-) are used in Grok patterns to match the fields.
e_regex("content",grok('%{DATE:data} %{TIME:time} %{WORD:s_sitename} %{WORD:s_computername} %{IP:s_ip} %{WORD:cs_method} %{NOTSPACE:cs_uri_stem} - %{NUMBER:s_port} - %{IP:c_ip} %{NOTSPACE:cs_version} - - - - %{NUMBER:sc_status} %{NUMBER:sc_substatus} %{NUMBER:sc_win32_status} %{NUMBER:sc_bytes} %{NUMBER:cs_bytes} %{NUMBER:time_taken}'))
- Result
content: 2018-12-26 00:00:00 W3SVC2 application001 192.168.0.0 HEAD / - 8000 - 10.0.0.0 HTTP/1.0 - - - - 404 0 64 0 19 0 data: 18-12-26 time: 00:00:00 s_sitename: W3SVC2 s_computername: application001 s_ip: 192.168.0.0 cs_method: HEAD cs_uri_stem: / s_port: 8000 c_ip: 10.0.0.0 cs_version: HTTP/1.0 sc_status: 404 sc_substatus: 0 sc_win32_status: 64 sc_bytes: 0 cs_bytes: 19 time_taken: 0
- Raw log