nginx access logs record the detailed information of user access requests. You can parse nginx access logs to monitor and analyze your business. This topic describes how to use regular expressions or the Grok function to parse nginx access logs.
Parsing methods
Simple Log Service allows you to parse nginx logs by using regular expressions or the Grok function.
Use regular expressions.
If you are unfamiliar with regular expressions, using regular expressions to parse logs may be difficult, inefficient, and time consuming. Therefore, we recommend that you use the Grok function instead of regular expressions to parse logs. For more information about regular expressions, see Regular expressions.
(Recommended) Use the Grok function.
Compared with regular expressions, the Grok function is easier to learn. You can use this function to parse logs if you are familiar with the field types in different Grok patterns. The Grok function is superior to regular expressions in terms of flexibility, efficiency, cost effectiveness, and learning curves. Simple Log Service supports 400 Grok patterns for data transformation. We recommend that you use this function to parse logs. For more information about Grok patterns, see Grok patterns.
You can combine regular expressions and the Grok function to parse logs.
You can customize regular expressions or the Grok function to parse nginx logs that are in a custom format.
Use regular expressions to parse nginx access logs that contain a success status code
The following example shows how to use regular expressions to parse nginx access logs that contain a success status code.
Raw log entry
__source__: 192.168.0.1 __tag__:__client_ip__: 192.168.254.254 __tag__:__receive_time__: 1563443076 content: 192.168.0.2 - - [04/Jan/2019:16:06:38 +0800] "GET http://example.aliyundoc.com/_astats?application=&inf.name=eth0 HTTP/1.1" 200 273932 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.example.com/bot.html)"
Requirements
Requirement 1: Extract the code, ip, datetime, protocol, request, sendbytes, referer, useragent, and verb fields from the nginx logs.
Requirement 2: Extract the uri_proto, uri_domain, and uri_param fields from the request field.
Requirement 3: Extract the uri_path and uri_query fields from the uri_param field.
DSL orchestration
General orchestration
"""Step 1: Parse the nginx logs.""" e_regex("content",r'(? P<ip>\d+\.\d+\.\d+\.\d+)( - - \[)(? P<datetime>[\s\S]+)\] \"(? P<verb>[A-Z]+) (? P<request>[\S]*) (? P<protocol>[\S]+)["] (? P<code>\d+) (? P<sendbytes>\d+) ["](? P<refere>[\S]*)["] ["](? P<useragent>[\S\s]+)["]') """Step 2: Parse the request field obtained in Step 1.""" e_regex('request',r'(? P<uri_proto>(\w+)):\/\/(? P<uri_domain>[a-z0-9.] *[^\/])(? P<uri_param>(. +)$)') """Step 3: Parse the uri_param field obtained in Step 2.""" e_regex('uri_param',r'(? P<uri_path>\/\_[a-z]+[^?]) \?(? <uri_query>(. +)$)')
Specific orchestration and the transformation results
Orchestration specific to Requirement 1:
e_regex("content",r'(? P<ip>\d+\.\d+\.\d+\.\d+)( - - \[)(? P<datetime>[\s\S]+)\] \"(? P<verb>[A-Z]+) (? P<request>[\S]*) (? P<protocol>[\S]+)["] (? P<code>\d+) (? P<sendbytes>\d+) ["](? P<refere>[\S]*)["] ["](? P<useragent>[\S\s]+)["]')
Sub-result
__source__:192.168.0.1 __tag__:__receive_time__: 1563443076 code:200 content:192.168.0.2 - - [04/Jan/2019:16:06:38 +0800] "GET http://example.aliyundoc.com/_astats?application=&inf.name=eth0 HTTP/1.1" 200 273932 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.example.com/bot.html)"httpversion:1.1 datetime:04/Jan/2019:16:06:38 +0800 ip:192.168.0.2 protocol:HTTP/1.1 refere:- request:http://example.aliyundoc.com/_astats?application=&inf.name=eth0 sendbytes:273932 useragent:Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.example.com/bot.html) verb:GET
Orchestration specific to Requirement 2 (Parse the request field).
e_regex('request',r'(? P<uri_proto>(\w+)):\/\/(? P<uri_domain>[a-z0-9.] *[^\/])(? P<uri_param>(. +)$)')
Sub-result
uri_param: /_astats?application=&inf.name=eth0 uri_domain: example.aliyundoc.com uri_proto: http
Orchestration specific to Requirement 3 (Parse the uri_param field).
e_regex('uri_param',r'(? P<uri_path>\/\_[a-z]+[^?]) \?(? <uri_query>(. +)$)')
Sub-result
uri_path: /_astats uri_query: application=&inf.name=eth0
Result
__source__:192.168.0.1 __tag__:__receive_time__: 1563443076 code:200 content:192.168.0.2 - - [04/Jan/2019:16:06:38 +0800] "GET http://example.aliyundoc.com/_astats?application=&inf.name=eth0 HTTP/1.1" 200 273932 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.example.com/bot.html)"httpversion:1.1 datetime:04/Jan/2019:16:06:38 +0800 ip:192.168.0.2 protocol:HTTP/1.1 refere:- request:http://example.aliyundoc.com/_astats?application=&inf.name=eth0 sendbytes:273932 uri_domain:example.aliyundoc.com uri_proto:http uri_param: /_astats?application=&inf.name=eth0 uri_path: /_astats uri_query: application=&inf.name=eth0 useragent:Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.example.com/bot.html) verb:GET
Use the Grok function to parse nginx access logs that contain a success status code.
The following example shows how to use the Grok function to parse nginx access logs that contain a success status code.
Raw log entry
__source__: 192.168.0.1 __tag__:__client_ip__: 192.168.254.254 __tag__:__receive_time__: 1563443076 content: 192.168.0.2 - - [04/Jan/2019:16:06:38 +0800] "GET http://example.aliyundoc.com/_astats?application=&inf.name=eth0 HTTP/1.1" 200 273932 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.example.com/bot.html)"
Requirements
Requirement 1: Extract the clientip, bytes, agent, auth, verb, request, ident, timestamp, httpversion, response, and referrer fields from the nginx logs.
Requirement 2: Extract the uri_proto, uri_domain, and uri_param fields from the request field.
Requirement 3: Extract uri_path and uri_query fields from the uri_param field.
DSL orchestration
General orchestration
"""Step 1: Parse the nginx logs.""" e_regex('content',grok('%{COMBINEDAPACHELOG}')) """Step 2: Parse the request field obtained in Step 1.""" e_regex('request',grok("%{URIPROTO:uri_proto}://(?:%{USER:user}(?::[^@]*)? @)?(?:%{URIHOST:uri_domain})?(?:%{URIPATHPARAM:uri_param})?")) """Step 3: Parse the uri_param field obtained in Step 2.""" e_regex('uri_param',grok("%{GREEDYDATA:uri_path}\? %{GREEDYDATA:uri_query}"))
To use the Grok function to parse the nginx logs, you only need to use the
COMBINEDAPACHELOG
pattern.Pattern
Rule
Description
COMMONAPACHELOG
%{IPORHOST:clientip} %
{HTTPDUSER:ident} %
{USER:auth} \[%
{HTTPDATE:timestamp}\] "(?:%
{WORD:verb} %
{NOTSPACE:request}(?: HTTP/%
{NUMBER:httpversion})? |%
{DATA:rawrequest})" %
{NUMBER:response} (?:%
{NUMBER:bytes}|-)
Parses the clientip, ident, auth, timestamp, verb, request, httpversion, response, and bytes fields.
COMBINEDAPACHELOG
%{COMMONAPACHELOG} %
{QS:referrer} %{QS:agent}
Parses all the fields in the COMMONAPACHELOG pattern, and parses the referrer and agent fields.
Specific orchestration and the transformation results
Orchestration specific to Requirement 1:
e_regex('content',grok('%{COMBINEDAPACHELOG}'))
Sub-result
clientip: 192.168.0.1 __tag__:__receive_time__: 1563443076 agent:"Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.example.com/bot.html)" auth:- bytes:273932 clientip:192.168.0.2 content:192.168.0.2 - - [04/Jan/2019:16:06:38 +0800] "GET http://example.aliyundoc.com/_astats?application=&inf.name=eth0 HTTP/1.1" 200 273932 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.example.com/bot.html)" httpversion:1.1 ident:- referrer:"-" request:http://example.aliyundoc.com/_astats?application=&inf.name=eth0 response:200 timestamp:04/Jan/2019:16:06:38 +0800 verb:GET
Orchestration specific to Requirement 2 (Parse the request field).
e_regex('request',grok("%{URIPROTO:uri_proto}://(?:%{USER:user}(?::[^@]*)? @)?(?:%{URIHOST:uri_domain})?(?:%{URIPATHPARAM:uri_param})?"))
Sub-result
uri_proto: http uri_domain: example.aliyundoc.com uri_param: /_astats?application=&inf.name=eth0
You can use the Grok patterns to parse the request field. The following table describes the patterns.
Pattern
Rule
Description
URIPROTO
[A-Za-z]+(\+[A-Za-z+]+)?
Matches URI schemes. For example, in
http://hostname.domain.tld/_astats?application=&inf.name=eth0
, the matched content is http.USER
[a-zA-Z0-9. _-]+
Matches content that contains letters, digits, and
. _-
.URIHOST
%{IPORHOST}(?::%
Matches IP addresses, hostnames, or positive integers.
URIPATHPARAM
%{URIPATH}(?:%{URIPARAM})?
Matches the uri_param field.
Orchestration specific to Requirement 3 (Parse the uri_param field).
e_regex('uri_param',grok("%{GREEDYDATA:uri_path}\? %{GREEDYDATA:uri_query}"))
Sub-result
uri_path: /_astats uri_query: application=&inf.name=eth0
The following table describes the Grok pattern that is used to parse the uri_param field.
Pattern
Rule
Description
GREEDYDATA
. *
Matches zero or multiple characters that are not line breaks.
Result
__source__:192.168.0.1 __tag__:__receive_time__: 1563443076 agent:"Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.example.com/bot.html)" auth:- bytes:273932 clientip:192.168.0.2 content:192.168.0.2 - - [04/Jan/2019:16:06:38 +0800] "GET http://example.aliyundoc.com/_astats?application=&inf.name=eth0 HTTP/1.1" 200 273932 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.example.com/bot.html)" httpversion:1.1 ident:- referrer:"-" request:http://example.aliyundoc.com/_astats?application=&inf.name=eth0 response:200 timestamp:04/Jan/2019:16:06:38 +0800 uri_domain:example.aliyundoc.com uri_param:/_astats?application=&inf.name=eth0 uri_path:/_astats uri_proto:http uri_query:application=&inf.name=eth0 verb:GET
Use the Grok function to parse nginx access logs that contain an error status code
The following example shows how to use the Grok function to parse nginx access logs that contain an error status code.
Raw log entry
__source__: 192.168.0.1 __tag__:__client_ip__: 192.168.254.254 __tag__:__receive_time__: 1563443076 content: 2019/08/07 16:05:17 [error] 1234#1234: *1234567 attempt to send data on a closed socket: u:111111ddd, c:0000000000000000, ft:0 eof:0, client: 1.2.3.4, server: sls.aliyun.com, request: "GET /favicon.ico HTTP/1.1", host: "sls.aliyun.com", referrer: "https://sls.aliyun.com/question/answer/123.html?from=singlemessage"
Requirement:
Parse the host, http_version, log_level, pid, referrer, request, request_time, server, and verb fields from the content field.
DSL orchestration:
e_regex('content',grok('%{DATESTAMP:request_time} \[%{LOGLEVEL:log_level}\] %{POSINT:pid}#%{NUMBER}: %{GREEDYDATA:errormessage}(?:, client: (? <client>%{IP}|%{HOSTNAME}))(?:, server: %{IPORHOST:server})(?:, request: "%{WORD:verb} %{NOTSPACE:request}( HTTP/%{NUMBER:http_version})")(?:, host: "%{HOSTNAME:host}")?(?:, referrer: "%{NOTSPACE:referrer}")?'))
Result
___source__: 192.168.0.1 __tag__:__client_ip__: 192.168.254.254 __tag__:__receive_time__: 1563443076 content:2019/08/07 16:05:17 [error] 1234#1234: *1234567 attempt to send data on a closed socket: u:111111ddd, c:0000000000000000, ft:0 eof:0, client: 1.2.3.4, server: sls.aliyun.com, request: "GET /favicon.ico HTTP/1.1", host: "sls.aliyun.com", referrer: "https://sls.aliyun.com/question/answer/123.html? host: sls.aliyun.com http_version: 1.1 log_level: error pid: 1234 referrer: https://sls.aliyun.com/question/answer/123.html?from=singlemessage request: /favicon.ico request_time:19/08/07 16:05:17 server: sls.aliyun.com verb: GET