This topic describes the matching modes of regular expressions and the methods that can be used to escape special characters in regular expressions.
Full match
If a regular expression matches an entire string, a full match is performed. For example, \d+
fully matches 1234
.
Some functions support partial matches for regular expressions. To perform full matches, you can enclose the regular expressions by using a caret (^
) and a dollar sign ($
) in the ^Regular expression$
format. For more information, see Regular expression operations.
The following table describes the matching modes for different functions.
Category | Function | Matching mode |
Global processing functions | Partial match | |
Full match | ||
Full match | ||
Full match | ||
Partial match | ||
Partial match | ||
Partial match | ||
Expression functions | Full match by default (configurable by using a parameter) | |
Partial match | ||
Partial match | ||
Partial match | ||
Partial match by default (configurable by using a parameter) | ||
Partial match | ||
Partial match |
The following examples are based on different matching modes:
regex_match("abc123", r"\d+")
: The string matches the regular expression. In this example, the default matching mode of partial match is used.regex_match("abc123", r"\d+", full=True)
: The string does not match the regular expression. In this example, the matching mode is set to full match.regex_match("abc123", r"^\d+$")
: The string does not match the regular expression. In this example, the matching mode is considered full match.e_search(r'status~="\d+"')
: Whether the value of the status field matches the regular expression is based on the actual value. In this example, the matching mode is considered partial match.e_search(r'status~="^\d+$"')
: Whether the value of the status field matches the regular expression is based on the actual value. In this example, the matching mode is considered full match.
Character escape
Regular expressions may contain special characters. If you want to retain the literal meanings of the characters, you must escape the characters. You can use the following methods to escape special characters:
Use backslashes (\).
For more information, see Escape special characters
Use the
str_regex_escape
function.Example 1: If you use
e_drop_fields(str_regex_escape("abc.test")
, the abc.test field is discarded.Example 2: If you use
e_drop_fields("abc.test")
, the fields that match abc?test are discarded. The question mark (?) specifies any character.
Group
You can use parentheses ()
to enclose subexpressions in a regular expression to create a group. The group can be repeatedly referenced. The following example shows the difference between a regular expression before and after a group is created:
"""
Log before processing:
SourceIP: 192.0.2.1
Log after processing:
SourceIP: 192.0.2.1
ip: 192.0.2.1
"""
# Before a group is created:
e_regex("SourceIP",r"\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}","ip")
# After a group is created:
e_regex("SourceIP", "\d{1,3}(.\d{1,3}){3}", "ip")
Capturing group
The text content that matches a capturing group is cached in the memory. The matched text content can be reused in other regular expressions by using backreferences. If the content that is enclosed in the parentheses ()
of a group does not start with ?:
, the group is a capturing group.
By default, all capturing groups are numbered from left to right based on an opening parenthesis. The first group is numbered 1, the second group is numbered 2, and so on. In the following example, three capturing groups are created:
(\d{4})-(\d{2}-(\d{2}))
1 1 2 3 32
If a regular expression contains both common capturing groups and named capturing groups, the named capturing groups are numbered after the common capturing groups. Simple Log Service allows you to directly reference the custom name of a capturing group in regular expressions or programs.
Non-capturing group
The text content that matches a non-capturing group is not cached in the memory. If the content that is enclosed in the parentheses ()
of a group starts with ?:
, the group is a non-capturing group.
For example, if you want to search for program and project, you can use the pro(gram|ject)
regular expression. If you do not want to cache the content that matches the regular expression in the memory, you can use pro(?:gram|ject)
.
(?:x)
specifies that x
matches the content but the matched content is not cached. You can define a subexpression in the (?:x) format and use the subexpression together with operators in the regular expression.