Regular expressions

0.0.201

This topic describes the matching modes of regular expressions and the methods that can be used to escape special characters in regular expressions.

Full match

If a regular expression matches an entire string, a full match is performed. For example, \d+ fully matches 1234.

Some functions support partial matches for regular expressions. To perform full matches, you can enclose the regular expressions by using a caret (^) and a dollar sign ($) in the ^Regular expression$ format. For more information, see Regular expression operations.

The following table describes the matching modes for different functions.

Category	Function	Matching mode

Category	Function	Matching mode
Global processing functions	e_regex	Partial match
	e_keep_fields	Full match
	e_drop_fields	Full match
	e_rename	Full match
	e_kv	Partial match
Expression functions	e_match	Full match by default (configurable by using a parameter)
	e_search	Partial match
	regex_select	Partial match
	regex_findall	Partial match
	regex_match	Partial match by default (configurable by using a parameter)
	regex_replace	Partial match
	regex_split	Partial match

The following examples are based on different matching modes:

regex_match("abc123", r"\d+"): The string matches the regular expression. In this example, the default matching mode of partial match is used.
regex_match("abc123", r"\d+", full=True): The string does not match the regular expression. In this example, the matching mode is set to full match.
regex_match("abc123", r"^\d+$"): The string does not match the regular expression. In this example, the matching mode is considered full match.
e_search(r'status~="\d+"'): Whether the value of the status field matches the regular expression is based on the actual value. In this example, the matching mode is considered partial match.
e_search(r'status~="^\d+$"'): Whether the value of the status field matches the regular expression is based on the actual value. In this example, the matching mode is considered full match.

Character escape

Regular expressions may contain special characters. If you want to retain the literal meanings of the characters, you must escape the characters. You can use the following methods to escape special characters:

Use backslashes (\).
For more information, see Escape special characters
Use the str_regex_escape function.
- Example 1: If you use e_drop_fields(str_regex_escape("abc.test"), the abc.test field is discarded.
- Example 2: If you use e_drop_fields("abc.test"), the fields that match abc?test are discarded. The question mark (?) specifies any character.

Group

You can use parentheses () to enclose subexpressions in a regular expression to create a group. The group can be repeatedly referenced. The following example shows the difference between a regular expression before and after a group is created:

"""
Log before processing:
SourceIP: 192.0.2.1
Log after processing:
SourceIP: 192.0.2.1
ip: 192.0.2.1
"""
# Before a group is created:
e_regex("SourceIP",r"\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}","ip")
# After a group is created:
e_regex("SourceIP", "\d{1,3}(.\d{1,3}){3}", "ip")

Capturing group

The text content that matches a capturing group is cached in the memory. The matched text content can be reused in other regular expressions by using backreferences. If the content that is enclosed in the parentheses () of a group does not start with ?:, the group is a capturing group.

By default, all capturing groups are numbered from left to right based on an opening parenthesis. The first group is numbered 1, the second group is numbered 2, and so on. In the following example, three capturing groups are created:

(\d{4})-(\d{2}-(\d{2}))

1     1 2      3     32

If a regular expression contains both common capturing groups and named capturing groups, the named capturing groups are numbered after the common capturing groups. Simple Log Service allows you to directly reference the custom name of a capturing group in regular expressions or programs.

Non-capturing group

The text content that matches a non-capturing group is not cached in the memory. If the content that is enclosed in the parentheses () of a group starts with ?:, the group is a non-capturing group.

For example, if you want to search for program and project, you can use the pro(gram|ject) regular expression. If you do not want to cache the content that matches the regular expression in the memory, you can use pro(?:gram|ject).

Note

(?:x) specifies that x matches the content but the matched content is not cached. You can define a subexpression in the (?:x) format and use the subexpression together with operators in the regular expression.

Feedback