All Products
Search
Document Center

Simple Log Service:Regular expression functions

Last Updated:Dec 20, 2024

This topic describes the syntax of regular expression functions and provides examples of their usage.

Simple Log Service supports the following regular expression functions using RE2 syntax.

Important If you want to use strings in analytic statements, you must enclose strings in single quotation marks (''). Strings that are not enclosed or enclosed in double quotation marks ("") indicate field names or column names. For example, 'status' indicates the status string, and status or "status" indicates the status log field.

Function Name

Syntax

Description

Supported in SQL

Supported in SPL

regexp_extract_all function

regexp_extract_all(x, regular expression)

Extracts substrings from a target string that match the given regular expression and returns a collection of all such substrings.

×

regexp_extract_all(x, regular expression, n)

Extracts substrings from a target string that match the given regular expression and returns a collection of substrings that match the target capturing group.

×

regexp_extract function

regexp_extract(x, regular expression)

Extracts and returns the first substring from a target string that matches the given regular expression.

regexp_extract(x, regular expression, n)

Extracts substrings from a target string that match the given regular expression and returns the first substring that matches the target capturing group.

regexp_extract_bool function

regexp_extract_bool(x, regular expression)

Extracts substrings from a target string that match a given regular expression and converts them to Boolean type. If the conversion fails, null is returned.

×

regexp_extract_bool(x, regular expression, n)

Extracts substrings from a target string that match a given regular expression and returns the substrings corresponding to the nth capturing group, converting them to Boolean type. If the conversion fails, null is returned.

×

regexp_extract_long function

regexp_extract_long(x, regular expression)

Extracts substrings from a target string that match a given regular expression and converts them to bigint type. If the conversion fails, null is returned.

×

regexp_extract_long(x, regular expression, n)

Extracts substrings from a target string that match a given regular expression and returns the substring corresponding to the nth capturing group, converting it to bigint type. If the conversion fails, null is returned.

×

regexp_extract_double function

regexp_extract_double(x, regular expression)

Extracts the first substring from a target string that matches a given regular expression and converts it to double type. If the conversion fails, null is returned.

×

regexp_extract_double(x, regular expression, n)

Extracts substrings from a target string that match a given regular expression and returns the substring corresponding to the nth capturing group, converting it to double type. If the conversion fails, null is returned.

×

regexp_like function

regexp_like(x, regular expression)

Determines if a target string matches a specific regular expression.

regexp_replace function

regexp_replace(x, regular expression)

Deletes substrings from a target string that match a given regular expression and returns the remaining substrings.

regexp_replace(x, regular expression, replace string)

Replaces substrings in a target string that match a given regular expression with the replacement string, returning the modified string.

regexp_split function

regexp_split(x, regular expression)

Splits a target string into a collection of substrings using the given regular expression as the delimiter.

×

Note

To use a regular expression function for extracting single quotation marks (') from a string, you need to include an additional single quotation mark (') in the regular expression. For specific examples, see the regexp_extract function (Example 3).

regexp_extract_all function

The regexp_extract_all function retrieves all substrings from a given string that match a specified regular expression.

Syntax

  • Returns an array of substrings that match a specified regular expression within a given string.

    regexp_extract_all(x, regular expression)
  • Returns an array of substrings from a given string that match the nth capturing group in the specified regular expression.

    regexp_extract_all(x, regular expression, n)

Parameter description

Parameter

Description

x

The parameter value is of varchar type.

regular expression

The regular expression with capturing groups, such as (\d)(\d)(\d) representing three capturing groups.

n

The nth capturing group, where n is an integer starting from 1.

Return value type

An array.

Examples

  • Example 1: Extract all digits from the server_protocol field value.

    • Sample field

      server_protocol:HTTP/2.0
    • Query statement (Debug)

      *| SELECT regexp_extract_all(server_protocol, '\d+')
    • Query resultregexp_extract_all

  • Example 2: Extract the "Chrome" portion from the http_user_agent field value and count the number of Chrome browser requests.

    • Sample field

      http_user_agent:Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/535.1 (KHTML, like Gecko) Chrome/14.0.803.0 Safari/535.1
    • Query statement (Debug)

      *| SELECT regexp_extract_all(http_user_agent, '(Chrome)',1) AS Chrome, count(*) AS count GROUP BY Chrome
    • Query resultregexp_extract_all

regexp_extract function

The regexp_extract function is designed to extract substrings from a specified string that match a given regular expression.

Syntax

  • This function retrieves the first substring that matches the specified regular expression from the input string and returns it.

    regexp_extract(x, regular expression)
  • It also extracts substrings that match the specified regular expression from the input string and returns the first substring corresponding to the nth capturing group in the expression.

    regexp_extract(x, regular expression, n)

Parameter description

Parameter

Description

x

The parameter value is of varchar type.

regular expression

A regular expression that includes capturing groups, such as (\d)(\d)(\d), which indicates three capturing groups.

n

The nth capturing group, where n is an integer starting from 1.

Return value type

The return value is of the varchar type.

Examples

SQL

  • Example 1: Extract the first digit from the server_protocol field value.

    • Sample field

      server_protocol:HTTP/2.0
    • Query statement (test)

      *|SELECT regexp_extract(server_protocol, '\d+')
    • Query resultregexp_extract

  • Example 2: Extract the file part from the request_uri field value and count the access frequency for each file.

    • Sample field

      request_uri:/request/path-3/file-5
    • Query statement (test)

      * | SELECT regexp_extract(request_uri, '.*\/(file.*)', 1) AS file, count(*) AS count GROUP BY file
    • Query result分析uri

  • Example 3: Extract the single quotation mark (') and digit part from the message field value.

    • Sample field

      message:error'1232
    • Query statement

      * | SELECT regexp_extract(message, '''\d+') 
      Note

      To extract single quotation marks (') using a regular expression function, you must escape them by adding an additional single quotation mark (').

    • Query resultregexp_extract函数

SPL

  • Example 1: Extract the first digit from the server_protocol field value.

    • Sample field

server_protocol:HTTP/2.0
  • Query statement

* | extend a = regexp_extract(server_protocol, '\d+')
  • Query result

image.png

  • Example 2: Extract the file part from the request_uri field value.

    • Sample field

request_uri:/request/path-3/file-5
  • Query statement

* | extend a = regexp_extract(request_uri, '.*\/(file.*)',1)
  • Query result

image.png

  • Example 3: Extract the single quotation mark (') and digit part from the message field value.

    • Sample field

message:error'1232
  • Query statement

* | extend a = regexp_extract(message, '''\d+') 

Description

To extract single quotation marks (') using a regular expression function, you must escape them by adding an additional single quotation mark (').

  • Query result

image.png

regexp_extract_bool function

The regexp_extract_bool function extracts substrings that match a specified regular expression from a given string and converts them to Boolean type. If the conversion fails, it returns null. The conversion succeeds only if the substring is "true" or "false," regardless of case sensitivity.

Syntax

  • Extracts substrings matching a specified regular expression from a given string and converts them to Boolean type. If the conversion fails, it returns null.

    regexp_extract_bool(x, regular expression)
  • Extracts substrings matching a specified regular expression from a given string, returns the substring that matches the specified capturing group, and converts it to Boolean type. If the conversion fails, it returns null.

    regexp_extract_bool(x, regular expression, n)

Parameter description

Parameter

Description

x

The parameter value is of the varchar type.

regular expression

A regular expression that includes capturing groups. For instance, (\d)(\d)(\d) represents three capturing groups.

n

The nth capturing group, where n is an integer starting from 1.

Return value type

Boolean type.

Examples

  • Extract the Boolean value from the field value.

    • Sample Field

      false 
    • Query Statement (test)

      *| select regexp_extract_bool('false', '[a-zA-Z]+')
    • Query and Analysis Results

      image

regexp_extract_long function

The regexp_extract_long function retrieves substrings from a given string that match a specified regular expression and converts them into the bigint type. Should the conversion be unsuccessful, it returns null.

Syntax

  • Extracts substrings matching a specified regular expression from a given string and converts them into the bigint type. Should the conversion fail, it returns null.

    regexp_extract_long(x, regular expression)
  • Extracts substrings matching a specified regular expression from a given string, returns those corresponding to the nth capturing group, and converts them into the bigint type. Should the conversion fail, it returns null.

    regexp_extract_long(x, regular expression, n)

Parameter description

Parameter

Description

x

The parameter value is of varchar type.

regular expression

The regular expression includes capturing groups. For instance, (\d)(\d)(\d) denotes three capturing groups.

n

The nth capturing group. n is an integer starting from 1.

Return value type

The return value is of bigint type.

Examples

  • Extract digits from the time field.

    • Sample field

      time:19/Dec/2024:06:16:06
    • Query statement and analysis (test)

      *|SELECT regexp_extract_long(time, '(\d{2})/', 1) 
    • Query and analysis results

      image

regexp_extract_double function

The regexp_extract_double function extracts substrings that match a specified regular expression from a given string and converts them to the double type. If the conversion is unsuccessful, it returns null.

Syntax

  • Extracts substrings matching a specified regular expression from a given string and converts them to the double type. If the conversion is unsuccessful, it returns null.

    regexp_extract_double(x, regular expression)
  • Extracts substrings matching a specified regular expression from a given string, returns the substring that matches the nth capturing group, and converts it to the double type. If the conversion is unsuccessful, it returns null.

    regexp_extract_double(x, regular expression, n)

Parameter description

Parameter

Description

x

The parameter value is of varchar type.

regular expression

The regular expression that includes capturing groups. For instance, (\d)(\d)(\d) denotes three capturing groups.

n

The nth capturing group, where n is an integer starting from 1.

Return value type

The type returned is double.

Examples

  • Extract digits from the server_protocol field.

    • Sample Field

      server_protocol:HTTP/1.1
    • Query Statement and Analysis (test)

      *|SELECT regexp_extract_double(server_protocol, '\d+') 
    • Query and Analysis Results

      image

Regexp_like function

The regexp_like function determines if a given string matches a specific regular expression.

Syntax

regexp_like(x, regular expression)

Parameter description

Parameter

Description

x

This parameter is a varchar type value.

regular expression

Regular expression.

Return value type

Boolean.

Examples

SQL

Determine if the server_protocol field contains any digits.

  • Sample Field

    server_protocol:HTTP/2.0
  • Query Statement and Analysis (debug)

    *| select regexp_like(server_protocol, '\d+')
  • Query and Analysis Resultsregexp_like

SPL

Verify if the server_protocol field includes digits.

  • Sample Field

server_protocol:HTTP/2.0
  • Query Statement and Analysis

* |extend a = regexp_like(server_protocol, '\d+')
  • Query and Analysis Results

image.png

regexp_replace function

The regexp_replace function deletes or replaces substrings in a given string that match a specified regular expression.

Syntax

  • This function removes substrings that match a specified regular expression from a given string and returns the remaining substrings.

    regexp_replace(x, regular expression)
  • It replaces substrings that match a specified regular expression in a given string and returns the modified string.

    regexp_replace(x, regular expression, replace string)

Parameter description

Parameter

Description

x

The parameter value is of varchar type.

regular expression

Regular expression.

replace string

The substring used for replacement.

Return value type

The return value is of the varchar type.

Examples

SQL

  • Example 1: Replace all region names starting with cn in the region field with China and count the number of requests from China.

    • Sample fields

      region:cn-shanghai
    • Query statements (debug)

      * | select regexp_replace(region, 'cn.*','中国') AS region, count(*) AS count GROUP BY region
    • Query resultsregexp_replace

  • Example 2: Delete the version number from the server_protocol field's value and count the number of requests for each communication protocol.

    • Sample fields

      server_protocol:HTTP/2.0
    • Query statements (debug)

      *| select regexp_replace(server_protocol, '.\d+') AS server_protocol, count(*) AS count GROUP BY server_protocol
    • Query resultsregexp_replace

SPL

  • Example 1: Replace all region names starting with cn in the region field with China.

    • Sample fields

region:cn-shanghai
  • Query statements

* | extend a = regexp_replace(region, 'cn.*','中国')
  • Query results

    image.png

  • Example 2: Delete the version number from the server_protocol field's value.

    • Sample fields

server_protocol:HTTP/2.0
  • Query statements (debug)

* | extend a = regexp_replace(server_protocol, '.\d+')
  • Query results

image.png

regexp_split function

The regexp_split function divides a given string into an array of substrings based on a specified regular expression.

Syntax

regexp_split(x, regular expression)

Parameter description

Parameter

Description

x

The value of this parameter is of the varchar type.

regular expression

Regular expression.

Return value type

An array.

Example

Split the request_uri field value using a forward slash (/).

  • Sample Field

    request_uri:/request/path-0/file-7
  • Query Statement (test)

    * | SELECT regexp_split(request_uri,'/')
  • Query and Analysis Resultsregexp_split