When you process a large amount of text data, you may need to calculate the number of occurrences of a specific word or phrase. MaxCompute allows you to use the REGEXP_COUNT function to calculate the number of occurrences of a specified substring in a string. This topic describes the syntax of the REGEXP_COUNT function and provides examples on how to use the function.
Syntax
Returns the number of substrings that match a specified pattern
in the source
string from the start position specified by start_position
.
bigint regexp_count(string <source>, string <pattern>[, bigint <start_position>])
Parameters
source: required. A value of the STRING type. This parameter specifies the string that contains the substring you want to search for. If the value is not a string, an error is returned.
pattern: required. A constant of the STRING type or a regular expression. This parameter specifies the pattern that a specified substring must match. For more information about regular expressions, see Regular expressions. If pattern is an empty string or is of another data type, an error is returned.
start_position: optional. A constant of the BIGINT type. The value of this parameter must be greater than 0. If the value is of another data type or is less than or equal to 0, an error is returned. If you do not specify this parameter, the default value is 1. This value indicates that the search starts from the first character of the source string.
Return value
A value of the BIGINT type is returned. The return value varies based on the following rules:
If no substring is matched, 0 is returned.
If the value of source, pattern, or start_position is null, null is returned.
Examples
Example 1: Calculate the number of substrings that match a specified pattern in
abababc
from the specified position. Sample statements:Calculate the number of substrings that match the regular expression
a.c
inabababc
. The regular expression specifies substrings that start witha
, end withc
, and contain one character between a and c.-- The return value is 1. SELECT regexp_count('abababc', 'a.c');
Calculate the number of substrings that match the regular expression
[[:alpha:]]{2}
inabababc
from the third character. The regular expression specifies two consecutive letters.-- The return value is 2. SELECT regexp_count('abababc', '[[:alpha:]]{2}', 3);
Example 2: An input parameter is set to null. Sample statement:
-- The return value is null. SELECT regexp_count('abababc', null);
Example 3: Calculate the number of
colons (:)
in the JSON string{"account_id":123456789,"account_name":"allen","location":"hangzhou","bill":100}
. Sample statement:-- The return value is 4. SELECT regexp_count('{"account_id":123456789,"account_name":"allen","location":"hangzhou","bill":100}',':');
Related functions
REGEXP_COUNT is a string function. For more information about functions related to string searches and conversion, see String functions.