RLIKE is an operator that is provided by MaxCompute SQL based on the Perl Compatible Regular Expressions (PCRE) standards. You can use the RLIKE operator to perform more accurate and complex pattern matching or replacement in MaxCompute SQL. This topic describes metacharacters supported by RLIKE, POSIX character groups, Chinese character groups, and escape characters. This topic also provides examples of the related characters.
Metacharacter
The RLIKE operator supports various metacharacters. The following table describes commonly used metacharacters that are supported by RLIKE.
Metacharacter | Description |
^ | Matches the beginning of a string. |
$ | Matches the end of a string. |
. | Matches any single character. |
* | Matches the preceding character or character pattern zero or more times. |
+ | Matches the preceding character or character pattern one or more times. |
? | Matches the preceding character or character pattern zero or one time. |
? | Matches a modifier. If this character follows one of other characters (*, +, ?, {n}, {n,}, or {n,m}), the match pattern is non-greedy. In the non-greedy algorithm, a character string matches as few characters as possible. In the greedy algorithm, a character string matches as many characters as possible. By default, the greedy algorithm is used. |
A|B | Matches A or B. |
(abc)* | Matches the abc sequence zero or more times. |
{n} or {m,n} | The number of matches. |
[ab] | Matches any character in the brackets. Fuzzy match is used. |
[a-d] | Matches one of the following characters: a, b, c, and d. |
[^ab] | Matches any character except those in the brackets. |
[::] | For more information, see POSIX character groups. |
\ | The escape character. For more information, see Escape characters. |
\n | n is a digit ranging from 1 to 9 and indicates back reference. |
\d | A digit. |
\D | A non-digit character. |
For more information about regular expressions, visit pcre2syntax man page.
If the matching result does not meet your expectations, escape characters may be required. For more information, see Escape characters.
Examples
Example 1: Match the beginning and end of a string.
select 'aa123bb' rlike '^a'; -- Match the beginning of the string aa123bb with a. true is returned. select 'aa123bb' rlike '^a.*b$'; -- Match the beginning of the string aa123bb with a and the end of the string aa123bb with b. true is returned. select 'footerbar' rlike 'foo(.*?)(bar)'; -- true is returned.
Example 2: Use an escape character to match a string that has special characters.
select 'a+b' rlike 'a\\\+b'; -- true is returned.
Example 3: Match any single character.
select 'cc123bb' rlike '^[a-d]'; -- Match the beginning of the string cc123bb with one of the characters from a to d. true is returned.
Example 4: Match any digit and any non-digit character.
select '123bb' rlike '^\\\d'; -- Match the beginning of the string 123bb with any digit. true is returned. select 'cc123bb' rlike '^\\\D'; -- Match the beginning of the string cc123bb with any non-digit character. true is returned.
Character groups
POSIX character groups
Character group | Description | Valid value |
[[:alnum:]] | Letters and digits | [a-zA-Z0-9] |
[[:alpha:]] | Letters | [a-zA-Z] |
[[:ascii:]] | ASCII characters | [\x00-\x7F] |
[[:blank:]] | Spaces and tab characters | [ \t] |
[[:cntrl:]] | Control characters | [\x00-\x1F\x7F] |
[[:digit:]] | Digits | [0-9] |
[[:graph:]] | Characters other than whitespace characters | [\x21-\x7E] |
[[:lower:]] | Lowercase letters | [a-z] |
[[:print:]] | [:graph:] and whitespace characters | [\x20-\x7E] |
[[:punct:]] | Punctuations | [][!”#$%&’()*+,./:;<=>? @\^_`{|}~-] |
[[:space:]] | Whitespace characters | [ \t\r\n\v\f] |
[[:upper:]] | Uppercase letters | [A-Z] |
[[:xdigit:]] | Hexadecimal characters | [A-Fa-f0-9] |
MaxCompute does not support the \f
and \v
escape characters. For more information about the escape characters that are supported by MaxCompute, see Escape characters.
Chinese character groups
Character group | Valid value |
Double-byte characters such as Chinese characters | [^\\x{00}-\\x{ff}] |
Chinese characters | [\\x{4e00}-\\x{9fa5}] |
Chinese punctuations | Chinese punctuations do not have unified Unicode range. You can use a search engine to search for the Unicode of a Chinese punctuation and use operators to match Chinese punctuations. For example, the Chinese period (。) is expressed as Note If you want to match Chinese single quotation marks (’), you can use |
Escape characters
RLIKE allows you to use backslashes (\
) as escape characters. Therefore, backslashes (\
) in regular expressions must also be escaped.
Examples
Example 1
Use a regular expression to match the
a+b
string. The plus sign (+
) in the expression is a special character in the expression and must be escaped. In the regular expression engine, the string is expressed asa\\+b
. The expression must be escaped again. Therefore, the expression that can match the string isa\\\+b
.select 'a+b' rlike 'a\\\+b'; +------+ | _c1 | +------+ | true | +------+
Example 2
Match the backslash
(\)
character. In specific cases, The backslash(\)
character is a special character in the regular expression engine and must be expressed as\\
in the engine. Then, the expression must be escaped again. As a result, the backslash (\) character is expressed as\\\\
.select 'a\\b', 'a\\b' rlike 'a\\\b'; +-----+------+ | _c0 | _c1 | +-----+------+ | a\b | false | +-----+------+ select 'a\\b', 'a\\b' rlike 'a\\\\b'; +-----+------+ | _c0 | _c1 | +-----+------+ | a\b | true | +-----+------+
NoteIf a MaxCompute SQL statement contains
a\\b
, MaxCompute returnsa\b
because MaxCompute escapes the expression.Example 3
Match a tab character. If a string contains tab characters, MaxCompute stores
\t
as one character when MaxCompute reads this expression. Therefore, \t is a common character in regular expressions.select 'a\tb', 'a\tb' rlike 'a\tb'; +---------+------+ | _c0 | _c1 | +---------+------+ | a b | true | +---------+------+