RLIKE is an operator provided by MaxCompute SQL based on the Perl Compatible Regular Expressions (PCRE) standards. You can use the RLIKE operator to perform more accurate and complex pattern matching or replacement in MaxCompute SQL. This topic describes metacharacters supported by RLIKE, POSIX character groups, Chinese character groups, and escape characters. This topic also provides examples of the related characters.
Metacharacters
The RLIKE operator supports various metacharacters. The following table describes commonly used metacharacters that are supported by RLIKE.
Metacharacter | Description |
^ | Matches the beginning of a string. |
$ | Matches the end of a string. |
. | Matches any single character. |
* | Matches the preceding character or character pattern zero or more times. |
+ | Matches the preceding character or character pattern one or more times. |
? | Matches the preceding character or character pattern zero or one time. Serves as a matching modifier. When this character follows any other limiting character (* , +, ?, {n}, {n,}, or {n,m}), the matching pattern is non-greedy. The non-greedy pattern matches as few characters as possible in the searched string, whereas the default greedy pattern matches as many characters as possible in the searched string.
|
A|B | Matches A or B. |
(abc)* | Matches the abc sequence zero or more times. |
{n} or {m,n} | The number of matches. |
[ab] | Matches any character in the brackets. The character can be a or b. |
[a-d] | Matches one of the following characters: a, b, c, and d. |
[^ab] | ^ indicates NOT. This expression matches any character that is not a or b. |
[::] | For more information, see POSIX character groups later in this topic. |
\ | Escape character. For more information, see Escape character matching later in this topic. |
\n | n is a digit ranging from 1 to 9 and indicates back reference. |
\d | Digits. |
\D | Non-digit characters. |
Examples
Escape character matching
The RLIKE operator supports the backslash \
as an escape character. Therefore, all \
characters in the pattern of a regular expression must be escaped twice.
Example 1
The regular expression needs to match the string a+b
, where +
is a special character in regular expressions. Therefore, it must be expressed as an escape character. In the regular expression engine, it is expressed as a\+b
. Because another layer of escaping is required, the expression that can match this string is a\\+b
.
SELECT 'a+b' RLIKE 'a\\+b';
+
| _c1 |
+
| true |
+
Example 2
To match the character \
, in the regular expression engine \
is a special character, so it needs to be represented as \\
, and because another layer of escaping is required, it is written as \\\\
.
Note
In MaxCompute SQL, a\\b
is written, but a\b
is displayed in the output. This is because MaxCompute escapes the expression.
SELECT 'a\\b', 'a\\b' RLIKE 'a\\\b';
+
| _c0 | _c1 |
+
| a\b | false |
+
SELECT 'a\\b', 'a\\b' RLIKE 'a\\\\b';
+
| _c0 | _c1 |
+
| a\b | true |
+
Example 3
If a string contains a tab character (TAB), the system has already stored \t
as a single character when it reads these two characters. Therefore, it is also a regular character in the pattern of the regular expression.
SELECT 'a\tb', 'a\tb' RLIKE 'a\tb';
+
| _c0 | _c1 |
+
| a b | true |
+
Example 4
Match strings that contain digits \d
, non-digit characters \D
, and back references \n
.
SELECT '2025maxcompute' RLIKE '^\\\d';
SELECT 'maxcompute2025test' RLIKE '^\\\D';
SELECT 'alibaba-cloud-MC2025-test' RLIKE 'MC\\\d';
SELECT 'alibaba-cloud-MC2025-test' RLIKE 'MC\\\d{4}-';
SELECT 'mmabcdefdefgg' RLIKE '(abc)(def)\\\2';
Character groups
Posix character groups
Character group | Description | Range |
Character group | Description | Range |
[[:alnum:]] | Letters and digits | [a-zA-Z0-9] |
[[:alpha:]] | Letters | [a-zA-Z] |
[[:ascii:]] | ASCII characters | [\x00-\x7F] |
[[:blank:]] | Spaces and tab characters | [ \t] |
[[:cntrl:]] | Control characters | [\x00-\x1F\x7F] |
[[:digit:]] | Digits | [0-9] |
[[:graph:]] | Characters other than whitespace characters | [\x21-\x7E] |
[[:lower:]] | Lowercase letters | [a-z] |
[[:print:]] | [:graph:] and whitespace characters | [\x20-\x7E] |
[[:punct:]] | Punctuations | [][!"#$%&'()*+,./:;<=>? @\^_`{|}~-] |
[[:space:]] | Whitespace characters | [ \t\r\n\v\f] |
[[:upper:]] | Uppercase letters | [A-Z] |
[[:xdigit:]] | Hexadecimal characters | [A-Fa-f0-9] |
Note
MaxCompute does not support \f
and \v
. For more information about supported escape characters, see Escape characters.
Chinese character groups
Chinese character group | Range |
Chinese character group | Range |
Double-byte characters such as Chinese characters | [^\\x{00}-\\x{ff}] |
Chinese characters | [\\x{4e00}-\\x{9fa5}] |
Chinese punctuations | Chinese punctuations do not have a unified encoding range. You can search for Chinese punctuation Unicode in a search engine and then use operators to exclude them one by one. |
Examples
For example, the regular expression for the period (。
) is [\\x{3002}]
.
SELECT * FROM VALUES ('Hello.'),('nihao!') t(d) WHERE d RLIKE '[\\x{3002}]';
+
| d |
+
| Hello. |
+
Note
To match a single quotation mark, you can use unicode0027
, and the corresponding regular expression is [\\x{0027}]
.