Splits the source string into groups based on a given pattern and returns the string in the nth group specified by groupid.
Precautions
In versions of the data types compatible with Hive, the REGEXP_EXTRACT
function follows the Java regex specification. However, in data type versions 1.0 and 2.0, it follows the MaxCompute specification.
Syntax
string regexp_extract(string <source>, string <pattern>[, bigint <groupid>])
Parameters
source: required. A value of the STRING type. This parameter specifies the string that you want to split.
pattern: required. A constant of the STRING type or a regular expression. This parameter specifies the pattern based on which you split a string. For more information about regular expressions, see Regular expressions.
groupid: optional. A constant of the BIGINT type. The value of this parameter must be greater than or equal to 0.
Data is stored in the UTF-8 format. Chinese characters can be represented in hexadecimal. They are encoded in the range of [\\x{4e00},\\x{9fa5}].
Return value
A value of the STRING type is returned. The return value varies based on the following rules:
If pattern is an empty string or no group is specified in pattern, an error is returned.
If the value of groupid is not of the BIGINT type or is less than 0, an error is returned. If you do not specify this parameter, the default value is 1. This value indicates that the string in the first group is returned. If groupid is set to 0, all substrings that match pattern are returned.
If the value of source, pattern, or groupid is null, null is returned.
Related functions
REGEXP_EXTRACT is a string function. For more information about functions related to string searches and conversion, see String functions.