All Products
Search
Document Center

MaxCompute:REGEXP_REPLACE

Last Updated:Sep 30, 2024

Uses a string specified by replace_string to replace the substring that matches a given pattern at the nth occurrence specified by occurrence in the source string and returns a result.

Precautions

In versions of the data types compatible with Hive, the REGEXP_REPLACE function follows the Java regex specification. However, in data type versions 1.0 and 2.0, it follows the MaxCompute specification.

Syntax

string regexp_replace(string <source>, string <pattern>, string <replace_string>[, bigint <occurrence>])

Parameters

  • source: required. A value of the STRING type. This parameter specifies the string that you want to replace.

  • pattern: required. A constant of the STRING type or a regular expression. This parameter specifies the pattern that the specified substring must match. For more information about regular expressions, see Regular expressions. If pattern is an empty string, an error is returned.

  • replace_string: required. A value of the STRING type. The value is used to replace the string that matches the pattern.

    Note
    • If the replace_string is an empty string, the function returns after removing the string matching the pattern.

    • The value of replace_string can contain a backward reference \n. If the value of replace_string contains a backward reference, a substring that matches the nth capturing group specified by pattern is inserted. n is a digit ranging from 1 to 9. If the value of replace_string contains the backward reference \0, the substring that matches the entire pattern is inserted. Backslashes (\) must be escaped. For example, if the value of replace_string contains the backward reference \1, the backslash (\) of the backward reference must be escaped and the backward reference \1 is expressed as (\\1). You can also use the raw string R'(\1)' to express the backward reference \1.

  • occurrence: optional. A constant of the BIGINT type, which must be greater than or equal to 0. The value of this parameter indicates that the string that matches the specified pattern at the nth occurrence specified by occurrence is replaced with the string specified by replace_string. If this parameter is set to 0, all the substrings that match the specified pattern are replaced. If this parameter is of another data type or is less than 0, an error is returned. Default value: 0.

Return value

A value of the STRING type is returned. The return value varies based on the following rules:

  • If the referenced group does not exist, the returned result is undefined.

  • If the value of replace_string is null and a substring matches the given pattern, null is returned.

  • If the value of replace_string is null but no substring matches the given pattern, the original string is returned.

  • If the value of source, pattern, or occurrence is null, null is returned.

Examples

  • Example 1: Replace a string based on the specified pattern. Sample statements:

    -- The return value is Abcd. 
    select regexp_replace("abcd", "a", "A", 0);
    -- The return value is bcd. 
    select regexp_replace("abcd", "a", "", 0);
    -- The return value is 19700101. 
    select regexp_replace("1970-01-01", "-", "", 0);
    -- The return value is abc. 
    select regexp_replace("a1b2c3", "[0-9]", "", 0);
    -- The return value a1b2c. 
    select regexp_replace("a1b2c3", "[0-9]", "", 3);
  • Example 2: Use (\\1)\\2-\\3 to replace all the substrings that match ([[:digit:]]{3})\\.([[:digit:]]{3})\\.([[:digit:]]{4}) in the 123.456.7890 string. Sample statement:

    -- The return value is (123)456-7890. 
    select regexp_replace('123.456.7890', '([[:digit:]]{3})\\.([[:digit:]]{3})\\.([[:digit:]]{4})',
    '(\\1)\\2-\\3', 0);
  • Example 3: Replace the substring that matches the specified pattern in the abcd string. Sample statements:

    -- The return value is a b c d. 
    select regexp_replace('abcd', '(.)', '\\1 ', 0);
    -- The return value is a bcd. 
    select regexp_replace('abcd', '(.)', '\\1 ', 1);
    -- The return value is d. 
    select regexp_replace("abcd", "(.*)(.)$", "\\2", 0);
  • Example 4: The data in the URL column of the url_set table is in the www.simple@xxx.com format and xxx in www.simple@xxx.com of each row is different. Replace all characters after www with strings in the data of the URL column. Sample statement:

    -- The return value is wwwtest. 
    select regexp_replace(url,'(www)(.*)','wwwtest',0) from url_set;
  • Example 5: An input parameter is set to null. Sample statement:

    -- The return value is null. 
    select regexp_replace('abcd', '(.)', null, 0);
  • Example 6: The group that you want to reference does not exist. Sample statements:

    -- Only one group is defined in the pattern, and the referenced group does not exist. 
    -- We recommend that you do not use this function in this way. The result of referencing a nonexistent group is not defined. 
    regexp_replace("abcd", "(.)", "\\2", 0) = "" or "abcd"
    -- The referenced group is defined in the pattern. Therefore, "\1" references a nonexistent group. 
    -- We recommend that you do not use this function in this way. The result of referencing a nonexistent group is not defined. 
    regexp_replace("abcd", "a", "\\1", 0) = "bcd" or "abcd" 

Related functions

REGEXP_REPLACE is a string function. For more information about functions related to string searches and conversion, see String functions.