This topic describes how to use a Java user-defined function (UDF) or Python UDF to obtain the value that corresponds to a specific key in a key-value pair string. The key or value has delimiters.
Description
In this example, a UDF named UDF_EXTRACT_KEY_VALUE_WITH_SPLIT is registered. The following section describes the function syntax and input parameters.
string UDF_EXTRACT_KEY_VALUE_WITH_SPLIT(string <s>, string <split1>, string <split2>, string <keyname>)
Function description: This function uses the delimiter specified by split1 to obtain key-value pairs from the string specified by s and uses the delimiter specified by split2 to obtain the keys and values. Then, the value that corresponds to the key specified by keyname is returned. Unlike the UDF that is used to obtain the values of strings that have no delimiters, this UDF applies to strings that have delimiters in keys or values.
Parameters:
s: the string that you want to split, which is of the STRING type. This parameter is required.
split1: the string delimiter that you use to obtain key-value pairs, which is of the STRING type. This parameter is required.
split2: the key/value delimiter that you use to obtain the keys and values, which is of the STRING type. This parameter is required.
keyname: the key whose value you want to obtain, which is of the STRING type. This parameter is required.
Development and usage procedure
1. Write a UDF
Sample code of a Java UDF
Sample code of a Python 3 UDF
Sample code of a Python 2 UDF
package com.aliyun.rewrite;
import com.aliyun.odps.udf.UDF;
public class ExtractKeyValueWithSplit extends UDF{
public String evaluate(String str, String split1, String split2, String keyname) {
if(str==null || split1==null || split2==null || keyname==null){
return null;
}
try {
String keySplit = keyname + split2;
for(String subStr: str.split(split1)){
if (subStr.startsWith(keySplit)){
return subStr.substring(keySplit.length());
}
}
} catch (Exception e) {
return null;
}
return null;
}
}
If you write a Java UDF, you must inherit the UDF class. In this example, the evaluate method defines four input parameters of the STRING type and a return value of the STRING type. The data types of the input parameters and return value are used as the function signature of the UDF in SQL statements. For information about other code specifications and requirements, see Java UDFs.
from odps.udf import annotate
@annotate("string,string,string,string->string")
class ExtractKeyValueWithSplit(object):
def evaluate(self, s, split1, split2, keyname):
if not s:
return None
key_split = keyname + split2
for subStr in s.split(split1):
if subStr.startswith(key_split):
return subStr[len(key_split):]
By default, Python 2 is used to run UDFs in MaxCompute projects. If you want to run UDFs in Python 3, run the following command at the session level: set odps.sql.python.version=cp37
. For more information about Python 3 UDF specifications, see Python 3 UDFs.
from odps.udf import annotate
@annotate("string,string,string,string->string")
class ExtractKeyValueWithSplit(object):
def evaluate(self, s, split1, split2, keyname):
if not s:
return None
key_split = keyname + split2
for subStr in s.split(split1):
if subStr.startswith(key_split):
return subStr[len(key_split):]
If Chinese characters appear in UDF code that is written in Python 2, an error is returned when you run the UDF. To address this issue, you must add an encoding declaration to the header of the code. The declaration format is #coding:utf-8
or # -*- coding: utf-8 -*-
. The two formats are equivalent. For more information about Python 2 UDF specifications, see Python 2 UDFs.
2. Upload resources and register the UDF
After you write and debug a UDF, you must upload the UDF code to MaxCompute and register the UDF. In this example, the UDF named UDF_EXTRACT_KEY_VALUE_WITH_SPLIT is registered. For more information about how to upload and register a Java UDF, see Package a Java program, upload the package, and create a MaxCompute UDF. For more information about how to upload and register a Python UDF, see Upload a Python program and create a MaxCompute UDF.
3. Use the UDF
After you register the UDF, run the following commands to obtain the value that corresponds to the key name from a key-value pair string. The value that corresponds to the key name contains delimiters.
set odps.sql.python.version=cp37;
SELECT UDF_EXTRACT_KEY_VALUE_WITH_SPLIT('name:zhangsang:man;age:2;', ';', ':', 'name');
The following result is returned:
+--------------+
| _c0 |
+--------------+
| zhangsan:man |
+--------------+