Extracts a single string from a standard JSON string based on a specified method.
Usage notes
This function is used to extract a single string from a standard JSON string by using JSONPath. This function supports input parameters of the following data types:
JSON types: If input parameters are of JSON types, standard JSONPath expressions are used.
STRING type: If input parameters are of the STRING type, the original JSONPath expressions are used.
The methods and usage notes for using this function vary based on input parameter data types. This topic describes how to use the GET_JSON_OBJECT function when input parameters are of the JSON and STRING types.
The JSONPath expressions that are used when input parameters are of JSON types and those that are used when input parameters are of the STRING type comply with different rules. This may cause incompatibility issues.
The GET_JSON_OBJECT function does not support the syntax of JSONPath expressions.
Input parameters of JSON types
Syntax
string get_json_object(json <json>, string <json_path>)
Parameters
json: required. A JSON string from which you want to extract a single string.
json_path: required. A JSONPath expression based on which you want to extract a single string.
Return value
A value of a STRING type is returned.
Examples
Example 1: Extract the value that corresponds to the key a from a JSON string.
select get_json_object(json '{"a":1, "b":2}', '$.a');
The following result is returned:
+-----+ | _c0 | +-----+ | 1 | +-----+
Example 2: Extract the value that corresponds to the key c from a JSON string.
select get_json_object(json '{"a":1, "b":2}', '$.c');
The following result is returned:
+-----+ | _c0 | +-----+ | NULL | +-----+
Example 3: If an invalid JSON path is specified, the return value is NULL.
select get_json_object(json '{"a":1, "b":2}', '$invalid_json_path');
The following result is returned:
+-----+ | _c0 | +-----+ | NULL | +-----+
Input parameters of the STRING type
Background information
This function is used to extract a single string from a standard JSON string by using path. The original data is read each time this function is called. Repeated calls may affect system performance and increase costs. To prevent repeated calls, you can use the GET_JSON_OBJECT
function together with user-defined table-valued functions (UDTFs). For more information, see Convert JSON log data by using MaxCompute built-in functions and UDTFs.
Parameters
json: required. The value must be of the STRING type. This parameter specifies a standard JSON object in the
{Key:Value, Key:Value,...}
format. If the string contains a double quotation mark ("), use two backslashes (\\) to escape the double quotation mark before extraction. If the string contains a single quotation mark ('), use a single backslash (\) to escape the single quotation mark before extraction.path: required. The value must be of the STRING type. This parameter specifies the path in the value of the json parameter and starts with
$
. For more information about the path parameter, see LanguageManual UDF. For more information about best practices, see Migrate JSON data from OSS to MaxCompute. Meanings of different characters:$
: indicates the root node..
or['']
: indicates a child node. MaxCompute parses JSON objects by using.
or['']
. If a key in a JSON object contains a period (.), you can use [''].[]
([number]
): indicates an array subscript, which starts from 0.*
: indicates the wildcard for[]
. If this character is used in the path parameter, an entire array is returned. An asterisk (*
) cannot be escaped.
Limits
Only the function of the latest version allows you to extract data by using ['']
in the path parameter. To use [''], you must add the set odps.sql.udf.getjsonobj.new=true;
configuration.
Syntax
string get_json_object(string <json>, string <path>)
Return value
If the json parameter is left empty or invalid, NULL is returned.
If the json parameter is valid and path exists, the related string is returned.
You can configure the
flagodps.sql.udf.getjsonobj.new
parameter for a session to specify how the function returns a value.If you run the
set odps.sql.udf.getjsonobj.new=true;
command, the function of the latest version is used and the function return behavior is more standard. This facilitates data processing and improves data processing performance. We recommend that you use the latest function version. In this case, the function complies with the following rules when it returns a value:The return value is a JSON string, which can be parsed as JSON data, without the need to use the REPLACE or REGEXP_REPLACE function to replace backslashes (\).
Duplicate keys are allowed in a JSON object. If duplicate keys exist, the data can be parsed, and the first value is returned.
-- The return value is 1. select get_json_object('{"a":"1","a":"2"}', '$.a');
The output results are displayed in alphabetical order.
-- The return value is {"b":"1","a":"2"}. select get_json_object('{"b":{"b":"1","a":"2"},"a":"2"}', '$.b');
If you run the set odps.sql.udf.getjsonobj.new=false; command, the function complies with the following rules when it returns a value:
NoteIf the Hive-compatible data type edition is enabled, the function of the latest version is used.
JSON reserved characters such as line feeds (\n) and quotation marks (") are displayed as
'\n'
and'\"'
.Each key in a JSON object must be unique. If duplicate keys exist, the data may fail to be parsed.
-- The return value is NULL. select get_json_object('{"a":"1","a":"2"}', '$.a');
The output results are displayed in alphabetical order.
-- The return value is {"a":"2","b":"1"}. select get_json_object('{"b":{"b":"1","a":"2"},"a":"2"}', '$.b');
Examples
Example 1: Extract information from the JSON object
src_json.json
. Sample statements:-- The JSON string src_json.json contains the following content: +----+ json +----+ {"store": {"fruit":[{"weight":8,"type":"apple"},{"weight":9,"type":"pear"}], "bicycle":{"price":19.95,"color":"red"} }, "email":"amy@only_for_json_udf_test.net", "owner":"amy" } -- Extract the information of the owner field and return amy. select get_json_object(src_json.json, '$.owner') from src_json; -- Extract the information of the first array in the store.fruit field and return {"weight":8,"type":"apple"}. select get_json_object(src_json.json, '$.store.fruit[0]') from src_json; -- Extract the information of a non-existent field and return NULL. select get_json_object(src_json.json, '$.non_exist_key') from src_json;
Example 2: Extract information from a JSON object of the ARRAY type. Sample statements:
-- The return value is 2222. select get_json_object('{"array":[["aaaa",1111],["bbbb",2222],["cccc",3333]]}','$.array[1][1]'); -- The return value is ["h0","h1","h2"]. set odps.sql.udf.getjsonobj.new=true; select get_json_object('{"aaa":"bbb","ccc":{"ddd":"eee","fff":"ggg","hhh":["h0","h1","h2"]},"iii":"jjj"}','$.ccc.hhh[*]'); -- The return value is ["h0","h1","h2"]. set odps.sql.udf.getjsonobj.new=false; select get_json_object('{"aaa":"bbb","ccc":{"ddd":"eee","fff":"ggg","hhh":["h0","h1","h2"]},"iii":"jjj"}','$.ccc.hhh'); -- The return value is h1. select get_json_object('{"aaa":"bbb","ccc":{"ddd":"eee","fff":"ggg","hhh":["h0","h1","h2"]},"iii":"jjj"}','$.ccc.hhh[1]');
Example 3: Extract information from a JSON object that includes a period (
.
). Sample statements:-- Create a table. create table mf_json (id string, json string); -- Insert data into the table. The key in the data contains a period (.). insert into table mf_json (id, json) values ("1", "{ \"China.beijing\":{\"school\":{\"id\":0,\"book\":[{\"title\": \"A\", \"price\": 8.95},{\"title\": \"B\",\"price\": 10.2}]}}}"); -- Insert data into the table. The key in the data does not contain a period (.). insert into table mf_json (id, json) values ("2", "{ \"China_beijing\":{\"school\":{\"id\":0,\"book\":[{\"title\": \"A\", \"price\": 8.95},{\"title\": \"B\",\"price\": 10.2}]}}}"); -- Query the value of id in the JSON object whose key is China.beijing. The value 0 is returned. Only [''] can be used to specify the key because the key contains a period (.). This way, MaxCompute can parse the key. select get_json_object(json, "$['China.beijing'].school['id']") from mf_json where id =1; -- Query the value of id in the JSON object whose key is China_beijing. The value 0 is returned. You can use one of the following statements: select get_json_object(json, "$['China_beijing'].school['id']") from mf_json where id =2; select get_json_object(json, "$.China_beijing.school['id']") from mf_json where id =2;
Example 4: The json parameter is empty or invalid. Sample statements:
-- The return value is NULL. select get_json_object('','$.array[1][1]'); -- The return value is NULL. select get_json_object('"array":["aaaa",1111],"bbbb":["cccc",3333]','$.array[1][1]');
Example 5: Escape a JSON string. Sample statements:
set odps.sql.udf.getjsonobj.new=true; -- The return value is "1". select get_json_object('{"a":"\\"1\\"","b":"2"}', '$.a'); -- The return value is '1'. select get_json_object('{"a":"\'1\'","b":"2"}', '$.a');
Example 6: If a key in a JSON object contains a period (
.
), use['']
. Sample statements:set odps.sql.udf.getjsonobj.new=true; -- The return value is "1". SELECT get_json_object('{"a.1":"1","a":"2"}', '$[\'a.1\']');
Related functions
GET_JSON_OBJECT is a complex type function or a string function.
For more information about the functions that are used to process data of complex data types, such as ARRAY, MAP, STRUCT, and JSON, see Complex type functions.
For more information about functions related to string searches and conversion, see String functions.