The GET_JSON_OBJECT function extracts a string from a JSON string or a value of the JSON data type based on a specified JSON path, json_path.
Syntax
STRING GET_JSON_OBJECT(JSON|STRING <json>, STRING <json_path>)
-- Example: Returns Alice.
SELECT GET_JSON_OBJECT(JSON '{"name": "Alice", "age": 30}', '$.name');Usage notes
The
GET_JSON_OBJECTfunction does not support regular expression syntax in JSON paths.The JSON path syntax for the new JSON data type differs from the original specification. This may cause compatibility issues.
If a query contains multiple GET_JSON_OBJECT functions that process the same JSON data, the function repeatedly parses the same JSON string. This can negatively affect performance and increase costs. To avoid this, you can use
GET_JSON_OBJECTwith a user-defined table-valued function (UDTF) to transform JSON log data. For more information, see Convert JSON log data using MaxCompute built-in functions and UDTFs.
Parameters
json: Required. The JSON data that you want to process. This parameter supports two input types: JSON and STRING.
JSON type: A value of the JSON data type. The value must be in the format
{"Key":"Value", "Key":"Value",...}, such asJSON '{"name": "Alice", "age": 30}'.STRING type: If the input is a STRING, it must meet the following format requirements:
The string must be in the format
'{"Key":"Value", "Key":"Value",...}', such as'{"name": "Alice", "age": 30}'.A double quotation mark (") must be escaped with two backslashes (\\).
A single quotation mark (') must be escaped with one backslash (\).
json_path: Required. A STRING that specifies the JSON path expression used to extract data. The path must start with a
$character, such as$.aliyun.test[0].demo. The path expression uses the following characters:$: Indicates the root node..or['']: Indicates a child node. This is used to parse JSON objects, such as$.store.book. If a JSON key contains a period (.), you can use['']instead.Extracting data using
['']is supported only if you run theSET odps.sql.udf.getjsonobj.new=true;statement.[]:[number]indicates an array subscript. The subscript starts from 0.*: A wildcard character for[]. It returns the entire array. The asterisk (*) cannot be escaped.
Return value
Returns a value of the STRING type. This value is the data extracted from the specified path. The function follows these rules for its return value:
If json is valid and json_path exists, the corresponding string is returned.
If json is empty or has an invalid format, NULL is returned.
If json_path contains
[*], the return value is not in an array format. To force the return value to be in a unified array format, you can run theSET odps.sql.force.getjsonobj.array.format=true;statement.If json_path is invalid, NULL is returned.
Return behavior
You can control the function's return behavior by setting the
project-level or session-levelflag with the following command:SET odps.sql.udf.getjsonobj.new=true/false;.The two return behaviors that correspond to the different flag settings are as follows:
ImportantWe recommend that you use the
SET odps.sql.udf.getjsonobj.new=true;configuration. This configuration provides more standard function behavior, simplifies data processing, and improves performance. If your MaxCompute project has existing jobs that rely on the behavior of escaping JSON reserved characters, we recommend that you continue to use the original behavior. This prevents errors or correctness issues that might occur if you switch to the new behavior without verification.Parameter Settings
SET odps.sql.udf.getjsonobj.new=true;SET odps.sql.udf.getjsonobj.new=false;Return behavior
Outputs the original string without modification.
Outputs the string with JSON reserved characters escaped.
The return value is a JSON string that can be parsed directly. You do not need to use functions, such as
REPLACEorREGEXP_REPLACE, to replace backslashes.JSON reserved characters, such as line feeds (\n) and quotation marks ("), are returned as the strings
'\n'and'\"'.Duplicate key parsing
A JSON object can contain duplicate keys, which can be parsed successfully.
-- Returns 1. SELECT GET_JSON_OBJECT('{"a":"1","a":"2"}', '$.a');A JSON object cannot contain duplicate keys. If it does, parsing may fail.
-- Returns NULL. SELECT GET_JSON_OBJECT('{"a":"1","a":"2"}', '$.a');Output sort order
The output is sorted in the same order as the original JSON string.
-- Returns {"b":"1","a":"2"}. SELECT GET_JSON_OBJECT('{"b":{"b":"1","a":"2"},"a":"2"}', '$.b');The output is sorted in alphabetical order.
-- Returns {"a":"2","b":"1"}. SELECT GET_JSON_OBJECT('{"b":{"b":"1","a":"2"},"a":"2"}', '$.b');If Hive compatibility mode is enabled by running the
SET odps.sql.hive.compatible=true;command, theGET_JSON_OBJECTfunction preserves the original strings in its return value.For MaxCompute projects created on or after January 21, 2021, the default return behavior of the
GET_JSON_OBJECTfunction is to preserve the original strings.For MaxCompute projects created before January 21, 2021, the default return behavior of the
GET_JSON_OBJECTfunction is to escape JSON reserved characters.You can use the following example to determine which behavior the
GET_JSON_OBJECTfunction uses in your MaxCompute project. To do this, run the following command:SELECT GET_JSON_OBJECT('{"a":"[\\"1\\"]"}', '$.a'); --The return value if the behavior is to escape JSON reserved characters: [\"1\"] --The return value if the behavior is to preserve original strings: ["1"]To switch the default return behavior of the
GET_JSON_OBJECTfunction in your project to preserving original strings, you can submit a ticket. This avoids the need to set the property at the session level for every session.
Examples
JSON input parameter
Example 1: Get values for specific keys from JSON data
-- Returns 1.
SELECT GET_JSON_OBJECT(JSON '{"a":1, "b":2}', '$.a');
-- Returns NULL.
SELECT GET_JSON_OBJECT(JSON '{"a":1, "b":2}', '$.c');Example 2: An invalid json_path returns NULL.
-- Returns NULL.
SELECT GET_JSON_OBJECT(JSON '{"a":1, "b":2}', '$invalid_json_path');STRING input parameter
Example 1: Extract information from thesrc_json.json JSON object
-- Prepare the test data.
CREATE TABLE IF NOT EXISTS src_json (
json STRING
);
INSERT OVERWRITE TABLE src_json
VALUES
('{"store":
{"fruit":[{"weight":8,"type":"apple"},
{"weight":9,"type":"pear"}],
"bicycle":{"price":19.95,
"color":"red"}},
"email":"amy@only_for_json_udf_test.net",
"owner":"amy"}');
-- Extract the information of the owner field. The return value is amy.
SELECT GET_JSON_OBJECT(src_json.json, '$.owner') FROM src_json;
-- Optional. Output by preserving the original string.
SET odps.sql.udf.getjsonobj.new=true;
-- Extract the information of the first array in the store.fruit field. The return value is {"weight":8,"type":"apple"}.
SELECT GET_JSON_OBJECT(src_json.json, '$.store.fruit[0]') FROM src_json;
-- Extract the information of a non-existent field. The return value is NULL.
SELECT GET_JSON_OBJECT(src_json.json, '$.non_exist_key') FROM src_json;Example 2: Extract information from JSON array data
-- Returns 2222.
SELECT GET_JSON_OBJECT('{"array":[["aaaa",1111],["bbbb",2222],["cccc",3333]]}','$.array[1][1]');
-- Output by preserving the original string.
SET odps.sql.udf.getjsonobj.new=true;
-- Returns ["h0","h1","h2"].
SELECT GET_JSON_OBJECT('{"aaa":"bbb","ccc":{"ddd":"eee","fff":"ggg","hhh":["h0","h1","h2"]},"iii":"jjj"}','$.ccc.hhh[*]');
-- Output by escaping JSON reserved characters.
SET odps.sql.udf.getjsonobj.new=false;
-- Returns ["h0","h1","h2"].
SELECT GET_JSON_OBJECT('{"aaa":"bbb","ccc":{"ddd":"eee","fff":"ggg","hhh":["h0","h1","h2"]},"iii":"jjj"}','$.ccc.hhh[*]');
-- Returns h1.
SELECT GET_JSON_OBJECT('{"aaa":"bbb","ccc":{"ddd":"eee","fff":"ggg","hhh":["h0","h1","h2"]},"iii":"jjj"}','$.ccc.hhh[1]');Example 3: Extract information from JSON data with a period (.) in the key
-- Prepare the test data.
CREATE TABLE json_test (id STRING, json STRING);
-- Insert data where the key contains a period (.).
INSERT INTO TABLE json_test (id, json) VALUES
("1",
"{
\"China.beijing\":
{\"school\":
{\"id\":0,\"book\":
[{\"title\": \"A\",\"price\": 8.95},
{\"title\": \"B\",\"price\": 10.2}]
}
}
}"
);
-- Insert data where the key does not contain a period (.).
INSERT INTO TABLE json_test (id, json) VALUES
("2",
"{
\"China_beijing\":
{\"school\":
{\"id\":0,\"book\":
[{\"title\": \"A\",\"price\": 8.95},
{\"title\": \"B\",\"price\": 10.2}]
}
}
}"
);
-- Use square brackets [''] to parse data that contains a period (.).
-- This extracts the 'id' value under 'China.beijing'. The return value is 0.
SELECT GET_JSON_OBJECT(json, "$['China.beijing'].school['id']") FROM json_test WHERE id =1;
-- For data without special characters, both '.' and [''] are valid and equivalent.
-- This extracts the 'id' value under 'China_beijing'. The return value is 0.
SELECT GET_JSON_OBJECT(json, "$['China_beijing'].school['id']") FROM json_test WHERE id =2;
SELECT GET_JSON_OBJECT(json, "$.China_beijing.school['id']") FROM json_test WHERE id =2;Example 4: Use [''] for keys that contain a period (.)
SET odps.sql.udf.getjsonobj.new=true;
-- Returns 1.
SELECT GET_JSON_OBJECT('{"a.1":"1","a":"2"}', '$[\'a.1\']');Example 5: Empty or invalid JSON input
-- Returns NULL.
SELECT GET_JSON_OBJECT('','$.array[1][1]');
-- Returns NULL.
SELECT GET_JSON_OBJECT('"array":["aaaa",1111],"bbbb":["cccc",3333]','$.array[1][1]');Example 6: Escaped JSON strings
SET odps.sql.udf.getjsonobj.new=true;
--Returns "1".
SELECT GET_JSON_OBJECT('{"a":"\\"1\\"","b":"2"}', '$.a');
--Returns '1'.
SELECT GET_JSON_OBJECT('{"a":"\'1\'","b":"2"}', '$.a');Example 7: Emoji support
-- Returns the emoji symbol.
SELECT GET_JSON_OBJECT('{"a":"<Emoji symbol>"}', '$.a');Note: DataWorks does not support entering emoji characters directly. You can use a tool, such as Data Integration, to write the encoded strings that correspond to emoji characters to MaxCompute. Then, you can use the GET_JSON_OBJECT function to process them.References
For more information about related functions, see JSON functions.
For best practices, see Migrate JSON data from OSS to MaxCompute.
For more information about json_path, see LanguageManual UDF.