This topic describes the syntax supported by the JSON and JSONB data types and the methods of using these data types.
Overview of the JSON and JSONB data types
In recent years, the popularity of mobile apps has increased the requirements for scenarios such as application tracking and user tag computing. To better meet these requirements, more and more big data systems begin to use semi-structured formats to store the related data for more flexible development and processing. JSON and JSONB are the commonly used semi-structured data types. The following content describes the differences between the JSON and JSONB data types:
Data of the JSON type is stored in the form of text, and data of the JSONB type is stored as binary data.
Data of the JSON type can be written at a fast pace, but data reads are time-consuming. This is because each time data of the JSON type is queried, processing functions need to parse the data. Data of the JSONB type can be read at a fast pace, but data writes are time-consuming. Data of the JSONB type is stored in a decomposed binary format. The system needs to perform additional conversion for data of the JSONB type when the system writes such data to Hologres. Therefore, more time is required to write data of the JSONB type. However, the query of JSONB data takes less time because no reparsing is required.
The JSON data type stores an exact copy of the input text. Spaces, duplicate object keys, and the order of object keys are retained. If a JSON object within a value contains the same key more than once, all the key-value pairs are retained. The JSONB data type removes unnecessary spaces, duplicate object keys, and the order of object keys when the input text is parsed. If duplicate object keys are specified in the input text, only the last value is retained.
The major difference between the two data types is the data processing efficiency.
The JSON data type stores an exact copy of the input text. The processing functions must reparse the copy during each execution. In addition, insignificant characters such as spaces may exist in data of the JSON type to comply with the semantic constraints of the input text. If a JSON object within a value contains the same key more than once, all the key-value pairs are retained. The processing functions consider the last value as a valid value.
Data of the JSONB type is stored in a decomposed binary format. Compared with data of the JSON type, data of the JSONB type takes more time to be written due to the additional conversion process. However, the processing of JSONB data takes less time because no reparsing is required. Spaces, the order of object keys, and duplicate object keys do not exist in JSONB data. If duplicate object keys are specified in the input text, only the last value is retained.
Limits
Hologres supports the JSON and JSONB data types. When you use these data types, take note of the following limits:
Only Hologres V0.9 and later support the JSON data type. If the version of your Hologres instance is earlier than V0.9, manually upgrade your Hologres instance in the Hologres console or join a Hologres DingTalk group to apply for an instance upgrade. For more information about how to manually upgrade a Hologres instance, see Instance upgrades. For more information about how to join a Hologres DingTalk group, see Obtain online support for Hologres.
Only Hologres V1.1 and later support GIN indexes for fields of the JSONB data type.
Only Hologres V1.3 and later support column-oriented storage for the JSONB data type. The column-oriented storage for the JSONB data type can be used only for column-oriented tables but not row-oriented tables. Column-oriented storage can be triggered only if a column-oriented table contains 1,000 or more data records.
Hologres does not support the following JSON-specific functions: json_each, jsonb_each, json_each_text, jsonb_each_text, json_extract_path, jsonb_extract_path, and jsonb_to_record.
If you want to use the jsonb_extract_path and json_extract_path functions, execute the following statements:
SELECT json_extract_path( '{"key":{"key1":"key1","key2":"key2"}}'::json , 'key' , 'key1' ); -- Statement equivalent to the json_extract_path function: SELECT '{"key":{"key1":"key1","key2":"key2"}}'::json #> '{"key","key1"}';
SELECT jsonb_extract_path( '{"key":{"key1":"key1","key2":"key2"}}'::jsonb , 'key' , 'key1' ); -- Statement equivalent to the jsonb_extract_path function: SELECT '{"key":{"key1":"key1","key2":"key2"}}'::jsonb #> '{"key","key1"}';
Operators supported by data of the JSON and JSONB types
Commonly used operators supported by data of the JSON and JSONB types
The following table describes commonly used operators that are supported by data of the JSON and JSONB types.
Operator | Right operand type | Description | Example | Execution result |
-> | int | Obtains a JSON array element with indexes starting from zero. A negative integer indicates that elements are counted backward from the end. |
|
|
-> | text | Obtains a JSON object field based on a key. |
|
|
->> | int | Obtains a JSON array element as text. |
|
|
->> | text | Obtains a JSON object field as text. |
|
|
#> | text[] | Obtains a JSON object from the specified path. |
|
|
#>> | text[] | Obtains a JSON object as text from the specified path. |
|
|
Additional operators supported by data of the JSON and JSONB types
The following table describes the additional operators that are supported by data of the JSON and JSONB types. The additional operators are used to meet different business development requirements.
Operator | Right operand type | Description | Example | Execution result |
@> | jsonb | Specifies whether the left JSON value contains the right JSON path or value. |
|
|
<@ | jsonb | Specifies whether the left JSON path or value is contained in the right JSON value. |
|
|
? | text | Specifies whether a key or an element string exists in the JSON value. |
|
|
?| | text[] | Specifies whether a key or an element string in an array string exists in the JSON value. |
|
|
?& | text[] | Specifies whether all array strings exist in the JSON value. |
|
|
|| | jsonb | Concatenates two JSONB values into a new JSONB value. Note The |
|
|
- | text | Deletes a key or a value from the left operand. The key or value is matched based on the key value. |
|
|
- | text[] | Deletes multiple keys or values from the left operand. The keys or values are matched based on the key values. |
|
|
- | integer | Deletes an array element that is in the specified position. A negative integer indicates that elements are counted backward from the end. An error is returned if the JSON data is not stored as an array. |
|
|
#- | text[] | Deletes an element with the specified path. For JSON arrays, a negative integer indicates that elements are counted backward from the end. |
|
|
Functions that support JSON and JSON data types
Functions for processing JSON and JSONB values
The following table describes the functions that are used to process JSON and JSONB values.
Function | Type of the return value | Description | Example | Execution result |
json_array_length(json) | int | Returns the number of elements in the outermost JSON array. |
|
|
jsonb_array_length(jsonb) | ||||
json_object_keys(json) | setof text | Returns a set of keys in the outermost JSON object. |
|
|
jsonb_object_keys(jsonb) | ||||
json_populate_record(base anyelement, from_json json) | anyelement | Expands the objects in the from_json parameter into a row with columns that match the record type defined by the base parameter. |
|
|
jsonb_populate_record(base anyelement, from_json jsonb) | ||||
json_populate_recordset(base anyelement, from_json json) | setof anyelement | Expands the outermost array of objects in the from_json parameter into a set of rows with columns that match the record type defined by the base parameter. |
|
|
jsonb_populate_recordset(base anyelement, from_json jsonb) | ||||
json_array_elements(json) | setof json | Expands a JSON array into a set of JSON values. |
|
|
jsonb_array_elements(jsonb) | setof jsonb | |||
json_array_elements_text(json) | setof text | Expands a JSON array into a set of text values. |
|
|
jsonb_array_elements_text(jsonb) | ||||
json_typeof(json) | text | Returns the data type of the outermost JSON value as a text string. Possible data types include OBJECT, ARRAY, STRING, NUMBER, BOOLEAN, and NULL. |
|
|
jsonb_typeof(jsonb) | ||||
json_strip_nulls(from_json json) | json | Returns the objects in the from_json parameter. Object fields that have null values are omitted. Other null values are retained. |
|
|
jsonb_strip_nulls(from_json jsonb) | jsonb | |||
jsonb_set(target jsonb, path text[], new_value jsonb[,create_missing boolean]) | jsonb | Returns the objects in the target parameter. The section specified by the path parameter is replaced by the value of the new_value parameter. If the value of the create_missing parameter is true and the item specified by the path parameter does not exist, the value of the new_value parameter is inserted. The default value of the create_missing parameter is true. As required by the path-oriented operators, a negative integer in the value of the path parameter indicates that elements are counted backward from the end of JSON arrays. |
|
|
|
| |||
jsonb_insert(target jsonb, path text[], new_value jsonb, [insert_after boolean]) | jsonb | Returns the objects in the target parameter with the value of the new_value parameter inserted. The section specified by the path parameter may be in a JSONB array. In this case, if the value of the insert_after parameter is false, which is the default value, the value of the new_value parameter is inserted before the value of the target parameter. Otherwise, the value of the new_value parameter is inserted after the value of the target parameter. The section specified by the path parameter may be in a JSONB object. In this case, the value of the new_value parameter is inserted only if the value of the target parameter does not exist. As required by the path-oriented operators, a negative integer in the value of the path parameter indicates that elements are counted backward from the end of JSON arrays. |
|
|
|
| |||
jsonb_pretty(from_json jsonb) | text | Returns the objects in the from_json parameter as indented JSON text. |
|
|
jsonb_agg | jsonb | Aggregates values, including null values, into a JSON array. |
|
|
jsonb_object_agg | jsonb | Aggregates key-value pairs into JSON objects. The value can be empty, but the key cannot be empty. |
|
|
is_valid_json | BOOLEAN | Validates a JSON string. The Boolean value Note
|
|
|
Parsing functions
Function | Description | Example | Execution result |
to_json(anyelement) | Returns a value as a valid JSON object. Arrays and composites are recursively converted into arrays and objects. For a value that is not an array or a composite, if a cast function is provided, the cast function is invoked to convert the input value into a JSON object. Otherwise, a scalar value is generated. If the scalar value is not the NUMBER, BOOLEAN, or NULL data type, the scalar value is represented by JSON text. In this case, the scalar value is a valid JSON string. |
|
|
to_jsonb(anyelement) | |||
array_to_json(anyarray [, pretty_bool]) | Returns an array as a JSON array. If you enter a multidimensional array, a JSON array of arrays is returned. If the value of the pretty_bool parameter is true, line feeds are added between dimension-1 elements. |
|
|
json_build_array(VARIADIC "any") | Builds a JSON array that may contain heterogeneous data based on a list of variable arguments. |
|
|
jsonb_build_array(VARIADIC "any") | |||
json_build_object(VARIADIC "any") | Builds a JSON object based on a list of variable arguments. The argument list consists of alternating keys and values. |
|
|
jsonb_build_object(VARIADIC "any") | |||
json_object(text[]) | Builds a JSON object based on a text array. The array can be a one-dimensional array that contains an even number of members. The members are taken as alternating key-value pairs. The array can also be a two-dimensional array. Each inner array has two elements, which are taken as a key-value pair. |
|
|
jsonb_object(text[]) |
|
| |
json_object(keys text[], values text[]) | Obtains a key-value pair based on two separate arrays. In other respects, the key-value pair is in the same format as a single argument. |
|
|
jsonb_object(keys text[], values text[]) |
JSONB field indexing
In Hologres V1.1 and later, GIN indexes and B-tree indexes can be created for fields of the JSONB data type to accelerate data queries. You can use one of the following operators to create GIN indexes for fields of the JSONB data type: the default jsonb_ops operator and the jsonb_path_ops operator.
In this topic, indexes are created for fields of the JSONB data type. We recommend that you create indexes for fields of the JSONB data type rather than the JSON data type.
Use the default jsonb_ops operator to create an index.
CREATE INDEX idx_name ON table_name USING gin (idx_col);
Use the jsonb_path_ops operator to create an index.
CREATE INDEX idx_name ON table_name USING gin (idx_col jsonb_path_ops);
The difference between jsonb_ops and jsonb_path_ops is that jsonb_ops creates independent index items for each key and value in the data, whereas jsonb_path_ops creates an index item only for each value in the data.
GIN indexes can be created by using native PostgreSQL operators and Hologres operators. The following sections provide the examples.
Native operators provided by PostgreSQL
Use the jsonb_ops operator to create an index.
-- 1. Create a table. BEGIN; DROP TABLE IF EXISTS json_table; CREATE TABLE IF NOT EXISTS json_table ( id INT ,j jsonb ); COMMIT; -- 2. Use the jsonb_ops operator to create an index. CREATE INDEX index_json on json_table USING GIN(j); -- 3. Insert data into the table. INSERT INTO json_table VALUES (1, '{"key1": 1, "key2": [1, 2], "key3": {"a": "b"}}') , (1, '{"key1": 1}'), (2, '{"key2": [1, 2], "key3": {"a": "b"}}') ; -- 4. Query data from the table. SELECT * FROM json_table WHERE j ? 'key1'; -- The following result is returned: id | j ----+------------------------------------------------- 1 | {"key1": 1, "key2": [1, 2], "key3": {"a": "b"}} 1 | {"key1": 1}
Execute the
EXPLAIN
statement to query the execution plan. Sample code:explain SELECT * FROM json_table WHERE j ? 'key1'; QUERY PLAN Gather (cost=0.00..0.26 rows=1000 width=12) -> Local Gather (cost=0.00..0.23 rows=1000 width=12) -> Decode (cost=0.00..0.23 rows=1000 width=12) -> Bitmap Heap Scan on json_table (cost=0.00..0.13 rows=1000 width=12) Recheck Cond: (j ? 'key1'::text) -> Bitmap Index Scan on index_json (cost=0.00..0.00 rows=0 width=0) Index Cond: (j ? 'key1'::text) Optimizer: HQO version 1.3.0
The preceding results show that the execution plan contains the
Index Scan
step. This indicates that an index is used during the query.Use the jsonb_path_ops operator to create an index.
-- 1. Create a table. BEGIN; DROP TABLE IF EXISTS json_table; CREATE TABLE IF NOT EXISTS json_table ( id INT ,j jsonb ); COMMIT; -- 2. Use the jsonb_ops operator to create an index. CREATE INDEX index_json on json_table USING GIN(j jsonb_path_ops); -- 3. Insert data into the table. INSERT INTO json_table ( SELECT i, ('{ "key1": "'||i||'" ,"key2": "'||i%100||'" ,"key3": "'||i%1000 ||'" ,"key4": "'||i%10000||'" ,"key5": "'||i%100000||'" }')::jsonb FROM generate_series(1, 1000000) i ) ; -- 4. Query data that contains '{"key1": "10"}' from the table. SELECT * FROM json_table WHERE j @> '{"key1": "10"}'::JSONB; -- The following result is returned: id | j ----+------------------------------------------------------------------------ 10 | {"key1": "10", "key2": "10", "key3": "10", "key4": "10", "key5": "10"} (1 row)
Execute the
EXPLAIN
statement to query the execution plan. Sample code:explain SELECT * FROM json_table WHERE j @> '{"key1": "10"}'::JSONB; QUERY PLAN ------------------------------------------------------------------------------------------- Gather (cost=0.00..0.26 rows=1000 width=12) -> Local Gather (cost=0.00..0.23 rows=1000 width=12) -> Decode (cost=0.00..0.23 rows=1000 width=12) -> Bitmap Heap Scan on json_table (cost=0.00..0.13 rows=1000 width=12) Recheck Cond: (j @> '{"key1": "10"}'::jsonb) -> Bitmap Index Scan on index_json (cost=0.00..0.00 rows=0 width=0) Index Cond: (j @> '{"key1": "10"}'::jsonb) Optimizer: HQO version 1.3.0 (8 rows)
The preceding results show that the execution plan contains the
Index Scan
step. This indicates that an index is used during the query.
Operators provided by Hologres
GIN indexes for fields of the JSONB data type provided by native PostgreSQL cannot be used to retrieve accurate data. Data must be rechecked after it is retrieved. Query performance may fail to be improved even if an index is used. Hologres provides the ops_class operator, and this operator does not require data recheck. If you do not specify an operator, the ops_class operator is used by default.
The ops_class operator supports indexes that contain 1 to 127 bytes in length. If an index exceeds 127 bytes in length, the index is truncated. Therefore, fields of the JSONB data type for which you want to create indexes are also truncated if the index length exceeds 127 bytes. In this case, data needs to be rechecked. You can execute the EXPLAIN ANALYZE statement to determine whether data is rechecked.
The jsonb_holo_ops operator class corresponds to the jsonb_ops operator class and supports filtering operations by using the following operators: ?, ?|, ?&, @>
. The jsonb_holo_path_ops operator class corresponds to the jsonb_path_ops operator class. Only the filtering operation by using the @>
operator is supported.
Use the jsonb_holo_ops operator to create an index.
-- 1. Create a table. BEGIN ; DROP TABLE IF EXISTS json_table; CREATE TABLE IF NOT EXISTS json_table ( id INT ,j jsonb ); COMMIT ; -- 2. Use the jsonb_holo_ops operator to create an index. CREATE INDEX index_json on json_table USING GIN(j jsonb_holo_ops); -- 3. Insert data into the table. INSERT INTO json_table VALUES (1, '{"key1": 1, "key2": [1, 2], "key3": {"a": "b"}}') , (1, '{"key1": 1}'), (2, '{"key2": [1, 2], "key3": {"a": "b"}}') ; -- 4. Query data from the table. SELECT * FROM json_table WHERE j ? 'key1'; -- The following result is returned: id | j ----+------------------------------------------------- 1 | {"key1": 1} 1 | {"key1": 1, "key2": [1, 2], "key3": {"a": "b"}} (2 rows)
Use the jsonb_holo_path_ops operator to create an index.
-- 1. Create a table. BEGIN ; DROP TABLE IF EXISTS json_table; CREATE TABLE IF NOT EXISTS json_table ( id INT ,j jsonb ); -- 2. Use the jsonb_holo_path_ops operator to create an index. CREATE INDEX index_json on json_table USING GIN(j jsonb_holo_path_ops); -- 3. Insert data into the table. INSERT INTO json_table ( SELECT i, ('{ "key1": "'||i||'" ,"key2": "'||i%100||'" ,"key3": "'||i%1000 ||'" ,"key4": "'||i%10000||'" ,"key5": "'||i%100000||'" }')::jsonb FROM generate_series(1, 1000000) i ) ; -- 4. Query data that contains '{"key1": "10"}' from the table. SELECT * FROM json_table WHERE j @> '{"key1": "10"}'::JSONB ; -- The following result is returned: id | j ----+------------------------------------------------------------------------ 10 | {"key1": "10", "key2": "10", "key3": "10", "key4": "10", "key5": "10"} (1 row)
Data import example: Import JSONB data from Realtime Compute for Apache Flink to Hologres in real time
When you import data from Realtime Compute for Apache Flink to Hologres, you must define the data types of fields to those supported by Realtime Compute for Apache Flink in SQL deployments. When you create internal tables in Hologres, you must define the data types of fields to those supported by Hologres. For more information about data type mappings between Realtime Compute for Apache Flink and Hologres, see Data type mappings between Realtime Compute for Apache Flink or Blink and Hologres.
If you want to import JSON data from Realtime Compute for Apache Flink to Hologres, you must define VARCHAR as the data type of JSON data in source tables and result tables in SQL deployments of Realtime Compute for Apache Flink. In Hologres internal tables, define JSONB as the data type. Examples:
Create an internal table in Hologres and define JSONB as the data type of the
message
field.BEGIN ; DROP TABLE IF EXISTS holo_internal_table; CREATE TABLE IF NOT EXISTS holo_internal_table ( id BIGINT NOT NULL, message JSONB NOT NULL ); CALL set_table_property('holo_internal_table', 'distribution_key', 'id'); COMMIT ;
In SQL deployments of Realtime Compute for Apache Flink, define VARCHAR as the data type of the
message
field in source table and result tables. Then, write data to Hologres.CREATE TEMPORARY TABLE randomSource ( id BIGINT, message VARCHAR ) WITH ('connector' = 'datagen'); CREATE TEMPORARY TABLE sink_holo ( id BIGINT, message VARCHAR ) WITH ( 'connector' = 'hologres', 'dbname'='<yourDBname>', -- The name of the Hologres database to which you want to connect. 'tablename'='<holo_internal_table>', -- The name of the Hologres table to which data is written. 'username'='<yourUsername>', -- The AccessKey ID of your Alibaba Cloud account. 'password'='<yourPassword>', -- The AccessKey secret of your Alibaba Cloud account. 'endpoint'='<yourEndpoint>', --The virtual private cloud (VPC) endpoint of your Hologres instance. ); INSERT INTO sink_holo SELECT 1, '{"k":"v"}' FROM randomSource;
Support for column-oriented storage for data of the JSONB data type
GIN indexes are used to improve performance only at the computing layer. In the computing process, the entire JSON content needs to be scanned. Hologres V1.3 and later support column-oriented storage for data of the JSONB type based on optimization at the storage layer. Data of the JSONB type can be stored in columns like structured data. This increases the efficiency of data compression and accelerates queries of JSONB-formatted data.
For more information about column-oriented storage of JSONB-formatted data, see Column-oriented storage for JSONB-formatted data.