Use column-oriented storage for JSONB-formatted data in Hologres and related limits and examples - Hologres

Hologres V1.3 and later support column-oriented storage for JSONB-formatted data in column-oriented tables. This reduces the storage resources that are required to store JSONB-formatted data and accelerates queries of JSONB-formatted data. This topic describes how to implement column-oriented storage for JSONB-formatted data in Hologres.

How it works

The following figure shows the implementation of column-oriented storage for JSONB-formatted data in Hologres. After column-oriented storage is enabled for the JSONB data type, the system converts the column of the JSONB data type into multiple columns with strong schemas at the underlying layer. This way, the column in which the value that you want to query resides can be directly located to improve query performance. Column-oriented storage enables JSONB-formatted data to be stored and compressed with the same efficiency as structured data at the storage layer because the values of different keys are stored in different columns. This helps improve efficiency and reduces costs.

Note

Column-oriented storage is not suitable for JSON-formatted data. We recommended that you do not enable this feature for JSON-formatted data.

Limits

Only Hologres V1.3 and later support column-oriented storage for the JSONB data type. If you want to use this feature, we recommend that you upgrade your Hologres instance to V1.3.37 or later for better performance. You can manually upgrade your Hologres instance in the Hologres console or join a Hologres DingTalk group to apply for an instance upgrade. For more information about how to manually upgrade a Hologres instance, see Instance upgrades. For more information about how to join a Hologres DingTalk group, see Obtain online support for Hologres.
Column-oriented storage for the JSONB data type can be used only for column-oriented tables but not row-oriented tables. Column-oriented storage for the JSONB data type can be triggered only if a column-oriented table contains 1,000 or more data records.

The following table describes operators that are supported by column-oriented storage for the JSONB data type. If you use unsupported operators in queries, the query performance may deteriorate.

Operator	Right operand type	Description	Operation and result
->	text	Obtains a JSON object field based on a key.	Sample operation: `select '{"a": {"b":"foo"}}'::json->'a'` Returned result: `{"b":"foo"}`
->>	text	Obtains a JSON object field as text.	Sample operation: `select '{"a":1,"b":2}'::json->>'b'` Returned result: `2`

Use column-oriented storage for the JSONB data type

Enable column-oriented storage for the JSONB data type

Execute the following statement to enable column-oriented storage for a column of the JSONB data type in a table.

-- Enable column-oriented storage for a column of the JSONB data type in a table.
ALTER TABLE <table_name> ALTER COLUMN <column_name> SET (enable_columnar_type = ON);

The table_name parameter specifies the name of the table, and the column_name parameter specifies the name of the column.

Important

After column-oriented storage is enabled for a column of the JSONB data type, the system converts the storage mode of all historical data in the column into the column-oriented storage mode during compaction.
Compaction consumes system resources, such as memory resources. We recommend that you perform compaction during off-peak hours. You can run the vacuum table_name; command to forcefully trigger compaction. The compaction operation is complete when the vacuum command finishes running.
After compaction is complete, newly written data is stored in column-oriented storage mode.

Enable DECIMAL type inference

Important

Before you enable DECIMAL type inference, you must make sure that column-oriented storage is enabled for the JSONB data type.

Hologres V2.0.11 and later support column-oriented storage of data of the DECIMAL type. Sample JSON data:

{
  "name":"Mike",
  "statistical_period":"2023-01-01 00:00:00+08",
  "balance":123.45
}

After you enable DECIMAL type inference for data of balance, column-oriented storage is supported for the data. Sample statement:

-- Enable column-oriented storage for a column of the DECIMAL data type in a table.
ALTER TABLE <table_name> ALTER COLUMN <column_name> SET (enable_decimal = ON);

The table_name parameter specifies the name of the table, and the column_name parameter specifies the name of the column.

Check the storage mode of the JSONB data type

The statements that are used to check the storage mode of the JSONB data type differ based on Hologres versions.

For Hologres V1.3.37 and later, execute the following statement:
Note
In Hologres V2.0.17 and earlier, the statement can be used to check the storage modes of only tables in the public schema. In Hologres V2.0.18 and later, the statement can be used to check the storage modes of tables in all schemas.
```
-- In Hologres V2.0.17 and earlier, the statement can be used to check the storage modes of only tables in the public schema. In Hologres V2.0.18 and later, the statement can be used to check the storage modes of tables in all schemas.
SELECT * FROM hologres.hg_column_options WHERE schema_name='<schema_name>' AND table_name = '<table_name>';
```
The schema_name parameter specifies the schema name, and the table_name parameter specifies the table name.

For Hologres V1.3.10 to V1.1.36, execute the following statements:

SELECT DISTINCT
    a.attnum as num,
    a.attname as name,
    format_type(a.atttypid, a.atttypmod) as type,
    a.attnotnull as notnull, 
    com.description as comment,
    coalesce(i.indisprimary,false) as primary_key,
    def.adsrc as default,
    a.attoptions
FROM pg_attribute a 
JOIN pg_class pgc ON pgc.oid = a.attrelid
LEFT JOIN pg_index i ON 
    (pgc.oid = i.indrelid AND i.indkey[0] = a.attnum)
LEFT JOIN pg_description com on 
    (pgc.oid = com.objoid AND a.attnum = com.objsubid)
LEFT JOIN pg_attrdef def ON 
    (a.attrelid = def.adrelid AND a.attnum = def.adnum)
WHERE a.attnum > 0 AND pgc.oid = a.attrelid
AND pg_table_is_visible(pgc.oid)
AND NOT a.attisdropped
AND pgc.relname = '<table_name>' 
ORDER BY a.attnum;

The table_name parameter specifies the table name.

Sample result
In the following figure, the attoptions or option property of a column is enable_columnar_type = ON, which indicates that column-oriented storage is enabled for this column.

Disable column-oriented storage for the JSONB data type

Execute the following statement to disable column-oriented storage for a column of the JSONB data type in a table.

-- Disable column-oriented storage for a column of the JSONB data type in a table.
ALTER TABLE <table_name> ALTER COLUMN <column_name> SET (enable_columnar_type = OFF);

The table_name parameter specifies the name of the table, and the column_name parameter specifies the name of the column.

Important

After column-oriented storage is disabled for a column of the JSONB data type, the system converts the storage mode of all historical data in the column into the standard JSONB data storage mode during compaction. After compaction is complete, the conversion is complete.
Compaction consumes system resources, such as memory resources. We recommend that you perform compaction during off-peak hours. You can run the vacuum table_name; command to forcefully trigger compaction. The compaction operation is complete when the vacuum command finishes running.
After compaction is complete, newly written data is stored in the standard JSONB data storage mode.

Disable DECIMAL type inference

To disable DECIMAL type inference for a column of a table, execute the following statement:

-- Disable column-oriented storage for a column of the DECIMAL type in a table.
ALTER TABLE <table_name> ALTER COLUMN <column_name> SET (enable_decimal = OFF);

The table_name parameter specifies the name of the table, and the column_name parameter specifies the name of the column.

Note

After you disable DECIMAL type inference, compaction is immediately triggered to convert the storage mode of data of the DECIMAL type from column-oriented storage mode into the original mode.

Create bitmap indexes

In Hologres, the bitmap_columns property specifies a bitmap index. This property uses an index schema that is independent of data storage. You can use the bitmap_columns property to accelerate equivalent comparisons based on a bitmap vector structure. Bitmap indexes can help filter data that equals a specified value in a stored file. Therefore, the bitmap_columns property is applicable to point queries. In Hologres V2.0 and later, you can create bitmap indexes for columns with JSONB-formatted data that is stored in column-oriented storage mode. After column-oriented storage is enabled for the JSONB data type, the system can parse data of the following data types: INT, INT[], BIGINT, BIGINT[], TEXT, TEXT[], and JSONB. After the bitmap index feature is enabled, the system creates bitmap indexes for columns with parsed data of the INT, INT[], BIGINT, BIGINT[], TEXT, and TEXT[] data types.

Syntax:

call set_table_property('<table_name>', 'bitmap_columns', '[<columnName>{:[on|off]}[,...]]');

The following table describes the parameters in the preceding syntax.

Parameter	Description
table_name	The name of the table.
columnName	The name of the column
on	Indicates that a bitmap index is specified for the current column. Important You can create a bitmap index only for a column with JSONB-formatted data that is stored in column-oriented storage mode.
off	Indicates that no bitmap index is created for the current column.

Example

Create a table.

DROP TABLE IF EXISTS user_tags;

-- Create a table.
BEGIN;
CREATE TABLE IF NOT EXISTS user_tags (
    ds timestamptz,
    tags jsonb
);
COMMIT;

Enable column-oriented storage for the tags column of the JSONB data type.

ALTER TABLE user_tags ALTER COLUMN tags SET (enable_columnar_type = ON);

View the storage mode of the tags column.

select * from hologres.hg_column_options where table_name = 'user_tags';

In the following returned result, the options property of the tags column is enable_columnar_type = on, which indicates that column-oriented storage is enabled.

 schema_name | table_name | column_id | column_name |       column_type        | notnull | comment | default |          options
-------------+------------+-----------+-------------+--------------------------+---------+---------+---------+---------------------------
 public      | user_tags  |         1 | ds          | timestamp with time zone | f       |         |         |
 public      | user_tags  |         2 | tags        | jsonb                    | f       |         |         | {enable_columnar_type=on}
(2 rows)

Import data.

INSERT INTO user_tags (ds, tags)
SELECT
    '2022-01-01 00:00:00+08'
    , ('{"id":' || i || ',"first_name" :"Sig",  "gender" :"Male"}')::jsonb
FROM
    generate_series(1, 10001) i;

Optional. Forcefully write data to disks.
When data is written to disks, Hologres enables column-oriented storage for JSONB-formatted data. To view the storage effect at the earliest opportunity, you can execute the following statement to forcefully write data to disks:
```
VACUUM user_tags;
```
Query data from the table.
Execute the following statement to query data in the first_name column whose value in the id column is 10:
```
SELECT
    (tags -> 'first_name')::text AS first_name
FROM
    user_tags
WHERE (tags -> 'id')::int = 10;
```

Check whether column-oriented storage is enabled for the JSONB data type based on the execution plan.

-- Query detailed statistics.
SET hg_experimental_show_execution_statistics_in_explain = ON;
-- Query the execution plan.
EXPLAIN ANALYZE
SELECT
    (tags -> 'first_name')::text AS first_name
FROM
    user_tags
WHERE (tags -> 'id')::int = 10;

In the following figure, columnar_access_used is returned, which indicates that column-oriented storage is enabled for the JSONB data type.

For the query in Step 6, you can create a bitmap index for the tags column to improve the efficiency of a key-based point query. Sample statement:
```
call set_table_property('user_tags', 'bitmap_columns', 'tags');
```
View the execution plan to check whether the bitmap index is used.
```
-- View the execution plan.
EXPLAIN ANALYZE
SELECT
    (tags -> 'first_name')::text AS first_name
FROM
    user_tags
WHERE (tags -> 'id')::int = 10;
```
The following figure shows the returned result.
The returned result contains bitmap_used, which indicates that the bitmap index is used.

Scenarios in which column-oriented storage is not recommended for the JSONB data type

Column-oriented storage for the JSONB data type helps save the required storage resources for JSONB-formatted data and significantly improves query performance. However, this feature is not suitable in the following scenarios.

Queries that require complete columns of the JSONB data type

Column-oriented storage for the JSONB data type provided by Hologres delivers better performance in most scenarios. However, for queries that require complete columns of the JSONB data type, the original storage mode of JSONB-formatted data delivers better performance. Sample SQL statements:

-- Execute the following data definition language (DDL) statement to create a table:
CREATE TABLE TBL(key int, json_data jsonb); 
SELECT json_data FROM TBL WHERE key = 123;
SELECT * FROM TBL limit 10;

Column-oriented storage for the JSONB data type is implemented at the underlying layer. If you want to obtain complete columns of the JSONB data type, the system needs to convert data in column-oriented storage into data in the original JSONB format.

The conversion process generates a large number of I/O operations and incurs high overheads. The conversion may cause performance to deteriorate if a large number of columns and a large amount of data are involved. We recommend that you do not enable column-oriented storage for the JSONB data type in this scenario.

Storage of extremely sparse data of the JSONB data type

If the JSONB-formatted data that you want to store in column-oriented storage contains sparse fields, Hologres places the sparse fields in a column named holo.remaining to prevent an excessive number of columns from being generated. If all the JSONB-formatted fields are sparse, for example, each field appears only once, column-oriented storage does not take effect. This is because all fields are placed in the column holo.remaining. In this case, column-oriented storage for the JSONB data type does not improve query performance.

Storage of JSONB-formatted data with complex nested structures

In this example, the root node of the JSONB-formatted data is an array, which contains JSONB-formatted data of different structures. Hologres stores data with the complex nested structures in one column. In this case, column-oriented storage for the JSONB data type does not improve query performance.

'[
  {"key1": "value1"}, 
  {"key2": 123},
  {"key3": 123.01}
]'

Best practices of column-oriented storage for the JSONB data type

Diagnostics of slow queries

If the query performance deteriorates after you enable column-oriented storage for the JSONB data type, check whether the query requires complete columns of the JSONB data type. For complex SQL statements, you can execute the Explain Analyze statement to check whether the query requires complete columns of the JSONB data type. Sample statements:

CREATE TABLE TBL(key int, json_data json); -- Create a table.
ALTER TABLE TBL ALTER COLUMN json_data SET (enable_columnar_type = on);
Explain Analyze SELECT json_data FROM TBL WHERE key = 123;

The return value of the Explain Analyze statement contains the Hint keyword. If the Hint keyword contains the following information, the query requires complete columns of the JSONB data type. As a result, the query performance deteriorates.

Column 'json_data' has enabled columnar jsonb, but the query scanned the entire Jsonb value

SQL syntax with better performance

JSONB-formatted data can be converted into the TEXT format by using different syntaxes. In this example, the ->> operator provides better performance if you want to obtain the name attribute of the json_data column.
```
-- Better performance
SELECT json_data->>'name' FROM tbl; 
-- Ordinary performance
SELECT (json_data->'name')::text FROM tbl;
```
If a JSONB-formatted field contains a TEXT array and you want to check whether the TEXT array contains the specified value, we recommend that you use the following syntax:
```
SELECT key FROM tbl WHERE jsonb_to_textarray(json_data->'phones') && ARRAY['123456'];
```

FAQ

Why does the storage usage increase after I enable column-oriented storage for the JSONB data type?

After column-oriented storage is enabled for the JSONB data type, keys in the original JSONB-formatted data are no longer stored. Only the values that correspond to the keys are stored, and the data type of each column is the same. Column-oriented storage for the JSONB data type provides a higher compression ratio. Theoretically, the storage usage is significantly reduced.

However, if the fields in the JSONB-formatted data are sparse, the number of columns significantly increases after column-oriented storage. Each column incurs additional storage overhead due to operations such as column statistics collection and index creation. If the data type of each column after column-oriented storage is TEXT, the compression performance is not good. The actual compression efficiency differs based on data features, such as data sparsity. Column-oriented storage for the JSONB data type might not deliver better compression performance in some scenarios.