By Zhihui Fan
PostgreSQL has two data types related to JSON: JSON and JSONB. This article introduces JSONB which is also recommended for you.
Suppose that all records in a table have attributes attr1..attr10, but only some records have attributes attr11..attr20, and attributes attr21 and attr22 may be added in the future. In this case, creating a table with a JSON column would be very suitable.
create table t1 (attr1 numeric, attr2 int, .. attr10 text, extra jsonb);
The data of attr11..attr20 is stored as key-value pairs in the extra attribute. If attr21 and attr22 need to be added in the future, they can be directly stored in the extra attribute without any modification to the table structure.
In addition to using the JSON data type, there are other solutions available. For example, set extra to a text/binary type, and serialize and deserialize on the client. The drawback of this approach is that the data needs to be transferred to the client for computation, which prevents the use of more advanced scanning methods and increases network transmission. However, using JSON/JSONB can avoid these issues.
select extra->'attr11' from t1 where extra->>'att20' = 'apple';
We can filter and project data on the server to reduce network transmission. In addition, we can also create an index in extra->>'att20' to accelerate the filter conditions.
Another extreme design is to store all the attributes in a JSONB type and create a table structure as follows:
create table t2(data jsonb)
This solution is also not optimal, because all records need to record the attributes from attr1 to attr10 repeatedly. For example, for data in the same row, the storage methods of t1 and t2 are as follows:
t1: 1|2|3|..|'20'|{'attr21': 2, 'attr22': 3}
t2: {'attr1': 1, 'attr2': 2, 'attr3': 3, ... 'attr21': 2, 'attr22': 3}
The design of t2 will result in storage and network transmission overhead. Both of these two problems can be mitigated by compression, but compression/decompression comes at a cost.
PostgreSQL supports the simultaneous use of JSONB data type and traditional data types, which is also a major feature, and leveraging this flexibility can bring significant benefits. Let's continue with the previous example: Suppose that later we find that attr11 exists in all records, so we can modify the table structure to:
create table t1 (attr1 numeric, attr2 int, .. attr10 text, attr11 text, extra jsonb);
Then, just a few minor changes to the application are needed. Compared with the database that only supports document type, this modification does not require the application to add new data sources or manage new connections but only needs to modify the access mode of attr11.
For a JSON data type, the toast capability should be used even if the data is slightly larger. Take the data of t1 as an example. The general working principle can be simplified as follows:
t1:
1|2|3|..|{pointer-x}
pg_toast_{t1}
pointer-x| 1 | {attr11:.., attr12: ...}
pointer-x| 2 | {attr16:.., attr20: ...}
Only when we need to access the real value of the pointer-x, do we "assemble" the value of {attr11: 11, ..., attr20: 20}, and this process is called detoast, which is also relatively resource-consuming.
For the following SQL:
select extra->'attr11' from t1 where extra->>'att20' = 'apple';
Both extra->>'att20'
and extra->'attr11'
need to access the complete extra data. In the native PostgreSQL, the detoast is performed twice. The following statement will detoast 4 times. When extra is slightly larger, a lot of time will be spent on the detoast process. shared detoast datum is designed to solve this problem. For the same data, we will only detoast once.
select extra->'attr11', extra->'attr12'
from t1
where extra->>'att20' = 'apple'
and extra->>'attr19' = 'cloud';
PolarDB for PostgreSQL is a cloud-native relational database service developed by Alibaba Cloud. It is 100% compatible with PostgreSQL and highly compatible with Oracle syntax (supported by the public cloud edition). It utilizes a Shared-Storage architecture, decoupling computing from storage. PolarDB for PostgreSQL offers ultrahigh elasticity, millisecond-level latency, HTAP capabilities, and enterprise-level database features such as high reliability, high availability, and elastic scaling. Additionally, with large-scale parallel computing capability, it can manage mixed OLTP and OLAP workloads.
Discover what and how you can use our products to build >>
[Infographic] Highlights | Database New Features in June 2024
digoal - May 28, 2019
- January 12, 2018
Alibaba Clouder - December 11, 2017
digoal - May 28, 2019
Hologres - June 30, 2021
Alibaba Clouder - December 12, 2017
Alibaba Cloud PolarDB for PostgreSQL is an in-house relational database service 100% compatible with PostgreSQL and highly compatible with the Oracle syntax.
Learn MoreAlibaba Cloud PolarDB for Xscale (PolarDB-X) is a cloud-native high-performance distributed database service independently developed by Alibaba Cloud.
Learn MoreAlibaba Cloud PolarDB for MySQL is a cloud-native relational database service 100% compatible with MySQL.
Learn MoreAn online MPP warehousing service based on the Greenplum Database open source program
Learn MoreMore Posts by ApsaraDB