×
Community Blog PostgreSQL gzip Plug-in Function Interface: Compress and Decompress Text and Bytea Files

PostgreSQL gzip Plug-in Function Interface: Compress and Decompress Text and Bytea Files

In this article, the author discusses how to use PostgreSQL gzip plug-in to compress and decompress text and bytea files to reduce bandwidth utilization during data transmission.

By digoal

Background

PostgreSQL (PG) supports variable-length file types such as text and bytea. The maximum size of these files is 1 GB, so you can store large texts or files in the database. PG uses pglz compression algorithm to compress and store data. However, if the user wants to query the client’s data, the transmitted content is not compressed. This leads to consuming a large bandwidth during transmission.

The following two methods are commonly used to solve the compression problem during network transmission:

1) Use Secure Socket Layer (SSL) to support data compression. With SSL, the data transmitted between the client and the database is encrypted and compressed. However, this will incur extra CPU overhead for encryption and decryption.

2) Use gzip plug-in. It uses the standard compression library, so it supports decompression as long as the client supports gzip. You can use the compressed data for storage and transmission.

client - search(gzip result) - pgdb

The gzip plug-in is a compression and decompression plug-in. The interfaces for calling this plug-in are as follows:

gzip(uncompressed BYTEA, [compression_level INTEGER]) returns BYTEA  
gzip(uncompressed TEXT, [compression_level INTEGER]) returns BYTEA  
gunzip(compressed BYTEA) returns BYTEA 

To solve the transmission problem, return the data compressed by gzip plug-in to the client.

select gzip (column, 压缩级别)  

To solve the storage problem, use the values compressed by gzip plug-in for storage in bytea type.

insert into table values (gzip (column, 压缩级别))  

For data query, the decompression is supported both on the database and the client, which requires gzipLIB on the client.

gzip Plug-in Installation

wget http://api.pgxn.org/dist/gzip/1.0.0/gzip-1.0.0.zip  
unzip gzip-1.0.0.zip  
cd gzip-1.0.0/  
USE_PGXS=1 make  
USE_PGXS=1 make install  
pg12@pg11-test-> psql  
psql (12beta2)  
Type "help" for help.  
  
postgres=# create extension gzip ;  
CREATE EXTENSION  

Example

This example is quoted from this site

> SELECT gzip('this is my this is my this is my this is my text');  
  
                                   gzip  
--------------------------------------------------------------------------  
 \x1f8b08000000000000132bc9c82c5600a2dc4a851282ccd48a12002e7a22ff30000000  

What, the compressed output is longer? No, it looks that way because in hex every character requires two hex digits. The original string looks like this in hex:

> SELECT 'this is my this is my this is my this is my text'::bytea;  
  
                                               bytea  
----------------------------------------------------------------------------------------------------  
 \x74686973206973206d792074686973206973206d792074686973206973206d792074686973206973206d792074657874  

For really long, repetitive things, compression naturally works like a charm:

> SELECT gzip(repeat('this is my ', 100));  
  
                                               bytea  
----------------------------------------------------------------------------------------------------  
 \x1f8b08000000000000132bc9c82c5600a2dc4a859251e628739439ca24970900d1341c5c4c040000  

Converting a bytea back into an equivalent text uses the encode() function with the escape encoding:

> SELECT encode(gunzip(gzip('this is my this is my this is my this is my text')), 'escape')  
  
                      encode  
--------------------------------------------------  
 this is my this is my this is my this is my text  

References

0 0 0
Share on

digoal

284 posts | 25 followers

You may also like

Comments