×
Community Blog PostgreSQL Row-Level Full-Text Search

PostgreSQL Row-Level Full-Text Search

This article looks at row-level full-text searches in PostgreSQL and walks through how you can create one yourself.

By Digoal.

In some applications, you may need to search all the fields in a table, and other fields may still require a precise query, while others in contrast may require a fuzzy query or a full-text search.

Now consider this scenario: the selection of drop-down boxes on some front-end pages. The selection of these boxes can actually be a headache for application developers because writing SQL statements can turn out to be quite a big hassle sometimes.

Now consider the code below:

postgres=# create table t(phonenum text, info text, c1 int, c2 text, c3 text, c4 timestamp);  
CREATE TABLE  
postgres=# insert into t values ('13888888888','i am digoal, a postgresqler',123,'china','PRC, Alibaba, ah',now());  
INSERT 0 1  
postgres=# select * from t;  
  phonenum   |            info             | c1  |  c2   |              c3              |             c4               
-------------+-----------------------------+-----+-------+------------------------------+----------------------------  
 13888888888 | i am digoal, a postgresqler | 123 | china | PRC, Alibaba, Ah | 2016-04-19 11:15:55.208658  
(1 row)  

Now, based on this code, the following is a query the t table, shown above. If any field matches digoal, this following record is returned:

select * from t where phonenum='digoal' or info ~ 'digoal' or c1='digoal' or ......;  

Of course, though, each field requires a condition, such as the precise matching condition, and the full-text search condition. However, with that said, Row-level full-text search can greatly simplify this query.

Below, we will go over how you can create a row-level text index to make things a bit easier.

Create a Row-Level Text Index

Consider Jieba as an example. The source code can be found here, and the pg_scws based on SCWS.

All of the above support custom dictionaries. The Installation process for these dictionaries is omitted here. The usage of these dictionaries is as follows:

postgres=# select t::text from t;  
                                                        t                                                          
-----------------------------------------------------------------------------------------------------------------  
 (13888888888,"i am digoal, a postgresqler",123,china,PRC, Alibaba, Ah,"2016-04-19 11:15:55.208658")  
(1 row)  
  
postgres=# select to_tsvector('jiebacfg',t::text) from t;  
                                                                                 to_tsvector                                                                                    
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------  
 ' ':6,8,11,13,33 '04':30 '11':34 '123':17 '13888888888':2 '15':36 '19':32 '2016':28 '55.208658':38 'china':19 'digoal':9 'postgresqler':14 'PRC':21 'Alibaba':23  
(1 row)  

Use t::text to convert rows into a large text.

postgres=# select to_tsvector('jiebacfg',t::text) @@ to_tsquery('digoal & china') from t;  
 ?column?   
----------  
 t  
(1 row)  
  
postgres=# select to_tsvector('jiebacfg',t::text) @@ to_tsquery('digoal & post') from t;  
 ?column?   
----------  
 f  
(1 row)  

To create a row-level text index, an immutable function index is required.

postgres=# create or replace function f1(regconfig,text) returns tsvector as 
$$
  
 select to_tsvector($1,$2);  
 
$$
 language sql immutable strict;  
CREATE FUNCTION  
  
postgres=# create or replace function f1(text) returns tsvector as 
$$
            
select to_tsvector($1);     

$$
 language sql immutable strict;  
CREATE FUNCTION  
  
postgres=# alter function record_out(record) immutable;  
ALTER FUNCTION  
postgres=# alter function textin(cstring) immutable;  
ALTER FUNCTION  
postgres=# create index idx_t_1 on t using gin (f1('jiebacfg'::regconfig,t::text)) ;  
CREATE INDEX  

Verification:

postgres=# select * from t where f1('jiebacfg'::regconfig,t::text) @@ to_tsquery('digoal & post') ;  
 phonenum | info | c1 | c2 | c3 | c4   
----------+------+----+----+----+----  
(0 rows)  
postgres=# select * from t where f1('jiebacfg'::regconfig,t::text) @@ to_tsquery('digoal & china') ;  
  phonenum   |            info             | c1  |  c2   |              c3              |             c4               
-------------+-----------------------------+-----+-------+------------------------------+----------------------------  
 13888888888 | i am digoal, a postgresqler | 123 | china | PRC, Alibaba | 2016-04-19 11:15:55.208658  
(1 row)  
  
postgres=# select * from t where f1('jiebacfg'::regconfig,t::text) @@ to_tsquery('digoal & Alibaba') ;  
  phonenum   |            info             | c1  |  c2   |              c3              |             c4               
-------------+-----------------------------+-----+-------+------------------------------+----------------------------  
 13888888888 | i am digoal, a postgresqler | 123 | china | PRC, Alibaba, Ah | 2016-04-19 11:15:55.208658  
(1 row)  
  
postgres=# explain select * from t where f1('jiebacfg'::regconfig,t::text) @@ to_tsquery('digoal & Alibaba') ;  
                                              QUERY PLAN                                                
------------------------------------------------------------------------------------------------------  
 Seq Scan on t  (cost=0.00..1.52 rows=1 width=140)  
   Filter: (to_tsvector('jiebacfg'::regconfig, (t.*)::text) @@ to_tsquery('digoal & Alibaba'::text))  
(2 rows)  

If the number of records is large, the indexes will be used. When the number of records is small, we can use Hint or Switch to force the index:

postgres=# set enable_seqscan=off;  
SET  
postgres=# explain select * from t where f1('jiebacfg'::regconfig,t::text) @@ to_tsquery('digoal & Alibaba') ;  
                                                   QUERY PLAN                                                     
----------------------------------------------------------------------------------------------------------------  
 Bitmap Heap Scan on t  (cost=12.25..16.77 rows=1 width=140)  
   Recheck Cond: (to_tsvector('jiebacfg'::regconfig, (t.*)::text) @@ to_tsquery('digoal & Alibaba'::text))  
   ->  Bitmap Index Scan on idx_t_1  (cost=0.00..12.25 rows=1 width=0)  
         Index Cond: (to_tsvector('jiebacfg'::regconfig, (t.*)::text) @@ to_tsquery('digoal & Alibaba'::text))  
(4 rows)  
0 0 0
Share on

digoal

281 posts | 24 followers

You may also like

Comments