By digoal
The structure of GIN is listed below:
https://www.postgresql.org/docs/12/gin-implementation.html
The TREE constructed by the index field entry stores the row number corresponding to the entry in posting tree|list at the end. There is no other information.
Therefore, GIN indexes support retrievals, such as inclusion and intersection.
However, besides inclusion and intersection, ranking is also required for full-text retrieval, which requires additional information, such as the number of target words in this row.
GIN finds the corresponding heap tuples, extracts them one by one, calculates the number of target words in each matched row, and ranks the results.
Are there any better methods?
The RUM index is similar to the GIN index, but some attribute values are appended to each ctid(itempoint) of posting list|tree. For example, if the ctid(1,10) row contains the word "alibaba" with a total quantity of 100, the number 100 will be stored.
RUM access method - inverted index with additional information in posting lists
Here, iptr refers to the heap tuple row number:
Therefore, RUM can solve the low-efficiency problem of GIN (such as in ranking sort):
RUM solves these problems by storing additional information in posting tree, such as positional information of lexemes or timestamps. You can get an idea of RUM from the following picture:
https://github.com/postgrespro/rum
SELECT t, a <=> to_tsquery('english', 'beautiful | place') AS rank
FROM test_rum
WHERE a @@ to_tsquery('english', 'beautiful | place')
ORDER BY a <=> to_tsquery('english', 'beautiful | place');
t | rank
---------------------------------+---------
It looks like a beautiful place | 8.22467
The situation is most beautiful | 16.4493
It is a beautiful | 16.4493
(3 rows)
It also stores other content (such as values of other fields) in the additional information of the itempoint and supports various types of sorting.
For example, use the following method to store the t (time) value in addition to the itempoint:
Construct an inverted tree based on the t (tsvector) field and store the content of field d of this record with the additional content corresponding to the ctid of the posting list|tree of the inverted tree
We can conduct a full-text search by t and sorting by d.
CREATE INDEX tsts_idx ON tsts USING rum (t rum_tsvector_addon_ops, d)
WITH (attach = 'd', to = 't');
Now we can execute the following queries:
EXPLAIN (costs off)
SELECT id, d, d <=> '2016-05-16 14:21:25' FROM tsts WHERE t @@ 'wr&qh' ORDER BY d <=> '2016-05-16 14:21:25' LIMIT 5;
QUERY PLAN
-----------------------------------------------------------------------------------
Limit
-> Index Scan using tsts_idx on tsts
Index Cond: (t @@ '''wr'' & ''qh'''::tsquery)
Order By: (d <=> 'Mon May 16 14:21:25 2016'::timestamp without time zone)
(4 rows)
SELECT id, d, d <=> '2016-05-16 14:21:25' FROM tsts WHERE t @@ 'wr&qh' ORDER BY d <=> '2016-05-16 14:21:25' LIMIT 5;
id | d | ?column?
-----+---------------------------------+---------------
355 | Mon May 16 14:21:22.326724 2016 | 2.673276
354 | Mon May 16 13:21:22.326724 2016 | 3602.673276
371 | Tue May 17 06:21:22.326724 2016 | 57597.326724
406 | Wed May 18 17:21:22.326724 2016 | 183597.326724
415 | Thu May 19 02:21:22.326724 2016 | 215997.326724
(5 rows)
An Introduction to the Machine Learning Algorithms of the PostgreSQL MADlib Graph
digoal - December 11, 2019
digoal - September 12, 2019
Alibaba Clouder - December 11, 2017
digoal - May 28, 2024
digoal - February 5, 2020
digoal - December 6, 2023
Alibaba Cloud PolarDB for PostgreSQL is an in-house relational database service 100% compatible with PostgreSQL and highly compatible with the Oracle syntax.
Learn MoreAn online MPP warehousing service based on the Greenplum Database open source program
Learn MoreAn on-demand database hosting service for PostgreSQL with automated monitoring, backup and disaster recovery capabilities
Learn MoreLeverage cloud-native database solutions dedicated for FinTech.
Learn MoreMore Posts by digoal