The RUM
extension expands general inverted indexes (GINs) and implements faster full-text search based on the GIN access method code.
Prerequisites
Your PolarDB for PostgreSQL cluster must meet the following requirements:
PostgreSQL 15 (revision version 15.7.1.1 or later).
PostgreSQL 14 (revision version 14.5.3.0 or later).
You can execute the following statement to query the minor engine version of your PolarDB for PostgreSQL cluster:
SELECT version();
Background information
GINs support tsvector
and tsquery
data types to implement full-text search, but encounters the following problems:
Slow sorting: The location information of words is required for sorting. However, GINs do not store word locations. Therefore, an additional scan is required to retrieve the location information of words after the index scan.
Slow phrase queries: GINs require location information to perform phrase search.
Slow timestamp sorting: GINs do not store related information in indexes that contain morphemes. Therefore, an additional scan is required.
The RUM
extension is based on GINs and can solve these problems by storing additional information (word location information or timestamp location information) in RUM indexes.
However, the RUM extension requires more time than GIN to construct and insert indexes. This is because the RUM extension generates indexes based on WAL logs and the generated RUM indexes contain more information than the keys that are used for encryption.
Common RUM operators
The RUM extension provides the following operators.
Operator | Data type | Description |
| float4 | Calculates the distance between |
| float8 | Calculates the distance between two timestamp values. |
| float8 | Calculates the distance between values that are less than the current timestamp. |
| float8 | Calculates the distance between values that are greater than the current timestamp. |
The <=>
, <=|
, and |=>
operators also apply to the following data types:
timestamptz
int2
int4
int8
float4
float8
money
oid
Use the RUM extension
Create the RUM extension
You can execute the following SQL statement to install the RUM extension.
CREATE EXTENSION rum;
If the extension fails to be created when your cluster meets the version requirements, contact us.
Operator classes
The
rum_tsvector_ops
class storestsvector
phrases with location information, and supports sorting by the<=>
operator and prefix search.Examples:
Prepare data:
CREATE TABLE test_rum(t text, a tsvector); CREATE TRIGGER tsvectorupdate BEFORE UPDATE OR INSERT ON test_rum FOR EACH ROW EXECUTE PROCEDURE tsvector_update_trigger('a', 'pg_catalog.english', 't'); INSERT INTO test_rum(t) VALUES ('The situation is most beautiful'); INSERT INTO test_rum(t) VALUES ('It is a beautiful'); INSERT INTO test_rum(t) VALUES ('It looks like a beautiful place');
Create RUM indexes:
CREATE INDEX rumidx ON test_rum USING rum (a rum_tsvector_ops);
Execute the following query statements:
Query statement 1:
SELECT t, a <=> to_tsquery('english', 'beautiful | place') AS rank FROM test_rum WHERE a @@ to_tsquery('english', 'beautiful | place') ORDER BY a <=> to_tsquery('english', 'beautiful | place');
Sample result:
t | rank ---------------------------------+---------- It looks like a beautiful place | 8.22467 The situation is most beautiful | 16.44934 It is a beautiful | 16.44934 (3 rows)
Query statement 2:
SELECT t, a <=> to_tsquery('english', 'place | situation') AS rank FROM test_rum WHERE a @@ to_tsquery('english', 'place | situation') ORDER BY a <=> to_tsquery('english', 'place | situation');
Sample result:
t | rank ---------------------------------+---------- The situation is most beautiful | 16.44934 It looks like a beautiful place | 16.44934 (2 rows)
The
rum_tsvector_hash_ops
class stores the hash values and location information oftsvector
phrases. This class supports sorting by the<=>
operator, but not prefix search.NoteThe
rum_tsvector_hash_ops
class supports sorting by the<=>
,<=|
, and|=>
operators. It can be used with therum_tsvector_addon_ops
,rum_tsvector_hash_addon_ops
, andrum_anyarray_addon_ops
classes.The
rum_TYPE_ops
class supports the following data types and operators:Data types:
int2
,int4
,int8
,float4
,float8
,money
,oid
,time
,timetz
,date
,interval
,macaddr
,inet
,cidr
,text
,varchar
,char
,bytea
,bit
,varbit
,numeric
,timestamp
, andtimestamptz
.Operators: The
<
,<=
,=
,>=
, and>
operators are applicable to all data types. The<=>
,<=|
, and|=>
operators are applicable toint2
,int4
,int8
,float4
,float8
,money
,oid
,timestamp
, andtimestamptz
data types.The
rum_tsvector_addon_ops
class storestsvector
lexemes and any lexemes supported by the module field.Examples:
Prepare data:
CREATE TABLE tsts (id int, t tsvector, d timestamp); \copy tsts from 'external/rum/data/tsts.data' CREATE INDEX tsts_idx ON tsts USING rum (t rum_tsvector_addon_ops, d) WITH (attach = 'd', to = 't');
Execute the following statements:
EXPLAIN (costs off) SELECT id, d, d <=> '2016-05-16 14:21:25' FROM tsts WHERE t @@ 'wr&qh' ORDER BY d <=> '2016-05-16 14:21:25' LIMIT 5; QUERY PLAN ----------------------------------------------------------------------------------- Limit -> Index Scan using tsts_idx on tsts Index Cond: (t @@ '''wr'' & ''qh'''::tsquery) Order By: (d <=> '2016-05-16 14:21:25'::timestamp without time zone) (4 rows)
Execute the following query statement:
SELECT id, d, d <=> '2016-05-16 14:21:25' FROM tsts WHERE t @@ 'wr&qh' ORDER BY d <=> '2016-05-16 14:21:25' LIMIT 5;
Sample result:
id | d | ?column? -----+----------------------------+--------------- 355 | 2016-05-16 14:21:22.326724 | 2.673276 354 | 2016-05-16 13:21:22.326724 | 3602.673276 371 | 2016-05-17 06:21:22.326724 | 57597.326724 406 | 2016-05-18 17:21:22.326724 | 183597.326724 415 | 2016-05-19 02:21:22.326724 | 215997.326724 (5 rows)
NoteBecause posting trees have fixed length right bound and fixed length non-leaf posting items, RUM may be flawed in creating an index using sorting over pass-by-reference additional information.
The
rum_tsvector_hash_addon_ops
class stores the hash values oftsvector
lexemes and any lexemes supported by the module field.The
rum_tsquery_ops
class is applicable to thetsquery
data type and stores branches of the query tree in additional information.Examples:
Prepare data:
CREATE TABLE test_array (i int2[]); INSERT INTO test_array VALUES ('{}'), ('{0}'), ('{1,2,3,4}'), ('{1,2,3}'), ('{1,2}'), ('{1}'); CREATE INDEX idx_array ON test_array USING rum (i rum_anyarray_ops);
Execute the following statements:
SET enable_seqscan TO off; EXPLAIN (COSTS OFF) SELECT * FROM test_array WHERE i && '{1}' ORDER BY i <=> '{1}' ASC; QUERY PLAN ------------------------------------------ Index Scan using idx_array on test_array Index Cond: (i && '{1}'::smallint[]) Order By: (i <=> '{1}'::smallint[]) (3 rows)
Execute the following query statement:
SELECT * FROM test_array WHERE i && '{1}' ORDER BY i <=> '{1}' ASC;
Sample result:
i ----------- {1} {1,2} {1,2,3} {1,2,3,4} (4 rows)
The
rum_anyarray_ops
class storesanyarrray
elements with the array length. Supported operators include&&
,@>
,<@
,=
, and%
. Sorting by the<=>
operators is supported.Examples:
Prepare data:
CREATE TABLE test_array (i int2[]); INSERT INTO test_array VALUES ('{}'), ('{0}'), ('{1,2,3,4}'), ('{1,2,3}'), ('{1,2}'), ('{1}'); CREATE INDEX idx_array ON test_array USING rum (i rum_anyarray_ops);
Execute the following statements:
SET enable_seqscan TO off; EXPLAIN (COSTS OFF) SELECT * FROM test_array WHERE i && '{1}' ORDER BY i <=> '{1}' ASC; QUERY PLAN ------------------------------------------ Index Scan using idx_array on test_array Index Cond: (i && '{1}'::smallint[]) Order By: (i <=> '{1}'::smallint[]) (3 rows)
Execute the following query statement:
SELECT * FROM test_array WHERE i && '{1}' ORDER BY i <=> '{1}' ASC;
Sample result:
i ----------- {1} {1,2} {1,2,3} {1,2,3,4} (4 rows)
The
rum_anyarray_addon_ops
class storesanyarrray
elements and any elements supported by the module field.
Uninstall the RUM extension
You can execute the following SQL statement to uninstall the RUM extension.
DROP EXTENSION rum;
References
For more information, see rum.