RUM - PolarDB - Alibaba Cloud Documentation Center

The RUM extension expands general inverted indexes (GINs) and implements faster full-text search based on the GIN access method code.

Prerequisites

Your PolarDB for PostgreSQL cluster must meet the following requirements:

PostgreSQL 15 (revision version 15.7.1.1 or later).
PostgreSQL 14 (revision version 14.5.3.0 or later).

Note

You can execute the following statement to query the minor engine version of your PolarDB for PostgreSQL cluster:

SELECT version();

Background information

GINs support tsvector and tsquery data types to implement full-text search, but encounters the following problems:

Slow sorting: The location information of words is required for sorting. However, GINs do not store word locations. Therefore, an additional scan is required to retrieve the location information of words after the index scan.
Slow phrase queries: GINs require location information to perform phrase search.
Slow timestamp sorting: GINs do not store related information in indexes that contain morphemes. Therefore, an additional scan is required.

The RUM extension is based on GINs and can solve these problems by storing additional information (word location information or timestamp location information) in RUM indexes.

Note

However, the RUM extension requires more time than GIN to construct and insert indexes. This is because the RUM extension generates indexes based on WAL logs and the generated RUM indexes contain more information than the keys that are used for encryption.

Common RUM operators

The RUM extension provides the following operators.

Operator	Data type	Description
`tsvector` `<=>` `tsquery`	float4	Calculates the distance between `tsvector` and `tsquery` values.
`timestamp` `<=>` `timestamp`	float8	Calculates the distance between two timestamp values.
`timestamp` `<=\|` `timestamp`	float8	Calculates the distance between values that are less than the current timestamp.
`timestamp` `\|=>` `timestamp`	float8	Calculates the distance between values that are greater than the current timestamp.

The <=>, <=|, and |=> operators also apply to the following data types:

timestamptz
int2
int4
int8
float4
float8
money
oid

Use the RUM extension

Create the RUM extension

You can execute the following SQL statement to install the RUM extension.

CREATE EXTENSION rum;

Note

If the extension fails to be created when your cluster meets the version requirements, contact us.

Operator classes

The rum_tsvector_ops class stores tsvector phrases with location information, and supports sorting by the <=> operator and prefix search.

Examples:

Prepare data:

CREATE TABLE test_rum(t text, a tsvector);
CREATE TRIGGER tsvectorupdate
    BEFORE UPDATE OR INSERT ON test_rum
    FOR EACH ROW
    EXECUTE PROCEDURE tsvector_update_trigger('a', 'pg_catalog.english', 't');
INSERT INTO test_rum(t) VALUES ('The situation is most beautiful');
INSERT INTO test_rum(t) VALUES ('It is a beautiful');
INSERT INTO test_rum(t) VALUES ('It looks like a beautiful place');

Create RUM indexes:

CREATE INDEX rumidx ON test_rum USING rum (a rum_tsvector_ops);

Execute the following query statements:

Query statement 1:

SELECT t, a <=> to_tsquery('english', 'beautiful | place') AS rank
FROM test_rum
WHERE a @@ to_tsquery('english', 'beautiful | place')
ORDER BY a <=> to_tsquery('english', 'beautiful | place');

Sample result:

                t                |   rank
---------------------------------+----------
 It looks like a beautiful place |  8.22467
 The situation is most beautiful | 16.44934
 It is a beautiful               | 16.44934
(3 rows)

Query statement 2:

SELECT t, a <=> to_tsquery('english', 'place | situation') AS rank
FROM test_rum
WHERE a @@ to_tsquery('english', 'place | situation')
ORDER BY a <=> to_tsquery('english', 'place | situation');

Sample result:

                t                |   rank
---------------------------------+----------
 The situation is most beautiful | 16.44934
 It looks like a beautiful place | 16.44934
(2 rows)

The rum_tsvector_hash_ops class stores the hash values and location information of tsvector phrases. This class supports sorting by the <=> operator, but not prefix search.
Note
The rum_tsvector_hash_ops class supports sorting by the <=>, <=|, and |=> operators. It can be used with the rum_tsvector_addon_ops, rum_tsvector_hash_addon_ops, and rum_anyarray_addon_ops classes.
The rum_TYPE_ops class supports the following data types and operators:
Data types: int2, int4, int8, float4, float8, money, oid, time, timetz, date, interval, macaddr, inet, cidr, text, varchar, char, bytea, bit, varbit, numeric, timestamp, and timestamptz.
Operators: The <, <=,=, >=, and > operators are applicable to all data types. The <=>, <=|, and |=> operators are applicable to int2, int4, int8, float4, float8, money, oid, timestamp, and timestamptz data types.

The rum_tsvector_addon_ops class stores tsvector lexemes and any lexemes supported by the module field.

Examples:

Prepare data:

CREATE TABLE tsts (id int, t tsvector, d timestamp);
\copy tsts from 'external/rum/data/tsts.data'
CREATE INDEX tsts_idx ON tsts USING rum (t rum_tsvector_addon_ops, d) WITH (attach = 'd', to = 't');

Execute the following statements:

EXPLAIN (costs off)
SELECT id, d, d <=> '2016-05-16 14:21:25'
FROM tsts
WHERE t @@ 'wr&qh'
ORDER BY d <=> '2016-05-16 14:21:25'
LIMIT 5;

                                    QUERY PLAN
-----------------------------------------------------------------------------------
 Limit
   ->  Index Scan using tsts_idx on tsts
         Index Cond: (t @@ '''wr'' & ''qh'''::tsquery)
         Order By: (d <=> '2016-05-16 14:21:25'::timestamp without time zone)
(4 rows)

Execute the following query statement:

SELECT id, d, d <=> '2016-05-16 14:21:25'
FROM tsts
WHERE t @@ 'wr&qh'
ORDER BY d <=> '2016-05-16 14:21:25'
LIMIT 5;

Sample result:

 id  |             d              |   ?column?
-----+----------------------------+---------------
 355 | 2016-05-16 14:21:22.326724 |      2.673276
 354 | 2016-05-16 13:21:22.326724 |   3602.673276
 371 | 2016-05-17 06:21:22.326724 |  57597.326724
 406 | 2016-05-18 17:21:22.326724 | 183597.326724
 415 | 2016-05-19 02:21:22.326724 | 215997.326724
(5 rows)

Note

Because posting trees have fixed length right bound and fixed length non-leaf posting items, RUM may be flawed in creating an index using sorting over pass-by-reference additional information.

The rum_tsvector_hash_addon_ops class stores the hash values of tsvector lexemes and any lexemes supported by the module field.

The rum_tsquery_ops class is applicable to the tsquery data type and stores branches of the query tree in additional information.

Examples:

Prepare data:

CREATE TABLE test_array (i int2[]);
INSERT INTO test_array VALUES ('{}'), ('{0}'), ('{1,2,3,4}'), ('{1,2,3}'), ('{1,2}'), ('{1}');
CREATE INDEX idx_array ON test_array USING rum (i rum_anyarray_ops);

Execute the following statements:

SET enable_seqscan TO off;
EXPLAIN (COSTS OFF)
SELECT *
FROM test_array
WHERE i && '{1}'
ORDER BY i <=> '{1}' ASC;

                QUERY PLAN
------------------------------------------
 Index Scan using idx_array on test_array
   Index Cond: (i && '{1}'::smallint[])
   Order By: (i <=> '{1}'::smallint[])
(3 rows)

Execute the following query statement:

SELECT *
FROM test_array
WHERE i && '{1}'
ORDER BY i <=> '{1}' ASC;

Sample result:

     i
-----------
 {1}
 {1,2}
 {1,2,3}
 {1,2,3,4}
(4 rows)

The rum_anyarray_ops class stores anyarrray elements with the array length. Supported operators include &&, @>, <@,=, and %. Sorting by the <=> operators is supported.

Examples:

Prepare data:

CREATE TABLE test_array (i int2[]);
INSERT INTO test_array VALUES ('{}'), ('{0}'), ('{1,2,3,4}'), ('{1,2,3}'), ('{1,2}'), ('{1}');
CREATE INDEX idx_array ON test_array USING rum (i rum_anyarray_ops);

Execute the following statements:

SET enable_seqscan TO off;

EXPLAIN (COSTS OFF)
SELECT *
FROM test_array
WHERE i && '{1}'
ORDER BY i <=> '{1}' ASC;

                QUERY PLAN
------------------------------------------
 Index Scan using idx_array on test_array
   Index Cond: (i && '{1}'::smallint[])
   Order By: (i <=> '{1}'::smallint[])
(3 rows)

Execute the following query statement:

SELECT *
FROM test_array
WHERE i && '{1}'
ORDER BY i <=> '{1}' ASC;

Sample result:

     i
-----------
 {1}
 {1,2}
 {1,2,3}
 {1,2,3,4}
(4 rows)

The rum_anyarray_addon_ops class stores anyarrray elements and any elements supported by the module field.

Uninstall the RUM extension

You can execute the following SQL statement to uninstall the RUM extension.

DROP EXTENSION rum;

References

For more information, see rum.