All Products
Search
Document Center

PolarDB:rum (full-text search acceleration)

Last Updated:Jan 05, 2026

The RUM extension provides an alternative to the built-in Generalized Inverted Index (GIN) in PostgreSQL for full-text searches on large volumes of text. You can use the RUM extension to sort search results by relevance, timestamps, or other fields. Traditional GIN indexes require table lookups to sort results, which can cause performance bottlenecks. The RUM extension avoids these lookups by storing position and other required information directly in the index. This can improve query performance by several times in specific scenarios.

Applicability

  • PolarDB for PostgreSQL supports the following versions:

    • PostgreSQL 18 (minor engine version 2.0.18.1.2.0 and later)

    • PostgreSQL 17 (minor engine version 2.0.17.6.4.0 and later)

    • PostgreSQL 16 (minor engine version 2.0.16.8.3.0 and later)

    • PostgreSQL 15 (minor engine version 2.0.15.7.1.1 and later)

    • PostgreSQL 14 (minor engine version 2.0.14.5.3.0 and later)

    Note

    You can view the minor engine version in the console or by running the SHOW polardb_version; statement. If your version does not meet the requirements, upgrade the minor engine version. For more information, see View the minor engine version.

  • Extension compatibility: The % operator of the RUM extension conflicts with the % operator of the smlar extension. Because of this conflict, you cannot create and use both extensions in the same database schema. Before you install the RUM extension, confirm that the smlar extension is not used in your environment, or install the two extensions in different schemas.

Scenarios

Review the following table to understand the core differences between the RUM extension and the native GIN index. This information can help you choose the best option for your business scenario.

Comparison Dimension

GIN Index (Built-in)

RUM Index

Recommendation

Core Advantage

Good write performance. Relatively small index size.

High sorting performance. Supports phrase searches and sorting by attached columns.

For scenarios that require sorting of search results, the RUM extension is the preferred choice.

Sorting Performance

Slow. Requires table lookups to obtain sorting data. Performance drops sharply as the data volume increases.

Fast. Sorts directly within the index. No table lookups are needed.

Use the RUM extension if you frequently sort full-text search results, such as by relevance, time, or price.

Phrase Search

Slow. Requires table lookups to obtain word position information to verify phrases.

Fast. Word position information is already stored in the index. No table lookups are needed.

Use the RUM extension for scenarios that require high-performance phrase searches.

Sorting by Attached Columns

Not supported. Cannot store information from extra columns, such as timestamps, in the index.

Supported. You can attach other columns to the index for high-performance custom sorting.

The RUM extension has a significant advantage in scenarios that require sorting by fields such as article publication time or update time.

Write Performance

Relatively fast.

Relatively slow because a more complex index schema must be maintained.

For write-intensive tables with frequent INSERT or UPDATE operations, carefully evaluate the write overhead that the RUM extension introduces.

Index Size

Relatively small.

Relatively large because it requires extra space to store position and additional information.

Evaluate your storage costs. If the index size is a major bottleneck, consider using a RUM hash index or continuing to use the GIN index.

Prefix Search

Supported.

Supported by rum_tsvector_ops. Not supported by the hash series.

If prefix search is a core requirement, avoid using RUM hash indexes.

The RUM extension trades space for time. It increases the index size and write overhead to significantly improve performance for specific query scenarios, especially sorting. If your core business involves full-text searches with complex sorting, the RUM extension is an ideal choice. If your business is primarily write-intensive or only requires simple text matching, the GIN index is more cost-effective.

Precautions

  • Write performance and index size: To speed up queries, a RUM index stores extra information, such as word positions. This makes the index size larger than that of a GIN index and increases the index creation overhead during data write and update operations. Therefore, you must carefully evaluate the cost of the RUM extension in write-intensive scenarios where storage space is a concern.

  • Scenarios that do not support prefix search: Indexes created with the rum_tsvector_hash_ops or rum_tsvector_hash_addon_ops operator classes do not support prefix search. This is because they store the hash values of lexemes, not the original text.

Install and uninstall the extension

Install the extension

Run the following command in your database to create the RUM extension.

CREATE EXTENSION rum;

Uninstall the extension

If you no longer need the RUM extension, run the following command to uninstall it.

DROP EXTENSION rum;

Usage

The RUM extension provides multiple operator classes to support different data types and query scenarios. You can choose the appropriate operator class to create an index based on your specific needs.

Note

An operator class defines a set of operations that a RUM index uses to process a specific data type. This allows the index to correctly store and retrieve data of that type. Each operator class includes a specific set of operators. When you use these supported operators in a WHERE or ORDER BY clause, PostgreSQL can use the RUM index to speed up the query.

Therefore, choosing the correct operator class is key to ensuring the effectiveness of the RUM index. For more information, see Operator Classes and Operator Families.

Operators

Operator

Supported Types

Return Value Type

Description

A @@ B

Left: tsvector, Right: tsquery

bool

Returns whether the full-text vector matches the query condition. It performs a distance calculation.

A <=> B

Left: tsvector, Right: tsquery

float4

Returns the distance value between the full-text vector and the query condition. A smaller value indicates higher relevance.

timestamp, timestamptz, int2, int4, int8, float4, float8, money, oid

float8

Returns the absolute difference between two values.

  • The time difference is in seconds, accurate to six decimal places (microseconds).

  • The money difference is in cents, accurate to the cent.

A <=| B

timestamp, timestamptz, int2, int4, int8, float4, float8, money, oid

float8

Returns B - A only if A ≤ B. Otherwise, returns infinity.

A |=> B

timestamp, timestamptz, int2, int4, int8, float4, float8, money, oid

float8

Returns A - B only if A > B. Otherwise, returns infinity.

Operator Class

Operator Class

Applicable Data Types

Key Supported Operators

Core Function and Description

rum_tsvector_ops

tsvector

  • WHERE: A @@ B

  • ORDER BY: A <=> B

Stores the lexemes and their position information from a tsvector. It supports full-text search, prefix search, and relevance sorting. This is the most commonly used operator class for full-text search.

rum_tsvector_hash_ops

tsvector

  • WHERE: A @@ B

  • ORDER BY: A <=> B

Stores the hash values and position information of tsvector lexemes.

  • Supports full-text search and relevance sorting, but not prefix search.

  • The index size may be smaller than with rum_tsvector_ops.

  • Hash collisions can occur during a search, which requires a recheck.

rum_<TYPE>_ops

int2, int4, int8, float4, float8, money, oid, time, timetz, date, interval, macaddr, inet, cidr, text, varchar, char, bytea, bit, varbit, numeric, timestamp, timestamptz

  • WHERE: For applicable data types, supports <, <=, =, >=, and > between values of the same type.

  • ORDER BY: For int2, int4, int8, float4, float8, money, oid, timestamp, and timestamptz, supports <=>, <=|, and |=> operations between values of the same type.

Performs range queries and distance sorting on non-text, non-array data types.

rum_tsvector_addon_ops

tsvector

WHERE: A @@ B

Attaches data from an additional column (such as a timestamp) to a tsvector index. This supports full-text search on the primary column while allowing efficient sorting on the attached column.

Note

The data type of the attached column must be supported by a corresponding rum_<TYPE>_ops operator class. To use the index for acceleration, sorting must use the <=>, <=|, or |=> operator.

rum_tsvector_hash_addon_ops

tsvector

WHERE: A @@ B

Same function as rum_tsvector_addon_ops.

  • Because it stores hash values of lexemes, it does not support prefix search.

  • The index size may be smaller than with rum_tsvector_addon_ops.

  • Hash collisions can occur, requiring a recheck.

  • The search may be slower than with rum_tsvector_addon_ops.

rum_tsquery_ops

tsquery

WHERE: A @@ B

Used to index a tsquery column. It can accelerate queries in reverse, quickly finding which stored query conditions (tsquery) match a given document (tsvector).

rum_anyarray_ops

anyarray, for example, int[], text[], varchar[]

  • WHERE:

    • &&: Do the arrays overlap (have common elements)?

    • @>: Does the left array contain all elements of the right array?

    • <@: Contained in.

    • =: Are the arrays equal?

    • : Are the arrays similar? (Similarity is calculated. If it exceeds a threshold, the arrays are considered similar.)

  • ORDER BY

    • <=>: The distance between two arrays.

Indexes array types. Supports array operations such as contains and overlap, and supports sorting by the distance between arrays.

rum_anyarray_addon_ops

anyarray, for example, int[], text[], varchar[]

  • WHERE:

    • &&: Do the arrays overlap (have common elements)?

    • @>: Does the left array contain all elements of the right array?

    • <@: Contained within.

    • =: Are the arrays equal?

    • : Are the arrays similar? (Similarity is calculated. If it exceeds a threshold, the arrays are considered similar.)

  • ORDER BY

    • <=>: The distance between two arrays.

Attaches data from an additional column to an array index to support more complex query scenarios.

Note

The data type of the attached column must be supported by a corresponding rum_<TYPE>_ops operator class. To use the index for acceleration, sorting must use the <=>, <=|, or |=> operator.

Scenario 1: Accelerate relevance sorting for full-text search results

When you need to sort full-text search results by relevance, you can use a RUM index to avoid the extra sorting overhead that is required by GIN indexes and achieve high-performance sorting.

  1. Prepare the data: First, create a test table.

    CREATE TABLE t1(
      t text,
      t_vec tsvector GENERATED ALWAYS AS (to_tsvector('pg_catalog.english', t)) STORED
    );
    
    -- Insert test data
    INSERT INTO t1(t) VALUES ('The situation is most beautiful');
    INSERT INTO t1(t) VALUES ('It is a beautiful');
    INSERT INTO t1(t) VALUES ('It looks like a beautiful place');
  2. Create a RUM index: Use the rum_tsvector_ops operator class to create a RUM index for the tsvector column.

    CREATE INDEX t1_t_vec_idx ON t1 USING rum (t_vec rum_tsvector_ops);
  3. Run a relevance sorting query: Use the <=> operator to query and sort. This operator calculates the distance between the query and the text. A smaller distance indicates higher relevance. Therefore, using ORDER BY sorts the results by relevance.

    SET enable_seqscan TO off;
    
    SELECT t, t_vec <=> to_tsquery('english', 'beautiful | place') AS rank
    FROM t1
    WHERE t_vec @@ to_tsquery('english', 'beautiful | place')
    ORDER BY t_vec <=> to_tsquery('english', 'beautiful | place');

    The following result is returned:

                    t                |  rank   
    ---------------------------------+---------
     It looks like a beautiful place | 8.22467
     The situation is most beautiful | 16.4493
     It is a beautiful               | 16.4493

Scenario 2: Accelerate combined sorting of full-text search and an attached column

In scenarios such as log analysis or E-commerce search, you often need to perform a full-text search and sort the results by an additional field, such as a timestamp or price. The RUM add-on feature lets you store information from an attached column in the index. This enables efficient combined queries and sorting.

  1. Prepare the data: Create a table that contains a tsvector column and a timestamp column, and then insert sample data.

    CREATE TABLE tsts (id int, t tsvector, d timestamp);
    INSERT INTO tsts VALUES
    (354, to_tsvector('wr qh'), '2016-05-16 14:21:22.326724'),
    (355, to_tsvector('wr qh'), '2016-05-16 13:21:22.326724'),
    (356, to_tsvector('ts op'), '2016-05-16 18:21:22.326724'),
    (358, to_tsvector('ts op'), '2016-05-16 23:21:22.326724'),
    (371, to_tsvector('wr qh'), '2016-05-17 06:21:22.326724'),
    (406, to_tsvector('wr qh'), '2016-05-18 17:21:22.326724'),
    (415, to_tsvector('wr qh'), '2016-05-19 02:21:22.326724');
  2. Create a RUM index with an attached column: Use the rum_tsvector_addon_ops operator class and specify the attached column and the primary index column in the WITH clause.

    CREATE INDEX tsts_idx ON tsts USING rum (t rum_tsvector_addon_ops, d) WITH (attach = 'd', to = 't');
    Note

    The key syntax WITH (attach = 'd', to = 't') attaches the values of the d column (the attached column, which is a timestamp) to the index entries of the t column (the primary index column of the tsvector type). This allows the database to use the index on the t column for full-text search and the attached d column information for efficient sorting in a single index scan. This process avoids table lookups and significantly improves performance.

  3. Run a combined sorting query: Query for records that contain specific words and sort them by how close their timestamp is to a target time.

    SET enable_seqscan TO off;
    
    EXPLAIN (costs off)
    SELECT id, d, d <=> '2016-05-16 14:21:25' AS distance
    FROM tsts
    WHERE t @@ 'wr&qh'
    ORDER BY d <=> '2016-05-16 14:21:25'
    LIMIT 5;

    The following execution plan shows that both sorting and filtering are completed in a single index scan.

                                      QUERY PLAN                                  
    ------------------------------------------------------------------------------
     Limit
       ->  Index Scan using tsts_idx on tsts
             Index Cond: (t @@ '''wr'' & ''qh'''::tsquery)
             Order By: (d <=> '2016-05-16 14:21:25'::timestamp without time zone)

    The following result is returned:

     id  |             d              |   distance    
    -----+----------------------------+---------------
     354 | 2016-05-16 14:21:22.326724 |      2.673276
     355 | 2016-05-16 13:21:22.326724 |   3602.673276
     371 | 2016-05-17 06:21:22.326724 |  57597.326724
     406 | 2016-05-18 17:21:22.326724 | 183597.326724
     415 | 2016-05-19 02:21:22.326724 | 215997.326724

Scenario 3: Accelerate array queries and similarity sorting

For scenarios such as tagging systems or user personas, you need to efficiently query arrays that contain specific elements and sort them by the degree of overlap or similarity.

  1. Prepare the data:

    CREATE TABLE test_array (id serial, i int2[]);
    INSERT INTO test_array(i) VALUES ('{}'), ('{0}'), ('{1,2,3,4}'), ('{1,2,3}'), ('{1,2}'), ('{1}');
  2. Create a RUM index for an array: Use the rum_anyarray_ops operator class to create an index for the array column.

    CREATE INDEX idx_array ON test_array USING rum (i rum_anyarray_ops);
  3. Run an array query and sort: Query for records that contain the element 1 and sort them by similarity to {1}.

    SELECT *
    FROM test_array
    WHERE i && '{1}' -- The '&&' operator indicates array overlap.
    ORDER BY i <=> '{1}' ASC; -- The '<=>' operator calculates the distance between arrays. A smaller value indicates greater similarity.

    The following result is returned:

         i
    -----------
     {1}
     {1,2}
     {1,2,3}
     {1,2,3,4}

Scenario 4: Inverted index to quickly match query rules

When building systems for user subscriptions or alert rule matching, you need to quickly match new data, such as an article, against many existing query rules, such as user-subscribed keywords. The RUM extension supports creating an index on the tsquery type to perform efficient inverted matching.

  1. Prepare the query rule data:

    CREATE TABLE query (id serial, q tsquery, tag text);
    INSERT INTO query (q, tag) VALUES
    ('supernova & star', 'sn'),
    ('black', 'color'),
    ('big & bang & black & hole', 'bang'),
    ('spiral & galaxy', 'shape'),
    ('black & hole', 'color');
  2. Create a RUM index for tsquery:

    CREATE INDEX query_idx ON query USING rum(q rum_tsquery_ops);
  3. Run an inverted match query: Use the tsvector of a new article to match all qualifying tsquery rules.

    SELECT *
    FROM query
    WHERE to_tsvector('black holes never exists before we think about them') @@ q;

    The following result is returned:

    id   |    q     |  tag
    -----+----------+-------
    2    | 'black'  | color