This topic describes how to use the RUM extension of ApsaraDB RDS for PostgreSQL to run full-text searches.
Prerequisites
The RDS instance runs PostgreSQL 10 or later.
If the RDS instance runs PostgreSQL 14 or PostgreSQL 15, the minor engine version of the RDS instance must be 20221030 or later. For more information about how to view and update the minor engine version of your RDS instance, see Update the minor engine version of an ApsaraDB RDS for PostgreSQL instance.
This extension is not supported by ApsaraDB RDS for PostgreSQL instances that run PostgreSQL 17.
Background information
Generalized Inverted Index (GIN) allows you to run full-text searches by using the tsvector and tsquery data types. However, this may produce the following issues:
Slow sorting
ApsaraDB RDS for PostgreSQL can sort words only after it obtains the locations of the words. However, GIN does not store word locations. After ApsaraDB RDS for PostgreSQL runs a scan based on a GIN index, it must run another scan to retrieve the word locations.
Slow queries for phrases
ApsaraDB RDS for PostgreSQL can search for phrases based on GIN indexes only after it obtains the locations of the phrases.
Slow sorting of timestamps
GIN does not store related information in indexes that contain morphemes. Therefore, an additional scan is required.
The RUM extension is designed based on GIN. It allows you to store word or timestamp locations in RUM indexes.
However, the RUM extension requires more time than GIN to construct and insert indexes. This is because the RUM extension generates indexes based on write-ahead logging (WAL) logs and the generated RUM indexes contain more information than the keys that are used for encryption.
Enable or disable the extension
Enable the extension
CREATE EXTENSION rum;
Disable the extension
DROP EXTENSION rum;
Universal operators
The following table describes the operators provided by the RUM extension.
Operator | Data type | Description |
tsvector <=> tsquery | float4 | Returns the distance between the data object of the tsvector type and that of the tsquery type. |
timestamp <=> timestamp | float8 | Returns the distance between two timestamps. |
timestamp <=| timestamp | float8 | Returns only the distance to the left-side timestamp. |
timestamp |=> timestamp | float8 | Returns only the distance to the right-side timestamp. |
The last three operators are also supported for the following data types: timestamptz, int2, int4, int8, float4, float8, money, and oid.
For more information about the functions that are provided by the RUM extension, visit the official website.
References
The method to use the RUM extension is the same as the method to use the open source extension. For more information, see official documentation.