The pg_bigm extension that is provided by ApsaraDB RDS for PostgreSQL supports full-text search. You can use this extension to create 2-gram Generalized Inverted Index (GIN) indexes. These indexes help expedite full-text search queries.
Prerequisites
Your RDS instance runs PostgreSQL 10 or later.
Your RDS instance runs a minor engine version of 20230830 or later.
ImportantThis extension is supported in some minor engine versions earlier than 20230830. However, ApsaraDB RDS plans to optimize vulnerable extensions in minor engine version updates for standardized extension management and enhanced security. Therefore, you cannot create this extension for RDS instances that run a minor engine version earlier than 20230830. For more information, see [Product changes/Feature changes] Limits on extension creation for ApsaraDB RDS for PostgreSQL instances.
If you have already created this extension for your RDS instance that runs a minor engine version earlier than 20230830, you can continue using this extension.
If you are creating this extension for the first time or need to recreate the extension for your RDS instance, you must update the minor engine version of the RDS instance to the latest version. For more information, see Update the minor engine version.
pg_bigm is added to the value of the shared_preload_libraries parameter of your RDS instance.
You can add pg_bigm to the value of the shared_preload_libraries parameter in the ApsaraDB RDS console or by calling an API operation. For more information, see Modify the parameters of an ApsaraDB RDS for PostgreSQL instance.
This extension is not supported by ApsaraDB RDS for PostgreSQL instances that run PostgreSQL 17.
Comparison between the pg_bigm extension and the pg_trgm extension
The pg_trgm extension is also provided by ApsaraDB RDS for PostgreSQL. The pg_trgm extension uses a 3-gram model to perform full-text search. The pg_bigm extension is developed based on the pg_trgm extension. The following table describes the differences between the two extensions.
Functionality | pg_trgm | pg_bigm |
Phrase matching model | 3-gram | 2-gram |
Index types | GIN and GiST | GIN |
Operators |
|
|
Non-alphabet full-text search | Not supported | Supported |
Full-text search by using keywords that contain 1 to 2 characters | Slow | Fast |
Similarity search | Supported | Supported |
Maximum length of an indexed column | 238,609,291 bytes, which are approximately 228 MB | 107,374,180 bytes, which are approximately 102 MB |
Usage notes
The length of the column on which you create a GIN index cannot exceed 107,374,180 bytes, which are approximately 102 MB.
If the data in your RDS instance is not encoded in the ASCII format, we recommend that you change the encoding format to UTF8.
NoteYou can run the
select pg_encoding_to_char(encoding) from pg_database where datname = current_database();
command to query the encoding format of your RDS instance.
Basic operations
Enable the pg_bigm extension.
postgres=> create extension pg_bigm; CREATE EXTENSION
Create a GIN index.
postgres=> CREATE TABLE pg_tools (tool text, description text); CREATE TABLE postgres=> INSERT INTO pg_tools VALUES ('pg_hint_plan', 'Tool that allows a user to specify an optimizer HINT to PostgreSQL'); INSERT 0 1 postgres=> INSERT INTO pg_tools VALUES ('pg_dbms_stats', 'Tool that allows a user to stabilize planner statistics in PostgreSQL'); INSERT 0 1 postgres=> INSERT INTO pg_tools VALUES ('pg_bigm', 'Tool that provides 2-gram full text search capability in PostgreSQL'); INSERT 0 1 postgres=> INSERT INTO pg_tools VALUES ('pg_trgm', 'Tool that provides 3-gram full text search capability in PostgreSQL'); INSERT 0 1 postgres=> CREATE INDEX pg_tools_idx ON pg_tools USING gin (description gin_bigm_ops); CREATE INDEX postgres=> CREATE INDEX pg_tools_multi_idx ON pg_tools USING gin (tool gin_bigm_ops, description gin_bigm_ops) WITH (FASTUPDATE = off); CREATE INDEX
Run a full-text search query.
postgres=> SELECT * FROM pg_tools WHERE description LIKE '%search%'; tool | description ---------+--------------------------------------------------------------------- pg_bigm | Tool that provides 2-gram full text search capability in PostgreSQL pg_trgm | Tool that provides 3-gram full text search capability in PostgreSQL (2 rows)
Use the
=%
operator to run a similarity search query.postgres=> SET pg_bigm.similarity_limit TO 0.2; SET postgres=> SELECT tool FROM pg_tools WHERE tool =% 'bigm'; tool --------- pg_bigm pg_trgm (2 rows)
Disable the pg_bigm extension.
postgres=> drop extension pg_bigm; DROP EXTENSION
Basic functions
likequery
Purpose: This function is used to generate a string that can be identified based on the LIKE keyword.
Request parameters: This function contains one request parameter. The data type for this parameter is STRING.
Return value: This function returns a string that can be identified based on the LIKE keyword.
Implementation:
Add a percent sign (
%
) preceding and following the keyword.Use a backward slash (
\
) to escape the percent sign (%
).
Example:
postgres=> SELECT likequery('pg_bigm has improved the full text search performance by 200%'); likequery ------------------------------------------------------------------- %pg\_bigm has improved the full text search performance by 200\%% (1 row) postgres=> SELECT * FROM pg_tools WHERE description LIKE likequery('search'); tool | description ---------+--------------------------------------------------------------------- pg_bigm | Tool that provides 2-gram full text search capability in PostgreSQL pg_trgm | Tool that provides 3-gram full text search capability in PostgreSQL (2 rows)
show_bigm
Purpose: This function is used to obtain all 2-gram elements of a string.
Request parameters: This function contains one request parameter. The data type for this parameter is STRING.
Return value: This parameter returns an array that consists of all 2-gram elements of a string.
Implementation:
Add a space preceding and following the string.
Identify all 2-gram elements in the string.
Example:
postgres=> SELECT show_bigm('full text search'); show_bigm ------------------------------------------------------------------ {" f"," s"," t",ar,ch,ea,ex,fu,"h ","l ",ll,rc,se,"t ",te,ul,xt} (1 row)
bigm_similarity
Purpose: This function is used to obtain the similarity between two strings.
Request parameters: This function contains two request parameters. The data types for these parameters are STRING.
Return value: This function returns a floating-point number, which indicates the similarity between the two strings.
Implementation:
Identify the 2-gram elements that are included in both of the two strings.
The return value is within the range of 0 to 1. The value 0 indicates that the two strings are different. The value 1 indicates that the two strings are the same.
NoteThis function adds a space preceding and following each string. Therefore, the similarity between the
ABC
string and theB
string is 0, and the similarity between theABC
string and theA
string is 0.25.This function distinguishes between uppercase letters and lowercase letters. For example, this function determines that the similarity between the
ABC
string and theabc
string is 0.
Example:
postgres=> SELECT bigm_similarity('full text search', 'text similarity search'); bigm_similarity ----------------- 0.5714286 (1 row) postgres=> SELECT bigm_similarity('ABC', 'A'); bigm_similarity ----------------- 0.25 (1 row) postgres=> SELECT bigm_similarity('ABC', 'B'); bigm_similarity ----------------- 0 (1 row) postgres=> SELECT bigm_similarity('ABC', 'abc'); bigm_similarity ----------------- 0 (1 row)
pg_gin_pending_stats
Purpose: This function is used to obtain the number of pages and the number of tuples in the pending list of a GIN index.
Request parameters: This function contains one parameter. This parameter specifies the name or OID of the GIN index.
Return value: This function returns two values: the number of pages and the number of tuples in the pending list of the GIN index.
NoteIf you set the FASTUPDATE parameter to False for a GIN index, the GIN index does not have a pending list. In this case, this function returns two values, 0 and 0.
Example:
postgres=> SELECT * FROM pg_gin_pending_stats('pg_tools_idx'); pages | tuples -------+-------- 0 | 0 (1 row)
Behavior control
pg_bigm.last_update
This parameter indicates the date on which the most recent update was made to the pg_bigm extension. You can only view this parameter. You cannot change the value of this parameter.
Example:
SHOW pg_bigm.last_update;
pg_bigm.enable_recheck
This parameter specifies whether to perform a recheck.
NoteWe recommend that you retain the default value ON. This way, you can obtain accurate query results.
Example:
postgres=> CREATE TABLE tbl (doc text); CREATE TABLE postgres=> INSERT INTO tbl VALUES('He is awaiting trial'); INSERT 0 1 postgres=> INSERT INTO tbl VALUES('It was a trivial mistake'); INSERT 0 1 postgres=> CREATE INDEX tbl_idx ON tbl USING gin (doc gin_bigm_ops); CREATE INDEX postgres=> SET enable_seqscan TO off; SET postgres=> EXPLAIN ANALYZE SELECT * FROM tbl WHERE doc LIKE likequery('trial'); QUERY PLAN ----------------------------------------------------------------------------------------------------------------- Bitmap Heap Scan on tbl (cost=20.00..24.01 rows=1 width=32) (actual time=0.020..0.021 rows=1 loops=1) Recheck Cond: (doc ~~ '%trial%'::text) Rows Removed by Index Recheck: 1 Heap Blocks: exact=1 -> Bitmap Index Scan on tbl_idx (cost=0.00..20.00 rows=1 width=0) (actual time=0.013..0.013 rows=2 loops=1) Index Cond: (doc ~~ '%trial%'::text) Planning Time: 0.117 ms Execution Time: 0.043 ms (8 rows) postgres=> postgres=> SELECT * FROM tbl WHERE doc LIKE likequery('trial'); doc ---------------------- He is awaiting trial (1 row) postgres=> SET pg_bigm.enable_recheck = off; SET postgres=> SELECT * FROM tbl WHERE doc LIKE likequery('trial'); doc -------------------------- He is awaiting trial It was a trivial mistake (2 rows)
pg_bigm.gin_key_limit
This parameter specifies the maximum number of 2-gram elements that can be used to perform a full-text search query. The default value is 0, which indicates that all 2-gram elements are used.
NoteIf query performance decreases due to the use of all 2-gram elements, you can decrease the value of this parameter.
pg_bigm.similarity_limit
This parameter specifies the threshold for similarity. The tuples between which the similarity exceeds the specified threshold are returned as similarity search results.