AnalyticDB for PostgreSQL allows you to use the pg_jieba extension to perform Chinese word segmentation and implement efficient Chinese full-text search.
Introduction
Jieba is a commonly used tool for Chinese word segmentation. The pg_jieba extension introduces the Chinese word segmentation capability of Jieba into PostgreSQL databases to help you implement efficient Chinese full-text search. AnalyticDB for PostgreSQL allows you to use the pg_jieba extension for distributed queries.
Prerequisites
Before you use the pg_jieba extension, make sure that the following requirements are met:
The AnalyticDB for PostgreSQL instance that you want to manage is in elastic storage mode.
The minor version of an AnalyticDB for PostgreSQL V6.0 instance is 6.6.2.1 or later. The minor version of an AnalyticDB for PostgreSQL V7.0 instance is 7.0.5 or later.
NoteFor information about how to view the minor version of an AnalyticDB for PostgreSQL instance, see View the minor engine version.
Install the pg_jieba extension
Before you use Jieba, install the pg_jieba extension on the Extensions page of the AnalyticDB for PostgreSQL instance. For more information, see Install, update, and uninstall extensions.
Switch to the public schema of the specified database and execute the following statement to check whether the pg_jieba extension is installed:
SELECT * FROM pg_extension WHERE extname = 'pg_jieba';
If the following result is returned, the pg_jieba extension is installed. If the following result is not returned, the pg_jieba extension is not installed for the public schema of the specified database.
+--------+--------+--------+--------+ |oid |extname |extowner|... | +--------+--------+--------+--------+ |17194 |pg_jieba|10. |... | +--------+--------+--------+--------+
Chinese word segmentation
After you install the pg_jieba extension, you can use the extension to perform Chinese word segmentation.
Example 1:
SELECT to_tsvector('jiebacfg', '有两种方法进行全文检索');
The following result is returned:
+---------------------------------------+
| to_tsvector |
+---------------------------------------+
|'两种':2 '全文检索':5 '方法':3 '进行':4 |
+---------------------------------------+
(1 row)
Example 2:
SELECT to_tsvector('jiebacfg', '有两种方法进行全文检索') @@ to_tsquery('jiebacfg', '全文检索');
+----------+
| ?column? |
+----------+
| t |
+----------+
(1 row)
Custom dictionaries
The pg_jieba extension supports custom dictionaries in AnalyticDB for PostgreSQL. You can add data to or remove data from the custom dictionary table named jieba.jieba_custom_word
to add or remove custom words.
You do not need to manually create a dictionary table. When the pg_jieba extension is installed, the system automatically creates a custom dictionary table named
jieba.jieba_custom_word
.The
jieba.jieba_custom_word
table has the following data structure:CREATE TABLE jieba.jieba_custom_word ( word text primary key, -- Custom word weight float8 default '1.0', -- Weight type text default 'x' -- Part of speech );
Apply for permissions to use the custom dictionary table
Submit a ticket to apply for permissions to use the jieba.jieba_custom_word table. Then, you can add words to the jieba.jieba_custom_word table, remove words from the table, query the table, and use the table to perform Chinese word segmentation.
Add a word to the custom dictionary table
INSERT INTO jieba.jieba_custom_word values('两种方法');
Remove a word from the custom dictionary table
DELETE FROM jieba.jieba_custom_word WHERE word='两种方法';
Query the custom dictionary table
SELECT * FROM jieba.jieba_custom_word;
Load the custom dictionary table
After you add words to or remove words from the jieba.jieba_custom_word table, you must reload the table to allow the modifications to take effect. Execute the following statement to reload the jieba.jieba_custom_word table:
SELECT jieba.jieba_load_user_dict();
Check the Chinese word segmentation effect
Execute the following sample statement before and after you configure the jieba.jieba_custom_word table to check the Chinese word segmentation effect:
SELECT to_tsvector('jiebacfg', '有两种方法进行全文检索');
The following result is returned:
Scenario | Before configuring the jieba.jieba_custom_word table | After configuring the jieba.jieba_custom_word table |
Chinese word segmentation effect |
|
|
References
For information about full-text search, see Full Text Search.
For information about the functions and operators that can be used for full-text search, see Text Search Functions and Operators.