This topic describes how to enable the zhparser plug-in and customize a Chinese word segment dictionary in PolarDB for PostgreSQL.
Enable the zhparser plug-in
Execute the following statements to enable the zhparser plug-in:
CREATE EXTENSION zhparser;
CREATE TEXT SEARCH CONFIGURATION testzhcfg (PARSER = zhparser);
ALTER TEXT SEARCH CONFIGURATION testzhcfg ADD MAPPING FOR n,v,a,i,e,l WITH simple;
--Optional parameter configuration
ALTER role ALL SET zhparser.multi_short=on;
--Perform a simple test
SELECT * FROM ts_parse('zhparser', 'hello world! 2010年保障房建设在全国范围内获全面启动,从中央到地方纷纷加大 了 保 障 房 的 建 设 和 投 入 力 度 。2011年,保障房进入了更大规模的建设阶段。住房城乡建设部党组书记、部长姜伟新去年底在全国住房城乡建设工作会议上表示,要继续推进保障性安居工程建设。');
SELECT to_tsvector('testzhcfg','“今年保障房新开工数量虽然有所下调,但实际的年度在建规模以及竣工规模会超以往年份,相对应的对资金的需求也会创历史纪录。”陈国强说。在他看来,与2011年相比,2012年的保障房建设在资金配套上的压力将更为严峻。');
SELECT to_tsquery('testzhcfg', '保障房资金压力');
Execute the following statements to use the zhparser plug-in to run a full-text index:
--Create a full-text index for the name field of table T1
CREATE index idx_t1 ON t1 USING gin (to_tsvector('zhcfg',upper(name) ));
--Use the full-text index
SELECT * FROM t1 WHERE to_tsvector('zhcfg',upper(t1.name)) @@ to_tsquery('zhcfg','(防火)') ;
Customize a Chinese word segment dictionary
Execute the following statements to customize a Chinese word segment dictionary
-- The segmentation result
SELECT to_tsquery('testzhcfg', '保障房资金压力');
-- Insert a new word segment to the dictionary
INSERT INTO pg_ts_custom_word VALUES ('保障房资');
-- Make the inserted word segment effective
SELECT zhprs_sync_dict_xdb();
-- End the connection
\c
-- Requery to obtain new segmentation results
SELECT to_tsquery('testzhcfg', '保障房资金压力');
Instructions to use custom word segments:
A maximum of 1 million custom word segments can be added. If the number of word segments exceed the limit, the word segments outside the limit are not processed. Ensure that the number of word segments is within this range. The custom and default word segmentation dictionaries take effect at the same time.
Each word segment can be a maximum of 128 bytes in length. The section after the 128th byte will be truncated.
After adding, deleting, or changing word segments, execute the
SELECT zhprs_sync_dict_xdb();
statement and re-establish a connection to make the operation take effect.