All Products
Search
Document Center

OpenSearch:General-purpose Chinese text analyzer

Last Updated:Aug 27, 2024

Overview

The general-purpose Chinese text analyzer (chn_standard) tokenizes text based on Chinese semantics. This analyzer is suitable for all industries over the entire network. Search units are the minimum granularity that is used for text analysis. The general-purpose Chinese text analyzer uses search units for text analysis and supports extended analysis. For example, if the value of a field in a document is "菊花茶", the analysis result is "菊花 茶 花茶", where "花茶" is an extended term of "茶."

Example:
Original content: 菊花茶 
Analysis result: 菊花  茶 花茶

Intervene in text analysis

If you want to intervene in the analysis result that is returned by the general-purpose Chinese text analyzer, modify the chn_standard.dict dictionary in the advanced settings and then publish the modified version of advanced settings as a new version. An intervention entry is a medium-granularity entry. OpenSearch Vector Search Edition converts the intervention entry into search units when it performs a search. For example, you add "搜索引擎" as an intervention entry to the dictionary. When a user searches for "搜索引擎", OpenSearch Vector Search Edition finds a match in the dictionary and then continues to convert the intervention entry "搜索引擎" into the following terms: "搜索" and "引擎".

Usage notes

  • This analyzer applies only to fields of the TEXT data type. To use the analyzer, set the analyzer to chn_standard when you configure a schema.