All Products
Search
Document Center

OpenSearch:Text analyzers

Last Updated:Dec 08, 2025

N-gram analyzer

Description: This analyzer tokenizes text into sequences of N consecutive characters. It supports 2-grams and 3-grams and is suitable for non-semantic search scenarios.

Important

This analyzer is available only for exclusive applications and requires the field type to be SHORT_TEXT.

Examples:

  • 2-gram

    If the document field contains "Open Search", the tokenized result is 'op','pe','en','n ',' s','se','ea','ar','rc','ch'
  • 3-gram

    If the document field contains "Open Search", the tokenized result is 'ope','pen','en ','n s',' se','sea','ear','arc','rch'

Keyword analyzer

Description: This analyzer does not tokenize text. It is suitable for scenarios that require an exact match, such as for tags, keywords, or string and numeric content that should not be tokenized.

Note: This analyzer applies to fields of the LITERAL, INT, LITERAL_ARRAY, and INT_ARRAY types.

Example:

For example, if a document field contains "chrysanthemum tea", the document can be retrieved only if you search for "chrysanthemum tea".

General analyzer for Chinese

Description: This general-purpose analyzer tokenizes text into search units based on Chinese semantics and is suitable for most industries.

Note: This analyzer applies to fields of the TEXT and SHORT_TEXT types.

Example:

For example, if a document field contains "菊花茶", the document can be retrieved if you search for "菊花茶", "菊花", "茶", or "花茶".

E-commerce analyzer for Chinese

Description: This analyzer is optimized for the E-commerce industry.

Note: This analyzer applies to fields of the TEXT and SHORT_TEXT types.

Example:

For example, if a document field contains "Dabao SOD lotion", the document can be retrieved if you search for "Dabao", "sod", "sod lotion", "SOD lotion", or "lotion".

Single-character analyzer for Chinese

Description: This analyzer tokenizes text into single Chinese characters and words. It is suitable for non-semantic Chinese search scenarios, such as searches for author names or store names.

Note: This analyzer applies to fields of the TEXT and SHORT_TEXT types.

Example:

For example, if a document field contains "菊花茶", the document can be retrieved if you search for "菊花茶", "菊花", "茶", "花茶", "菊花", "花", or "菊茶".

Fuzzy analyzer

Description: This analyzer supports searches by pinyin, single characters, and letters. It also supports prefix and suffix matching for numbers, letters, and pinyin, but not for Chinese text. The field length is limited to 100 bytes. For more information, see Fuzzy searches.

Note: This analyzer applies only to fields of the SHORT_TEXT type.

Examples:

For example, if a document field contains "chrysanthemum tea", the document can be retrieved if you search for "chrysanthemum tea", "chrysanthemum", "tea", "flower tea", "chrysanthemum", "flower", "chrysanthemum tea", "ju", "juhua", "juhuacha", "j", "jh", or "jhc".
For example, if a document field contains the phone number "138****5678", use "^138" to search for phone numbers that start with "138", and use "5678$" to search for phone numbers that end with "5678".
For example, if a document field contains "OpenSearch", the document can be retrieved by searching for a single letter or a combination of letters from the word.

Word stemming analyzer for English

Description: This analyzer is suitable for English semantic search scenarios. By default, it stems each tokenized English word to its root form and handles pluralization.

Note: This analyzer applies to fields of the TEXT and SHORT_TEXT types. This analyzer does not support query analysis configurations.

Example:

For example, if a document field contains "英文分词器 english analyzer", the document can be retrieved if you search for "英文分词器", "english", "analyz", "analyzer", "analyzers", "analyze", "analyzed", or "analyzing".
(Note: Consecutive Chinese characters are treated as a single token by English analyzers.)

Unstemmed word analyzer for English

Description: This analyzer tokenizes text based on spaces and punctuation marks. It is suitable for search scenarios that are not based on English semantics, such as searches for book titles or author names.

Note: This analyzer applies to fields of the TEXT and SHORT_TEXT types. This analyzer does not support query analysis configurations.

Example:

For example, if a document field contains "英文分词器 english analyzer", the document can be retrieved if you search for "英文分词器", "english", or "analyzer".
(Note: Consecutive Chinese characters are treated as a single token by English analyzers.)

Fine-grained analyzer for English

Description: This analyzer tokenizes text into search units based on English semantics and is suitable for general industry applications.

Note: This analyzer applies to fields of the TEXT and SHORT_TEXT types.

This analyzer is available only for exclusive applications.

Example:

If a document field contains "dataprocess", the analysis result is "data process". In this case, the document can be retrieved if you search for "dataprocess", "data process", "data", or "process".

Full pinyin spelling analyzer

Description: This analyzer enables you to search for Chinese characters in short text using their full pinyin spelling or the first letter of their pinyin. It is suitable for searches that use full or abbreviated pinyin, such as searches for movie names or author names. To search for characters using full pinyin, you must enter the complete pinyin of the Chinese characters, not a partial spelling.

Note: This analyzer applies only to fields of the SHORT_TEXT type.

Examples:

For example: If the content of a document field is "Da Nei Mi Tan 007", the document can be retrieved when you search for "d", "dn", "dnm", "dnmt", "dnmt007", "da", "danei", "daneimi", or "daneimitan". The document cannot be retrieved when you search for "an" or "anei".

Abbreviated pinyin analyzer

Description: This analyzer lets you retrieve Chinese characters in short text using the first letter of their pinyin. It is suitable for scenarios that require searches by pinyin initials, such as for people's names or movie names.

Note: This analyzer applies to fields of the SHORT_TEXT type.

Examples:

For example, if a document field contains "Da Nei Mi Tan 007", a search for "d", "dn", "dnm", "dnmt", "dnmt0", "dnmt007", "m", "mt", "mt007", or "007" retrieves the document.

Simple analyzer

Description: This analyzer provides full control over searches. It is suitable for special scenarios where the system's built-in analyzers cannot meet your requirements. When you push documents or perform searches, use the tab character ('\t') to separate field content or search queries. Ensure that the field content and search queries are tokenized in the same way. Otherwise, documents cannot be retrieved.

Note: This analyzer applies to fields of the TEXT and SHORT_TEXT types. This analyzer does not support query analysis configuration.

Examples:

For example: If the content of a field is "chrysanthemum\tflower tea\thao", the document can be retrieved only when you search for "chrysanthemum", "flower tea", "chrysanthemum\tflower tea", "flower tea\thao", "chrysanthemum\thao", or "chrysanthemum\tflower tea\thao".

Numeric analyzer

Description: This analyzer is suitable for searches based on time intervals or numerical ranges.

Note: This analyzer applies to fields of the INT and TIMESTAMP types.

Examples:

query=default:'OpenSearch' AND index:[number1,number2]
// In this example, index is the name of the index for which the numerical value analyzer is configured.

Geo-location analyzer

Description: This analyzer is suitable for scenarios that require geographic location range queries.

Note: This applies only to the geo_point field type.

Examples:

query=spatial_index:'circle(116.5806 39.99624, 1000)'
// Queries points within a circle to find nearby locations within a few kilometers.

IT content analyzer

Description: This industry-specific analyzer is designed for content in the IT industry. It tokenizes IT-related terms differently than a general-purpose analyzer.

Note: This applies only to the TEXT and SHORT_TEXT field types.

Examples:

Example: Original content: C++ array usage notes
General analysis: C++ array usage notes
IT content analysis: C++ array usage notes

General e-commerce analysis

Description: This industry-specific analyzer is designed for the E-commerce industry. It uses the natural language processing (NLP) technology of Alibaba DAMO Academy and years of industry experience to provide query analysis capabilities that resolve common pain points in E-commerce.

Note:

This analyzer applies to fields of the TEXT type.

This analyzer is available only for exclusive applications of the E-commerce Industry Enhanced specification.

Examples:

Example: Original text: Small Gold Tube Concealer Cream
General analysis: "Small Gold Tube" "Concealer" "Cream"
E-commerce analysis: "Small Gold Tube" "Concealer" "Cream"

General analyzer for Thai

Description: This general-purpose analyzer tokenizes Thai text into search units and is suitable for general industry applications.

Note: This analyzer applies to fields of the TEXT and SHORT_TEXT types.

This analyzer is available only for exclusive applications.

Examples:

If the content of a document field is "แหล่งดึงดูดนักท่องเที่ยว" and it is tokenized as "แหล่ง ดึง ดูด นักท่องเที่ยว", the document can be retrieved when you search for "นักท่องเที่ยว" or "แหล่งดึงดูดนักท่องเที่ยว".

E-commerce analyzer for Thai

Description: This analyzer is designed for Thai-language E-commerce scenarios.

Note: This analyzer applies to fields of the TEXT and SHORT_TEXT types.

This analyzer is available only for exclusive applications.

Examples:

If the value of a field in a document is "หน้าจอโทรศัพท์" and the tokenization result is "หน้าจอ โทรศัพท์", the document can be retrieved by searching for "หน้าจอโทรศัพท์", "หน้าจอ", or "โทรศัพท์".

General analyzer for Vietnamese

Description: This analyzer is suitable for Vietnamese text analysis in general industries.

Note: This analyzer applies to fields of the TEXT and SHORT_TEXT types.

This analyzer is available only for exclusive applications.

General analysis for the gaming industry

Description: This analyzer is designed for the gaming industry.

Note: This applies only to fields of the TEXT and SHORT_TEXT types.

This analyzer is available only for exclusive applications with the Game Industry Enhanced specification.

Examples:

If a document field contains "Genshin equipment" and is tokenized into "Genshin" and "equipment", a search for "Genshin equipment", "Genshin", or "equipment" retrieves the document.

General analyzer for English E-commerce

Description: This analyzer is suitable for English text analysis in the E-commerce industry.

Note: This applies only to the TEXT field type.

This analyzer is available only for exclusive applications of the Industry-specific Enhanced Edition for E-commerce.

Character analyzer for Chinese

Description: This analyzer tokenizes text into single Chinese characters, numbers, English letters, and punctuation marks. It is suitable for non-semantic search scenarios.

Note: This analyzer applies to fields of the TEXT and SHORT_TEXT types.

This analyzer is available only for exclusive applications.

Examples:

For example: If the document field content is "开放搜索OpenSearch123.", the document can be retrieved by searching for "开", "放", "搜", "索", "O", "p", "e", "n", "S", "e", "a", "r", "c", "h", or "."

General analyzer for Korean

Description: This analyzer is suitable for Korean text analysis in general industries.

Note: This analyzer applies to fields of the TEXT and SHORT_TEXT types.

This analyzer is available only for exclusive applications.

Examples:

If the content of a document field is "인제군의교육" and the tokenization result is "인제군 의 교육", the document can be retrieved by searching for "인제군의교육", "의", or "교육".

E-commerce analyzer for Korean

Description: This analyzer is designed for Korean text analysis in the E-commerce industry.

Note: This analyzer applies to fields of the TEXT and SHORT_TEXT types.

This analyzer is available only for exclusive applications.

Examples:

If a document field contains "스포츠캐주얼신발" and is tokenized into "스포츠 캐주얼 신발", the document can be retrieved by a search for "스포츠", "캐주얼", or "신발".

General analyzer for Japanese

Description: This analyzer is suitable for Japanese text analysis in general industries.

Note: This applies only to the TEXT and SHORT_TEXT field types.

This analyzer is available only for exclusive applications.

Examples:

If the document field content is "メキシコアグーチ" and the tokenization result is "メキシコ アグーチ", the document can be retrieved by searching for "メキシコ" or "アグーチ".

E-commerce analyzer for Japanese

Description: This analyzer is designed for Japanese text in the E-commerce industry.

Note: This analyzer applies to fields of the TEXT and SHORT_TEXT types.

This analyzer is available only for exclusive applications.

Examples:

If a document field's content is "ラウンドネックスーツ" and the tokenization result is "ラウンド ネック スーツ", the document can be retrieved by a search for "ラウンド", "ネック", or "スーツ".

Custom text analyzer

Description: This analyzer combines an industry-specific analyzer, such as a general analyzer, an E-commerce analyzer, or a person name analyzer, with custom intervention entries. For more information, see Custom text analyzers.

Note: This applies only to TEXT and SHORT_TEXT field types.

Analyze tests

You can test the analysis results of industry-specific and custom analyzers. In the OpenSearch console, navigate to Search Algorithm Center > Retrieval Configuration > Analyzer Management and click the Analysis Test tab. The following figure shows an example.

4

Scenarios

  • For semantic searches in Chinese, use a Chinese semantic analyzer.

  • For Chinese searches of short text or in non-semantic scenarios where precise sorting is not required, use a Chinese single-character analyzer to improve retrieval recall.

  • For Pinyin searches, use the fuzzy analyzer.

  • For searches in English, use an English stemming analyzer.

  • In some scenarios, you can use a Chinese semantic analyzer and a single-character analyzer together to achieve better search results. For example, combine the query query=title_index:'菊花茶' OR sws_title_index:'菊花茶' with the fine sort expression text_relevance(title)×5+field_proximity(sws_title). This combination retrieves documents that contain the individual characters for "菊花茶" even if they are separated, and ranks documents with the exact phrase "菊花茶" higher.

Usage notes

  • Supported field types for index fields

    INT, INT_ARRAY, TEXT, SHORT_TEXT, LITERAL, LITERAL_ARRAY, TIMESTAMP, and GEO_POINT

    Unsupported field types for index fields

    FLOAT, FLOAT_ARRAY, DOUBLE, and DOUBLE_ARRAY

  • If a search result summary is configured for a TEXT field, some phrases in extended search units, such as "花茶" in the preceding example, are not highlighted.

  • The single-character Chinese analyzer treats numbers and English words as single tokens. For example, for the text "hello world", a search for "hello" retrieves the document, but a search for "he" does not. To retrieve documents based on partial word matches, use the fuzzy analyzer.

  • By default, the primary key of the primary table in the application schema is set as an index field named "id". This configuration cannot be modified.