All Products
Search
Document Center

OpenSearch:Query analysis

Last Updated:Dec 18, 2024

The intent judgment based on keywords in search queries determines whether the search results meet relevant requirements. The query analysis feature can be used to understand the search intent in OpenSearch. This feature performs various intelligent analysis on a search query, rewrites the search query, and then submits the search query to the search engine. Then, the search engine retrieves and sorts data based on the search query. This topic describes the basic features that you can use for query analysis.

Stop word filtering

Meaningless words in search queries are filtered out. Meaningless words are words that appear at a high frequency but do not affect the search results, such as punctuation marks and modal particles.

Spelling correction

If the search queries that you entered contain spelling errors, the search results may not meet expectations or even no search results may be returned. To resolve the issue, OpenSearch checks the spelling of the search queries that you entered. The spelling correction feature for query analysis corrects the spelling errors that may be contained in a search query and produces a new search query with errors corrected. Then, OpenSearch determines whether to use the new search query with errors corrected to conduct a search based on the credibility of the spelling correction.

Term weight analysis

The term weight analysis feature evaluates the importance of each term in search queries and quantifies the evaluated importance as a weight. OpenSearch may not use low-importance terms to retrieve documents. This helps increase the number of documents that are retrieved. If the search queries that you entered contain low-importance terms and these terms are involved in the document retrieval process, only a small number of documents may be retrieved based on the search queries.

Synonym configuration

The search queries that you entered may have synonyms that carry the same meaning. For example, when you search for "Apple phone", the content that is related to iPhone can also be retrieved and displayed in the search results. The synonym configuration feature of OpenSearch retrieves documents based on the synonyms of search queries. This increases the number of documents that are retrieved.

NER

The named entity recognition (NER) feature recognizes each semantic entity in a search query after the search query is analyzed. Each semantic entity is added to a specific category. Semantic entity categories with lower priorities may be ignored in the search process. For example, the search query is "Nike Slim Dress". After NER, "Nike" is recognized as a brand name with medium priority, "Slim" a style element with low priority, and "Dress" a category name with high priority.

Category prediction

The category prediction feature is used to predict the relevance between the intent of a search query and business categories. Then, you can use sort expressions to adjust the sorting of documents in the search results. For example, the search results for the search query "mobile phone" include mobile phones and mobile phone cases. Based on category prediction, the relevance of the search query to the digital product category is higher than that to the digital accessory category. In this case, you can use sort expressions to rank the results of the digital product category higher than those of the digital accessory category.

Word embedding

The word embedding feature uses a vector model to convert the text in a search query into a vector and uses the multimodal search feature to return the text search results. For example, the search query "OpenSearch" is converted into [0.1,0.3,0.5] based on the vector model in the selected index analyzer. Then, the search results are obtained based on vector indexes. You can configure search policies when you configure the multimodal search feature.

Index range

image

Important
  • You can specify an index range for the following types of analyzers: the general analyzer for Chinese text, analyzer for Chinese text from the E-commerce industry, analyzer for IT-related text, analyzer for text from the gaming industry, and analyzer for text from the education industry.

  • You can change analysis methods for indexes in the application schema when you configure an application or modify an offline application.

Search query rewriting

The search query rewriting feature uses the AND or OR logical operator to specify how the results are retrieved for a search query based on the query terms. Note: The term weight analysis and NER features can affect the analysis results of terms that are used to retrieve documents. For example, the search query "Nike sports shoes" is divided into the following terms: Nike, sports, and shoes.

  • If the AND logical operator is used, the search query is rewritten as the following string:

    (default:'Nike' AND default:'sports' AND default:'shoes')

  • If the OR logical operator is used, the search query is rewritten as the following string:

    (default:'Nike' OR default:'sports' OR default:'shoes')

Enable the search query rewriting feature When you create or modify a query analysis rule, select AND or OR from the Search Query Rewriting drop-down list.

1111

Note

If a search query is complex, it can be rewritten twice based on the following rules:

If data can be retrieved based on the search query rewritten for the first time, the search query is not rewritten for the second time.

If no data is retrieved based on the search query rewritten for the first time, the search query is rewritten for the second time to retrieve more data.

Re-search policy

After you configure query analysis, if no data is retrieved based on a search query, the search query is rewritten to trigger a re-search. When you initiate a search request, you can use the disable parameter to specify whether to enable the re-search feature and use the re_search parameter to configure a re-search policy. For more information, see Initiate search requests.

Example:

disable=re_search # The re-search feature is disabled.

re_search=strategy:threshold,params:total_hits#6    # If the total number of hits is less than six, a re-search is performed. 

After you configure the re-search feature, you can add fetch=qp:profile to your search request to determine whether the results are returned after a re-search.

For example, if the value of the re_search_times parameter is 0 in the qp parameter, no re-search is triggered. Sample code:

"qp": [
  {
   "app_name": "130180448",
   "query_correction_info": [
    {
     "index": "index",
     "original_query": "Barbie Brown Powder",
     "corrected_query": "Barbie Brown Powder",
     "correction_level": 1,
     "processor_name": "spell_check"
    }
   ],
   "re_search_times": 0
  }
 ],

The following sample code provides an example on the request string after you configure the re_search and fetch parameters:

query=query=index:'search test'&&config=start:0,hit:10,format:fulljson&fetch_fields=title;subtitle&fetch=qp:profile&re_search=re_search=strategy:threshold,params:total_hits#6

Query analysis based on multiple rules

In a search request, you can specify multiple query analysis rules by setting the qp parameter in the format of qp=qpName1,qpName2. For example, you use two indexes: index_1 and index_2, and two analyzers: qp_1 and qp_2, to initiate a search request. Take note of the following limits:

  • Case 1: The index_1 index is associated with the qp_1 analyzer. The index_2 index is associated with the qp_2 analyzer.

# Query clauses
index_1:'xxx' AND index_2:'xxx'  & qp=qp_1,qp_2   # The query clause is valid.
index_1:'xxx'  & qp=qp_1,qp_2   # The query clause is valid.
index_2:'xxx'  & qp=qp_1,qp_2   # The query clause is valid.
  • Case 2: The index_1 index is associated with the qp_1 analyzer. The index_1 index is also associated with the qp_2 analyzer.

# Query clauses
index_1:'xxx'  & qp=qp_1,qp_2    # An error occurs. The error message "6601:Rewrite index used in multi qp chains" is returned.
index_1:'xxx'  & qp=qp_1            # The query clause is valid.
index_2:'xxx'  & qp=qp_2            # An error occurs. The error message "6606:No index need to process by QP" is returned.

Features of query analysis supported in different industries

Feature

General-purpose

E-commerce

Enhanced for E-commerce

IT

Education

Intervention dictionary

Stop word filtering

Spelling correction

Term weight analysis

Synonym configuration

Category prediction

NER

×

×

×

Word embedding

×

×

×

×

×

Note: A check sign (✓) indicates that the feature is supported, and a cross sign (×) indicates that the feature is not supported.