All Products
Search
Document Center

OpenSearch:cache clause

Last Updated:Feb 28, 2024

Overview

You can use the searcher cache feature to cache final query results after the system performs fine sorting or disperses the documents that are obtained. After you enable the searcher cache feature, the number of rough sorting and fine sorting operations can be reduced and the query performance of each Searcher worker can be improved because the system does not need to repeatedly perform rough sorting and fine sorting on the same data. You can create cache policies based on your business requirements. For example, you can specify whether to cache the returned results of a query and specify a cache duration for cached items. You can enable or disable these policies when you configure settings for Searcher workers or write query statements. OpenSearch Retrieval Engine Edition provides plug-ins that you can use to create cache policies based on your business requirements.

Syntax

{
  "cache" : {
  }
}

Parameters:

  • enabled: specifies whether to enable the searcher cache feature. Valid values: true and false.

  • cache_key: the cache key of the request. You can generate a cache key and specify the cache key in the query statement. The cache key can affect the cache hit ratio. You can specify a cache key based on your usage scenarios and your business requirements to ensure a high cache hit ratio. If you do not specify a cache key, the system calculates a hash value based on the query statement and uses the hash value as the cache key.

  • expire_time: the cache duration for query result items that are cached. This parameter is used to ensure the timeliness of cached data. Unit: seconds. You can specify a cache duration based on your business requirements. If your application does not have high requirements on data timeliness, or no incremental updates are performed on the data that is queried, you can ignore the expire_time parameter. By default, cached items do not expire, and the system deletes cached items based on the least recently used (LRU) algorithm. You can specify an expression that is supported by OpenSearch Retrieval Engine Edition as the value of the expire_time parameter. For example, you can set the parameter to an attribute expression, a function expression, a virtual attribute expression, or an arithmetic expression that contains attribute expressions, function expressions, and virtual attribute expressions. The returned value of the specified expression must be of the UINT32 type. If the returned value is not of the UINT32 type, the system reports an error. Before the Searcher Cache component caches the results of a query, the Searcher Cache component calculates a cache duration for each document that is queried based on the expression that you specify in the expire_time parameter and uses the minimum value that is calculated as the cache duration of all cached items.

  • current_time: If you specify the current_time parameter in the cache clause, the Searcher Cache component compares the values of the expire_time and current_time parameters to determine whether a cached item expires. If you do not specify the current_time parameter in the cache clause, the Searcher Cache component compares the value of the expire_time parameter with the current system time to determine whether a cached item expires.

  • cache_filter: the conditions used to filter the cached items. The filtering syntax is the same as the standard filtering syntax.

  • cache_doc_num_limit: the maximum number of queried documents that can be cached. To ensure a high cache hit ratio, specify this parameter in each query statement. The default value is [200, 3000]. You can specify a value that includes multiple hierarchies such as [100, 200, 300]. For example, if you set the cache_doc_num_limit parameter to [280, 780], the system caches data based on the following cache policies: When the value of the required_topK parameter is smaller than or equal to 280, the system caches 280 documents. When the value of the required_topK parameter is greater than 280 and smaller than or equal to 780, the system caches 780 documents. When the value of the required_topK parameter is greater than 780, the system caches a number of documents based on the value of the required_topK parameter.

  • refresh_attributes: the attribute fields whose values the system needs to update to the cache when the cache is hit. You can specify only the attribute fields that are included in the schema. Attribute fields that are specified in a virtual_attribute clause are not supported. If the cache duration is long and the values of specific attribute fields are updated, the cached attribute field values may be outdated. You can specify the refresh_attributes parameter to ensure that the system caches the most recent values of attribute fields. This helps ensure high timeliness of cached attribute field values.

Example:

{
  "cache" : {
    "enabled" : true,
    "cache_key" : 1234567890,
    "expire_time" : "now()+300",
    "current_time" : 1235,
    "cache_filter" : "a > 10",
    "cache_doc_num_limit" : [200, 3000],
    "refresh_attributes": ["price", "sell_count"]
  }
}

Usage notes

  • A cache clause is optional.

  • The number of documents that the system can cache is based on the value of the required_topK parameter of the Searcher worker. The value of the required_topK parameter varies based on the value that is calculated based on the following formula: start + hit. If your application uses multiple Searcher workers, the Query Result Searcher (QRS) worker limits the number of documents that each Searcher worker can return based on the value of the searcher_return_hits parameter. In this case, the value of the required_topK parameter is calculated based on the following expression: min(start + hit, searcher_return_hits).

  • The Searcher Cache component caches the results after fine sorting is performed. To ensure that sufficient results are cached, make sure that the system returns a specific number of results after fine sorting is performed. You can specify the rank_size parameter in the rerank scorer or specify the rerank_size parameter in a config clause.