All Products
Search
Document Center

OpenSearch:layer clause

Last Updated:Feb 28, 2024

Overview

You can include a layer clause in a query statement to perform a hierarchical query. The hierarchical query feature is an extension of the query feature. You can use the layer clause to manage the data retrieval process in a more efficient manner. You can specify the layer clause to accelerate the data retrieval process based on the scenarios in which your application is used and improve the overall performance of your application system. The layer clause provides the following extended features:

  • You can specify the ranges from which you want to obtain data.

  • You can specify the priorities of the specified ranges.

  • You can use different query clauses to obtain data from different ranges.

Syntax

Terms

Term

Description

seek

The operation that searches for a document during a query process.

docid

The ID of a document. OpenSearch Retrieval Engine Edition generates an ID for each document. When OpenSearch Retrieval Engine Edition performs a query, it scans documents by the document IDs in ascending order.

range

A range of document IDs. When you write a query statement, you can specify a range of documents that you want OpenSearch Retrieval Engine Edition to scan.

layer

A layer can contain one or more ranges. The layer that contains a range determines the query priority of the range. You can specify a layer in the following formats: [query] layer, layer [query] range, and range docid seek name.

Syntax

{
  "layer" : [
  ]
}

Specify ranges

You can use a layer clause to specify the ranges from which you want to obtain data.

A layer clause can contain multiple single layers. A single layer contains two main elements:

  • quota: the maximum number of documents that can be retrieved from the current layer. Take note of the following items:

    • Relationship between quota and rank_size: The total quota of all layers must be smaller than or equal to the value of the rank_size parameter. For example, If you include rank_size=10,quota:5;quota:7 in a layer clause, OpenSearch Retrieval Engine Edition retrieves only five documents from the second layer.

    • If the number of documents that are retrieved from a layer does not reach the quota that is specified for the layer, OpenSearch Retrieval Engine Edition automatically adds the remaining quota to the quota of the next layer.

    • Documents that are retrieved from subsequent ranges may better match the search query than documents that are retrieved from preceding ranges in a layer. OpenSearch Retrieval Engine Edition supports the following quota check methods: In the first method, OpenSearch Retrieval Engine Edition checks the remaining quota of the current layer each time it retrieves a document. In the second method, OpenSearch Retrieval Engine Edition does not check the remaining quota of the current layer when it retrieves a document. After OpenSearch Retrieval Engine Edition scans all documents in the layer, it checks the remaining quota of the layer and adds the remaining quota to the quota of the next layer. In both methods, the number of retrieved documents cannot exceed the value of the rank_size parameter.

    • When you specify a layer clause, the default value of the quota for each layer is 0, and the maximum value of the quota is the maximum value of the uint32_t parameter.

  • range: the document ranges based on which OpenSearch Retrieval Engine Edition scans for the required documents in the current layer. If you do not specify a range, all documents are scanned. The default value of a range is [0,docCount). OpenSearch Retrieval Engine Edition uses the attribute fields that you specify to determine the ranges based on which OpenSearch Retrieval Engine Edition needs to scan for the required documents. Take note of the following items:

    • You must use attribute fields. Do not use calculation expressions to specify ranges.

    • The attribute fields that you use to define a range must be sorted based on the same sorting method that OpenSearch Retrieval Engine Edition uses to sort the documents that you query. If the attribute fields are not sorted based on the same method, the specified range is invalid and OpenSearch Retrieval Engine Edition scans the full range of documents.

    • You must specify attribute fields in a continuous manner. You cannot specify extension keywords such as %sorted and %docid between attribute fields.

    • OpenSearch Retrieval Engine Edition also supports the following extension keywords: %sorted, %unsorted, %other, %docid, %segmentid, and %percent. %sorted specifies the sorted full data and incremental data in the current layer. %unsorted specifies the unsorted data, including the unsorted data in the current layer and real-time data. %other specifies the range of documents that are not in the specified layers. %docid specifies the range of documents that you want to scan. %segmentid specifies the range of segments that you want to scan. %percent specifies the percentage of documents that you want to scan in a range.

    • If you do not specify the %sorted keyword or %unsorted keyword in a layer clause, OpenSearch Retrieval Engine Edition automatically includes these keywords in a layer clause. In default mode, OpenSearch Retrieval Engine Edition sorts documents in each layer. If the number of documents that are retrieved does not meet the quota that you specify for the current layer, OpenSearch Retrieval Engine Edition scans the real-time data of the next layer.

Specify different query clauses to perform queries in different ranges

You can use different search queries or posting lists to query data in each layer. Separate clauses with semicolons (;). If the number of layers is greater than the number of clauses, OpenSearch Retrieval Engine Edition automatically includes the remaining layers in the last clause.

Examples

This section describes how to combine the hierarchical query feature and other query features provided by OpenSearch Retrieval Engine Edition in specific scenarios.

Sort documents during index building and then retrieve data from the specified range

If the offline sorting feature is enabled, OpenSearch Retrieval Engine Edition sorts documents based on the sorting field that you specify. For example, if you specify the site_id field as the sorting field, the documents on the same site are sorted in a continuous range in the index list. When you query a document on a site, you can specify the range to which the documents on the site belong to accelerate the scanning process. If you want to use iphone as a search query to query data from Site 1 and Site 7, you can use the following clause:

{
  "layer" : [
    {
      "range" : { 
        "fields" : [
          {
            "field" : "site_id",
            "values" : [1,7]
          }
        ]
      },
      "quota" : 5000
    }
  ]
}

If the number of results that are retrieved from Site 1 and Site 7 does not reach the specified quota and you want to query the iphone keyword from Site 5 and Site 10, you can use the following clause:

{
  "layer" : [
    {
      "range" : { 
        "fields" : [
          {
            "field" : "site_id",
            "values" : [1,7]
          }
        ]
      },
      "quota" : 5000
    },
    {
      "range" : { 
        "fields" : [
          {
            "field" : "site_id",
            "values" : [5,10]
          }
        ]
      },
      "quota" : 0
    }
  ]
}

The second layer is scanned only when the number of results that are retrieved from the first layer does not reach the specified quota. You can set the quota parameter to 0 or do not specify the quota parameter when you configure the second layer. You can also specify the quota parameter for the first layer and the second layer based on your business requirements. The following sample code shows how to specify the required parameters:

{
  "layer" : [
    {
      "range" : { 
        "fields" : [
          {
            "field" : "site_id",
            "values" : [1,7]
          }
        ]
      },
      "quota" : 4000
    },
    {
      "range" : { 
        "fields" : [
          {
            "field" : "site_id",
            "values" : [5,10]
          }
        ]
      },
      "quota" : 1000
    }
  ]
}

If documents are sorted based on multiple dimensions, you can specify ranges based on multiple dimensions. For example, specific documents are sorted based on the values of the site_id field and documents on the same site are sorted based on the static stability scores of web pages. If you want to retrieve the web pages whose static stability scores are greater than 100, you can use the following clause:

{
  "layer" : [
    {
      "range" : { 
        "fields" : [
          {
            "field" : "site_id",
            "values" : [1,7]
          },
          {
            "field" : "static_score",
            "values" : "[100,]"
          }
        ]
      },
      "quota" : 4000
    }
  ]
}

Note: Enclose the range of static stability scores in double quotation marks (" ").

Query data based on multiple query modes

In specific scenarios, you need sufficient results when your expected number of results is not excessively large. If the number of results is excessively large, the query performance may be compromised. For example, you want to query data based on multiple keywords. In the following table, A and B specify two search queries.

Query mode

Number of retrieved results

Performance

A AND B

Small number of results

High

A OR B

Large number of results

Low

A RANK B or B RANK A

Medium number of results

Medium

Based on the information that is described in the preceding table, different query modes can be used to obtain different numbers of results and provide different query performance levels. In most cases, you may want to obtain sufficient results without compromising query performance. In this case, you cannot use a fixed query mode to obtain a sufficient number of results and ensure query performance at the same time. To resolve this issue, you can specify multiple query modes in one query statement.

{
  "query": "A OR B;A RANK B;A AND B",
  "layer" : [
    {
      "quota" : 1000
    },
    {
      "quota" : 1000
    },
    {
      "quota" : 1000
    }
  ]
}

This method can help you obtain a sufficient number of results and ensure that query performance is efficient. This method can also improve the degree of matching between the results and the search queries. For large-scale queries, the A AND B query mode is used to retrieve results that match A and B. For small-scale queries, the A OR B query mode is used to retrieve sufficient results.

Query data with better timeliness

In specific scenarios, you need to obtain data with better timeliness. In this case, OpenSearch Retrieval Engine Edition queries the real-time unsorted data first, and then queries the sorted full data and incremental data. You can also use the %percent keyword to instruct the search engine to query the sorted documents that are assigned low ranks before the search engine scans the sorted documents that are assigned high ranks.

For example, you can use iphone as a search query to query real-time data and you can instruct OpenSearch Retrieval Engine Edition to query the last 50% of sorted documents before the engine queries the first 50% of sorted documents if insufficient results are returned. You can use the following clause:

{
  "layer" : [
    {
      "range" : { 
        "index_type" : "%unsorted"
      },
      "quota" : 5000
    },
    {
      "range" : { 
        "index_type" : "%sorted",
        "fields" : [
          {
            "field" : "service_id",
            "values" : [1,3]
          }
        ],
        "percent" : "[50,100)"
      },
      "quota" : 0
    },
    {
      "range" : { 
        "index_type" : "%sorted",
        "fields" : [
          {
            "field" : "service_id",
            "values" : [1,3]
          }
        ],
        "percent" : "[0,50)"
      },
      "quota" : 0
    }
  ]
}

Note: You can use the %percent keyword to specify multiple ranges in the [Value 1,Value 2) format.

Usage notes

  • A layer clause is optional.