All Products
Search
Document Center

OpenSearch:aggs clause

Last Updated:Feb 28, 2024

Overview

Tens of thousands of documents may be retrieved based on a single search query. However, you may not want to view all the retrieved documents to obtain the required information. If you want to view only some statistics of the retrieved documents, you can use an aggs clause to obtain the statistics.

Syntax

{
  "aggs" : [
    {
      "group_key": "field",
      "agg_fun" : ["func1", "func2"],
      "agg_filter" : "filter_expression",
      "agg_range" : [number1, number2],
      "max_group" : number,
      "order_by":"count"
    } 
  ]
}
  • group_key: required. The field based on which you want to obtain statistics. The field that you specify must be an attribute field of the INTEGER or STRING type.

  • agg_fun: required. The following built-in functions are supported: count(), sum(id), max(id), min(id), and distinct_count(id). You can use count() to calculate the number of documents, use sum(id) to obtain the sum of the values of the id field, use max(id) to obtain the maximum value of the id field, use min(id) to obtain the minimum value of the id field, and use distinct_count(id) to calculate the number of distinct values of the id field. You can specify multiple functions in an aggs clause to obtain statistics.

  • agg_filter: optional. The conditions based on which documents are filtered. You can specify logical expressions as filter conditions. For more information, see the filter clause.

  • agg_range: optional. The range in which the system queries data. If you want to obtain information about data distribution, you can specify this parameter. You can specify only one range in an aggs clause. For example, you can specify a range of values between Number 1 and Number 2, or a range of values greater than Number 2. You cannot specify a value of the STRING type as the value of the range parameter.

  • max_group: optional. The maximum number of groups to return. Default value: 1000.

  • order_by: optional. The order based on which the statistical results are sorted. Set the value to count. If you do not specify this parameter, the statistical results are sorted based on the lexicographic order of the values of the field specified by the group_key parameter by default.

Examples:

  • Simple statistics

    {
      "aggs" : [
        {
          "group_key": "group_id",
          "agg_fun" : ["sum(price)"]
    		} 
      ]
    }
    
    Sample statistical results:
    {
      result: {
        facet: [
          {
            key: "group_id",
            items: [
              {
                value: 43,
                sum: 81
              },
              {
                value: 63,
                sum: 91
              }
            ]
          }
        ]
      }
    },

  • Statistics from multiple dimensions

    {
      "aggs" : [
        {
          "group_key": "company_id",
          "agg_fun" : ["sum(id)", "max(id)", "min(id)"]
    		} 
      ]
    }

  • Statistics based on multiple fields

    {
      "aggs" : [
        {
          "group_key": "group_id",
          "agg_fun" : ["sum(price)"]
    		},
        {
          "group_key": "company_id",
          "agg_fun" : ["count()"]
    		} 
      ]
    }

  • Statistics based on conditions

    # Query the documents whose values of the price field are greater than 100.
    {
      "aggs" : [
        {
          "group_key": "group_id",
          "agg_fun" : ["sum(price)"],
          "agg_filter" : "price > 100"
    		} 
      ]
    }

  • Semi-exact statistics

    {
      "aggs" : [
        {
          "group_key": "company_id",
          "agg_fun" : ["distinct_count(brand)"]
    		}
      ]
    }
    If you include the distinct_count function in an aggs clause, the semi-exact statistics feature is enabled. The semi-exact statistics feature uses the HyperLogLog (HLL) algorithm to obtain statistics. In most cases, the exact ratio of results that are obtained by using the semi-exact statistics feature is higher than 99%.

    Usage notes

    • The fields that you specify in an aggs clause must be the attribute fields that you specify in the schema.json file.

    • The results of an aggs clause are returned to the facet node on the Searcher worker. The results include the agg_fun parameter that indicates the functions that you specified in the aggs clause, such as sum() and count().

    • The results of an aggs clause are returned to the facet node on the Searcher worker. To obtain the data on the facet node, specify fulljson as the value of the format parameter in a config clause.

    • The system can return accurate statistics on up to 100,000 documents. If the number of documents that match the specified conditions in a partition exceeds 100,000, the statistics that are returned may be inaccurate due to the limits on engine performance. You can modify the limit on the maximum number of documents in the cluster configurations.

    • If the max_group parameter is set to a value greater than 10000, a large amount of memory resources may be consumed on the Query Result Searcher (QRS) worker, and an out of memory (OOM) error may occur.