This topic explains how to use Tablestore's aggregation feature for data analysis. The aggregation feature allows you to calculate minimum, maximum, sum, average, and count values, along with distinct counts and percentile statistics. It also enables grouping by field value, range, geographical location, and filter conditions, along with querying by histogram and date histogram. Additionally, you can query rows resulting from each group's aggregation operation and perform nested queries. Multiple aggregation features can be combined to fulfill complex query requirements.
Process
The complete aggregation execution process is illustrated below.
Aggregation occurs after server-side "query" completion. The server aggregates all documents hit during the "query" phase according to the query request, making aggregation requests more complex than non-aggregation ones.
Feature list
Aggregation supports functions akin to SQL, including MIN()
, MAX()
, SUM()
, AVG()
, COUNT()
, COUNT(DISTINCT)
, ANY_VALUE()
, and GROUP BY
. It also supports percentile statistics, field value grouping, multi-field grouping, range grouping, geographical location grouping, filter condition grouping, histogram queries, date histogram queries, and nested queries. For a detailed feature list, refer to the table below.
Feature | Description |
Minimum value | Returns the minimum value of a field, similar to SQL's MIN function. |
Maximum value | Returns the maximum value of a field, similar to SQL's MAX function. |
Sum | Returns the sum of a numeric field, similar to SQL's SUM function. |
Average value | Returns the average of all values for a numeric field, similar to SQL's AVG function. |
Count | Returns the total number of values for a specific field or the total number of rows in a search index, similar to SQL's count. |
Distinct count | Returns the number of distinct values for a specific field, similar to SQL's count(distinct). |
Percentile statistics | Percentile statistics are used to analyze the distribution of a dataset across various percentiles, such as p25, p50, p90, and p99, which is particularly useful during routine system operations and maintenance to assess the response time distribution of each request. |
Group by field value | Groups query results based on the values of a field. Identical field values are grouped together. The value of each group and the number of members in each group are returned. Note If the group contains an excessively large number of values, the calculated number may be different from the actual number. |
Group by multiple fields | Groups query results based on multiple fields. You can use tokens to perform paging. |
Group by range | Groups query results based on the value ranges of a field, with field values within a specified range being grouped together and the number of values in each range being returned. |
Group by geographical location | Groups query results based on distances from geographical locations to a central point, with query results within a specified distance range being grouped together and the number of items in each range being returned. |
Group by filter | Filters query results and groups them based on each filter to obtain the number of matching results, with results being returned in the order the filters are specified. |
Query by histogram | Groups query results based on specific data intervals, with field values within the same range being grouped together and the value range of each group along with the number of values in each group being returned. |
Query by date histogram | Groups query results based on specific date intervals. Field values that are within the same range are grouped together. The value range of each group and the number of values in each group are returned. |
Query the rows that are obtained from the results of an aggregation operation in each group | After grouping query results, you can query the rows in each group, similar to using ANY_VALUE(field) in MySQL. |
Nesting | GroupBy supports nesting, allowing for sub-aggregation operations within a group. |
Multiple aggregations | You can perform multiple aggregation operations. Note If you perform multiple complex aggregation operations at the same time, a long period of time may be required. |
Related API operations
The API operation for aggregation is Search.
Prerequisites
You can use the Tablestore console, CLI, or SDKs for aggregation operations. Ensure the following preparations are complete before proceeding:
If you use the Tablestore console or CLI, the available aggregation features may vary.
You have an Alibaba Cloud account or a RAM user that has permissions to perform operations on Tablestore. For information about how to grant Tablestore operation permissions to a RAM user, see Use a RAM policy to grant permissions to a RAM user.
If you want to use Tablestore SDKs or the Tablestore CLI to perform a query, an AccessKey pair is created for your Alibaba Cloud account or RAM user. For more information, see Create an AccessKey pair.
A data table is created. For more information, see Operations on tables.
A search index is created for the data table. For more information, see Create a search index.
If you want to use Tablestore SDKs to perform a query, an OTSClient instance is initialized. For more information, see Initialize a Tablestore client.
If you want to use the Tablestore CLI to perform a query, the Tablestore CLI is downloaded and started, and information about the instance that you want to access and the data table are configured. For more information, see Download the Tablestore CLI, Start the Tablestore CLI and configure access information, and Operations on data tables.
Use the console
Go to the Indexes tab.
Log on to the Tablestore console.
In the top navigation bar, select a resource group and a region.
On the Overview page, click the name of the instance that you want to manage or click Manage Instance in the Actions column of the instance.
On the Tables tab of the Instance Details tab, click the name of the data table or click Indexes in the Actions column of the data table.
On the Indexes tab, find the search index that you want to use to query data and click Manage Data in the Actions column.
-
In the Query Data dialog box, you can query data.
By default, the system returns all attribute columns. To return specific attribute columns, turn off All Columns and specify the attribute columns that you want to return. Separate multiple attribute columns with commas (,).
NoteBy default, the system returns all primary key columns of the data table.
Select the And, Or, or Not logical operator based on your business requirements.
If you select the And logical operator, data that meets the query conditions is returned. If you select the Or operator and specify a single query condition, data that meets the query condition is returned. If you select the Or logical operator and specify multiple query conditions, data that meets one of the query conditions is returned. If you select the Not logical operator, data that does not meet the query conditions is returned.
-
Select an index field, then click Add , and specify the query type and value for that field.
Repeat this step to add multiple index field query conditions.
By default, the sorting feature is disabled. If you want to sort the query results based on specific fields, turn on Sort and specify the fields based on which you want to sort the query results and the sorting order.
-
By default, the statistics feature is turned off. To collect statistics for a particular field, enable the Statistics option and define the field, type of statistics, name, and default value.
You can add multiple fields for statistics at once. For Statistics Type, you can choose Minimum, Maximum, Sum, Average, Count, or Distinct count. The Default parameter applies when the index field is absent in any row.
-
Click OK .
The data and statistics that match the query conditions appear on the Index Management tab.
Use the CLI
When using the CLI to run the search
command for data queries, configure Aggregations for aggregation. Supported types include min, max, sum, avg, and count. For more information, see Search index.
-
Use the
search
command to query and analyze data in the search_index and return all indexed columns.search -n search_index --return_all_indexed
-
Enter the query conditions as prompted. Here is a sample code for reference:
Enter the query conditions as prompted. For example, to query rows where the gid column value is less than 10 or exactly 77, and to average the gid column values, use the following sample code:
{ "Offset": -1, "Limit": 10, "Collapse": null, "Sort": null, "GetTotalCount": true, "Token": null, "Query": { "Name": "BoolQuery", "Query": { "MinimumShouldMatch": null, "MustQueries": null, "MustNotQueries": null, "FilterQueries": null, "ShouldQueries": [{ "Name": "RangeQuery", "Query": { "FieldName": "gid", "From": null, "To": 10, "IncludeLower": false, "IncludeUpper": false } }, { "Name": "TermQuery", "Query": { "FieldName": "gid", "Term": 77 } }] } }, "Aggregations": [{ "Name": "avg", "Aggregation": { "AggName": "agg1", "Field": "gid", "MissingValue": null } }] }
Use the SDKs
You can use the Java SDK, Go SDK, Python SDK, Node.js SDK, .NET SDK, and PHP SDK to perform aggregation operations. This section describes how to use the Java SDK to perform aggregation operations.
The field types supported by the aggregation feature are the field types supported by search indexes. For more information about the field types of search indexes and the mapping between the field types of search indexes and the field types of data tables, see Data types.
Minimum value
Returns the minimum value of a field, similar to SQL's MIN function.
Parameters
Parameter
Description
aggregationName
The unique name of the aggregation operation. You can query the results of a specific aggregation operation based on this name.
fieldName
The name of the field that is used to perform the aggregation operation. Only the Long, Double, and Date types are supported.
missing
The default value for the field on which the aggregation operation is performed, which is applied to rows in which the field value is empty.
If you do not specify a value for the missing parameter, the row is ignored.
If you specify a value for the missing parameter, the value of this parameter is used as the field value of the row.
Examples
/** * The price of each product is listed in the product table. Query the minimum price of the products that are produced in Zhejiang. * SQL statement: SELECT min(column_price) FROM product where place_of_production="Zhejiang"; */ public void min(SyncClient client) { // Use builder to create a query statement. { SearchRequest searchRequest = SearchRequest.newBuilder() .tableName("<TABLE_NAME>") .indexName("<SEARCH_INDEX_NAME>") .searchQuery( SearchQuery.newBuilder() .query(QueryBuilders.term("place_of_production", "Zhejiang")) .limit(0) // If you want to obtain only the aggregation results instead of specific data, you can set limit to 0 to improve query performance. .addAggregation(AggregationBuilders.min("min_agg_1", "column_price").missing(100)) .build()) .build(); // Execute the query statement. SearchResponse resp = client.search(searchRequest); // Obtain the aggregation results. System.out.println(resp.getAggregationResults().getAsMinAggregationResult("min_agg_1").getValue()); } // Create a query statement without using builder. { SearchRequest searchRequest = new SearchRequest(); searchRequest.setTableName("<TABLE_NAME>"); searchRequest.setIndexName("<SEARCH_INDEX_NAME>"); SearchQuery searchQuery = new SearchQuery(); TermQuery query = new TermQuery(); query.setTerm(ColumnValue.fromString("Zhejiang")); query.setFieldName("place_of_production"); // In the following comment, builder is used to create a query statement. The method that uses builder to create a query statement has the same effect as the method that uses TermQuery to create a query statement. // Query query2 = QueryBuilders.term("place_of_production", "Zhejiang").build(); searchQuery.setQuery(query); searchQuery.setLimit(0); MinAggregation aggregation = new MinAggregation(); aggregation.setAggName("min_agg_1"); aggregation.setFieldName("column_price"); aggregation.setMissing(ColumnValue.fromLong(100)); // In the following comment, builder is used to create a query statement. The method that uses builder to create a query statement has the same effect as the method that uses aggregation to create a query statement. // MinAggregation aggregation2 = AggregationBuilders.min("min_agg_1", "column_price").missing(100).build(); List<Aggregation> aggregationList = new ArrayList<Aggregation>(); aggregationList.add(aggregation); searchQuery.setAggregationList(aggregationList); // Execute the query statement. SearchResponse resp = client.search(searchRequest); // Obtain the aggregation results. System.out.println(resp.getAggregationResults().getAsMinAggregationResult("min_agg_1").getValue()); } }
Maximum value
Returns the maximum value of a field, similar to SQL's MAX function.
Parameters
Parameter
Description
aggregationName
The unique name of the aggregation operation. You can query the results of a specific aggregation operation based on this name.
fieldName
The name of the field that is used to perform the aggregation operation. Only the Long, Double, and Date types are supported.
missing
The default value for the field on which the aggregation operation is performed, which is applied to rows in which the field value is empty.
If you do not specify a value for the missing parameter, the row is ignored.
If you specify a value for the missing parameter, the value of this parameter is used as the field value of the row.
Examples
/** * The price of each product is listed in the product table. Query the maximum price of the products that are produced in Zhejiang. * SQL statement: SELECT max(column_price) FROM product where place_of_production = "Zhejiang". */ public void max(SyncClient client) { // Use builder to create a query statement. { SearchRequest searchRequest = SearchRequest.newBuilder() .tableName("<TABLE_NAME>") .indexName("<SEARCH_INDEX_NAME>") .searchQuery( SearchQuery.newBuilder() .query(QueryBuilders.term("place_of_production", "Zhejiang")) .limit(0) // If you want to obtain only the aggregation results instead of specific data, you can set limit to 0 to improve query performance. .addAggregation(AggregationBuilders.max("max_agg_1", "column_price").missing(0)) .build()) .build(); // Execute the query statement. SearchResponse resp = client.search(searchRequest); // Obtain the aggregation results. System.out.println(resp.getAggregationResults().getAsMaxAggregationResult("max_agg_1").getValue()); } // Create a query statement without using builder. { SearchRequest searchRequest = new SearchRequest(); searchRequest.setTableName("<TABLE_NAME>"); searchRequest.setIndexName("<SEARCH_INDEX_NAME>"); SearchQuery searchQuery = new SearchQuery(); TermQuery query = new TermQuery(); query.setTerm(ColumnValue.fromString("Zhejiang")); query.setFieldName("place_of_production"); // In the following comment, builder is used to create a query statement. The method that uses builder to create a query statement has the same effect as the method that uses TermQuery to create a query statement. // Query query2 = QueryBuilders.term("place_of_production", "Zhejiang").build(); searchQuery.setQuery(query); searchQuery.setLimit(0); MaxAggregation aggregation = new MaxAggregation(); aggregation.setAggName("max_agg_1"); aggregation.setFieldName("column_price"); aggregation.setMissing(ColumnValue.fromLong(100)); // In the following comment, builder is used to create a query statement. The method that uses builder to create a query statement has the same effect as the method that uses aggregation to create a query statement. // MaxAggregation aggregation2 = AggregationBuilders.max("max_agg_1", "column_price").missing(100).build(); List<Aggregation> aggregationList = new ArrayList<Aggregation>(); aggregationList.add(aggregation); searchQuery.setAggregationList(aggregationList); // Execute the query statement. SearchResponse resp = client.search(searchRequest); // Obtain the aggregation results. System.out.println(resp.getAggregationResults().getAsMaxAggregationResult("max_agg_1").getValue()); } }
Sum
Returns the sum of a numeric field, similar to SQL's SUM function.
Parameters
Parameter
Description
aggregationName
The unique name of the aggregation operation. You can query the results of a specific aggregation operation based on this name.
fieldName
The name of the field that is used to perform the aggregation operation. Only the Long and Double types are supported.
missing
The default value for the field on which the aggregation operation is performed, which is applied to rows in which the field value is empty.
If you do not specify a value for the missing parameter, the row is ignored.
If you specify a value for the missing parameter, the value of this parameter is used as the field value of the row.
Examples
/** * The price of each product is listed in the product table. Query the sum of the price of the products that are produced in Zhejiang. * SQL statement: SELECT sum(column_price) FROM product where place_of_production = "Zhejiang". */ public void sum(SyncClient client) { // Use builder to create a query statement. { SearchRequest searchRequest = SearchRequest.newBuilder() .tableName("<TABLE_NAME>") .indexName("<SEARCH_INDEX_NAME>") .searchQuery( SearchQuery.newBuilder() .query(QueryBuilders.term("place_of_production", "Zhejiang")) .limit(0) // If you want to obtain only the aggregation results instead of specific data, you can set limit to 0 to improve query performance. .addAggregation(AggregationBuilders.sum("sum_agg_1", "column_number").missing(10)) .build()) .build(); // Execute the query statement. SearchResponse resp = client.search(searchRequest); // Obtain the aggregation results. System.out.println(resp.getAggregationResults().getAsSumAggregationResult("sum_agg_1").getValue()); } // Create a query statement without using builder. { SearchRequest searchRequest = new SearchRequest(); searchRequest.setTableName("<TABLE_NAME>"); searchRequest.setIndexName("<SEARCH_INDEX_NAME>"); SearchQuery searchQuery = new SearchQuery(); TermQuery query = new TermQuery(); query.setTerm(ColumnValue.fromString("Zhejiang")); query.setFieldName("place_of_production"); // In the following comment, builder is used to create a query statement. The method that uses builder to create a query statement has the same effect as the method that uses TermQuery to create a query statement. // Query query2 = QueryBuilders.term("place_of_production", "Zhejiang").build(); searchQuery.setQuery(query); searchQuery.setLimit(0); SumAggregation aggregation = new SumAggregation(); aggregation.setAggName("sum_agg_1"); aggregation.setFieldName("column_number"); aggregation.setMissing(ColumnValue.fromLong(100)); // In the following comment, builder is used to create a query statement. The method that uses builder to create a query statement has the same effect as the method that uses aggregation to create a query statement. // SumAggregation aggregation2 = AggregationBuilders.sum("sum_agg_1", "column_number").missing(10).build(); List<Aggregation> aggregationList = new ArrayList<Aggregation>(); aggregationList.add(aggregation); searchQuery.setAggregationList(aggregationList); // Execute the query statement. SearchResponse resp = client.search(searchRequest); // Obtain the aggregation results. System.out.println(resp.getAggregationResults().getAsSumAggregationResult("sum_agg_1").getValue()); } }
Average value
Returns the average of all values for a numeric field, similar to SQL's AVG function.
Parameters
Parameter
Description
aggregationName
The unique name of the aggregation operation. You can query the results of a specific aggregation operation based on this name.
fieldName
The name of the field that is used to perform the aggregation operation. Only the Long, Double, and Date types are supported.
missing
The default value for the field on which the aggregation operation is performed, which is applied to rows in which the field value is empty.
If you do not specify a value for the missing parameter, the row is ignored.
If you specify a value for the missing parameter, the value of this parameter is used as the field value of the row.
Examples
/** * The sales volume of each product is listed in the product table. Query the average price of the products that are produced in Zhejiang. * SQL statement: SELECT avg(column_price) FROM product where place_of_production = "Zhejiang". */ public void avg(SyncClient client) { // Use builder to create a query statement. { SearchRequest searchRequest = SearchRequest.newBuilder() .tableName("<TABLE_NAME>") .indexName("<SEARCH_INDEX_NAME>") .searchQuery( SearchQuery.newBuilder() .query(QueryBuilders.term("place_of_production", "Zhejiang")) .limit(0) // If you want to obtain only the aggregation results instead of specific data, you can set limit to 0 to improve query performance. .addAggregation(AggregationBuilders.avg("avg_agg_1", "column_price")) .build()) .build(); // Execute the query statement. SearchResponse resp = client.search(searchRequest); // Obtain the aggregation results. System.out.println(resp.getAggregationResults().getAsAvgAggregationResult("avg_agg_1").getValue()); } // Create a query statement without using builder. { SearchRequest searchRequest = new SearchRequest(); searchRequest.setTableName("<TABLE_NAME>"); searchRequest.setIndexName("<SEARCH_INDEX_NAME>"); SearchQuery searchQuery = new SearchQuery(); TermQuery query = new TermQuery(); query.setTerm(ColumnValue.fromString("Zhejiang")); query.setFieldName("place_of_production"); // In the following comment, builder is used to create a query statement. The method that uses builder to create a query statement has the same effect as the method that uses TermQuery to create a query statement. // Query query2 = QueryBuilders.term("place_of_production", "Zhejiang").build(); searchQuery.setQuery(query); searchQuery.setLimit(0); AvgAggregation aggregation = new AvgAggregation(); aggregation.setAggName("avg_agg_1"); aggregation.setFieldName("column_price"); // In the following comment, builder is used to create a query statement. The method that uses builder to create a query statement has the same effect as the method that uses aggregation to create a query statement. // AvgAggregation aggregation2 = AggregationBuilders.avg("avg_agg_1", "column_price").build(); List<Aggregation> aggregationList = new ArrayList<Aggregation>(); aggregationList.add(aggregation); searchQuery.setAggregationList(aggregationList); // Execute the query statement. SearchResponse resp = client.search(searchRequest); // Obtain the aggregation results. System.out.println(resp.getAggregationResults().getAsAvgAggregationResult("avg_agg_1").getValue()); } }
Count
Returns the total number of values for a specific field or the total number of rows in a search index, similar to SQL's COUNT function.
You can use the following methods to query the total number of rows in a search index or the total number of rows that meet the query conditions:
Use the count feature of aggregation. Set the count parameter to * in the request.
Use the query feature to obtain the number of rows that meet the query conditions. Set the setGetTotalCount parameter to true in the query. Use MatchAllQuery to obtain the total number of rows in a search index.
You can use the name of a column as the value of the count expression to query the number of rows that contain the column in a search index. This method is suitable for scenarios that involve sparse columns.
Parameters
Parameter
Description
aggregationName
The unique name of the aggregation operation. You can query the results of a specific aggregation operation based on this name.
fieldName
The name of the field that is used to perform the aggregation operation. Only the following types are supported: Long, Double, Boolean, Keyword, Geo_point, and Date.
Examples
/** * Penalty records of merchants are recorded in the merchant table. You can query the number of merchants who are located in Zhejiang and for whom penalty records exist. If no penalty records exist for a merchant, the field that corresponds to penalty records also does not exist for the merchant. * SQL statement: SELECT count(column_history) FROM product where place_of_production = "Zhejiang". */ public void count(SyncClient client) { // Use builder to create a query statement. { SearchRequest searchRequest = SearchRequest.newBuilder() .tableName("<TABLE_NAME>") .indexName("<SEARCH_INDEX_NAME>") .searchQuery( SearchQuery.newBuilder() .query(QueryBuilders.term("place_of_production", "Zhejiang")) .limit(0) // If you want to obtain only the aggregation results instead of specific data, you can set limit to 0 to improve query performance. .addAggregation(AggregationBuilders.count("count_agg_1", "column_history")) .build()) .build(); // Execute the query statement. SearchResponse resp = client.search(searchRequest); // Obtain the aggregation results. System.out.println(resp.getAggregationResults().getAsCountAggregationResult("count_agg_1").getValue()); } // Create a query statement without using builder. { SearchRequest searchRequest = new SearchRequest(); searchRequest.setTableName("<TABLE_NAME>"); searchRequest.setIndexName("<SEARCH_INDEX_NAME>"); SearchQuery searchQuery = new SearchQuery(); TermQuery query = new TermQuery(); query.setTerm(ColumnValue.fromString("Zhejiang")); query.setFieldName("place_of_production"); // In the following comment, builder is used to create a query statement. The method that uses builder to create a query statement has the same effect as the method that uses TermQuery to create a query statement. // Query query2 = QueryBuilders.term("place_of_production", "Zhejiang").build(); searchQuery.setQuery(query); searchQuery.setLimit(0); CountAggregation aggregation = new CountAggregation(); aggregation.setAggName("count_agg_1"); aggregation.setFieldName("column_history"); // In the following comment, builder is used to create a query statement. The method that uses builder to create a query statement has the same effect as the method that uses aggregation to create a query statement. // CountAggregation aggregation2 = AggregationBuilders.count("count_agg_1", "column_history").build(); List<Aggregation> aggregationList = new ArrayList<Aggregation>(); aggregationList.add(aggregation); searchQuery.setAggregationList(aggregationList); // Execute the query statement. SearchResponse resp = client.search(searchRequest); // Obtain the aggregation results. System.out.println(resp.getAggregationResults().getAsCountAggregationResult("count_agg_1").getValue()); } }
Distinct count
Returns the count of unique values for a specific field, akin to the SQL count(distinct)
function.
The number of distinct values is an approximate number.
If the total number of rows before the distinct count feature is used is less than 10,000, the calculated result is close to the exact value.
If the total number of rows before the distinct count feature is used is greater than or equal to 100 million, the error rate is approximately 2%.
Parameters
Parameter
Description
aggregationName
The unique name of the aggregation operation. You can query the results of a specific aggregation operation based on this name.
fieldName
The name of the field that is used to perform the aggregation operation. Only the following types are supported: Long, Double, Boolean, Keyword, Geo_point, and Date.
missing
The default value for the field on which the aggregation operation is performed, which is applied to rows in which the field value is empty.
If you do not specify a value for the missing parameter, the row is ignored.
If you specify a value for the missing parameter, the value of this parameter is used as the field value of the row.
Examples
/** * Query the number of distinct provinces from which the products are produced. * SQL statement: SELECT count(distinct column_place) FROM product. */ public void distinctCount(SyncClient client) { // Use builder to create a query statement. { SearchRequest searchRequest = SearchRequest.newBuilder() .tableName("<TABLE_NAME>") .indexName("<SEARCH_INDEX_NAME>") .searchQuery( SearchQuery.newBuilder() .query(QueryBuilders.matchAll()) .limit(0) // If you want to obtain only the aggregation results instead of specific data, you can set limit to 0 to improve query performance. .addAggregation(AggregationBuilders.distinctCount("dis_count_agg_1", "column_place")) .build()) .build(); // Execute the query statement. SearchResponse resp = client.search(searchRequest); // Obtain the aggregation results. System.out.println(resp.getAggregationResults().getAsDistinctCountAggregationResult("dis_count_agg_1").getValue()); } // Create a query statement without using builder. { SearchRequest searchRequest = new SearchRequest(); searchRequest.setTableName("<TABLE_NAME>"); searchRequest.setIndexName("<SEARCH_INDEX_NAME>"); SearchQuery searchQuery = new SearchQuery(); MatchAllQuery query = new MatchAllQuery(); // In the following comment, builder is used to create a query statement. The method that uses builder to create a query statement has the same effect as the method that uses TermQuery to create a query statement. // Query query2 = QueryBuilders.matchAll().build(); searchQuery.setQuery(query); searchQuery.setLimit(0); DistinctCountAggregation aggregation = new DistinctCountAggregation(); aggregation.setAggName("dis_count_agg_1"); aggregation.setFieldName("column_place"); // In the following comment, builder is used to create a query statement. The method that uses builder to create a query statement has the same effect as the method that uses aggregation to create a query statement. // DistinctCountAggregation aggregation2 = AggregationBuilders.distinctCount("dis_count_agg_1", "column_place").build(); List<Aggregation> aggregationList = new ArrayList<Aggregation>(); aggregationList.add(aggregation); searchQuery.setAggregationList(aggregationList); // Execute the query statement. SearchResponse resp = client.search(searchRequest); // Obtain the aggregation results. System.out.println(resp.getAggregationResults().getAsDistinctCountAggregationResult("dis_count_agg_1").getValue()); } }
Percentile statistics
Percentile statistics are used to analyze the distribution of a dataset across various percentiles, such as p25, p50, p90, and p99, which is particularly useful during routine system operations and maintenance to assess the response time distribution of each request.
To improve the accuracy of the results, we recommend that you specify extreme percentile values such as p1 and p99. If you use extreme percentile values instead of other values such as p50, the returned results are more accurate.
Parameters
Parameter
Description
aggregationName
The unique name of the aggregation operation. You can query the results of a specific aggregation operation based on this name.
fieldName
The name of the field that is used to perform the aggregation operation. Only the Long, Double, and Date types are supported.
percentiles
The percentiles such as p50, p90, and p99. You can specify one or more percentiles.
missing
The default value for the field on which the aggregation operation is performed, which is applied to rows in which the field value is empty.
If you do not specify a value for the missing parameter, the row is ignored.
If you specify a value for the missing parameter, the value of this parameter is used as the field value of the row.
Examples
/** * Analyze the distribution of the response time of each request that is sent to the system by using percentiles. */ public void percentilesAgg(SyncClient client) { // Use builder to create a query statement. { SearchRequest searchRequest = SearchRequest.newBuilder() .tableName("<TABLE_NAME>") .indexName("indexName") .searchQuery( SearchQuery.newBuilder() .query(QueryBuilders.matchAll()) .limit(0) // If you want to obtain only the aggregation results instead of specific data, you can set limit to 0 to improve query performance. .addAggregation(AggregationBuilders.percentiles("percentilesAgg", "latency") .percentiles(Arrays.asList(25.0d, 50.0d, 99.0d)) .missing(1.0)) .build()) .build(); // Execute the query statement. SearchResponse resp = client.search(searchRequest); // Obtain the results. PercentilesAggregationResult percentilesAggregationResult = resp.getAggregationResults().getAsPercentilesAggregationResult("percentilesAgg"); for (PercentilesAggregationItem item : percentilesAggregationResult.getPercentilesAggregationItems()) { System.out.println("key: " + item.getKey() + " value:" + item.getValue().asDouble()); } } // Create a query statement without using builder. { SearchRequest searchRequest = new SearchRequest(); searchRequest.setTableName("<TABLE_NAME>"); searchRequest.setIndexName("<SEARCH_INDEX_NAME>"); SearchQuery searchQuery = new SearchQuery(); MatchAllQuery query = new MatchAllQuery(); // In the following comment, builder is used to create a query statement. The method that uses builder to create a query statement has the same effect as the method that uses TermQuery to create a query statement. // Query query2 = QueryBuilders.matchAll().build(); searchQuery.setQuery(query); searchQuery.setLimit(0); PercentilesAggregation aggregation = new PercentilesAggregation(); aggregation.setAggName("percentilesAgg"); aggregation.setFieldName("latency"); aggregation.setPercentiles(Arrays.asList(25.0d, 50.0d, 99.0d)); // In the following comment, builder is used to create a query statement. The method that uses builder to create a query statement has the same effect as the method that uses aggregation to create a query statement. // AggregationBuilders.percentiles("percentilesAgg", "latency").percentiles(Arrays.asList(25.0d, 50.0d, 99.0d)).missing(1.0).build(); List<Aggregation> aggregationList = new ArrayList<Aggregation>(); aggregationList.add(aggregation); searchQuery.setAggregationList(aggregationList); // Execute the query statement. SearchResponse resp = client.search(searchRequest); // Obtain the results. PercentilesAggregationResult percentilesAggregationResult = resp.getAggregationResults().getAsPercentilesAggregationResult("percentilesAgg"); for (PercentilesAggregationItem item : percentilesAggregationResult.getPercentilesAggregationItems()) { System.out.println("key: " + item.getKey() + " value:" + item.getValue().asDouble()); } } }
Group by field value
Groups query results based on the values of a field, combining identical field values into a single group and returning the value of each group along with the number of members in each group.
If the group contains an excessively large number of values, the calculated number may differ from the actual number.
To group query results based on multiple fields, you can use the groupBy parameter in nested mode or the GroupByComposite parameter. For more information on the differences between these two methods, see Appendix: Comparison of different methods for grouping by multiple fields.
Parameters
Parameter
Description
groupByName
The unique name of the aggregation operation. You can query the results of a specific aggregation operation based on this name.
fieldName
The name of the field that is used to perform the aggregation operation. Only the Long, Double, Boolean, Keyword, and Date types are supported.
groupBySorter
The sorting rules for groups. By default, groups are sorted based on the number of items in the groups in descending order. If you configure multiple sorting rules, the groups are sorted based on the order in which the rules are configured. The following sorting rules are supported:
Sort by value in alphabetical order.
Sort by value in reverse alphabetical order
Sort by row count in ascending order
Sort by row count in descending order
Sort by the values that are obtained from sub-aggregation results in ascending order
Sort the values that are obtained from sub-aggregation results in descending order
size
The number of groups that you want to return. Default value: 10. Maximum value: 2000. If the number of groups exceeds 2,000, only the first 2,000 groups are returned.
subAggregation and subGroupBy
The sub-aggregation operation. You can perform the sub-aggregation operation based on the grouping results.
Scenario
Query the number of products in each category, and the maximum and minimum product prices in each category.
Method
Group query results by product category to obtain the number of products in each category. Then, perform two sub-aggregation operations to obtain the maximum and minimum product prices in each category.
Examples:
Fruits: 5. The maximum price is CNY 15. The minimum price is CNY 3.
Toiletries: 10. The maximum price is CNY 98. The minimum price is CNY 1.
Electronic devices: 3. The maximum price is CNY 8,699. The minimum price is CNY 2,300.
Other products: 15. The maximum price is CNY 1,000. The minimum price is CNY 80.
Examples
Group by single field
/** * Query the number of products, and the maximum and minimum product prices in each category. * Example of returned results: Fruits: 5. The maximum price is CNY 15, and the minimum price is CNY 3. Toiletries: 10. The maximum price is CNY 98, and the minimum price is CNY 1. Electronic devices: 3. The maximum price is CNY 8,699, and the minimum price is CNY 2,300. * Other products: 15. The maximum price is CNY 1,000, and the minimum price is CNY 80. */ public void groupByField(SyncClient client) { // Use builder to create a query statement. { SearchRequest searchRequest = SearchRequest.newBuilder() .tableName("<TABLE_NAME>") .indexName("<SEARCH_INDEX_NAME>") .searchQuery( SearchQuery.newBuilder() .query(QueryBuilders.matchAll()) .limit(0) // If you want to obtain only the aggregation results instead of specific data, you can set limit to 0 to improve query performance. .addGroupBy(GroupByBuilders .groupByField("name1", "column_type") .addSubAggregation(AggregationBuilders.min("subName1", "column_price")) .addSubAggregation(AggregationBuilders.max("subName2", "column_price")) ) .build()) .build(); // Execute the query statement. SearchResponse resp = client.search(searchRequest); // Obtain the aggregation results. for (GroupByFieldResultItem item : resp.getGroupByResults().getAsGroupByFieldResult("name1").getGroupByFieldResultItems()) { // Display values. System.out.println(item.getKey()); // Display the number of rows. System.out.println(item.getRowCount()); // Display the minimum prices. System.out.println(item.getSubAggregationResults().getAsMinAggregationResult("subName1").getValue()); // Display the maximum prices. System.out.println(item.getSubAggregationResults().getAsMaxAggregationResult("subName2").getValue()); } } // Create a query statement without using builder. { SearchRequest searchRequest = new SearchRequest(); searchRequest.setTableName("<TABLE_NAME>"); searchRequest.setIndexName("<SEARCH_INDEX_NAME>"); SearchQuery searchQuery = new SearchQuery(); MatchAllQuery query = new MatchAllQuery(); // In the following comment, builder is used to create a query statement. The method that uses builder to create a query statement has the same effect as the method that uses TermQuery to create a query statement. // Query query2 = QueryBuilders.matchAll().build(); searchQuery.setQuery(query); searchQuery.setLimit(0); GroupByField groupByField = new GroupByField(); groupByField.setGroupByName("name1"); groupByField.setFieldName("column_type"); // Configure sub-aggregation operations. MinAggregation minAggregation = AggregationBuilders.min("subName1", "column_price").build(); MaxAggregation maxAggregation = AggregationBuilders.max("subName2", "column_price").build(); groupByField.setSubAggregations(Arrays.asList(minAggregation, maxAggregation)); // In the following comment, builder is used to create a query statement. The method that uses builder to create a query statement has the same effect as the method that uses aggregation to create a query statement. // GroupByBuilders.groupByField("name1", "column_type") // .addSubAggregation(AggregationBuilders.min("subName1", "column_price")) // .addSubAggregation(AggregationBuilders.max("subName2", "column_price").build()); List<GroupBy> groupByList = new ArrayList<GroupBy>(); groupByList.add(groupByField); searchQuery.setGroupByList(groupByList); searchRequest.setSearchQuery(searchQuery); // Execute the query statement. SearchResponse resp = client.search(searchRequest); // Obtain the aggregation results. for (GroupByFieldResultItem item : resp.getGroupByResults().getAsGroupByFieldResult("name1").getGroupByFieldResultItems()) { // Display values. System.out.println(item.getKey()); // Display the number of rows. System.out.println(item.getRowCount()); // Display the minimum prices. System.out.println(item.getSubAggregationResults().getAsMinAggregationResult("subName1").getValue()); // Display the maximum prices. System.out.println(item.getSubAggregationResults().getAsMaxAggregationResult("subName2").getValue()); } } }
Group by multiple fields in nested mode
/** * Example of grouping query results by multiple fields in nested mode. * In a search index, you can use two groupBy fields in nested mode to achieve the same effect as using multiple groupBy fields in an SQL statement. * SQL statement: select a,d, sum(b),sum(c) from user group by a,d. */ public void GroupByMultiField() { SearchRequest searchRequest = SearchRequest.newBuilder() .tableName("<TABLE_NAME>") .indexName("<SEARCH_INDEX_NAME>") .returnAllColumns(true) // You can set returnAllColumns to false and specify a value for addColumesToGet to obtain better query performance. //.addColumnsToGet("col_1","col_2") .searchQuery(SearchQuery.newBuilder() .query(QueryBuilders.matchAll()) // Specify query conditions. Query conditions can be used in the same manner as the WHERE clause in SQL. You can use QueryBuilders.bool() to perform nested queries. .addGroupBy( GroupByBuilders .groupByField("unique name_1", "field_a") .size(20) .addSubGroupBy( GroupByBuilders .groupByField("unique name_2", "field_d") .size(20) .addSubAggregation(AggregationBuilders.sum("unique name_3", "field_b")) .addSubAggregation(AggregationBuilders.sum("unique name_4", "field_c")) ) ) .build()) .build(); SearchResponse response = client.search(searchRequest); // Query rows that meet the specified conditions. List<Row> rows = response.getRows(); // Obtain the aggregation results. GroupByFieldResult groupByFieldResult1 = response.getGroupByResults().getAsGroupByFieldResult("unique name_1"); for (GroupByFieldResultItem resultItem : groupByFieldResult1.getGroupByFieldResultItems()) { System.out.println("field_a key:" + resultItem.getKey() + " Count:" + resultItem.getRowCount()); // Obtain the sub-aggregation results. GroupByFieldResult subGroupByResult = resultItem.getSubGroupByResults().getAsGroupByFieldResult("unique name_2"); for (GroupByFieldResultItem item : subGroupByResult.getGroupByFieldResultItems()) { System.out.println("field_a " + resultItem.getKey() + " field_d key:" + item.getKey() + " Count: " + item.getRowCount()); double sumOf_field_b = item.getSubAggregationResults().getAsSumAggregationResult("unique name_3").getValue(); double sumOf_field_c = item.getSubAggregationResults().getAsSumAggregationResult("unique name_4").getValue(); System.out.println("sumOf_field_b:" + sumOf_field_b); System.out.println("sumOf_field_c:" + sumOf_field_c); } } }
Sort groups for aggregation
/** * Example of configuring sorting rules for groups obtained from aggregation results. * Method: Configure sorting rules by specifying GroupBySorter. If you configure multiple sorting rules, the groups are sorted based on the order in which the rules are configured. GroupBySorter supports sorting in ascending or descending order. * By default, the groups are sorted by row count in descending order (GroupBySorter.rowCountSortInDesc()). */ public void groupByFieldWithSort(SyncClient client) { // Create a query statement. SearchRequest searchRequest = SearchRequest.newBuilder() .tableName("<TABLE_NAME>") .indexName("<SEARCH_INDEX_NAME>") .searchQuery( SearchQuery.newBuilder() .query(QueryBuilders.matchAll()) .limit(0) .addGroupBy(GroupByBuilders .groupByField("name1", "column_type") //.addGroupBySorter(GroupBySorter.subAggSortInAsc("subName1")) // Sort the groups in ascending order based on the values that are obtained from sub-aggregation results. .addGroupBySorter(GroupBySorter.groupKeySortInAsc()) // Sort the groups in ascending order based on the values that are obtained from aggregation results. //.addGroupBySorter(GroupBySorter.rowCountSortInDesc()) // Sort the groups in descending order based on the number of rows that are obtained from the aggregation results in each group. .size(20) .addSubAggregation(AggregationBuilders.min("subName1", "column_price")) .addSubAggregation(AggregationBuilders.max("subName2", "column_price")) ) .build()) .build(); // Execute the query statement. SearchResponse resp = client.search(searchRequest); }
Group by multiple fields
Groups query results based on multiple fields, allowing for pagination using tokens.
Only Tablestore SDKs for Java and Go support this feature.
To group query results by multiple fields, you can use the groupBy parameter in nested mode or the GroupByComposite parameter. For more information on the differences between these two methods, see Appendix: Comparison of different methods for grouping by multiple fields.
Parameters
Parameter
Description
groupByName
The unique name of the aggregation operation. You can query the results of a specific aggregation operation based on this name.
sources
The fields by which you want to group query results. You can group query results by up to 32 fields and perform aggregation operations on the resulting groups. The following group types are supported:
ImportantFor group types specified in the sources parameter, you can set the groupBySorter parameter only to groupKeySort.
By default, the groups are sorted in descending order.
If a field value does not exist, the system returns NULL.
For the GroupByField type, you can configure only the following parameters: groupByName, fieldName, and groupBySorter.
For the GroupByHistogram type, you can configure only the following parameters: groupByName, fieldName, interval, and groupBySorter.
For the GroupByDateHistogram type, you can configure only the following parameters: groupByName, fieldName, interval, timeZone, and groupBySorter.
nextToken
The pagination token that is used in the next request to retrieve a new page of groups. You can obtain the value of the nextToken parameter from the output of the GroupByCompositeResult operation. The nextToken parameter allows you to obtain all resulting groups.
size
The number of groups per page. If the number of groups that meet the query conditions exceeds the value of the size parameter, you can use the nextToken parameter to obtain the groups on the next page.
ImportantIf you want to limit the number of groups to return, you cannot configure the Size and suggestedSize parameters at the same time. In most cases, we recommend that you configure the size parameter.
In scenarios in which you want to interconnect Tablestore with a high-throughput computing engine such as Apache Spark or PrestoSQL, we recommend that you configure the suggestedSize parameter.
suggestedSize
The expected number of groups per page. You can specify a value greater than the maximum number of groups allowed at the server side or -1. The server side returns the number of groups based on its capacity. This parameter is suitable for scenarios in which you interconnect Tablestore with a high-throughput computing engine, such as Apache Spark or PrestoSQL.
If you set this parameter to a value that is greater than the maximum number of groups allowed at the server side, the system adjusts the value to the maximum number of groups allowed at the server side. The actual number of groups that are returned equals
min(suggestedSize, maximum number of groups allowed at the server side, total number of groups)
.subAggregation and subGroupBy
The sub-aggregation operation. You can perform the sub-aggregation operation based on the grouping results.
ImportantThe GroupByComposite parameter is not supported in subGroupBy.
Examples
/** * Group and aggregate query results: Group the query results and perform the aggregation operation on the resulting groups based on the parameters such as groupbyField, groupByHistogram, and groupByDataHistogram that are passed to the SourceGroupBy parameter. * Return the aggregated results of multiple fields in a flat structure. */ public static void groupByComposite(SyncClient client) { GroupByComposite.Builder compositeBuilder = GroupByBuilders .groupByComposite("groupByComposite") .size(2000) .addSources(GroupByBuilders.groupByField("groupByField", "Col_Keyword") .addGroupBySorter(GroupBySorter.groupKeySortInAsc()).build()) .addSources(GroupByBuilders.groupByHistogram("groupByHistogram", "Col_Long") .addGroupBySorter(GroupBySorter.groupKeySortInAsc()) .interval(5) .build()) .addSources(GroupByBuilders.groupByDateHistogram("groupByDateHistogram", "Col_Date") .addGroupBySorter(GroupBySorter.groupKeySortInAsc()) .interval(5, DateTimeUnit.DAY) .timeZone("+05:30").build()); SearchRequest searchRequest = SearchRequest.newBuilder() .indexName("<SEARCH_INDEX_NAME>") .tableName("<TABLE_NAME>") .returnAllColumnsFromIndex(true) .searchQuery(SearchQuery.newBuilder() .addGroupBy(compositeBuilder.build()) .build()) .build(); SearchResponse resp = client.search(searchRequest); while (true) { if (resp.getGroupByResults() == null || resp.getGroupByResults().getResultAsMap().size() == 0) { System.out.println("groupByComposite Result is null or empty"); return; } GroupByCompositeResult result = resp.getGroupByResults().getAsGroupByCompositeResult("groupByComposite"); if(!result.getSourceNames().isEmpty()) { for (String sourceGroupByNames: result.getSourceNames()) { System.out.printf("%s\t", sourceGroupByNames); } System.out.print("rowCount\t\n"); } for (GroupByCompositeResultItem item : result.getGroupByCompositeResultItems()) { for (String value : item.getKeys()) { String val = value == null ? "NULL" : value; System.out.printf("%s\t", val); } System.out.printf("%d\t\n", item.getRowCount()); } // Use the token to page through groups. if (result.getNextToken() != null) { searchRequest.setSearchQuery( SearchQuery.newBuilder() .addGroupBy(compositeBuilder.nextToken(result.getNextToken()).build()) .build() ); resp = client.search(searchRequest); } else { break; } } }
Group by range
Groups query results based on the value ranges of a field, with field values within a specified range being grouped together and the number of values in each range being returned.
Parameters
Parameter
Description
groupByName
The unique name of the aggregation operation. You can query the results of a specific aggregation operation based on this name.
fieldName
The name of the field that is used to perform the aggregation operation. Only the Long and Double types are supported.
range[double_from, double_to)
The value ranges for grouping.
You can set double_from to Double.MIN_VALUE to specify the minimum value and set double_to to Double.MAX_VALUE to specify the maximum value.
subAggregation and subGroupBy
The sub-aggregation operation. You can perform the sub-aggregation operation based on the grouping results.
For example, after you group query results by sales volume and by province, you can obtain the province that has the largest proportion of sales volume in a specified range. You need to specify GroupByField in GroupByRange to perform this query.
Examples
/** * Group sales volumes based on ranges [0, 1000), [1000, 5000), and [5000, Double.MAX_VALUE) to obtain the sales volume in each range. */ public void groupByRange(SyncClient client) { // Create a query statement. SearchRequest searchRequest = SearchRequest.newBuilder() .tableName("<TABLE_NAME>") .indexName("<SEARCH_INDEX_NAME>") .searchQuery( SearchQuery.newBuilder() .query(QueryBuilders.matchAll()) .limit(0) .addGroupBy(GroupByBuilders .groupByRange("name1", "column_number") .addRange(0, 1000) .addRange(1000, 5000) .addRange(5000, Double.MAX_VALUE) ) .build()) .build(); // Execute the query statement. SearchResponse resp = client.search(searchRequest); // Obtain the aggregation results. for (GroupByRangeResultItem item : resp.getGroupByResults().getAsGroupByRangeResult("name1").getGroupByRangeResultItems()) { // Display the number of rows. System.out.println(item.getRowCount()); } }
Group by geographical location
Groups query results based on distances from geographical locations to a central point, with query results within a specified distance range being grouped together and the number of items in each range being returned.
Parameters
Parameter
Description
groupByName
The unique name of the aggregation operation. You can query the results of a specific aggregation operation based on this name.
fieldName
The name of the field that is used to perform the aggregation operation. Only the Geo_point type is supported.
origin(double lat, double lon)
The longitude and latitude of the central point.
double lat specifies the latitude of the central point. double lon specifies the longitude of the central point.
range[double_from, double_to)
The distance ranges that are used for grouping. Unit: meters.
You can set double_from to Double.MIN_VALUE to specify the minimum value and set double_to to Double.MAX_VALUE to specify the maximum value.
subAggregation and subGroupBy
The sub-aggregation operation. You can perform the sub-aggregation operation based on the grouping results.
Examples
/** * Group users based on geographical locations to a Wanda Plaza to obtain the number of users in each distance range. The distance ranges are [0, 1000), [1000, 5000), and [5000, Double.MAX_VALUE). Unit: meters. */ public void groupByGeoDistance(SyncClient client) { // Create a query statement. SearchRequest searchRequest = SearchRequest.newBuilder() .tableName("<TABLE_NAME>") .indexName("<SEARCH_INDEX_NAME>") .searchQuery( SearchQuery.newBuilder() .query(QueryBuilders.matchAll()) .limit(0) .addGroupBy(GroupByBuilders .groupByGeoDistance("name1", "column_geo_point") .origin(3.1, 6.5) .addRange(0, 1000) .addRange(1000, 5000) .addRange(5000, Double.MAX_VALUE) ) .build()) .build(); // Execute the query statement. SearchResponse resp = client.search(searchRequest); // Obtain the aggregation results. for (GroupByGeoDistanceResultItem item : resp.getGroupByResults().getAsGroupByGeoDistanceResult("name1").getGroupByGeoDistanceResultItems()) { // Display the number of rows. System.out.println(item.getRowCount()); } }
Group by filter
Filters query results and groups them based on each filter to obtain the number of matching results, with results being returned in the order the filters are specified.
Parameters
Parameter
Description
groupByName
The unique name of the aggregation operation. You can query the results of a specific aggregation operation based on this name.
filter
The filters that can be used for the query. Results are returned in the order in which the filters are specified.
subAggregation and subGroupBy
The sub-aggregation operation. You can perform the sub-aggregation operation based on the grouping results.
Examples
/** * Specify the following filters to obtain the number of items that match each filter: The sales volume exceeds 100, the place of origin is Zhejiang, and the description contains Hangzhou. */ public void groupByFilter(SyncClient client) { // Create a query statement. SearchRequest searchRequest = SearchRequest.newBuilder() .tableName("<TABLE_NAME>") .indexName("<SEARCH_INDEX_NAME>") .searchQuery( SearchQuery.newBuilder() .query(QueryBuilders.matchAll()) .limit(0) .addGroupBy(GroupByBuilders .groupByFilter("name1") .addFilter(QueryBuilders.range("number").greaterThanOrEqual(100)) .addFilter(QueryBuilders.term("place","Zhejiang")) .addFilter(QueryBuilders.match("text","Hangzhou")) ) .build()) .build(); // Execute the query statement. SearchResponse resp = client.search(searchRequest); // Obtain the aggregation results based on the order of filters. for (GroupByFilterResultItem item : resp.getGroupByResults().getAsGroupByFilterResult("name1").getGroupByFilterResultItems()) { // Display the number of rows. System.out.println(item.getRowCount()); } }
Query by histogram
Groups query results based on specific data intervals, with field values within the same range being grouped together and the value range of each group along with the number of values in each group being returned.
Parameters
Parameter
Description
groupByName
The unique name of the aggregation operation. You can query the results of a specific aggregation operation based on this name.
fieldName
The name of the field that is used to perform the aggregation operation. Only the Long and Double types are supported.
interval
The data interval that is used to obtain aggregation results.
fieldRange[min,max]
The range that is used together with the interval parameter to limit the number of groups. The number of groups that is determined by using the
(fieldRange.max-fieldRange.min)/interval
formula cannot exceed 2,000.minDocCount
The minimum number of rows. If the number of rows in a group is less than the minimum number of rows, the aggregation results for the group are not returned.
missing
The default value for the field on which the aggregation operation is performed, which is applied to rows in which the field value is empty.
If you do not specify a value for the missing parameter, the row is ignored.
If you specify a value for the missing parameter, the value of this parameter is used as the field value of the row.
Example
/** * Collect statistics on the distribution of users by age group. */ public static void groupByHistogram(SyncClient client) { // Create a query statement. SearchRequest searchRequest = SearchRequest.newBuilder() .tableName("<TABLE_NAME>") .indexName("<SEARCH_INDEX_NAME>") .searchQuery( SearchQuery.newBuilder() .addGroupBy(GroupByBuilders .groupByHistogram("groupByHistogram", "age") .interval(10) .minDocCount(0L) .addFieldRange(0, 99)) .build()) .build(); // Execute the query statement. SearchResponse resp = ots.search(searchRequest); // Obtain the results that are returned when the aggregation operation is performed. GroupByHistogramResult results = resp.getGroupByResults().getAsGroupByHistogramResult("groupByHistogram"); for (GroupByHistogramItem item : results.getGroupByHistogramItems()) { System.out.println("key:" + item.getKey().asLong() + " value:" + item.getValue()); } }
Query by date histogram
Groups query results based on specific date intervals, with field values within the same range being grouped together and the value range of each group along with the number of values in each group being returned.
Tablestore SDK for Java V5.16.1 or later supports this feature. For more information about the historical iterations of the Java SDK, see Java SDK historical iterations.
Parameters
Parameter
Description
groupByName
The unique name of the aggregation operation. You can query the results of a specific aggregation operation based on this name.
fieldName
The name of the field that is used to perform the aggregation operation. Only the Date type is supported.
ImportantThe Date type of search indexes is supported by Tablestore SDK for Java V5.13.9 and later.
interval
The data interval that is used to obtain aggregation results.
fieldRange[min,max]
The range that is used together with the interval parameter to limit the number of groups. The number of groups that is determined by using the
(fieldRange.max-fieldRange.min)/interval
formula cannot exceed 2,000.minDocCount
The minimum number of rows. If the number of rows in a group is less than the minimum number of rows, the aggregation results for the group are not returned.
missing
The default value for the field on which the aggregation operation is performed, which is applied to rows in which the field value is empty.
If you do not specify a value for the missing parameter, the row is ignored.
If you specify a value for the missing parameter, the value of this parameter is used as the field value of the row.
timeZone
The time zone in the
+hh:mm
or-hh:mm
format, such as+08:00
or-09:00
. This parameter is required only when the field is of the Date type.If this parameter is not specified for the field of the Date type, the offset of N hours may occur in the aggregation results. You can specify the timeZone parameter to prevent this issue.
Examples
/** * Collect statistics on the daily distribution of data in the col_date field from 10:00:00 on May 1, 2017 to 13:00:00 May 21, 2017. */ public static void groupByDateHistogram(SyncClient client) { // Create a query statement. SearchRequest searchRequest = SearchRequest.newBuilder() .returnAllColumns(false) .indexName("<SEARCH_INDEX_NAME>") .tableName("<TABLE_NAME>") .searchQuery( SearchQuery.newBuilder() .query(QueryBuilders.matchAll()) .limit(0) .getTotalCount(false) .addGroupBy(GroupByBuilders .groupByDateHistogram("groupByDateHistogram", "col_date") .interval(1, DateTimeUnit.DAY) .minDocCount(1) .missing("2017-05-01 13:01:00") .fieldRange("2017-05-01 10:00", "2017-05-21 13:00:00")) .build()) .build(); // Execute the query statement. SearchResponse resp = ots.search(searchRequest); // Obtain the results that are returned when the aggregation operation is performed. List<GroupByDateHistogramItem> items = resp.getGroupByResults().getAsGroupByDateHistogramResult("groupByDateHistogram").getGroupByDateHistogramItems(); for (GroupByDateHistogramItem item : items) { System.out.printf("millisecondTimestamp:%d, count:%d \n", item.getTimestamp(), item.getRowCount()); } }
Query the rows that are obtained from the results of an aggregation operation in each group
After grouping query results, you can query the rows in each group, similar to using ANY_VALUE(field) in MySQL.
When querying the rows obtained from the results of an aggregation operation in each group, the returned results only contain the primary key information if the search index includes NESTED, GEOPOINT, or ARRAY fields. To retrieve the required field, you must query the data table.
Parameters
Parameter
Description
aggregationName
The unique name of the aggregation operation. You can query the results of a specific aggregation operation based on this name.
limit
The maximum number of rows that can be returned for each group. By default, only one row of data is returned.
sort
The sorting method that is used to sort data in groups.
columnsToGet
The fields that you want to return. Only fields in search indexes are supported. Array, Date, Geopoint, and Nested fields are not supported.
The value of this parameter is the same as the value of the columnsToGet parameter in SearchRequest. You need to only specify a value for the columnsToGet parameter in SearchRequest.
Examples
/** * An activity application form of a school contains fields in which information such as the names of students, classes, head teachers, and class presidents can be specified. You can group students by class to view the application statistics and the property information of each class. * SQL statement: select className, teacher, monitor, COUNT(*) as number from table GROUP BY className. */ public void testTopRows(SyncClient client) { SearchRequest searchRequest = SearchRequest.newBuilder() .indexName("<SEARCH_INDEX_NAME>") .tableName("<TABLE_NAME>") .searchQuery( SearchQuery.newBuilder() .query(QueryBuilders.matchAll()) .limit(0) .addGroupBy(GroupByBuilders.groupByField("groupName", "className") .size(5) // Specify the number of groups that you want to return. For information about the maximum value that you can specify for the number of groups to return, see the description of the number of groups returned by GroupByField in the "Search index limits" topic. .addSubAggregation(AggregationBuilders.topRows("topRowsName") .limit(1) .sort(new Sort(Arrays.asList(new FieldSort("teacher", SortOrder.DESC)))) // Sort rows by teacher in descending order. ) ) .build()) .addColumnsToGet(Arrays.asList("teacher", "monitor")) .build(); SearchResponse resp = client.search(searchRequest); List<GroupByFieldResultItem> items = resp.getGroupByResults().getAsGroupByFieldResult("groupName").getGroupByFieldResultItems(); for (GroupByFieldResultItem item : items) { String className = item.getKey(); long number = item.getRowCount(); List<Row> topRows = item.getSubAggregationResults().getAsTopRowsAggregationResult("topRowsName").getRows(); Row row = topRows.get(0); String teacher = row.getLatestColumn("teacher").getValue().asString(); String monitor = row.getLatestColumn("monitor").getValue().asString(); } }
Nesting
GroupBy supports nesting, allowing for sub-aggregation operations within a group.
Nesting enables sub-aggregation operations within a group. The following examples illustrate two levels of nesting in aggregation operations:
GroupBy + SubGroupBy: Rows are grouped by province and then by city to obtain data for each city within each province.
GroupBy + SubAggregation: Rows are grouped by province and aggregation operations are performed to obtain the maximum value of a metric for each province.
To maintain high performance for complex GroupBy operations, limit the number of nesting levels. For more information, see Search index limits.
public void subGroupBy(SyncClient client) {
// Create a query statement.
SearchRequest searchRequest = SearchRequest.newBuilder()
.indexName("index_name")
.tableName("table_name")
.returnAllColumns(true)
.searchQuery(
SearchQuery.newBuilder()
.query(QueryBuilders.match("textField", "hello"))
.limit(10)
.addAggregation(AggregationBuilders.min("name1", "fieldName1"))
.addAggregation(AggregationBuilders.max("name2", "fieldName2"))
.addGroupBy(GroupByBuilders
.groupByField("name3", "fieldName3")
.addSubAggregation(AggregationBuilders.max("subName1", "fieldName4"))
.addSubAggregation(AggregationBuilders.sum("subName2", "fieldName5"))
.addSubGroupBy(GroupByBuilders
.groupByRange("subName3", "fieldName6")
.addRange(12, 90)
.addRange(100, 900)
))
.build())
.build();
// Execute the query.
SearchResponse resp = client.search(searchRequest);
// Obtain the maximum and minimum values for the first level.
AggregationResults aggResults = resp.getAggregationResults();
System.out.println(aggResults.getAsMinAggregationResult("name1").getValue());
System.out.println(aggResults.getAsMaxAggregationResult("name2").getValue());
// Obtain the GroupByField results of the first level and the results of the aggregations that are nested in GroupByField.
GroupByFieldResult results = resp.getGroupByResults().getAsGroupByFieldResult("name3");
for (GroupByFieldResultItem item : results.getGroupByFieldResultItems()) {
System.out.println("count:" + item.getRowCount());
System.out.println("key:" + item.getKey());
// Obtain the sub-aggregation results.
// Display the maximum value that is obtained from the results of the sub-aggregation operation.
System.out.println(item.getSubAggregationResults().getAsMaxAggregationResult("subName1"));
// Display the sum that is obtained from the results of the sub-aggregation operation.
System.out.println(item.getSubAggregationResults().getAsSumAggregationResult("subName2"));
// Display the GroupByRange values that are obtained from the results of the sub-aggregation operation.
GroupByRangeResult subResults = item.getSubGroupByResults().getAsGroupByRangeResult("subName3");
for (GroupByRangeResultItem subItem : subResults.getGroupByRangeResultItems()) {
System.out.println(String.format("from:%s, to:%s, count:%s", subItem.getFrom(), subItem.getTo(), subItem.getRowCount()));
}
}
}
Multiple aggregations
You can perform multiple aggregation operations concurrently.
Performing multiple complex aggregation operations simultaneously may require an extended period of time.
Combine multiple aggregations
public void multipleAggregation(SyncClient client) {
// Create a query statement.
SearchRequest searchRequest = SearchRequest.newBuilder()
.tableName("<TABLE_NAME>")
.indexName("<SEARCH_INDEX_NAME>")
.searchQuery(
SearchQuery.newBuilder()
.query(QueryBuilders.matchAll())
.limit(0)
.addAggregation(AggregationBuilders.min("name1", "long"))
.addAggregation(AggregationBuilders.sum("name2", "long"))
.addAggregation(AggregationBuilders.distinctCount("name3", "long"))
.build())
.build();
// Execute the query statement.
SearchResponse resp = client.search(searchRequest);
// Obtain the minimum value from the results of the aggregation operation.
System.out.println(resp.getAggregationResults().getAsMinAggregationResult("name1").getValue());
// Obtain the sum from the results of the aggregation operation.
System.out.println(resp.getAggregationResults().getAsSumAggregationResult("name2").getValue());
// Obtain the number of distinct values from the results of the aggregation operation.
System.out.println(resp.getAggregationResults().getAsDistinctCountAggregationResult("name3").getValue());
}
Combine Aggregation and GroupBy
public void multipleGroupBy(SyncClient client) {
// Create a query statement.
SearchRequest searchRequest = SearchRequest.newBuilder()
.tableName("<TABLE_NAME>")
.indexName("<SEARCH_INDEX_NAME>")
.searchQuery(
SearchQuery.newBuilder()
.query(QueryBuilders.matchAll())
.limit(0)
.addAggregation(AggregationBuilders.min("name1", "long"))
.addAggregation(AggregationBuilders.sum("name2", "long"))
.addAggregation(AggregationBuilders.distinctCount("name3", "long"))
.addGroupBy(GroupByBuilders.groupByField("name4", "type"))
.addGroupBy(GroupByBuilders.groupByRange("name5", "long").addRange(1, 15))
.build())
.build();
// Execute the query statement.
SearchResponse resp = client.search(searchRequest);
// Obtain the minimum value from the results of the aggregation operation.
System.out.println(resp.getAggregationResults().getAsMinAggregationResult("name1").getValue());
// Obtain the sum from the results of the aggregation operation.
System.out.println(resp.getAggregationResults().getAsSumAggregationResult("name2").getValue());
// Obtain the number of distinct values from the results of the aggregation operation.
System.out.println(resp.getAggregationResults().getAsDistinctCountAggregationResult("name3").getValue());
// Obtain the values of GroupByField from the results of the aggregation operation.
for (GroupByFieldResultItem item : resp.getGroupByResults().getAsGroupByFieldResult("name4").getGroupByFieldResultItems()) {
// Display the keys.
System.out.println(item.getKey());
// Display the number of rows.
System.out.println(item.getRowCount());
}
// Obtain the values of GroupByRange from the results of the aggregation operation.
for (GroupByRangeResultItem item : resp.getGroupByResults().getAsGroupByRangeResult("name5").getGroupByRangeResultItems()) {
// Display the number of rows.
System.out.println(item.getRowCount());
}
}
Appendix: comparison of different methods for grouping by multiple fields
If you want to group query results by multiple fields, you can use the groupBy parameter in nested mode or use the GroupByComposite parameter. The following table describes the difference between the groupBy parameter in nested mode and the groupByComposite parameter.
Feature | groupBy (nested) | groupByComposite |
size | 2000 | 2000 |
Limits on fields | Up to five levels are supported. | Up to 32 levels are supported. |
Pagination | Not supported | Supported by using the nextToken parameter |
Sorting rules for rows in groups |
| In alphabetical order or reverse alphabetical order |
Supports aggregation | Yes | Yes |
Compatibility | For fields of the Date type, the query results are returned in the specified format. | For fields of the DATE type, the query results are returned as timestamp strings. |