All Products
Search
Document Center

Tablestore:Collapse (distinct)

Last Updated:Nov 29, 2024

You can use the collapse (distinct) feature to collapse the result set based on a specified column when the results of a query contain large amounts of data of a specific type. Data of the specific type is displayed only once in the query results to ensure the diversity of the result types.

In most scenarios, you can use the collapse (distinct) feature to obtain distinct values based on the columns that are collapsed. However, this feature is supported only for columns whose values are of the INTEGER, FLOATING-POINT, or KEYWORD type. Only the top 50,000 results are returned.

Usage notes

  • If you use the collapse (distinct) feature, you can perform pagination only by specifying the offset and limit parameters.

  • If you aggregate and collapse a result set at the same time, the result set is aggregated before it is collapsed.

  • If you collapse the results of a query, the total number of results that are returned is determined by the sum of the values of the offset and limit parameters. A maximum of 50,000 results can be returned.

  • The total number of rows in the response indicates the number of rows that meet the query conditions before you use the collapse (distinct) feature. After the result set is collapsed, the total number of distinct values cannot be queried.

API operation

You can configure the collapse parameter in the Search operation to implement the collapse (distinct) feature.

Parameters

Parameter

Description

query

The query type. You can set this parameter to any query type.

collapse

Collapses the result set based on the column that is specified by the fieldName field.

fieldName: the name of the column based on which the result set is collapsed. Only columns whose values are of the INTEGER, FLOATING-POINT, or KEYWORD type are supported.

offset

The position from which the current query starts.

limit

The maximum number of rows that you want the current query to return.

To query only the number of rows that meet the query conditions without specific data, set the limit parameter to 0.

getTotalCount

Specifies whether to return the total number of rows that meet the query conditions. The default value of this parameter is false, which specifies that the total number of rows that meet the query conditions is not returned.

If you set this parameter to true, the query performance is compromised.

tableName

The name of the data table.

indexName

The name of the search index.

columnsToGet

Specifies whether to return all columns of each row that meets the query conditions. You can specify the returnAll and columns fields for this parameter.

The default value of the returnAll field is false, which specifies that not all columns are returned. In this case, you can use the columns field to specify the columns that you want to return. If you do not specify the columns that you want to return, only the primary key columns are returned.

If you set the returnAll field to true, all columns are returned.

Methods

You can collapse the results of a data query by using the Tablestore CLI or Tablestore SDKs. Before you use the collapse (distinct) feature, make sure that the following prerequisites are met:

Use the Tablestore CLI

You can run the search command in the Tablestore CLI to query data by using search indexes and configure the Collapse parameter to implement the collapse (distinct) feature. For more information, see Search index.

  1. Run the search command to use the search_index search index to query data and return all indexed columns of each row that meets the query conditions.

    search -n search_index --return_all_indexed
  2. The following sample code shows how to enter the query conditions as prompted by the system:

    {
        "Offset": -1,
        "Limit": 10,
        "Collapse": {
            "FieldName": "product_name"
        },
        "Sort": null,
        "GetTotalCount": true,
        "Token": null,
        "Query": {
            "Name": "MatchQuery",
            "Query": {
                "FieldName": "user_id",
                "Text": "00002",
                "MinimumShouldMatch": 1
            }
        }
    }

Use Tablestore SDKs

You can collapse the results of a data query by using Tablestore SDK for Java, Tablestore SDK for Go, Tablestore SDK for Python, Tablestore SDK for Node.js, Tablestore SDK for .NET, and Tablestore SDK for PHP. In the following example, Tablestore SDK for Java is used to describe how to implement the collapse (distinct) feature.

The following sample code provides an example on how to query the rows in which the value of the user_id column matches "00002" and then collapse the result set based on the value of the product_name column:

private static void UseCollapse(SyncClient client){
    SearchQuery searchQuery = new SearchQuery(); // Specify the query conditions. 
    MatchQuery matchQuery = new MatchQuery();
    matchQuery.setFieldName("user_id");
    matchQuery.setText("00002");

    searchQuery.setQuery(matchQuery);
    Collapse collapse = new Collapse("product_name"); // Collapse the result set based on the product_name column. 
    searchQuery.setCollapse(collapse);

    //searchQuery.setOffset(1000);// The position from which the current query starts. 
    searchQuery.setLimit(20);
    //searchQuery.setGetTotalCount(true);// Set the GetTotalCount parameter to true to return the total number of rows that meet the query conditions. 

    SearchRequest searchRequest = new SearchRequest("<TABLE_NAME>", "<SEARCH_INDEX_NAME>", searchQuery);// Specify the name of the data table and the name of the search index.     
    // You can use the columnsToGet parameter to specify the columns that you want to return or specify that all columns are returned. If you do not specify this parameter, only the primary key columns are returned. 
    //SearchRequest.ColumnsToGet columnsToGet = new SearchRequest.ColumnsToGet();
    //columnsToGet.setReturnAll(true); // Set the ReturnAll parameter to true to return all columns. 
    //columnsToGet.setColumns(Arrays.asList("ColName1","ColName2")); // Specify the columns that you want to return. 
    //searchRequest.setColumnsToGet(columnsToGet);

    SearchResponse response = client.search(searchRequest);  
    //System.out.println(response.getTotalCount());    
    //System.out.println(response.getRows().size()); // Display the number of rows that are returned based on the product_name column. 
    System.out.println(response.getRows()); // Display the product names that are returned based on the product_name column. 
}

Billing rules

If you use search indexes to query data, read throughput is consumed. For more information, see Billable items of search indexes.

The collapse (distinct) feature does not affect the existing billing rules.

FAQ

References

  • When you use a search index to query data, you can use the following query methods: term query, terms query, match all query, match query, match phrase query, prefix query, range query, wildcard query, fuzzy query, Boolean query, geo query, nested query, KNN vector query, and exists query. You can select query methods based on your business requirements to query data from multiple dimensions.

    You can sort or paginate rows that meet the query conditions by using the sorting and paging features. For more information, see Perform sorting and paging.

    You can use the collapse (distinct) feature to collapse the result set based on a specific column. This way, data of the specified type appears only once in the query results. For more information, see Collapse (distinct).

  • If you want to analyze data in a data table, you can use the aggregation feature of the Search operation or execute SQL statements. For example, you can obtain the minimum and maximum values, sum, and total number of rows. For more information, see Aggregation and SQL query.

  • If you want to obtain all rows that meet the query conditions without the need to sort the rows, you can call the ParallelScan and ComputeSplits operations to use the parallel scan feature. For more information, see Parallel scan.