Use approximate match to query data - Tablestore - Alibaba Cloud Documentation Center

You can use match query to query data in a table based on approximate matches. Tablestore tokenizes the values in the TEXT field and the keyword you use to perform a match query based on the analyzer type that you specify. This way, Tablestore can perform a match query based on the tokens. We recommend that you use match phrase query for TEXT fields for which fuzzy tokenization is used to ensure high performance in fuzzy queries.

Scenarios

You can use match query to search for data that contains a specific phrase. You can use match phrase query together with tokenization to perform full-text search in specific scenarios, such as big data analysis, content search, knowledge management, social media analysis, log analysis, intelligent Q&A systems, and compliance review. For example, you can quickly query a list of products whose title, description, or tags contain the specified keyword on e-commerce platforms and quickly locate error messages or suspicious operations in logs.

Features

You can use match query to query data in a table based on approximate matches. For example, the value in the title column of the TEXT type is "Hangzhou West Lake Scenic Area" in a row and single-word tokenization is used. If you set the keyword to "Lake Scenic" for match query, the row meets the query conditions.

When you use match query, you must specify the name of the field that you want to query and the keyword. If at least one of the tokens in a row matches the tokens in the keyword, the row meets the query conditions.

When you perform a match query, you can specify the minimum number of matched tokens contained in the value of the field, the weight that you want to assign to the field that you want to query to calculate the BM25-based keyword relevance score, the columns that you want to return, whether to return the total number of rows that meet the query conditions, and the method that is used to sort the returned rows.

API operation

You can call the Search or ParallelScan operation and set the query type to MatchQuery to perform a match query.

Parameters

Parameter	Description
fieldName	The name of the field that you want to match. Match query applies to TEXT fields.
text	The keyword that is used to match the value of the field when you perform a match query. If the field that you want to match is a TEXT field, the keyword is tokenized into multiple tokens based on the analyzer type that you specify when you create the search index. If you do not specify the analyzer type when you create the search index, single-word tokenization is performed. For example, if the field that you want to match is a TEXT field, you set the analyzer type to single-word tokenization, and you use "this is" as a search keyword, you can obtain query results such as "..., this is tablestore", "is this tablestore", "tablestore is cool", "this", and "is".
query	The type of the query. Set the query parameter to matchQuery.
offset	The position from which the current query starts.
limit	The maximum number of rows that you want the current query to return. To query only the number of rows that meet the query conditions without specific data, set the limit parameter to 0.
minimumShouldMatch	The minimum number of matched tokens contained in the value of the field. A row is returned only if the value of the field specified by the fieldName parameter in the row contains at least the minimum number of matched tokens. Note You must use the minimumShouldMatch parameter together with the OR logical operator.
operator	The logical operator. By default, OR is used as the logical operator, which specifies that a row meets the query conditions when the column value contains at least the minimum number of matched tokens. If you set the operator parameter to AND, the row meets the query conditions only if the column value contains all matched tokens.
getTotalCount	Specifies whether to return the total number of rows that meet the query conditions. The default value of this parameter is false, which specifies that the total number of rows that meet the query conditions is not returned. If you set this parameter to true, the query performance is compromised.
weight	The weight that you want to assign to the field that you want to query to calculate the BM25-based keyword relevance score. This parameter is used in full-text search scenarios. If you specify a higher weight for the field that you want to query, the BM25-based keyword relevance score for the field is higher. The value of this parameter is a positive floating point number. This parameter does not affect the number of rows that are returned. However, this parameter affects the BM25-based keyword relevance scores of the query results.
tableName	The name of the data table.
indexName	The name of the search index.
columnsToGet	Specifies whether to return all columns of each row that meets the query conditions. You can specify the returnAll and columns fields for the columnsToGet parameter. The default value of the returnAll field is false, which specifies that not all columns are returned. In this case, you can use the columns field to specify the columns that you want to return. If you do not specify the columns that you want to return, only the primary key columns are returned. If you set the returnAll field to true, all columns are returned.

Usage notes

You can use search indexes to sort query results based on the BM25-based keyword relevance score. You cannot specify custom relevance scores to sort query results.

Methods

You can use the Tablestore console, Tablestore CLI, or Tablestore SDKs to perform a match query. Before you perform a match query, make sure that the following preparations are made:

You have an Alibaba Cloud account or a RAM user that has permissions to perform operations on Tablestore. For information about how to grant Tablestore operation permissions to a RAM user, see Use a RAM policy to grant permissions to a RAM user.
If you want to use Tablestore SDKs or the Tablestore CLI to perform a query, an AccessKey pair is created for your Alibaba Cloud account or RAM user. For more information, see Create an AccessKey pair.
A data table is created. For more information, see Operations on tables.
A search index is created for the data table. For more information, see Create a search index.
If you want to use Tablestore SDKs to perform a query, an OTSClient instance is initialized. For more information, see Initialize an OTSClient instance.
If you want to use the Tablestore CLI to perform a query, the Tablestore CLI is downloaded and started, and information about the instance that you want to access and the data table are configured. For more information, see Download the Tablestore CLI, Start the Tablestore CLI and configure access information, and Operations on data tables.

Use the Tablestore console

Go to the Indexes tab.
1. Log on to the Tablestore console.
2. In the top navigation bar, select a resource group and a region.
3. On the Overview page, click the name of the instance that you want to manage or click Manage Instance in the Actions column of the instance.
4. On the Tables tab of the Instance Details tab, click the name of the data table or click Indexes in the Actions column of the data table.
On the Indexes tab, find the search index that you want to use to query data and click Manage Data in the Actions column.
In the Search dialog box, specify query conditions.
1. By default, the system returns all attribute columns. To return specific attribute columns, turn off All Columns and specify the attribute columns that you want to return. Separate multiple attribute columns with commas (,).
  Note
  By default, the system returns all primary key columns of the data table.
2. Select the And, Or, or Not logical operator based on your business requirements.
  If you select the And logical operator, data that meets the query conditions is returned. If you select the Or operator and specify a single query condition, data that meets the query condition is returned. If you select the Or logical operator and specify multiple query conditions, data that meets one of the query conditions is returned. If you select the Not logical operator, data that does not meet the query conditions is returned.
3. Select a field of the TEXT type and click Add.
4. Set the Query Type parameter to MatchQuery(MatchQuery) and enter the value that you want to query.
5. By default, the sorting feature is disabled. If you want to sort the query results based on specific fields, turn on Sort and specify the fields based on which you want to sort the query results and the sorting order.
6. By default, the aggregation feature is disabled. If you want to collect statistics on a specific field, turn on Collect Statistics, specify the field based on which you want to collect statistics, and then configure the information that is required to collect statistics.
Click OK.
Data that meets the query conditions is displayed in the specified order on the Indexes tab.

Use the Tablestore CLI

You can use the Tablestore CLI to run the search command to query data by using search indexes. For more information, see Search index.

Run the search command to use the search_index search index to query data and return all indexed columns of each row that meets the query conditions.
```
search -n search_index --return_all_indexed
```

Enter the query conditions as prompted:

{
    "Offset": -1,
    "Limit": 10,
    "Collapse": null,
    "Sort": null,
    "GetTotalCount": true,
    "Token": null,
    "Query": {
        "Name": "MatchQuery",
        "Query": {
            "FieldName": "col_text",
            "Text": "this is",
            "MinimumShouldMatch": 1
        }
    }
}

Use Tablestore SDKs

You can perform a match query by using the following Tablestore SDKs: Tablestore SDK for Java, Tablestore SDK for Go, Tablestore SDK for Python, Tablestore SDK for Node.js, Tablestore SDK for .NET, and Tablestore SDK for PHP. In this example, Tablestore SDK for Java is used.

The following sample code provides an example on how to query the rows in which the value of the Col_Keyword column matches "hangzhou" in a table:

/**
 * Query the rows in which the value of the Col_Keyword column matches "hangzhou" in a table. Tablestore returns the total number of rows that meet the query conditions and the specific data of some of these rows. 
 * @param client
 */
private static void matchQuery(SyncClient client) {
    SearchQuery searchQuery = new SearchQuery();
    MatchQuery matchQuery = new MatchQuery(); // Set the query type to MatchQuery. 
    matchQuery.setFieldName("Col_Keyword"); // Specify the name of the column that you want to query. 
    matchQuery.setText("hangzhou"); // Specify the keyword that you want to match. 
    searchQuery.setQuery(matchQuery);
    searchQuery.setOffset(0); // Set offset to 0. 
    searchQuery.setLimit(20); // Set limit to 20 to return up to 20 rows. 
    //searchQuery.setGetTotalCount(true); // Specify that the total number of matched rows is returned. 

    SearchRequest searchRequest = new SearchRequest("<TABLE_NAME>", "<SEARCH_INDEX_NAME>", searchQuery);
    // You can configure the columnsToGet parameter to specify the columns to return or specify that all columns are returned. If you do not configure this parameter, only the primary key columns are returned. 
    //SearchRequest.ColumnsToGet columnsToGet = new SearchRequest.ColumnsToGet();
    //columnsToGet.setReturnAll(true); // Specify that all columns are returned. 
    //columnsToGet.setColumns(Arrays.asList("ColName1","ColName2")); // Specify the columns that you want to return. 
    //searchRequest.setColumnsToGet(columnsToGet);

    SearchResponse resp = client.search(searchRequest);
    //System.out.println("TotalCount: " + resp.getTotalCount()); // Specify that the total number of matched rows instead of the number of returned rows is displayed. 
    System.out.println("Row: " + resp.getRows());
}

Billing rules

When you use a search index to query data, you are charged for the read throughput that is consumed. For more information, see Billable items of search indexes.

FAQ

References

When you use a search index to query data, you can use the following query methods: term query, terms query, match all query, match query, match phrase query, prefix query, range query, wildcard query, fuzzy query, Boolean query, geo query, nested query, KNN vector query, and exists query. You can select query methods based on your business requirements to query data from multiple dimensions.
You can sort or paginate rows that meet the query conditions by using the sorting and paging features. For more information, see Perform sorting and paging.
You can use the collapse (distinct) feature to collapse the result set based on a specific column. This way, data of the specified type appears only once in the query results. For more information, see Collapse (distinct).
If you want to analyze data in a data table, you can use the aggregation feature of the Search operation or execute SQL statements. For example, you can obtain the minimum and maximum values, sum, and total number of rows. For more information, see Aggregation and SQL query.
If you want to obtain all rows that meet the query conditions without the need to sort the rows, you can call the ParallelScan and ComputeSplits operations to use the parallel scan feature. For more information, see Parallel scan.