Perform match queries by using Tablestore SDK for Python - Tablestore

A match query can be used to query data in a table based on approximate matches. Tablestore tokenizes the values in TEXT columns and the keywords that you use to perform match queries based on the analyzer type that you specify. This way, Tablestore can perform match queries based on the tokens. We recommend that you use match phase query for columns for which fuzzy tokenization is used to ensure high performance in fuzzy queries.

Prerequisites

A TableStoreClient instance is initialized. For more information, see Initialize an OTSClient instance.
A data table is created and data is written to the data table. For more information, see Create a data table and Write data.
A search index is created for the data table. For more information, see Create a search index.

Parameters

Parameter	Description
field_name	The name of the column that you want to query. Match query applies to TEXT columns.
text	The keyword that is used to match the value of the column when you perform a match query. If the column that you want to query is a TEXT column, the keyword is tokenized into multiple tokens based on the analyzer type that you specify when you create the search index. By default, single-word tokenization is performed if you do not specify the analyzer type when you create the search index. For example, if the column that you want to match is a TEXT column, you set the analyzer type to single-word tokenization, and you use "this is" as a search keyword, you can obtain query results such as "..., this is tablestore", "is this tablestore", "tablestore is cool", "this", and "is".
query	The type of the query. Set this parameter to MatchQuery.
table_name	The name of the data table.
index_name	The name of the search index.
limit	The maximum number of rows that you want the current query to return. To query only the number of rows that meet the query conditions without querying specific data of the rows, set the limit parameter to 0.
operator	The logical operator. By default, OR is used as the logical operator. This operator specifies that a row meets the query conditions when the column value contains at least the minimum number of tokens. If you set the operator parameter to AND, the row meets the query conditions only when the value of the column contains all tokens.
minimum_should_match	The minimum number of matched tokens contained in a column value. A row is returned only when the value of the queried column in the row contains at least the minimum number of matched tokenized keywords. Note The minimum_should_match parameter must be used with the OR logical operator.
get_total_count	Specifies whether to return the total number of rows that meet the query conditions. The default value of this parameter is false, which specifies that the total number of rows that meet the query conditions is not returned. If you set this parameter to true, the query performance is compromised.
columns_to_get	Specifies whether to return all columns of each row that meets the query conditions. If you set the return_type field to ColumnReturnType.SPECIFIED, you can use the column_names field to specify the columns that you want to return. If you set the return_type field to ColumnReturnType.ALL, all columns are returned. If you set the return_type field to ColumnReturnType.NONE, only the primary key columns are returned.

Examples

The following examples show how to query the rows in which the value of the Col_Keyword column approximately matches 'this is'.

Perform a match query by using Tablestore SDK for Python V5.2.1 or later

If you use Tablestore SDK for Python V5.2.1 or later to perform a match query, a SearchResponse object is returned by default. The following code shows a sample request:

query = MatchQuery('Col_Keyword', 'this is')
search_response = client.search(
    '<TABLE_NAME>', '<SEARCH_INDEX_NAME>', 
    SearchQuery(query, limit=100, get_total_count=True), 
    ColumnsToGet(return_type=ColumnReturnType.ALL)
)
print('request_id : %s' % search_response.request_id)
print('is_all_succeed : %s' % search_response.is_all_succeed)
print('total_count : %s' % search_response.total_count)
print('rows : %s' % search_response.rows)

# # If deep paging is required, we recommend that you use the next_token parameter because this method has no limits on the paging depth. 
# all_rows = []
# next_token = None
# # first round
# search_response = client.search(
#     '<TABLE_NAME>', '<SEARCH_INDEX_NAME>',
#     SearchQuery(query, next_token=next_token, limit=100, get_total_count=True),
#     columns_to_get=ColumnsToGet(return_type=ColumnReturnType.ALL))
# all_rows.extend(search_response.rows)
# 
# # loop
# while search_response.next_token:
#     search_response = client.search(
#         '<TABLE_NAME>', '<SEARCH_INDEX_NAME>',
#         SearchQuery(query, next_token=search_response.next_token, limit=100, get_total_count=True),
#         columns_to_get=ColumnsToGet(return_type=ColumnReturnType.ALL))
#     all_rows.extend(search_response.rows)
# print('Total rows:%s' % len(all_rows))

You can use the following sample request to return results of the Tuple type:

query = MatchQuery('Col_Keyword', 'this is')
rows, next_token, total_count, is_all_succeed, agg_results, group_by_results = client.search(
    '<TABLE_NAME>', '<SEARCH_INDEX_NAME>', 
    SearchQuery(query, limit=100, get_total_count=True), 
    ColumnsToGet(return_type=ColumnReturnType.ALL)
).v1_response()

Perform a match query by using Tablestore SDK for Python of a version earlier than 5.2.1
If you use a version of Tablestore SDK for Python that is earlier than 5.2.1 to perform a match query, results of the TUPLE type are returned by default. The following sample code provides a sample request:
```
query = MatchQuery('Col_Keyword', 'this is')
rows, next_token, total_count, is_all_succeed = client.search(
    '<TABLE_NAME>', '<SEARCH_INDEX_NAME>', 
    SearchQuery(query, limit=100, get_total_count=True), 
    ColumnsToGet(return_type=ColumnReturnType.ALL)
)
```

FAQ

References

When you use a search index to query data, you can use the following query methods: term query, terms query, match all query, match query, match phrase query, prefix query, range query, wildcard query, geo query, Boolean query, KNN vector query, nested query, and exists query. You can use the query methods provided by the search index to query data from multiple dimensions based on your business requirements.
You can sort or paginate rows that meet the query conditions by using the sorting and paging features. For more information, see Sorting and paging.
You can use the collapse (distinct) feature to collapse the result set based on a specific column. This way, data of the specified type appears only once in the query results. For more information, see Collapse (distinct).
If you want to analyze data in a data table, you can use the aggregation feature of the Search operation or execute SQL statements. For example, you can obtain the minimum and maximum values, sum, and total number of rows. For more information, see Aggregation and SQL query.
If you want to obtain all rows that meet the query conditions without the need to sort the rows, you can call the ParallelScan and ComputeSplits operations to use the parallel scan feature. For more information, see Parallel scan.