Use Tablestore SDK for Python to perform a KNN vector query for an approximate nearest neighbor search - Tablestore

You can use the k-nearest neighbor (KNN) vector query feature to perform an approximate nearest neighbor search based on vectors. This way, you can find data items that have the highest similarity as the vector that you want to query in a large-scale dataset.

Prerequisites

A Tablestore client is initialized.
A search index is created for the data table and a vector field is specified.

Usage notes

Tablestore SDK for Python V5.4.4 or later supports the KNN vector query feature. We recommend that you use the latest version of Tablestore SDK for Python.
Note
For information about the version history of Tablestore SDK for Python, see Version history of Tablestore SDK for Python.
Limits are imposed on the number of Vector fields and the number of dimensions for a Vector field. For more information, see Search index limits.
The search index server has multiple partitions. Each partition of the search index server returns the top K neighbors nearest to the vector that you want to query. The top K nearest neighbors returned by the partitions are aggregated on the client node. If you want to use tokens to query all data by page, the total number of rows in the response is related to the number of partitions of the search index server.

Parameters

Parameter	Required	Description
field_name	Yes	The name of the vector field.
top_k	Yes	The top k query results that have the highest similarity as the vector that you want to query. For information about the maximum value of the top_k parameter, see Search index limits. Important A greater value of k indicates higher recall rate, query latency, and costs.
float32_query_vector	Yes	The vector for which you want to query the similarity.
filter	No	The filter. You can use a combination of query conditions that are not KNN vector query conditions.

Examples

The following sample code provides an example on how to query the top 10 vectors in a table that have the highest similarity as the specified vector. In this example, the top 10 vectors must meet the following query conditions: the value of the col_keyword column is 0 and the value of the col_long column ranges between 0 and 50.

def knn_vector_query(client):
    filter_query = BoolQuery(
        must_queries=[
            TermQuery(field_name='col_keyword', column_value="0"),
            RangeQuery(field_name='col_long', range_from=0, range_to=50),
        ]
    )
    query = KnnVectorQuery(field_name='col_vector', top_k=10, float32_query_vector=[1.0, 1.1, 1.2, -1.3], filter=filter_query)
    # Sort the query results based on scores. 
    sort = Sort(sorters=[ScoreSort(sort_order=SortOrder.DESC)])
    search_query = SearchQuery(query, limit=10, get_total_count=False, sort=sort)
    search_response = client.search(
        table_name='<TABLE_NAME>',
        index_name='<SEARCH_INDEX_NAME>',
        search_query=search_query,
        columns_to_get=ColumnsToGet(column_names=["col_keyword", "col_long"], return_type=ColumnReturnType.SPECIFIED)
    )
    print("requestId:", search_response.request_id)
    # If the Tablestore SDK for Python that you use cannot obtain search_hits, use Tablestore SDK for Python V6.1.0 or later.
    for hit in search_response.search_hits:
    # Obtain the row data.
        row = hit.row
        print(row)
    # Obtain the score.
        score = hit.score
        print(score)

FAQ

How do I optimize the performance of Tablestore KNN vector query?

References

When you use a search index to query data, you can use the following query methods: term query, terms query, match all query, match query, match phrase query, prefix query, range query, wildcard query, geo query, Boolean query, KNN vector query, nested query, and exists query. You can use the query methods provided by the search index to query data from multiple dimensions based on your business requirements.
You can sort or paginate rows that meet the query conditions by using the sorting and paging features. For more information, see Sorting and paging.
You can use the collapse (distinct) feature to collapse the result set based on a specific column. This way, data of the specified type appears only once in the query results. For more information, see Collapse (distinct).
If you want to analyze data in a data table, you can use the aggregation feature of the Search operation or execute SQL statements. For example, you can obtain the minimum and maximum values, sum, and total number of rows. For more information, see Aggregation and SQL query.
If you want to obtain all rows that meet the query conditions without the need to sort the rows, you can call the ParallelScan and ComputeSplits operations to use the parallel scan feature. For more information, see Parallel scan.