Perform similarity searches on documents in a collection by using the SDK for Python - Vector Retrieval Service

This topic describes how to perform similarity searches on documents in a collection by using the SDK for Python.

Prerequisites

A cluster is created. For more information, see Create a cluster.
An API key is obtained. For more information, see Manage API keys.
The SDK of the latest version is installed. For more information, see Install DashVector SDK.

API definition

Python

Collection.query(
    vector: Optional[Union[List[Union[int, float]], np.ndarray]] = None,
    id: Optional[str] = None,
    topk: int = 10,
    filter: Optional[str] = None,
    include_vector: bool = False,
    partition: Optional[str] = None,
    output_fields: Optional[List[str]] = None,
    sparse_vector: Optional[Dict[int, float]] = None,
    async_req: False
) -> DashVectorResponse

Example

Note

You need to replace YOUR_API_KEY with your API key and YOUR_CLUSTER_ENDPOINT with the endpoint of your cluster in the sample code for the code to run properly.
You need to create a collection named quickstart in advance. For more information, see the "Example" section of the Create a collection topic. You also need to insert some documents in advance. For more information, see Insert documents.

Python

import dashvector
import numpy as np

client = dashvector.Client(
    api_key='YOUR_API_KEY',
    endpoint='YOUR_CLUSTER_ENDPOINT'
)
collection = client.get(name='quickstart')

Perform a similarity search by using a vector

Python

ret = collection.query(
    vector=[0.1, 0.2, 0.3, 0.4]
)
# Check whether the query method is successfully called.
if ret:
    print('query success')
    print(len(ret))
    for doc in ret:
        print(doc)
        print(doc.id)
        print(doc.vector)
        print(doc.fields)

ret = collection.query(
    vector=[0.1, 0.2, 0.3, 0.4],
    topk=100,
    output_fields=['name', 'age'], # Only the name and age fields need to be returned.
    include_vector=True
)

Perform a similarity search by using the vector associated with the primary key

Python

ret = collection.query(
    id='1'
)
# Check whether the query method is successfully called.
if ret:
    print('query success')
    print(len(ret))
    for doc in ret:
        print(doc)
        print(doc.id)
        print(doc.vector)
        print(doc.fields)

ret = collection.query(
    id='1',
    topk=100,
    output_fields=['name', 'age'], # Only the name and age fields need to be returned.
    include_vector=True
)

Perform a similarity search by using the vector or primary key and a conditional filter

Python

# Perform a similarity search by using the vector or primary key and a conditional filter.
ret = collection.query(
    vector=[0.1, 0.2, 0.3, 0.4],   # Specify a vector for search. Alternatively, you can specify a primary key for search.
    topk=100,
    filter='age > 18',             # Specify a conditional filter to perform a match query on documents whose value of the age field is greater than 18.
    output_fields=['name', 'age'], # Only the name and age fields need to be returned.
    include_vector=True
)

Perform a similarity search by using both dense and sparse vectors

Note

You can use a sparse vector to represent the keyword weight to implement a keyword-aware semantic vector search.

Python

# Perform a similarity search by using both dense and sparse vectors.
ret = collection.query(
    vector=[0.1, 0.2, 0.3, 0.4],   # Specify a vector for search.
    sparse_vector={1: 0.3, 20: 0.7}
)

Perform a match query by using a conditional filter

Python

# Perform a match query only by using a conditional filter without specifying a vector or primary key.
ret = collection.query(
    topk=100,
    filter='age > 18',             # Specify a conditional filter to perform a match query on documents whose value of the age field is greater than 18.
    output_fields=['name', 'age'], # Only the name and age fields need to be returned.
    include_vector=True
)

Request parameters

Note

To perform a similarity search, you need to specify either the vector or id field. If neither is specified, the system performs a match query only by using a conditional filter.

Parameter	Type	Default value	Description
vector	Optional[Union[List[Union[int, float]], np.ndarray]]	None	Optional. The vector.
id	Optional[str]	None	Optional. The primary key. The similarity search is performed based on the vector associated with the primary key.
topk	int	10	Optional. The maximum number of documents to return that are most similar to the provided query vector.
filter	Optional[str]	None	Optional. The conditional filter, which must comply with the syntax of the SQL WHERE clause. For more information, see Conditional filtering.
include_vector	bool	False	Optional. Specifies whether to return vector data.
partition	Optional[str]	None	Optional. The name of the partition.
output_fields	Optional[List[str]]	None	Optional. The fields to be returned. By default, all fields are returned.
sparse_vector	Optional[Dict[int, float]]	None	Optional. The sparse vector.
async_req	bool	False	Optional. Specifies whether to enable the asynchronous mode.

Response parameters

Note

A DashVectorResponse object is returned, which contains the operation result, as described in the following table.

Parameter	Type	Description	Example
code	int	The returned status code. For more information, see Status codes.	0
message	str	The returned message.	success
request_id	str	The unique ID of the request.	19215409-ea66-4db9-8764-26ce2eb5bb99
output	List[Doc]	The similarity search results returned.