This topic describes how to perform similarity searches on documents in a collection by using the SDK for Python.
Prerequisites
A cluster is created. For more information, see Create a cluster.
An API key is obtained. For more information, see Manage API keys.
The SDK of the latest version is installed. For more information, see Install DashVector SDK.
API definition
Collection.query(
vector: Optional[Union[List[Union[int, float]], np.ndarray]] = None,
id: Optional[str] = None,
topk: int = 10,
filter: Optional[str] = None,
include_vector: bool = False,
partition: Optional[str] = None,
output_fields: Optional[List[str]] = None,
sparse_vector: Optional[Dict[int, float]] = None,
async_req: False
) -> DashVectorResponse
Example
You need to replace YOUR_API_KEY with your API key and YOUR_CLUSTER_ENDPOINT with the endpoint of your cluster in the sample code for the code to run properly.
You need to create a collection named
quickstart
in advance. For more information, see the "Example" section of the Create a collection topic. You also need to insert some documents in advance. For more information, see Insert documents.
import dashvector
import numpy as np
client = dashvector.Client(
api_key='YOUR_API_KEY',
endpoint='YOUR_CLUSTER_ENDPOINT'
)
collection = client.get(name='quickstart')
Perform a similarity search by using a vector
ret = collection.query(
vector=[0.1, 0.2, 0.3, 0.4]
)
# Check whether the query method is successfully called.
if ret:
print('query success')
print(len(ret))
for doc in ret:
print(doc)
print(doc.id)
print(doc.vector)
print(doc.fields)
ret = collection.query(
vector=[0.1, 0.2, 0.3, 0.4],
topk=100,
output_fields=['name', 'age'], # Only the name and age fields need to be returned.
include_vector=True
)
Perform a similarity search by using the vector associated with the primary key
ret = collection.query(
id='1'
)
# Check whether the query method is successfully called.
if ret:
print('query success')
print(len(ret))
for doc in ret:
print(doc)
print(doc.id)
print(doc.vector)
print(doc.fields)
ret = collection.query(
id='1',
topk=100,
output_fields=['name', 'age'], # Only the name and age fields need to be returned.
include_vector=True
)
Perform a similarity search by using the vector or primary key and a conditional filter
# Perform a similarity search by using the vector or primary key and a conditional filter.
ret = collection.query(
vector=[0.1, 0.2, 0.3, 0.4], # Specify a vector for search. Alternatively, you can specify a primary key for search.
topk=100,
filter='age > 18', # Specify a conditional filter to perform a match query on documents whose value of the age field is greater than 18.
output_fields=['name', 'age'], # Only the name and age fields need to be returned.
include_vector=True
)
Perform a similarity search by using both dense and sparse vectors
You can use a sparse vector to represent the keyword weight to implement a keyword-aware semantic vector search.
# Perform a similarity search by using both dense and sparse vectors.
ret = collection.query(
vector=[0.1, 0.2, 0.3, 0.4], # Specify a vector for search.
sparse_vector={1: 0.3, 20: 0.7}
)
Perform a match query by using a conditional filter
# Perform a match query only by using a conditional filter without specifying a vector or primary key.
ret = collection.query(
topk=100,
filter='age > 18', # Specify a conditional filter to perform a match query on documents whose value of the age field is greater than 18.
output_fields=['name', 'age'], # Only the name and age fields need to be returned.
include_vector=True
)
Request parameters
To perform a similarity search, you need to specify either the vector
or id
field. If neither is specified, the system performs a match query only by using a conditional filter.
Parameter | Type | Default value | Description |
vector | Optional[Union[List[Union[int, float]], np.ndarray]] | None | Optional. The vector. |
id | Optional[str] | None | Optional. The primary key. The similarity search is performed based on the vector associated with the primary key. |
topk | int | 10 | Optional. The maximum number of documents to return that are most similar to the provided query vector. |
filter | Optional[str] | None | Optional. The conditional filter, which must comply with the syntax of the SQL WHERE clause. For more information, see Conditional filtering. |
include_vector | bool | False | Optional. Specifies whether to return vector data. |
partition | Optional[str] | None | Optional. The name of the partition. |
output_fields | Optional[List[str]] | None | Optional. The fields to be returned. By default, all fields are returned. |
sparse_vector | Optional[Dict[int, float]] | None | Optional. The sparse vector. |
async_req | bool | False | Optional. Specifies whether to enable the asynchronous mode. |
Response parameters
A DashVectorResponse object is returned, which contains the operation result, as described in the following table.
Parameter | Type | Description | Example |
code | int | The returned status code. For more information, see Status codes. | 0 |
message | str | The returned message. | success |
request_id | str | The unique ID of the request. | 19215409-ea66-4db9-8764-26ce2eb5bb99 |
output | List[Doc] | The similarity search results returned. |