All Products
Search
Document Center

Vector Retrieval Service:Search for documents

Last Updated:Sep 05, 2024

This topic describes how to perform similarity searches on documents in a collection by using the SDK for Python.

Prerequisites

API definition

Collection.query(
    vector: Optional[Union[List[Union[int, float]], np.ndarray]] = None,
    id: Optional[str] = None,
    topk: int = 10,
    filter: Optional[str] = None,
    include_vector: bool = False,
    partition: Optional[str] = None,
    output_fields: Optional[List[str]] = None,
    sparse_vector: Optional[Dict[int, float]] = None,
    async_req: False
) -> DashVectorResponse

Example

Note
  1. You need to replace YOUR_API_KEY with your API key and YOUR_CLUSTER_ENDPOINT with the endpoint of your cluster in the sample code for the code to run properly.

  2. You need to create a collection named quickstart in advance. For more information, see the "Example" section of the Create a collection topic. You also need to insert some documents in advance. For more information, see Insert documents.

import dashvector
import numpy as np

client = dashvector.Client(
    api_key='YOUR_API_KEY',
    endpoint='YOUR_CLUSTER_ENDPOINT'
)
collection = client.get(name='quickstart')

Perform a similarity search by using a vector

ret = collection.query(
    vector=[0.1, 0.2, 0.3, 0.4]
)
# Check whether the query method is successfully called.
if ret:
    print('query success')
    print(len(ret))
    for doc in ret:
        print(doc)
        print(doc.id)
        print(doc.vector)
        print(doc.fields)

ret = collection.query(
    vector=[0.1, 0.2, 0.3, 0.4],
    topk=100,
    output_fields=['name', 'age'], # Only the name and age fields need to be returned.
    include_vector=True
)

Perform a similarity search by using the vector associated with the primary key

ret = collection.query(
    id='1'
)
# Check whether the query method is successfully called.
if ret:
    print('query success')
    print(len(ret))
    for doc in ret:
        print(doc)
        print(doc.id)
        print(doc.vector)
        print(doc.fields)

ret = collection.query(
    id='1',
    topk=100,
    output_fields=['name', 'age'], # Only the name and age fields need to be returned.
    include_vector=True
)

Perform a similarity search by using the vector or primary key and a conditional filter

# Perform a similarity search by using the vector or primary key and a conditional filter.
ret = collection.query(
    vector=[0.1, 0.2, 0.3, 0.4],   # Specify a vector for search. Alternatively, you can specify a primary key for search.
    topk=100,
    filter='age > 18',             # Specify a conditional filter to perform a match query on documents whose value of the age field is greater than 18.
    output_fields=['name', 'age'], # Only the name and age fields need to be returned.
    include_vector=True
)

Perform a similarity search by using both dense and sparse vectors

Note

You can use a sparse vector to represent the keyword weight to implement a keyword-aware semantic vector search.

# Perform a similarity search by using both dense and sparse vectors.
ret = collection.query(
    vector=[0.1, 0.2, 0.3, 0.4],   # Specify a vector for search.
    sparse_vector={1: 0.3, 20: 0.7}
)

Perform a match query by using a conditional filter

# Perform a match query only by using a conditional filter without specifying a vector or primary key.
ret = collection.query(
    topk=100,
    filter='age > 18',             # Specify a conditional filter to perform a match query on documents whose value of the age field is greater than 18.
    output_fields=['name', 'age'], # Only the name and age fields need to be returned.
    include_vector=True
)

Request parameters

Note

To perform a similarity search, you need to specify either the vector or id field. If neither is specified, the system performs a match query only by using a conditional filter.

Parameter

Type

Default value

Description

vector

Optional[Union[List[Union[int, float]], np.ndarray]]

None

Optional. The vector.

id

Optional[str]

None

Optional. The primary key. The similarity search is performed based on the vector associated with the primary key.

topk

int

10

Optional. The maximum number of documents to return that are most similar to the provided query vector.

filter

Optional[str]

None

Optional. The conditional filter, which must comply with the syntax of the SQL WHERE clause. For more information, see Conditional filtering.

include_vector

bool

False

Optional. Specifies whether to return vector data.

partition

Optional[str]

None

Optional. The name of the partition.

output_fields

Optional[List[str]]

None

Optional. The fields to be returned. By default, all fields are returned.

sparse_vector

Optional[Dict[int, float]]

None

Optional. The sparse vector.

async_req

bool

False

Optional. Specifies whether to enable the asynchronous mode.

Response parameters

Note

A DashVectorResponse object is returned, which contains the operation result, as described in the following table.

Parameter

Type

Description

Example

code

int

The returned status code. For more information, see Status codes.

0

message

str

The returned message.

success

request_id

str

The unique ID of the request.

19215409-ea66-4db9-8764-26ce2eb5bb99

output

List[Doc]

The similarity search results returned.