Background
In most business scenarios, a vector similarity search alone is insufficient to meet business needs, and specific conditions or tags are required for filtration first.
DashVector combines conditional filtering and vector similarity search so that a vector search is performed based on the specified filter, improving the search efficiency.
Conditional filtering example
You need to replace YOUR_API_KEY with your API key and YOUR_CLUSTER_ENDPOINT with your cluster endpoint for the sample code to run properly.
You need to create a collection named
quickstart
in advance. For more information, see the "Example" section of the Create a collection topic.
Insert data with fields
import dashvector
import numpy as np
client = dashvector.Client(
api_key='YOUR_API_KEY',
endpoint='YOUR_CLUSTER_ENDPOINT'
)
collection = client.get(name='quickstart')
ret = collection.insert([
('1', np.random.rand(4), {'name':'zhangsan', 'age': 10, 'male': True, 'weight': 35.0}),
('2', np.random.rand(4), {'name':'lisi', 'age': 20, 'male': False, 'weight': 45.0}),
('3', np.random.rand(4), {'name':'wangwu', 'age': 30, 'male': True, 'weight': 75.0}),
('4', np.random.rand(4), {'name':'zhaoliu', 'age': 5, 'male': False, 'weight': 18.0}),
('5', np.random.rand(4), {'name':'sunqi', 'age': 40, 'male': True, 'weight': 70.0})
])
assert ret
In the "Example section" of the Create a collection topic, a collection named quickstart
is created, in which three fields ({'name': str, 'weight': float, 'age': int}
) are defined. As DashVector is schema-free, you can specify any field that is not predefined during collection creation when you are inserting a document, such as the male
field in the above sample code.
Perform a search by filter
import dashvector
client = dashvector.Client(
api_key='YOUR_API_KEY',
endpoint='YOUR_CLUSTER_ENDPOINT'
)
collection = client.get(name='quickstart')
# Search for men (male = true) whose age is greater than 18 and whose weight is greater than 65.0.
docs = collection.query(
[0.1, 0.1, 0.1, 0.1],
topk=10,
filter = 'age > 18 and weight > 65.0 and male = true'
)
print(docs)
Supported data types
For fields, DashVector supports the following basic data types in Python:
str
float
int
bool
In Python, the int type can represent an integer of an unlimited size. However, DashVector only supports 32-bit signed integers from -2,147,483,648 to 2,147,483,647. Therefore, you need to ensure that data does not overflow.
Comparison operator
A comparison operator is used to form a comparison expression in the pattern of Field Comparison operator Constant
. The following table describes the comparison operators and examples of comparison expressions.
Operator | Operator description | Supported data types | Expression example | Example description |
< | Less than |
|
|
|
<= | Less than or equal to |
|
|
|
= | Equal to |
|
|
|
!= | Not equal to |
|
|
|
>= | Greater than or equal to |
|
|
|
> | Greater than |
|
|
|
String operator
A string operator is used to form a matching expression in the pattern of Field String operator Constant
. The following table describes the string operator and expression example.
Operator | Operator description | Supported data types | Expression example | Example description |
like | Prefix match |
|
|
|
Logical operator
A logical operator is used to combine expressions. The following table describes the logical operators and examples of expression combinations.
Operator | Operator description | Example | Example description |
and | And | expr1 and expr2 | The compound expression returns |
or | Or | expr1 or expr2 | The compound expression returns |
You can group expressions with brackets (()
) to enforce precedence for the grouped elements of a compound expression. For example, in the compound expression of
expr1 and (expr2 or expr3)
, (expr2 or expr3)
will be evaluated preferentially.