DashVector is designed to be schema-free. When calling the API for inserting a document, updating a document, or upserting a document, you can pass any types of key-value pairs for fields. An example is as follows:
collection.insert(
Doc(
id='1',
vector=np.random.rand(4),
fields={
'name': 'zhangsan',
'weight':70.0,
'age':30,
'anykey1': 'anyvalue',
'anykey2': 1,
'anykey3': True,
'anykey4': 3.1415926
... ...
}
)
)
More fields consume more resources, such as memory and disk resources.
Supported data types
For fields, DashVector supports the following basic data types in Python:
str
float
int
bool
In Python, the int type can represent an integer of an unlimited size. However, DashVector only supports 32-bit signed integers from -2,147,483,648 to 2,147,483,647. Therefore, you need to ensure that data does not overflow.
Field-based search
You can use the key-value pairs passed by fields in the API for inserting a document, updating a document, or upserting a document to search for a document. An example is as follows:
ret = collection.query(
vector=[0.1, 0.2, 0.3, 0.4],
filter='(age > 18 and anykey2 = 1) or (name like "zhang%" and anykey3 = false)'
)
During a search, more fields consume more resources, such as CPU resources, and complex filter expressions result in higher time overheads to obtain results.
Benefits of predefining a field schema
You can predefine a field schema when creating a collection. An example is as follows:
ret = client.create(
name='complex',
dimension=4,
fields_schema={'name': str, 'weight': float, 'age': int}
)
Predefining a field schema can bring three main benefits:
Faster search: Conditional filtering based on predefined fields reduces time overheads and CPU overheads.
Lower memory and disk usage: If fields are predefined, only values need to be stored; otherwise, both keys and values need to be stored, which consumes more memory and disk resources.
Filter pre-verification: If fields are predefined, the syntax of the filter will be verified during conditional filtering, and DashVector immediately returns a failure when detecting input values of unexpected data types. If fields are not predefined, the expected data types are unknown, and pre-verification cannot be performed.
You are advised to predefine determined fields that will be included in most documents when creating a collection and set uncertain fields that are specific to a few documents when inserting these documents.