Schema-free DashVector - Vector Retrieval Service - Alibaba Cloud Documentation Center

DashVector is designed to be schema-free. When calling the API for inserting a document, updating a document, or upserting a document, you can pass any types of key-value pairs for fields. An example is as follows:

Python

collection.insert(
    Doc(
        id='1',
        vector=np.random.rand(4),
        fields={
            'name': 'zhangsan', 
            'weight':70.0, 
            'age':30, 
            'anykey1': 'anyvalue', 
            'anykey2': 1,
            'anykey3': True, 
            'anykey4': 3.1415926
            ... ...
        }
    )
)

Note

More fields consume more resources, such as memory and disk resources.

Supported data types

For fields, DashVector supports the following basic data types in Python:

str
float
int
bool

Important

In Python, the int type can represent an integer of an unlimited size. However, DashVector only supports 32-bit signed integers from -2,147,483,648 to 2,147,483,647. Therefore, you need to ensure that data does not overflow.

Field-based search

You can use the key-value pairs passed by fields in the API for inserting a document, updating a document, or upserting a document to search for a document. An example is as follows:

Python

ret = collection.query(
    vector=[0.1, 0.2, 0.3, 0.4],
    filter='(age > 18 and anykey2 = 1) or (name like "zhang%" and anykey3 = false)'
)

Note

During a search, more fields consume more resources, such as CPU resources, and complex filter expressions result in higher time overheads to obtain results.

Benefits of predefining a field schema

You can predefine a field schema when creating a collection. An example is as follows:

Python

ret = client.create(
    name='complex', 
    dimension=4, 
    fields_schema={'name': str, 'weight': float, 'age': int}
)

Predefining a field schema can bring three main benefits:

Faster search: Conditional filtering based on predefined fields reduces time overheads and CPU overheads.
Lower memory and disk usage: If fields are predefined, only values need to be stored; otherwise, both keys and values need to be stored, which consumes more memory and disk resources.
Filter pre-verification: If fields are predefined, the syntax of the filter will be verified during conditional filtering, and DashVector immediately returns a failure when detecting input values of unexpected data types. If fields are not predefined, the expected data types are unknown, and pre-verification cannot be performed.

You are advised to predefine determined fields that will be included in most documents when creating a collection and set uncertain fields that are specific to a few documents when inserting these documents.