All Products
Search
Document Center

Vector Retrieval Service:Upsert documents

Last Updated:Apr 11, 2024

This topic describes how to upsert documents in a collection by using the SDK for Python.

Note

If the ID of a target document already exists, the operation of updating a document is performed. Otherwise, the operation of inserting a document is performed.

Prerequisites

API definition

Collection.upsert(
    docs: Union[Doc, List[Doc], Tuple, List[Tuple]],
    partition: Optional[str] = None,
    async_req: False
) -> DashVectorResponse

Example

Note
  1. You need to replace YOUR_API_KEY with your API key and YOUR_CLUSTER_ENDPOINT with the endpoint of your cluster in the sample code for the code to run properly.

  2. You need to create a collection named quickstart in advance. For more information, see the "Example" section of the Create a collection topic.

import dashvector
from dashvector import Doc
import numpy as np

client = dashvector.Client(
    api_key='YOUR_API_KEY',
    endpoint='YOUR_CLUSTER_ENDPOINT'
)
collection = client.get(name='quickstart')

Upsert a document

# Use a Doc object to upsert a document.
ret = collection.upsert(
    Doc(
        id='1',
        vector=[0.1, 0.2, 0.3, 0.4]
    )
)
# Check whether the upsert operation is successful.
assert ret

# Simplified version: use a tuple to upsert a document.
ret = collection.upsert(
    ('2', [0.1, 0.1, 0.1, 0.1])               # (id, vector)
)

Upsert a document with fields

# Use a Doc object to upsert a document and specify the values of related fields.
ret = collection.upsert(
    Doc(
        id='3',
        vector=np.random.rand(4),
        fields={
            # Specify the values of fields that are predefined during collection creation.
            'name': 'zhangsan', 'weight':70.0, 'age':30, 
            # Specify schema-free fields and values.
            'anykey1': 'str-value', 'anykey2': 1,
            'anykey3': True, 'anykey4': 3.1415926
        }
    )
)

# Use a tuple to upsert a document and specify the values of related fields.
ret = collection.upsert(
    ('4', np.random.rand(4), {'foo': 'bar'})  # (id, vector, fields)
)

Upsert multiple documents at a time

# Use Doc objects to upsert 10 documents at a time.
ret = collection.upsert(
    [
        Doc(id=str(i+5), vector=np.random.rand(4)) for i in range(10)
    ]
)

# Simplified version: use tuples to upsert three documents at a time.
ret = collection.upsert(
    [
        ('15', [0.2,0.7,0.8,1.3], {'age': 20}),
        ('16', [0.3,0.6,0.9,1.2], {'age': 30}),
        ('17', [0.4,0.5,1.0,1.1], {'age': 40})
    ]                                         # List[(id, vector, fields)]
)

# Check whether the batch upsert operation is successful.
assert ret

Asynchronously upsert documents

# Asynchronously upsert 10 documents at a time.
ret_funture = collection.upsert(
    [
        Doc(id=str(i+18), vector=np.random.rand(4), fields={'name': 'foo' + str(i)}) for i in range(10)
    ],
    async_req=True
)
# Wait for the result of the asynchronous upsert operation.
ret = ret_funture.get()

Upsert a document containing a sparse vector

ret = collection.upsert(
    Doc(
        id='28',
        vector=[0.1, 0.2, 0.3, 0.4],
        sparse_vector={1:0.4, 10000:0.6, 222222:0.8}
    )
)

Request parameters

Parameter

Type

Default value

Description

docs

Union[Doc, List[Doc], Tuple, List[Tuple]]

-

The one or more documents to be upserted.

partition

Optional[str]

None

Optional. The name of the partition.

async_req

bool

False

Optional. Specifies whether to enable the asynchronous mode.

Note
  1. If the type of the docs parameter is Tuple, the elements in the tuple must be in the order of (id, vector) or (id, vector, fields). In this case, the tuple is equivalent to a Doc object.

  2. Each field in a Doc object can be set to a user-defined key-value pair. In a key-value pair, the key must be of the str type, and the value can be of the str, int, bool, or float type.

    1. If a key is predefined during collection creation, the value must be of the predefined type.

    2. If the key is not predefined during collection creation, the value can be of the str, int, bool, or float type.

  3. For more information about predefining fields, see Schema-free.

Response parameters

Note

A DashVectorResponse object is returned, which contains the operation result, as described in the following table.

Parameter

Type

Description

Example

code

int

The returned status code. For more information, see Status codes.

0

message

str

The returned message.

success

request_id

str

The unique ID of the request.

19215409-ea66-4db9-8764-26ce2eb5bb99