All Products
Search
Document Center

DashVector:Vectorize text data by using the vectorization model of Baichuan AI

Last Updated:Apr 11, 2024

This topic describes how to vectorize text data by using the vectorization model of Baichuan AI and import the vector data into DashVector for vector search.

Prerequisites

Vectorization model of Baichuan AI

Overview

Model name

Vector dimensions

Distance metric

Vector data type

Remarks

Baichuan-Text-Embedding

1,024

Cosine

Float32

  • Maximum number of characters in a token: 512. If the number of characters in a token exceeds 512, the excess characters are automatically truncated.

  • Maximum number of tokens that can be specified at a time: 16

Note

For more information about the vectorization model of Baichuan AI, see Baichuan AI vectorization model.

Example

Note

You must perform the following operations for the code to run properly:

  1. Replace {your-dashvector-api-key} in the sample code with your DashVector API key.

  2. Replace {your-dashvector-cluster-endpoint} in the sample code with the endpoint of your DashVector cluster.

  3. Replace {your-baichuan-api-key} in the following sample code with your Baichuan AI API key.

from dashvector import Client
import requests
from typing import List


# Use the vectorization model of Baichuan AI to embed text data into vector data.
def generate_embeddings(texts: List[str]):
    headers = {
      'Content-Type': 'application/json',
      'Authorization': 'Bearer {your-baichuan-api-key}'
    }
    data = {'input': texts, 'model': 'Baichuan-Text-Embedding'}
    response = requests.post('http://api.baichuan-ai.com/v1/embeddings', headers=headers, json=data)
    return [record["embedding"] for record in response.json()["data"]]


# Create a DashVector client.
client = Client(
    api_key='{your-dashvector-api-key}',
    endpoint='{your-dashvector-cluster-endpoint}'
)

# Create a DashVector collection.
rsp = client.create('baichuan-text-embedding', 1024)
assert rsp
collection = client.get('baichuan-text-embedding')
assert collection

# Convert text into a vector and store it in DashVector.
collection.insert(
    ('ID1', generate_embeddings(['Alibaba Cloud DashVector is one of the best vector databases in performance and cost-effectiveness.'])[0])
)

# Perform a vector search.
docs = collection.query(
    generate_embeddings(['The best vector database'])[0]
)
print(docs)