This topic describes how to vectorize text data by using the vectorization model of Baichuan AI and import the vector data into DashVector for vector search.
Prerequisites
DashVector:
A cluster is created. For more information, see Create a cluster.
An API key is obtained. For more information, see Manage API keys.
The SDK of the latest version is installed. For more information, see Install DashVector SDK.
Baichuan AI:
An API key is obtained. For more information, see API introduction.
Vectorization model of Baichuan AI
Overview
Model name | Vector dimensions | Distance metric | Vector data type | Remarks |
Baichuan-Text-Embedding | 1,024 | Cosine | Float32 |
|
For more information about the vectorization model of Baichuan AI, see Baichuan AI vectorization model.
Example
You must perform the following operations for the code to run properly:
Replace {your-dashvector-api-key} in the sample code with your DashVector API key.
Replace {your-dashvector-cluster-endpoint} in the sample code with the endpoint of your DashVector cluster.
Replace {your-baichuan-api-key} in the following sample code with your Baichuan AI API key.
from dashvector import Client
import requests
from typing import List
# Use the vectorization model of Baichuan AI to embed text data into vector data.
def generate_embeddings(texts: List[str]):
headers = {
'Content-Type': 'application/json',
'Authorization': 'Bearer {your-baichuan-api-key}'
}
data = {'input': texts, 'model': 'Baichuan-Text-Embedding'}
response = requests.post('http://api.baichuan-ai.com/v1/embeddings', headers=headers, json=data)
return [record["embedding"] for record in response.json()["data"]]
# Create a DashVector client.
client = Client(
api_key='{your-dashvector-api-key}',
endpoint='{your-dashvector-cluster-endpoint}'
)
# Create a DashVector collection.
rsp = client.create('baichuan-text-embedding', 1024)
assert rsp
collection = client.get('baichuan-text-embedding')
assert collection
# Convert text into a vector and store it in DashVector.
collection.insert(
('ID1', generate_embeddings(['Alibaba Cloud DashVector is one of the best vector databases in performance and cost-effectiveness.'])[0])
)
# Perform a vector search.
docs = collection.query(
generate_embeddings(['The best vector database'])[0]
)
print(docs)