This topic describes the basic concepts of a vector, including the number of dimensions, distance metric, and data type, helping you better use DashVector.
Basic concepts
In the AI field, a vector is an abstract representation of the features of an object. Take DashScope text-embedding-v1 model as an example. When receiving a piece of input text, the text-embedding-v1 model converts it into a vector. The conversion process is called embedding
.
Request example
Input text: The quality of the clothes is simply outstanding, and they look fabulous. The wait was completely worth it, and I am absolutely delighted with my purchase. I will definitely be a repeat customer here.
import dashscope
from dashscope import TextEmbedding
dashscope.api_key = {YOUR API KEY}
def embed_with_str():
resp = TextEmbedding.call(
model=TextEmbedding.Models.text_embedding_v1,
input='The quality of the clothes is simply outstanding, and they look fabulous. The wait was completely worth it, and I am absolutely delighted with my purchase. I will definitely be a repeat customer here.')
print(resp)
if __name__ == '__main__':
embed_with_str()
Response example
{
"status_code": 200,
"request_id": "617b3670-6f9e-9f47-ad57-997ed8aeba6a",
"code": "",
"message": "",
"output": {
"embeddings": [
{
"embedding": [
0.09393704682588577,
2.4155092239379883,
-1.8923076391220093,
.,
.,
.
],
"text_index": 0
}
]
},
"usage": {
"total_tokens": 23
}
}
The value of the embedding
field in the response is a vector.
[
0.09393704682588577,
2.4155092239379883,
-1.8923076391220093,
.,
.,
.
]
Dimensions and data type
As the preceding response example indicates, a vector is a list of numbers, that is, an array that represents features of the text. The dimension indicates the number of elements in this array. If there are 100 elements, the vector has 100 dimensions. For example, a vector returned by DashScope text-embedding-v1 model contains 1,536 elements, indicating that each vector has 1,536 dimensions. In addition, the number of dimensions remains unchanged in all output of this model.text-embedding-v1 The following table describes basic information about DashScope text-embedding-v1 model. For more information, see the Quick start topic of this model.
Model name | Model version | Vector dimensions | Maximum number of lines per request | Maximum number of characters per line | Supported languages |
DashScope Text Embedding | text-embedding-v1 | 1,536 | 25 | 2,048 | Chinese, English, Spanish, French, Portuguese, and Indonesian |
The data type of a vector refers to the data type of its elements. For example, if all elements in the vector returned by DashScope text-embedding-v1 model are of the float type, the data type of the vector is float. For a more specific example, if the vector is [1,2,3,4], which consists of integers only, the vector is a 4-dimensional vector of the int type.
The dimension and data type of a vector vary based on the embedding model.
Distance metric
As a vector is an array, the similarity between vectors is usually measured by their distance. DashVector supports three typical distance metrics.
Cosine distance
Cosine similarity is measured by the cosine of the angle between two vectors as follows.
Here, A and B are two different vectors, n indicates the vector dimension, · indicates the dot product of vectors, and ||A|| and ||B|| indicate magnitudes of the two vectors respectively.
DashVector uses the cosine distance to indicate the similarity, specifically, cosine distance = 1 - cosine similarity. The valid range of the cosine distance is [0, 2], where a smaller value indicates a greater similarity. The following formula shows how the cosine similarity is calculated.
Euclidean distance
The Euclidean distance is the straight-line distance between two vectors. A shorter Euclidean distance indicates a greater similarity. The Euclidean distance is calculated as follows.
Here, A and B are two different vectors, and n indicates the vector dimension.
Dot product
Dot product similarity refers to the dot product, also called scalar product, of two vectors. A larger dot product indicates a greater similarity between the target vectors. The dot product is calculated as follows:
Here, A and B are two different vectors, and n indicates the vector dimension.
Common models and vector parameters
Model type | Vector dimensions | Data type | Recommended distance metric |
DashScope text-embedding-v1 | 1,536 | Float(32) | Cosine |
DashScope multimodal-embedding-one-peace-v1 | 1,536 | Float(32) | Cosine |
OpenAI Embedding | 1,536 | Float(32) | Cosine |
When creating a collection, you can select parameters based on the model you use.