All Products
Search
Document Center

Vector Retrieval Service:Vector introduction

Last Updated:Aug 20, 2024

This topic describes the basic concepts of a vector, including the number of dimensions, distance metric, and data type, helping you better use DashVector.

Basic concepts

In the AI field, a vector is an abstract representation of the features of an object. Take DashScope text-embedding-v1 model as an example. When receiving a piece of input text, the text-embedding-v1 model converts it into a vector. The conversion process is called embedding.

Request example

Input text: The quality of the clothes is simply outstanding, and they look fabulous. The wait was completely worth it, and I am absolutely delighted with my purchase. I will definitely be a repeat customer here.

import dashscope
from dashscope import TextEmbedding

dashscope.api_key = {YOUR API KEY}

def embed_with_str():
    resp = TextEmbedding.call(
        model=TextEmbedding.Models.text_embedding_v1,
        input='The quality of the clothes is simply outstanding, and they look fabulous. The wait was completely worth it, and I am absolutely delighted with my purchase. I will definitely be a repeat customer here.')
    print(resp)


if __name__ == '__main__':
    embed_with_str()

Response example

{
  "status_code": 200,
  "request_id": "617b3670-6f9e-9f47-ad57-997ed8aeba6a",
  "code": "",
  "message": "",
  "output": {
    "embeddings": [
      {
        "embedding": [
          0.09393704682588577,
          2.4155092239379883,
          -1.8923076391220093,
          .,
          .,
          .

        ],
        "text_index": 0
      }
    ]
  },
  "usage": {
    "total_tokens": 23
  }
}

The value of the embedding field in the response is a vector.

 [
    0.09393704682588577,
    2.4155092239379883,
    -1.8923076391220093,
    .,
    .,
    .
]

Dimensions and data type

As the preceding response example indicates, a vector is a list of numbers, that is, an array that represents features of the text. The dimension indicates the number of elements in this array. If there are 100 elements, the vector has 100 dimensions. For example, a vector returned by DashScope text-embedding-v1 model contains 1,536 elements, indicating that each vector has 1,536 dimensions. In addition, the number of dimensions remains unchanged in all output of this model.text-embedding-v1 The following table describes basic information about DashScope text-embedding-v1 model. For more information, see the Quick start topic of this model.

Model name

Model version

Vector dimensions

Maximum number of lines per request

Maximum number of characters per line

Supported languages

DashScope Text Embedding

text-embedding-v1

1,536

25

2,048

Chinese, English, Spanish, French, Portuguese, and Indonesian

The data type of a vector refers to the data type of its elements. For example, if all elements in the vector returned by DashScope text-embedding-v1 model are of the float type, the data type of the vector is float. For a more specific example, if the vector is [1,2,3,4], which consists of integers only, the vector is a 4-dimensional vector of the int type.

Note

The dimension and data type of a vector vary based on the embedding model.

Distance metric

As a vector is an array, the similarity between vectors is usually measured by their distance. DashVector supports three typical distance metrics.

Cosine distance

Cosine similarity is measured by the cosine of the angle between two vectors as follows.

image.png

Here, A and B are two different vectors, n indicates the vector dimension, · indicates the dot product of vectors, and ||A|| and ||B|| indicate magnitudes of the two vectors respectively.

image.png

DashVector uses the cosine distance to indicate the similarity, specifically, cosine distance = 1 - cosine similarity. The valid range of the cosine distance is [0, 2], where a smaller value indicates a greater similarity. The following formula shows how the cosine similarity is calculated.

image.png

Euclidean distance

The Euclidean distance is the straight-line distance between two vectors. A shorter Euclidean distance indicates a greater similarity. The Euclidean distance is calculated as follows.

image.png

Here, A and B are two different vectors, and n indicates the vector dimension.

image.png

Dot product

Dot product similarity refers to the dot product, also called scalar product, of two vectors. A larger dot product indicates a greater similarity between the target vectors. The dot product is calculated as follows:

image.png

Here, A and B are two different vectors, and n indicates the vector dimension.

image.png

Common models and vector parameters

Model type

Vector dimensions

Data type

Recommended distance metric

DashScope text-embedding-v1

1,536

Float(32)

Cosine

DashScope multimodal-embedding-one-peace-v1

1,536

Float(32)

Cosine

OpenAI Embedding

1,536

Float(32)

Cosine

When creating a collection, you can select parameters based on the model you use.image.png