使用通用文本向量模型可以免去您在本地部署嵌入模型与向量数据库的步骤,且按token进行计费,帮助您降低在项目初期的投入成本。
前言
通用文本向量,是通义实验室基于LLM底座的多语言文本统一向量模型,面向全球多个主流语种,提供高水准的向量服务,帮助开发者将文本数据快速转换为高质量的向量数据。
模型中文名 | 模型英文名 | 向量维度 | 单次请求文本最大行数 | 单行最大输入token长度 | 支持语种 |
通用文本向量 | text-embedding-v3 | 1024/768/512 | 6 | 8192 | 中文、英语、西班牙语、法语、葡萄牙语、印尼语、日语、韩语、德语、俄罗斯语等50+语种 |
MTEB | MTEB(Retrieval task) | CMTEB | CMTEB (Retrieval task) | |
text-embedding-v3 | 63.39 | 55.41 | 68.92 | 73.23 |
快速调用
调用前准备
代码示例
API-KEY设置
export DASHSCOPE_API_KEY=YOUR_DASHSCOPE_API_KEY
同步调用示例
同步调用支持输入单条文本,对其进行处理返回结果。
import dashscope
from http import HTTPStatus
dashscope.base_http_api_url = 'https://dashscope-intl.aliyuncs.com/api/v1'
def embed_with_str():
resp = dashscope.TextEmbedding.call(
model=dashscope.TextEmbedding.Models.text_embedding_v3,
input='衣服的质量杠杠的,很漂亮,不枉我等了这么久啊,喜欢,以后还来这里买')
if resp.status_code == HTTPStatus.OK:
print(resp)
else:
print(resp)
if __name__ == '__main__':
embed_with_str()
import java.util.Arrays;
import java.util.concurrent.Semaphore;
import com.alibaba.dashscope.common.ResultCallback;
import com.alibaba.dashscope.embeddings.TextEmbedding;
import com.alibaba.dashscope.embeddings.TextEmbeddingParam;
import com.alibaba.dashscope.embeddings.TextEmbeddingResult;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.alibaba.dashscope.utils.Constants;
public final class Main {
static {
Constants.baseHttpApiUrl="https://dashscope-intl.aliyuncs.com/api/v1";
}
public static void basicCall() throws ApiException, NoApiKeyException{
TextEmbeddingParam param = TextEmbeddingParam
.builder()
.model("text-embedding-v3")
.texts(Arrays.asList("风急天高猿啸哀", "渚清沙白鸟飞回", "无边落木萧萧下", "不尽长江滚滚来")).build();
TextEmbedding textEmbedding = new TextEmbedding();
TextEmbeddingResult result = textEmbedding.call(param);
System.out.println(result);
}
public static void main(String[] args){
try {
basicCall();
} catch (ApiException | NoApiKeyException e) {
System.out.println(e.getMessage());
}
System.exit(0);
}
}
同步调用输出
{
"status_code": 200,
"request_id": "617b3670-6f9e-9f47-ad57-997ed8aeba6a",
"code": "",
"message": "",
"output": {
"embeddings": [
{
"embedding": [
0.09393704682588577,
2.4155092239379883,
-1.8923076391220093,
.,
.,
.
],
"text_index": 0
}
]
},
"usage": {
"total_tokens": 23
}
}
python示例输出
{
"status_code": 200,
"request_id": "e941b97b-3b82-9de5-9ef0-512815b9c0ca",
"code": null,
"message": "",
"output": {
"task_id": "0b8ccc25-dec8-4d5f-bb84-cfe2ea580f9d",
"task_status": "SUCCEEDED",
"url": "The embedding result file url.",
"submit_time": "2023-09-07 10:22:52.459",
"scheduled_time": "2023-09-07 10:22:52.481",
"end_time": "2023-09-07 10:22:53.419"
},
"usage": {
"total_tokens": 384
}
}
了解更多
同步调用代码示例详见同步接口API详情。