购买实例
购买实例可参考购买OpenSearch向量检索版实例。
配置实例
新购买的实例,在其详情页中,实例状态为“待配置”,并且会自动部署一个与购买的查询节点和数据节点的个数及规格一致的空实例,之后需要为该实例配置表信息>数据同步>字段配置>索引结构,之后等待索引重建完成即可正常搜索。
1. 表基础信息
表管理点击“表添加",输入表名称,设置数据分片数和数据更新资源数,场景模板选择:向量(图片搜索),数据处理选择:已有向量数据,点击下一步,
配置说明:
表名称:可自定义。
数据分片数:分片数设置时,请填写不超过256的正整数, 用于提升全量构建速度、单次查询性能。(部分存量实例,仍需各索引表分片数保持一致;或至少一个索引表分片数为1,其余索引表分片数一致)
数据更新资源数:数据更新所用资源数,每个索引默认免费提供2个4核8G的更新资源,超出免费额度的资源将产生费用,详情可参考向量检索版国际站计费文档
场景模板:向量检索版内置了3种模板可供用户选择:通用、向量-图片搜索、向量-文本语义模板。
如果需要将原始数据转为向量数据可参考端到端图搜解决方案。
2. 数据同步
配置数据源(目前支持的数据源有MaxCompute数据源和API推送数据源),这里以MaxCompute数据源为例,数据源类型选择MaxCompute,设置Project、AccesskeyID、AccesskeySecret、Table、分组键partition、时间戳,可按需选择是否开启“自动索引重建”选择完成之后可选择校验,通过后可点击下一步:
MaxCompute数据源文档参考
API 数据源文档参考
OSS数据源文档参考
3. 字段配置
OpenSearch会根据您选择的场景模板,预置相关字段,并会将全量数据来源中的字段(如有),自动导入字段列表中:
设置字段:‘向量图片搜索’模板必须至少包含4个预置字段,id(主键)、vector(向量字段)、cate_id(类目字段)、vector_source_image(存储图片向量的字段),平台默认会生成:
字段配置说明:
必选字段:主键字段和向量字段,主键字段为int或string类型并且需要勾选主键按钮,向量字段为float类型并且需要勾选向量字段按钮;
向量字段默认为多值的float类型,多值分隔符默认使用ha3分割符^] 进行切分(其对应utf编码为\x1D),也可以输入自定义多值分隔符
若无需系统生成向量可删除vector_source_image字段或取消勾选“需embedding字段”
使用向量检索,在定义字段时有位置要求,需要按照主键字段、命名空间字段(非必要)、向量字段的顺序创建。(如上图所示)
当数据中缺少字段或字段为空时,系统将自动补充默认值,数字类型默认补0,STRING类型默认补空字符串,支持自定义默认值
4. 索引结构
4.1. 向量索引
OpenSearch会对主键与向量字段自动构建索引,索引名与字段名相同,只需要在控制台配置向量索引:
高级配置,向量索引需要单独配置参数,详情可参考向量索引通用配置
主键字段、向量字段必须填写,命名空间字段非必填,可以为空。
仅支持选择固定的三个字段,不支持新增。
系统自动填充向量索引的配置参数,如无特殊需求,可直接点击「确定」快速完成配置。
命名空间字段:实例引擎版本为vetcor service 1.0.2及以下版本,namespace标签字段不支持string格式类型;实例引擎版本为vetcor service 1.0.2及以上版本,无此限制。
5. 确认创建
索引配置完成后,点击确认创建。
6. 变更历史
实例管理-变更历史-数据源变更,可以看到创建表及新增索引及索引重建的所有FSM,全部完成之后引擎搭建完成,可以开始查询测试:
7. 查询测试
查询示例:
{
"vector": [0.0019676427,0.005902928,0.021644069,0.21644068,0.12199384,0.043288138,0.007870571,0.0,0.08460863,0.041320495,0.043288138,0.035417568,0.011805856,0.055093993,0.12592913,0.017708784,0.021644069,0.0019676427,0.0,0.0,0.0019676427,0.078705706,0.1987319,0.041320495,0.039352853,0.0039352854,0.007870571,0.0039352854,0.0039352854,0.017708784,0.035417568,0.06886749,0.0019676427,0.0019676427,0.013773498,0.049191065,0.2125054,0.22824654,0.123961486,0.0039352854,0.0,0.0,0.021644069,0.14560555,0.078705706,0.1987319,0.22824654,0.005902928,0.064932205,0.0019676427,0.0019676427,0.021644069,0.027546996,0.035417568,0.22824654,0.22824654,0.1337997,0.023611711,0.009838213,0.007870571,0.0039352854,0.0039352854,0.017708784,0.20069954,0.033449925,0.005902928,0.019676426,0.035417568,0.015741142,0.029514639,0.13183205,0.123961486,0.029514639,0.0,0.027546996,0.22824654,0.15741141,0.0,0.0039352854,0.043288138,0.18889369,0.072802775,0.055093993,0.17315255,0.08460863,0.0019676427,0.007870571,0.035417568,0.22824654,0.10034977,0.009838213,0.021644069,0.062964566,0.027546996,0.015741142,0.04525578,0.086576276,0.033449925,0.023611711,0.017708784,0.0,0.0,0.03738521,0.072802775,0.16724962,0.035417568,0.031482283,0.20463483,0.043288138,0.011805856,0.0039352854,0.051158708,0.023611711,0.11412327,0.13183205,0.16134669,0.049191065,0.023611711,0.0039352854,0.0039352854,0.049191065,0.035417568,0.015741142,0.0039352854,0.03738521,0.08264099,0.094446845,0.021644069],
"topK": 10,
"includeVector": true
}
vector:具体要查询的向量
topK:取top K个结果
是否返回文档中的向量信息
结果演示:
详细的查询语法可参考下文的语法说明。