AI algorithms can abstract various unstructured data generated by sources in the physical world, such as people, things, and scenes, into multi-dimensional vectors. The unstructured data can be speech, image, video, text, and behavior. These vectors are like coordinates in mathematical space, identifying various entities and entity relationships. The process of changing unstructured data into vectors is generally called embedding, while unstructured search is the process of searching these generated vectors for the corresponding entities.
Unstructured search, in essence, is the vector search technology. The technology is mainly applied to fields such as facial recognition, recommendation system, image search, video fingerprint, voice processing, Natural Language Processing (NLP), and file search. With the wide application of AI technology and the continuous growth of data scale, vector search has gradually become an indispensable part of AI technology links, and a supplement to the traditional search technology. Vector search also supports multi-modal searching.
To meet the requirements of more diversified and complex multi-modal search scenarios, OpenSearch provides the vector search feature, which can build a high-performance vector search system in one stop.
Product Type: You can set this parameter to Pay-as-you-go
when you purchase the instance for test only.
Region and Zone: Set this parameter to China (Hangzhou)
(customizable).
Application Name: Set this parameter to test_vector_opensearch
(customizable).
Industry Type: Set this parameter to General Industry
.
Specifications: Set this parameter to Exclusive Computing with 30GB, 1000LCU selected for Storage Capacity and Computing Resource, respectively. Then, click Buy Now.
Step 3: On the displayed Confirm Order page, select I have read and agree to Open Search (Subscription) on International Site Agreement of Service. Then, click Activate Now to confirm the order.
The OpenSearch instance is created.
3. Configure a vector retrieval service instance
On the Configure Application page, set the parameters in the Feature Selection, Application Schema, Index Schema, Data Source, and Complete steps in sequence.
3.1. Application Schema
Find the corresponding application on the Applications page of OpenSearch console and click Configure in the Actions column.
Step 1: Create an application schema.
You can manually create an application schema, or create an application schema by connecting to a data source, by uploading templates, or by uploading files. For example, to select MaxCompute as the data source, select Use Data Source for Application Schema Creation Method and then select MaxCompute for Use Data Source. Then, click Create Database.
Enter the database connection information.
Step 2: Select a table and click OK.
Step 3: Select a primary table and a primary key. If you need to join multiple tables, see Join multiple tables.
Note: The vector field must be set to the DOUBLE_ARRAY type.
3.2. Index Scheme
step 1:Index fields
After the application schema is configured, the system automatically generates index fields, analyzers, index tags, and the field contained.
Note: You need to configure the vector index for the vector field (vector_field
). You can select dimensions based on your needs. By default, OpenSearch supports 64-, 128-, 256-, 512-, and 1536-dimensional vectors.
step 2:Attribute Field List
3.3. Data Source
If you select a MaxCompute data source when you configure an application schema, the corresponding project table is automatically mapped. You only need to specify the corresponding import conditions as required. By default, all partition data of the table is imported.
If the name of the data source table field is inconsistent with that in the application schema you configured, you can click Modify to manually modify the mapping field.
Confirm and click Finish.
3.4. Configuration completed
4. Online query
For more information about the vector query syntax, click Vector search.
# 1536-dimensional vector is used as an example.
vector_index:'-0.01786,0.03692,0.03710,0.01668,0.03655,-0.03515,0.02017,-0.00653,-0.01419,-0.01708,-0.00091,-0.03528,0.02821,-0.02194,-0.01609,-0.02045,0.02209,0.06413,0.06233,0.03064,-0.00863,-0.06810,0.00729,0.07912,-0.03948,0.06932,0.02051,-0.00688,-0.01138,0.03207,0.03040,-0.00050,0.06220,-0.03895,0.04575,-0.00259,0.04358,0.02027,0.03342,-0.02916,0.04793,-0.02954,0.04327,0.06156,-0.00230,0.00653,0.01515,-0.00287,0.03546,-0.01551,-0.03049,0.07542,-0.01563,0.00680,0.00598,-0.00396,0.00330,0.00359,-0.03395,-0.00825,-0.02175,0.04479,0.04008,0.03558,-0.03011,-0.00015,0.03086,-0.00941,0.03113,0.00758,-0.04333,0.04607,-0.02520,-0.01260,-0.04726,0.00564,-0.02423,-0.00439,-0.02739,-0.01674,0.06426,-0.05995,0.01762,0.04370,0.02211,-0.03174,0.04465,0.00475,-0.03577,0.01111,-0.00963,0.03510,-0.02533,-0.00444,0.00161,0.00561,0.00066,-0.04074,0.00682,0.03293,-0.01630,-0.02575,0.02834,0.02679,-0.04558,0.02395,0.00531,0.01240,0.04064,0.03599,0.00172,0.00413,-0.06839...&sf=0.8'