Integrate OpenSearch Vector Search Edition with RAG - Platform For AI

You can integrate OpenSearch Vector Search Edition with your Retrieval-Augmented Generation (RAG)-based large language model (LLM) chatbot to improve the accuracy and information richness of generated responses. OpenSearch Vector Search Edition supports a variety of vector retrieval algorithms and provides high performance in multiple typical scenarios. You can view index information and implement simple data management by using a graphical interface. OpenSearch Vector Search Edition can improve the retrieval efficiency and user experience of your RAG-based chatbot. This topic describes how to associate OpenSearch with a RAG-based LLM chatbot when you deploy the RAG-based LLM chatbot. This topic also describes the basic features provided by a RAG-based LLM chatbot and the special features provided by OpenSearch Vector Search Edition.

Background information

Introduction to EAS

Elastic Algorithm Service (EAS) is an online model service platform of PAI that allows you to deploy models as online inference services or AI-powered web applications. EAS provides features such as auto scaling and blue-green deployment. These features reduce the costs of developing stable online model services that can handle a large number of concurrent requests. In addition, EAS provides features such as resource group management and model versioning and capabilities such as comprehensive O&M and monitoring. For more information, see EAS overview.

Introduction to RAG

With the rapid development of AI technology, generative AI has made remarkable achievements in various fields such as text generation and image generation. However, the following inherent limits gradually emerge while LLMs are widely used:

Field knowledge limits: In most cases, LLMs are trained by using large-scale general datasets. In this case, LLMs struggle to provide in-depth and targeted processing for specialized vertical fields.
Information update delay: The static nature of the training datasets prevents LLMs from accessing and incorporating real-time information and knowledge updates.
Misleading outputs: LLMs are prone to hallucinations, producing outputs that appear plausible but are factually incorrect. This is attributed to factors such as data bias and inherent model limits.

To address these challenges and enhance the capabilities and accuracy of LLMs, RAG is developed. RAG integrates external knowledge bases to significantly mitigate the issue of LLM hallucinations and enhance the capabilities of LLMs to access and apply up-to-date knowledge. This enables the customization of LLMs for greater personalization and accuracy.

Introduction to OpenSearch

OpenSearch Vector Search Edition is a large-scale distributed search engine that is developed by Alibaba Cloud. OpenSearch Vector Search Edition supports a variety of vector retrieval algorithms and provides high performance in high-precision scenarios. OpenSearch Vector Search Edition provides the following features: cost-effective vector index building and similarity retrieval for large amounts of data, horizontal index expansion and merging, streaming index building, and real-time update of data.

OpenSearch Vector Search Edition can provide high performance in the following typical scenarios: RAG, multimodal retrieval, and personalized recommendations. For more information, see What is OpenSearch Vector Search Edition?

Procedure

EAS provides a self-developed RAG systematic solution with flexible parameter configurations. You can access RAG services by using a web user interface (UI) or calling API operations to configure a custom RAG-based LLM chatbot. The technical architecture of RAG focuses on retrieval and generation.

Retrieval: EAS integrates a range of vector databases, including open source Faiss and Alibaba Cloud services such as Milvus, Elasticsearch, Hologres, OpenSearch, and AnalyticDB for PostgreSQL.
Generation: EAS supports various open source models such as Qwen, Meta Llama, Mistral, and Baichuan, and also integrates ChatGPT.

In this example, OpenSearch is used to show how to use Elastic Algorithm Service (EAS) and OpenSearch Vector Search Edition to deploy a RAG-based LLM chatbot. The overall procedure takes about 20 minutes.

Prepare an OpenSearch Vector Search Edition instance
Create an OpenSearch Vector Search Edition instance and prepare configuration items on which a RAG-based LLM chatbot depends to associate with the OpenSearch Vector Search Edition instance.
Deploy a RAG-based LLM chatbot and associate it with the OpenSearch Vector Search Edition instance
Deploy a RAG-based LLM chatbot and associate it with the OpenSearch Vector Search Edition instance on EAS.
Use the RAG-based LLM chatbot
You can connect to the OpenSearch Vector Search Edition instance in the RAG-based LLM chatbot, upload business data files, and then perform knowledge Q&A.

Prerequisites

A virtual private cloud (VPC), a vSwitch, and a security group are created. For more information, see Create a VPC with an IPv4 CIDR block and Create a security group.

Precautions

This practice is subject to the maximum number of tokens of an LLM service and is designed to help you understand the basic retrieval feature of a RAG-based LLM chatbot.

The chatbot is limited by the server resource size of the LLM service and the default number of tokens. The conversation length supported by the chatbot is also limited.
If you do not need to perform multiple rounds of conversations, we recommended that you disable the with chat history feature of the chatbot on the WebUI page. This effectively reduces the possibility of reaching the limit. For more information, see How do I disable the with chat history feature of the RAG-based chatbot?

Prepare an OpenSearch Vector Search Edition instance

Step 1: Create an OpenSearch Vector Search Edition instance

Log on to the OpenSearch console. In the upper-left corner, switch to OpenSearch Vector Search Edition.

On the Instance Management page, click Create Instance. The following table describes the key parameters. For information about other parameters, see Purchase an OpenSearch Vector Search Edition instance.

Parameter	Description
Service Edition	Select Vector Search Edition.
VPC	Select a virtual private cloud (VPC) and a vSwitch.
VSwitch	Select a virtual private cloud (VPC) and a vSwitch.
Username	The username of the OpenSearch Vector Search Edition instance.
Password	The password of the OpenSearch Vector Search Edition instance.

Step 2: Prepare configuration items

1. Prepare the ID of the instance

On the Instances page, find the desired instance, view the ID of the OpenSearch Vector Search Edition instance, and save the ID to your on-premises machine.

2. Prepare an index table

After you create an instance, the instance status changes to Pending Configuration. You must configure Table Basic Information > Data Synchronization > Field Configuration > Index Schema for the instance. Then, wait until the index is rebuilt. Perform the following steps to add a table:

Click Configure in the Actions column of the instance.
Configure the Basic Table Information parameters and click Next.
Take note of the following key parameters. For information about other parameters, see Getting started for common scenarios.
- Table Name: Enter a table name.
- Data Shards: If you have purchased query nodes, you can set the value to a positive integer that is less than or equal to 256. This accelerates index building and improves query performance. If no query nodes are purchased, the value must be 1.
- Number of Resources for Data Updates: the number of resources used for data updates. By default, OpenSearch provides a free quota of two resources for data updates for each data source in an OpenSearch Vector Search Edition instance. Each resource consists of 4 CPU cores and 8 GB of memory. You are charged for resources that exceed the free quota. For more information, see Billing overview of OpenSearch Vector Search Edition.
- Scenario Template: Select Common Template.
Configure the Data Synchronization parameters and click Next.
Select a Full Data Source based on your business requirements:
- MaxCompute + API: Use a MaxCompute data source as the full data source and use APIs to push incremental data. For information about other parameters, see Create a table for a MaxCompute data source.
- OSS + API: Use OSS as the full data source and use APIs to push incremental data. For information about other parameters, see Create a table for an OSS data source.
- API: Use APIs as the full and incremental data source.

Configure the Field Configuration parameters and click Next.

Save the following sample code as a JSON file. Then, click Import Field Index Schema in the upper-right corner and import the JSON file. Fields and index schema are configured based on the file.

Field Index Schema

{
	"schema": {
		"summarys": {
			"parameter": {
				"file_compressor": "zstd"
			},
			"summary_fields": [
				"id",
				"embedding",
				"file_path",
				"file_name",
				"file_type",
				"node_content",
				"node_type",
				"doc_id",
				"text",
				"source_type"
			]
		},
		"file_compress": [
			{
				"name": "file_compressor",
				"type": "zstd"
			},
			{
				"name": "no_compressor",
				"type": ""
			}
		],
		"indexs": [
			{
				"index_fields": [
					{
						"boost": 1,
						"field_name": "id"
					},
					{
						"boost": 1,
						"field_name": "embedding"
					}
				],
				"indexer": "aitheta2_indexer",
				"index_name": "embedding",
				"parameters": {
					"enable_rt_build": "true",
					"min_scan_doc_cnt": "20000",
					"vector_index_type": "Qc",
					"major_order": "col",
					"builder_name": "QcBuilder",
					"distance_type": "SquaredEuclidean",
					"embedding_delimiter": ",",
					"enable_recall_report": "true",
					"ignore_invalid_doc": "true",
					"is_embedding_saved": "false",
					"linear_build_threshold": "5000",
					"dimension": "1536",
					"rt_index_params": "{\"proxima.oswg.streamer.segment_size\":2048}",
					"search_index_params": "{\"proxima.qc.searcher.scan_ratio\":0.01}",
					"searcher_name": "QcSearcher",
					"build_index_params": "{\"proxima.qc.builder.quantizer_class\":\"Int8QuantizerConverter\",\"proxima.qc.builder.quantize_by_centroid\":true,\"proxima.qc.builder.optimizer_class\":\"BruteForceBuilder\",\"proxima.qc.builder.thread_count\":10,\"proxima.qc.builder.optimizer_params\":{\"proxima.linear.builder.column_major_order\":true},\"proxima.qc.builder.store_original_features\":false,\"proxima.qc.builder.train_sample_count\":3000000,\"proxima.qc.builder.train_sample_ratio\":0.5}"
				},
				"index_type": "CUSTOMIZED"
			},
			{
				"has_primary_key_attribute": true,
				"index_fields": "id",
				"is_primary_key_sorted": false,
				"index_name": "id",
				"index_type": "PRIMARYKEY64"
			},
			{
				"index_fields": "file_path",
				"index_name": "file_path",
				"index_type": "STRING"
			},
			{
				"index_fields": "file_name",
				"index_name": "file_name",
				"index_type": "STRING"
			},
			{
				"index_fields": "file_type",
				"index_name": "file_type",
				"index_type": "STRING"
			},
			{
				"index_fields": "node_content",
				"index_name": "node_content",
				"index_type": "STRING"
			},
			{
				"index_fields": "node_type",
				"index_name": "node_type",
				"index_type": "STRING"
			},
			{
				"index_fields": "doc_id",
				"index_name": "doc_id",
				"index_type": "STRING"
			},
			{
				"index_fields": "text",
				"index_name": "text",
				"index_type": "STRING"
			},
			{
				"index_fields": "source_type",
				"index_name": "source_type",
				"index_type": "STRING"
			}
		],
		"attributes": [
			{
				"file_compress": "no_compressor",
				"field_name": "id"
			},
			{
				"file_compress": "no_compressor",
				"field_name": "embedding"
			},
			{
				"file_compress": "no_compressor",
				"field_name": "file_path"
			},
			{
				"file_compress": "no_compressor",
				"field_name": "file_name"
			},
			{
				"file_compress": "no_compressor",
				"field_name": "file_type"
			},
			{
				"file_compress": "no_compressor",
				"field_name": "node_content"
			},
			{
				"file_compress": "no_compressor",
				"field_name": "node_type"
			},
			{
				"file_compress": "no_compressor",
				"field_name": "doc_id"
			},
			{
				"file_compress": "no_compressor",
				"field_name": "text"
			},
			{
				"file_compress": "no_compressor",
				"field_name": "source_type"
			}
		],
		"fields": [
			{
				"compress_type": "uniq",
				"field_type": "STRING",
				"field_name": "id"
			},
			{
				"user_defined_param": {
					"multi_value_sep": ","
				},
				"multi_value": true,
				"compress_type": "uniq",
				"field_type": "FLOAT",
				"field_name": "embedding"
			},
			{
				"compress_type": "uniq",
				"field_type": "STRING",
				"field_name": "file_path"
			},
			{
				"compress_type": "uniq",
				"field_type": "STRING",
				"field_name": "file_name"
			},
			{
				"compress_type": "uniq",
				"field_type": "STRING",
				"field_name": "file_type"
			},
			{
				"compress_type": "uniq",
				"field_type": "STRING",
				"field_name": "node_content"
			},
			{
				"compress_type": "uniq",
				"field_type": "STRING",
				"field_name": "node_type"
			},
			{
				"compress_type": "uniq",
				"field_type": "STRING",
				"field_name": "doc_id"
			},
			{
				"compress_type": "uniq",
				"field_type": "STRING",
				"field_name": "text"
			},
			{
				"compress_type": "uniq",
				"field_type": "STRING",
				"field_name": "source_type"
			}
		],
		"table_name": "abc"
	},
	"extend": {
		"description": [],
		"vector": [
			"embedding"
		],
		"embeding": []
	}
}

Configure the Index Schema parameters and click Next.
Take note of the following key parameters. For information about other parameters, see Common configurations of vector indexes.
- Vector Dimension: Set the value to 512.
- Distance Type: We recommend that you select InnerProduct.
In the Confirm step, click Confirm.
The Table Management page appears. When the Status parameter changes to In Use, the table is created.

3. Configure Internet access for the OpenSearch Vector Search Edition instance

EAS can access OpenSearch only over the Internet. Therefore, you need to add a VPC to EAS and associate a NAT gateway and an Elastic IP Address (EIP) to the VPC. You also need to configure Internet access for the OpenSearch Vector Search Edition instance and add the EIP to the whitelist. Perform the following steps to configure Internet access between the OpenSearch Vector Search Edition instance and the VPC associated with EAS. The VPC associated with EAS is not necessarily the VPC associated with the OpenSearch Vector Search Edition instance.

Configure Internet access for the VPC associated with EAS. For more information, see Use the SNAT feature of an Internet NAT gateway to access the Internet.
View the associated EIP.
1. Log on to the VPC console. Find the desired VPC and click the ID in the Instance ID/Name column of the VPC. On the Basic Information tab of the VPC details page, click the Resource Management tab.
2. In the Access to Internet section of the tab that appears, click the associated Internet NAT gateway.
3. On the Internet NAT Gateway page, find the desired Internet NAT gateway and click the ID in the Instance ID/Name column of the Internet NAT gateway.
4. On the Basic Information tab of the details page for the Internet NAT gateway, click the Associated EIP tab to view the associated EIP and save it to your on-premises machine.
On the Instances page of OpenSearch Vector Search Edition, find the desired instance and click the name in the Instance Name/ID column of the instance. The Instance Information page appears.
In the Network Information section, turn on Public Access. In the Modify Public Access Whitelist panel, add the EIP that you obtained in the previous step to the whitelist based on the prompt instructions.
In the Network Information section, save the Public Endpoint to your on-premises machine.

4. View the username and password of the OpenSearch Vector Search Edition instance

In the API Endpoint section, view the username and password that you configured when you create the OpenSearch Vector Search Edition instance.

Deploy a RAG-based LLM chatbot and associate it with the OpenSearch Vector Search Edition instance

Log on to the PAI console. Select a region and a workspace. Then, click Enter Elastic Algorithm Service (EAS).
On the Elastic Algorithm Service (EAS) page, click Deploy Service. In the Scenario-based Model Deployment section, click RAG-based Smart Dialogue Deployment.

On the RAG-based LLM Chatbot Deployment page, configure the key parameters described in the following table. For information about other parameters, see Step 1: Deploy the RAG service.

Parameter		Description
Basic Information	Model Source	Select Open Source Model.
Basic Information	Model Type	Select a model type. In this example, Qwen1.5-1.8b is used.
Resource Configuration	Resource Configuration	The system recommends the appropriate resource specifications based on the selected model type. If you use other resource specifications, the model service may fail to start.
Vector Database Settings	Vector Database Type	Select OpenSearch.
	Endpoint	Enter the public endpoint that you obtained in Step 2 without http:// or https://. Example: ha-cn-****.public.ha.aliyuncs.com.
	Instance ID	Enter the ID of the OpenSearch Vector Search Edition instance that you obtained in Step 2.
	Username	Enter the username of the OpenSearch Vector Search Edition instance.
	Password	Enter the password of the OpenSearch Vector Search Edition instance.
	Table Name	Enter the name of the index table that you created in Step 2.
VPC Configuration (Optional)	VPC	Select the same VPC and vSwitch as those of OpenSearch. You can also use another VPC. But you must ensure that the VPC can access the Internet and add the EIP of the VPC to the whitelist of the OpenSearch Vector Search Edition instance. For more information, see Use the SNAT feature of an Internet NAT gateway to access the Internet and Configure the public access whitelist.
	vSwitch
	Security Group Name	Select a security group.

After you configure the parameters, click Deploy.

Use the RAG-based LLM chatbot

The following section describes how to use a RAG-based LLM chatbot. For more information, see RAG-based LLM chatbot.

1. Configure the RAG-based LLM chatbot

After you deploy the RAG-based chatbot, click View Web App in the Service Type column to enter the web UI.
Check whether the OpenSearch Vector Search Edition instance is connected.
The system recognizes and applies the vector database settings that are configured when you deploy a chatbot. Click Connect OpenSearch to check whether the OpenSearch Vector Search Edition instance is connected. If the connection fails, check whether the vector database is correctly configured based on Step 2: Prepare configuration items. Then, click Connect OpenSearch to reconnect to the OpenSearch Vector Search Edition instance.

2. Upload business data files

Upload your knowledge base files. The system automatically stores the knowledge base in the PAI-RAG format to the vector database for retrieval. You can also use existing knowledge bases in the database, but the knowledge bases must meet the PAI-RAG format requirements. Otherwise, errors may occur during retrieval.

On the Upload tab, configure the chunk parameters.

The following parameters controls the granularity of document chunking and whether to enable Q&A extraction.

Parameter	Description
Chunk Size	The size of each chunk. Unit: bytes. Default value: 500.
Chunk Overlap	The overlap between adjacent chunks. Default value: 10.
Process with QA Extraction Model	Specifies whether to extract Q&A information. If you select Yes, the system automatically extracts questions and corresponding answers in pairs after knowledge files are uploaded. This way, more accurate answers are returned in data queries.

On the Files tab or Directory tab, upload one or more business data files. You can also upload a directory that contains the business data files. Supported file types: txt,. pdf, Excel (.xlsx or. xls),. csv, Word (.docx or. doc), Markdown, or. html. For example: rag_chatbot_test_doc.txt.
Click Upload. The system performs data cleansing and semantic-based chunking on the business data files before uploading the business data files. Data cleansing includes text extraction and hyperlink replacement.

3. Perform knowledge Q&A

The RAG-based LLM chatbot enters the results returned from the vector database and the query into the selected prompt template and sends the template to the LLM application to provide an answer.

Special features provided by OpenSearch

Alibaba Cloud OpenSearch provides a convenient graphical interface to efficiently manage tables and indexes. The following section describes how to use the OpenSearch console to view index information and implement simple data management.

Index table management

Go to the Instance Details page of the OpenSearch Vector Search Edition instance.
1. Log on to the OpenSearch Vector Search Edition console.
2. Click the ID of the created instance to go to the Instance Details page.
Go to the Table Management page and manage index tables.
1. In the left-side navigation pane, click Table Management.
  All tables created in the current instance are displayed on the page.
2. You can perform the following operations on the page: view fields, view the index schema, edit an index table, rebuild indexes, and delete an index table. For more information, see Table management.

Data management

Go to the Instance Details page of the OpenSearch Vector Search Edition instance.
1. Log on to the OpenSearch Vector Search Edition console.
2. Click the ID of the created instance to go to the Instance Details page.
Insert data.
1. In the left-side navigation pane, choose Vector Management > Insert Data.
2. Select Form Mode or Developer Mode from the drop-down list on the right.
3. Select the table to which you want to insert data.
4. In Form Mode, enter data by field and click Add. In Developer Mode, enter a data write statement and click Create. For more information, see Insert data.
  If "message": "success" is returned, the data is inserted.
View table metrics.
1. In the left-side navigation pane, choose Metric Monitoring > Table Metrics.
2. Select the table that you want to view. You can view metrics such as DocCount and QPS. For more information, see Table metrics.
Delete data.
1. In the left-side navigation pane, choose Vector Management > Delete Data.
2. Select Form Mode or Developer Mode from the drop-down list on the right.
3. Select Table Name, enter the Primary Key, and then click Delete. For more information, see Delete data.
  If "message": "success" is returned, the data is deleted.

References

EAS provides simplified deployment methods for typical cutting-edge scenarios of AI-Generated Content (AIGC) and LLM. You can easily deploy model services by using deployment methods such as ComfyUI, Stable Diffusion WebUI, ModelScope, Hugging Face, Triton Inference Server, and TensorFlow Serving. For more information, see Scenario-based deployment.
You can configure various inference parameters on the web UI of a RAG-based LLM chatbot to meet diverse requirements. You can also use the RAG-based LLM chatbot by calling API operations. For more information about implementation details and parameter settings, see RAG-based LLM chatbot.
A RAG-based LLM chatbot can also be associated with other types of vector databases, such as Elasticsearch and ApsaraDB RDS for PostgreSQL. For more information, see Use EAS and Elasticsearch to deploy a RAG-based LLM chatbot or Use EAS and ApsaraDB RDS for PostgreSQL to deploy a RAG-based LLM chatbot.