Use EAS and Elasticsearch to Deploy a RAG-Based LLM Chatbot

Platform for AI (PAI) allows you to integrate Elasticsearch with a Retrieval-Augmented Generation (RAG)-based large language model (LLM) chatbot. This way, the accuracy and diversity of the answers generated by models are enhanced. Elasticsearch features efficient data retrieval capabilities and provides special features such as dictionary configuration and index management. This allows the RAG-based LLM chatbot to identify user requirements better and provide more appropriate and valuable feedback. This article describes how to associate Elasticsearch with a RAG-based LLM chatbot when you deploy the RAG-based LLM chatbot. This article also describes the basic features provided by a RAG-based LLM chatbot and the special features provided by Elasticsearch.

Background Information

Introduction to EAS

Elastic Algorithm Service (EAS) is an online model service platform of PAI that allows you to deploy models as online inference services or AI-powered web applications. EAS provides features such as auto scaling and blue-green deployment. These features reduce the costs of developing stable online model services that can handle a large number of concurrent requests. In addition, EAS provides features such as resource group management and model versioning and capabilities such as comprehensive O&M and monitoring. For more information, see EAS overview.

Introduction to RAG

With the rapid development of AI technology, generative AI has made remarkable achievements in various fields such as text generation and image generation. However, the following inherent limits gradually emerge while LLMs are widely used:

Field knowledge limits**: In most cases, LLMs are trained by using large-scale general datasets. In this case, LLMs struggle to provide in-depth and targeted processing for specialized vertical fields.
Information update delay: The static nature of the training datasets prevents LLMs from accessing and incorporating real-time information and knowledge updates.
Misleading outputs: LLMs are prone to hallucinations, producing outputs that appear plausible but are factually incorrect. This is attributed to factors such as data bias and inherent model limits.

To address these challenges and enhance the capabilities and accuracy of LLMs, RAG is developed. RAG integrates external knowledge bases to significantly mitigate the issue of LLM hallucinations and enhance the capabilities of LLMs to access and apply up-to-date knowledge. This enables the customization of LLMs for greater personalization and accuracy.

Introduction to Elasticsearch

Alibaba Cloud Elasticsearch is a fully managed cloud service that is developed based on open source Elasticsearch. Alibaba Cloud Elasticsearch is compatible with all features provided by open source Elasticsearch. You can use this out-of-the-box service by using the pay-as-you-go billing method. In addition to Elastic Stack components such as Elasticsearch, Logstash, Kibana, and Beats, Alibaba Cloud Elasticsearch cooperates with Elastic and provides the X-Pack commercial plug-in free of charge. X-Pack advanced features provided in the open source Elasticsearch Platinum edition are developed by the open source Elasticsearch team based on the X-Pack plug-in. The features include security, SQL plug-in, machine learning, alerting, and monitoring. Alibaba Cloud Elasticsearch is widely used in scenarios such as real-time log analysis and processing, information retrieval, multidimensional data queries, and statistical data analytics. For more information, see What is Alibaba Cloud Elasticsearch?

Procedure

EAS provides a self-developed RAG systematic solution with flexible parameter configurations. You can access RAG services by using a web user interface (UI) or calling API operations to configure a custom RAG-based LLM chatbot. The technical architecture of RAG focuses on retrieval and generation.

Retrieval: EAS integrates a range of vector databases, including open source Faiss and Alibaba Cloud services such as Milvus, Elasticsearch, Hologres, OpenSearch, and AnalyticDB for PostgreSQL.
Generation: EAS supports various open source models such as Qwen, Meta Llama, Mistral, and Baichuan, and also integrates ChatGPT.

In this example, an Elasticsearch cluster is used to show how to use EAS and Elasticsearch to deploy a RAG-based LLM chatbot by performing the following steps:

1. Prepare a vector database by using Elasticsearch

Create an Elasticsearch cluster and prepare the configuration items on which a RAG-based LLM chatbot depends to associate the chatbot with the Elasticsearch cluster.

2. Deploy a RAG-based LLM chatbot and associate it with the Elasticsearch cluster

Deploy a RAG-based LLM chatbot and associate it with the Elasticsearch cluster on EAS.

3. Use the RAG-based LLM chatbot

You can connect to the Elasticsearch cluster in the RAG-based LLM chatbot, upload business data files, and then perform knowledge Q&A.

Prerequisites

A virtual private cloud (VPC), a vSwitch, and a security group are created. For more information, see Create a VPC with an IPv4 CIDR block and Create a security group.

Precautions

This practice is subject to the maximum number of tokens of an LLM service and is designed to help you understand the basic retrieval feature of a RAG-based LLM chatbot.

The chatbot is limited by the server resource size of the LLM service and the default number of tokens. The conversation length supported by the chatbot is also limited.
If you do not need to perform multiple rounds of conversations, we recommended that you disable the with chat history feature of the chatbot on the WebUI page. This effectively reduces the possibility of reaching the limit. For more information, see How do I disable the with chat history feature of the RAG-based chatbot?

Prepare a Vector Database by Using Elasticsearch

Step 1: Create an Alibaba Cloud Elasticsearch Cluster

Log on to the Alibaba Cloud Elasticsearch console. In the left-side navigation pane, click Elasticsearch Clusters. On the page that appears, click Create. The following table describes the key parameters. For information about other parameters, see Create an Alibaba Cloud Elasticsearch cluster.

Parameter	Description
Region and Zone	The region and zone in which the cluster resides. Select the region in which EAS is deployed.
Instance Type	The type of the cluster. Select Standard Edition.
Password	The password that is used to access the cluster. Configure a password and save the password to your on-premises machine.

Step 2: Prepare Configuration Items

1. Prepare the URL of the Elasticsearch cluster.

In the top navigation bar of the Elasticsearch Clusters page, select the region in which the Elasticsearch cluster resides. In the Elasticsearch cluster list, find the Elasticsearch cluster and click its ID.
In the Basic Information section of the cluster details page, obtain the internal endpoint and corresponding port number. Construct the URL of the Elasticsearch cluster by using the private endpoint and port number.

Format: http://<Internal endpoint>:<Port number>.

Note: If you use an internal endpoint, make sure that the Elasticsearch cluster and RAG-based LLM chatbot reside in the same VPC. Otherwise, the connection fails.

2. Prepare the index name.

In the left-side navigation pane, choose Configuration and Management > Cluster Configuration. On the page that appears, click Modify Configuration in the upper-right corner of the YML File Configuration section. In the YML File Configuration panel, select Enable for the Auto Indexing parameter. For more information, see Configure the YML file.

After the settings are complete, you can configure a custom index name when you deploy a RAG-based LLM chatbot. For example, you can set the index name to es-test.

3. Prepare the username and password of the Elasticsearch cluster.

The default username of an Elasticsearch cluster is elastic. The password is the one you specify when you create the Elasticsearch cluster. If you forget your password, you can reset the password. For more information, see Reset the access password for an Elasticsearch cluster.

Deploy a RAG-Based LLM Chatbot and Associate It with the Elasticsearch Cluster

1. Log on to the PAI console. In the upper part of the page, select the region in which you want to create a workspace. In the left-side navigation pane, choose Model Training > Elastic Algorithm Service (EAS). On the page that appears, select the desired workspace and click Enter Elastic Algorithm Service (EAS).

2. On the Elastic Algorithm Service (EAS) page, click Deploy Service. In the Scenario-based Model Deployment section, click RAG-based Smart Dialogue Deployment.

3. On the RAG-based LLM Chatbot Deployment page, configure the key parameters described in the following table. For information about other parameters, see Step 1: Deploy the RAG-based chatbot.

Parameter		Description
Basic Information	Model Source	Select Open Source Model.
Basic Information	Model Type	The model type. In this example, Qwen1.5-1.8b is used.
Resource Configuration	Resource Configuration	The system recommends the appropriate resource specifications based on the selected model type. If you use other resource specifications, the model service may fail to start.
Vector Database Settings	Vector Database Type	Select Elasticsearch.
	Private Endpoint and Port	Enter the Elasticsearch cluster URL that you obtained in Step 2. Specify the value in the http://<Internal endpoint>:<Port number> format.
	Index Name	The name of the index. You can enter a new index name or an existing index name. If you use an existing index name, the index schema must meet the requirements of the RAG-based chatbot. For example, you can enter the name of the index that is automatically created when you deploy the RAG-based chatbot by using EAS.
	Account	Enter elastic.
	Password	The password that you specify in Step 2.
VPC Configuration (Optional)	VPC	The VPC in which the Elasticsearch cluster resides.
	vSwitch
	Security Group Name

4. After you configure the parameters, click Deploy.

Use the RAG-based LLM Chatbot

The following section describes how to use a RAG-based LLM chatbot. For more information, see RAG-based LLM chatbot.

1. Connect to a Vector Database

After you deploy the RAG-based chatbot, click View Web App in the Service Type column to enter the web UI.
Check whether the vector database in an Elasticsearch cluster is connected.

The system recognizes and applies the vector database settings that are configured when you deploy the chatbot. Click Connect ElasticSearch to check whether the vector database in the Elasticsearch cluster is connected. If the connection fails, check whether the vector database settings are correct based on Step 2: Prepare configuration items. If the settings are incorrect, modify the configuration items and click Connect ElasticSearch to reconnect the Elasticsearch cluster.

2. Upload Business Data Files

Upload your knowledge base files. The system automatically stores the knowledge base in the PAI-RAG format to the vector database for retrieval. You can also use existing knowledge bases in the database, but the knowledge bases must meet the PAI-RAG format requirements. Otherwise, errors may occur during retrieval.

1. On the Upload tab, configure the chunk parameters.

The following parameters controls the granularity of document chunking and whether to enable Q&A extraction.

Parameter	Description
Chunk Size	The size of each chunk. Unit: bytes. Default value: 500.
Chunk Overlap	The overlap between adjacent chunks. Default value: 10.
Process with QA Extraction Model	Specifies whether to extract Q&A information. If you select Yes, the system automatically extracts questions and corresponding answers in pairs after knowledge files are uploaded. This way, more accurate answers are returned in data queries.

2. On the Files tab or Directory tab, upload one or more business data files. You can also upload a directory that contains the business data files. Supported file types: txt,. pdf, Excel (.xlsx or. xls),. csv, Word (.docx or. doc), Markdown, or. html. For example: rag_chatbot_test_doc.txt.

3. Click Upload. The system performs data cleansing and semantic-based chunking on the business data files before uploading the business data files. Data cleansing includes text extraction and hyperlink replacement.

3. Perform Knowledge Q&A

The RAG-based LLM chatbot enters the results returned from the vector database and the query into the selected prompt template and sends the template to the LLM application to provide an answer.

Special Features Provided by Elasticsearch

Configure Custom Main and Stopword Dictionaries

Alibaba Cloud Elasticsearch comes with a built-in IK analysis plug-in named analysis-ik. The analysis-ik plug-in acts as a dictionary that contains common words of various categories. The analysis-ik plug-in comes with a built-in main dictionary and a stopword dictionary. The main dictionary is used to analyze complex text. The stopword dictionary is used to remove meaningless high-frequency words. This enhances retrieval efficiency and accuracy. Although the built-in dictionaries of analysis-ik are comprehensive, they may not include specific terminology in specialized fields such as law and medicine, as well as product names, company names, and brand names in the default knowledge base. To enhance retrieval accuracy, you can create custom dictionaries based on your business requirements. For more information, see Use the analysis-ik plug-in.

1. Prepare a Custom Main or Stopword Dictionary

Prepare a custom main or stopword dictionary on your on-premises machine. The dictionary file must meet the following requirements:

Format: The file name extension must be .dic. The file name must be 1 to 30 characters in length and can contain letters, digits, and underscores (_). For example, you can prepare a dictionary file named new_word.dic.
Content: Each row in the dictionary file can contain only one token or stopword. For example, if you use the built-in main dictionary, the phrase "cloud server" is tokenized into two separate words: "cloud" and "server". If your application requires "cloud server" to be treated as a single term, you can add "cloud server" to a custom dictionary. The following example shows how to define a custom dictionary named new_word.dic:

cloud server
custom token

2. Upload the Dictionary File

After you prepare the dictionary file, you need to upload the dictionary file to the specified location. In this example, Standard Update is used to describe how to upload the dictionary file. For more information, see Use the analysis-ik plug-in.

1. Go to the details page of an Elasticsearch cluster.

Log on to the Alibaba Cloud Elasticsearch console.
In the left-side navigation pane, click Elasticsearch Clusters.
Navigate to the desired cluster.
- In the top navigation bar, select the resource group to which the cluster belongs and the region where the cluster resides.
- On the Elasticsearch Clusters page, find the cluster and click its ID.

2. In the left-side navigation pane of the page that appears, choose Configuration and Management > Plug-ins.

3. On the Built-in Plug-ins tab, find the analysis-ik plug-in and click Rolling Update in the Actions column.

4. In the Configure IK Dictionaries - Rolling Update panel, click Edit on the right side of the dictionary that you want to update, upload a dictionary file, and then click Save.
You can use one of the following methods to update a dictionary file:

Upload On-premises File: Click the upload area and select the file that you want to upload from your on-premises machine. Alternatively, drag the file that you want to upload from your on-premises machine to the upload area.
Upload OSS File: Configure the Bucket Name and File Name parameters, and click Add.
- Make sure that the bucket that you specify resides in the same region as your Elasticsearch cluster.
- The dictionary file that you specify cannot be automatically updated. If the content of the dictionary file that is stored in OSS changes, you must perform a rolling update to make the changes take effect.

Note

The file name extension must be .dic. The file name must be 1 to 30 characters in length and can contain letters, digits, and underscores (_).
If you want to modify the uploaded dictionary file, you can click the icon next to the file to download and modify it. Then, delete the file and upload the file again.
You can upload multiple dictionary files. The cluster needs to be restarted only when the content of dictionary files changes. If the dictionary file names and the number of dictionary files remain unchanged, the system does not restart the cluster. To ensure that your business is not affected, we recommend that you perform the update during off-peak hours. After the restart is complete, the new dictionary file takes effect.

5. Click OK. After the dictionary file is updated, reconnect the RAG-based LLM chatbot to the Elasticsearch cluster on the web UI. For more information, see Connect to a vector database.

After the RAG-based LLM chatbot is reconnected to the Elasticsearch cluster, perform knowledge Q&A on the web UI. If you select Keyword Only or Hybrid for the Retrieval Mode parameter, you can perform full-text queries by using the updated dictionary file of the Elasticsearch cluster.

Index Management

Elasticsearch provides the index management feature. Effective index management can allow a RAG-based LLM chatbot to efficiently and accurately retrieve valuable information from vast datasets and generate high-quality answers. To manage indexes, perform the following steps:

1. Go to the details page of an Elasticsearch cluster.

Log on to the Alibaba Cloud Elasticsearch console.
In the left-side navigation pane, click Elasticsearch Clusters.
Navigate to the desired cluster.
- In the top navigation bar, select the resource group to which the cluster belongs and the region where the cluster resides.
- On the Elasticsearch Clusters page, find the cluster and click its ID.

2. In the left-side navigation pane of the page that appears, choose Data Visualization.

3. In the Kibana section of the page that appears, click Modify Configuration. On the Kibana Configuration page, configure a private or public IP address whitelist for Kibana.
For more information, see Configure a public or private IP address whitelist for Kibana.

4. Log on to the Kibana console.

Click the Back icon in the upper-left corner of the page to return to the Data Visualization page.
In the Kibana section, click Access over Internet or Access over Internal Network.

Note
The Access over Internet or Access over Internal Network entry is displayed only after the Public Network Access or Private Network Access switch is turned on for Kibana.

On the Kibana logon page, enter the username and password.
- Username: The default username of an Elasticsearch cluster is elastic.
  You can also customize a username. For more information, see Use the RBAC mechanism provided by Elasticsearch X-Pack to implement access control.
- Password: the password that corresponds to the elastic username. The password of the elastic account is specified when you create the Elasticsearch cluster. If you forget the password, you can reset it.
  For more information about the procedure and precautions for resetting the password, see Reset the access password of an Elasticsearch cluster.
Click Log In. The page shown in the following figure appears.

5. View and manage indexes.

In the top navigation bar, click the icon and choose Management > Stack Management.
In the left-side navigation pane, choose Data > Index Management.
On the Index tab of the page that appears, view the index that you want to manage and perform management operations such as disabling the index, refreshing the index, clearing the index, or deleting the index. The following figure shows an example on how to manage an index named es_test.

References

EAS provides simplified deployment methods for typical cutting-edge scenarios of AI-Generated Content (AIGC) and LLM. You can easily deploy model services by using deployment methods such as ComfyUI, Stable Diffusion WebUI, ModelScope, Hugging Face, Triton Inference Server, and TensorFlow Serving. For more information, see Scenario-based deployment.
You can configure various inference parameters on the web UI of a RAG-based LLM chatbot to meet diverse requirements. You can also use the RAG-based LLM chatbot by calling API operations. For more information about implementation details and parameter settings, see RAG-based LLM chatbot.
The RAG-based LLM chatbot can also be associated with other types of vector databases, such as OpenSearch and ApsaraDB RDS for PostgreSQL. For more information, see Use EAS and OpenSearch to deploy a RAG-based chatbot or Use EAS and ApsaraDB RDS for PostgreSQL to deploy a RAG-based LLM chatbot.

Community