All Products
Search
Document Center

Platform For AI:Use EAS and Elasticsearch to deploy a RAG-based LLM chatbot

Last Updated:Aug 28, 2024

Platform for AI (PAI) allows you to integrate Elasticsearch information retrieval components into a Retrieval-Augmented Generation (RAG)-based large language model (LLM) chatbot. This way, the accuracy and diversity of the answers generated by models are enhanced. In addition to the efficient data retrieval capability, Elasticsearch provides special features such as dictionary configuration and index management. This allows the RAG-based LLM chatbot to identify user requirements better and provide more appropriate and valuable feedback. This topic describes how to associate Elasticsearch with a RAG-based LLM chatbot when you deploy the RAG-based LLM chatbot. This topic also describes the basic features provided by the RAG-based LLM chatbot and the special features provided by Elasticsearch.

Background information

Introduction to EAS

Elastic Algorithm Service (EAS) is an online model service platform of PAI that allows you to deploy models as online inference services or AI-powered web applications. EAS provides features such as automatic scaling and blue-green deployment. These features reduce the costs of developing stable online model services that can handle a large number of concurrent requests. Furthermore, EAS provides features such as resource group management and model versioning and capabilities such as comprehensive O&M and monitoring. For more information, see EAS overview.

Introduction to RAG

With the rapid development of AI technology, generative AI has made remarkable achievements in various fields such as text generation and image generation. However, the following inherent limits gradually emerge while LLMs are widely used:

  • Field knowledge limits: Generally, LLMs are trained by using large-scale general datasets. Consequently, LLMs struggle to provide in-depth and targeted processing for specialized vertical fields.

  • Information update delay: The static nature of the training datasets prevents LLMs from accessing and incorporating real-time information and knowledge updates.

  • Misleading outputs: LLMs are prone to hallucinations, producing outputs that appear plausible but are factually incorrect. This is attributed to factors such as data bias and inherent model limitations.

To address these challenges and enhance the capabilities and accuracy of LLMs, RAG is developed. RAG integrates external knowledge bases to significantly mitigate the issue of LLM hallucinations and enhance the capabilities of LLMs to access and apply up-to-date knowledge. This enables the customization of LLMs for greater personalization and accuracy.

Introduction to Elasticsearch

Alibaba Cloud Elasticsearch is a fully managed cloud service that is developed based on open source Elasticsearch. This service is fully compatible with open source Elasticsearch. You can use this out-of-the-box service by using the pay-as-you-go billing method. In addition to out-of-the-box Elastic Stack components such as Elasticsearch, Logstash, Kibana, and Beats, Alibaba Cloud, in partnership with Elastic, provides the X-Pack plug-in free of charge. X-Pack provides multiple advanced features such as security, alerting, monitoring, and machine learning and SQL capabilities. Alibaba Cloud Elasticsearch is widely used in scenarios such as real-time log analysis and processing, information retrieval, multidimensional data queries, and statistical data analytics. For more information, see What is Alibaba Cloud Elasticsearch?

Procedure

EAS provides a self-built RAG systematic solution with flexibly customizable parameters. You can access RAG services by using a web user interface (UI) or calling API operations to customize a RAG-based LLM chatbot. The core of the RAG technical architecture lies in two key components: retrieval and generation:

  • For retrieval, EAS integrates with a range of vector database services, including open source Faiss and the following Alibaba Cloud services: Milvus, Elasticsearch, Hologres, and AnalyticDB for PostgreSQL.

  • For generation, EAS supports a diverse array of open source models such as Qwen, Meta Llama, Mistral, and Baichuan, while also integrating with ChatGPT.

In this example, an Elasticsearch cluster is used to show how to use EAS and Elasticsearch to deploy a RAG-based LLM chatbot by performing the following steps:

  1. Prepare a vector database by using Elasticsearch

    Create an Elasticsearch cluster and prepare the configuration items on which the RAG-based LLM chatbot depends to associate with the Elasticsearch cluster.

  2. Deploy the RAG-based LLM chatbot and associate it with the Elasticsearch cluster

    Deploy the RAG-based LLM chatbot on EAS and associate it with the Elasticsearch cluster.

  3. Use the RAG-based LLM chatbot

    You can connect Elasticsearch to the RAG-based LLM chatbot, upload business data files, and then perform knowledge Q&A. Elasticsearch also provides special features such as dictionary configuration and index management to improve search quality.

Prerequisites

A virtual private cloud (VPC), vSwitch, and security group are created. For more information, see Create a VPC with an IPv4 CIDR block and Create a security group.

Limits

The vector database and EAS must be deployed in the same region.

Usage notes

This practice is subject to the maximum number of tokens of an LLM service and is designed to help you understand the basic retrieval feature of a RAG-based LLM chatbot.

  • The chatbot is limited by the server resource size of the LLM service and the default number of tokens. The conversation length supported by the chatbot is also limited.

  • If you do not need to perform multiple rounds of conversations, we recommended that you disable the with chat history feature of the chatbot. This effectively reduces the possibility of reaching the limit. For more information, see the How do I disable the with chat history feature of the RAG-based chatbot? section of this topic.

Prepare a vector database by using Elasticsearch

Step 1: Create an Alibaba Cloud Elasticsearch cluster

  1. Log on to the Alibaba Cloud Elasticsearch console.
  2. In the left-side navigation pane, click Elasticsearch Clusters.
  3. In the upper-left corner of the Elasticsearch Clusters page, click Create. On the Elasticsearch cluster buy page, configure the key parameters described in the following table. For more information about other parameters, see Create an Alibaba Cloud Elasticsearch cluster.

    Parameter

    Description

    Regions and Zones

    The region and zone in which the cluster resides. Select the region in which EAS is deployed.

    Instance Type

    The type of the cluster. Select Standard Edition.

    Initial Configuration Scenario

    The scenario template of the cluster. Select General.

    Password

    The password that is used to log on to the cluster. Configure a password and save the password to your local device.

  4. Click Buy Now. The Confirm Order page appears. Confirm the order, read the terms of service, select the check box for Terms of Service, and then click Activate Now.

Step 2: Prepare configuration items

  1. Prepare the Elasticsearch cluster URL.

    1. In the top navigation bar of the Elasticsearch Clusters page, select the region in which the created Elasticsearch cluster resides. Find the Elasticsearch cluster and click its ID.

    2. In the Basic Information section of the cluster details page, obtain the private endpoint and corresponding port number. Construct the Elasticsearch cluster URL by using the private endpoint and port number.image

      Format: http://<Private endpoint>:<Port number>.

      Important

      If you use a private endpoint, make sure that the Elasticsearch cluster and RAG-based LLM chatbot reside in the same VPC. Otherwise, the connection fails.

  2. Prepare the index name.

    1. In the top navigation bar of the Elasticsearch Clusters page, select the region in which the created Elasticsearch cluster resides. Find the Elasticsearch cluster and click its ID.

    2. In the left-side pane of the cluster details page, choose Configuration and Management > Cluster Configuration.

    3. On the page that appears, click Modify Configuration in the upper-right corner of the YML File Configuration section.

    4. In the YML File Configuration panel, select Enable for the Auto Indexing parameter, select the check box in the Other Configurations section, and then click OK.image

    After the settings are complete, you can customize the index name when you deploy the RAG-based LLM chatbot. For example, you can set the index name to es-test.

  3. Prepare the username and password of the Elasticsearch cluster.

    The default username of an Elasticsearch cluster is elastic. The password used to log on to an Elasticsearch cluster is the password that you configure when you create the Elasticsearch cluster. If you forget the logon password, perform the following steps to reset the password:

    1. In the top navigation bar of the Elasticsearch Clusters page, select the region in which the created Elasticsearch cluster resides. Find the Elasticsearch cluster and click its ID.

    2. In the left-side pane of the cluster details page, choose Configuration and Management > Security.

    3. In the Access Settings section, click Reset Password next to the Elasticsearch Cluster Password parameter.

    4. In the Reset Password panel, enter a new password and click OK.

Deploy the RAG-based LLM chatbot and associate it with the Elasticsearch cluster

  1. Go to the Elastic Algorithm Service (EAS) page.

    1. Log on to the PAI console.

    2. In the left-side navigation pane, click Workspaces. On the Workspaces page, click the name of the workspace that you want to manage.

    3. In the left-side navigation pane, choose Model Deployment > Elastic Algorithm Service (EAS).

  2. On the Elastic Algorithm Service (EAS) page, click Deploy Service. In the Scenario-based Model Deployment section, click RAG-based Smart Dialogue Deployment.6eea7736f88e6ec8b3b900e4d028bb48

  3. On the RAG-based LLM Chatbot Deployment page, configure the key parameters described in the following table. For more information about other parameters, see the "Step 2: Deploy the RAG-based chatbot" section of the RAG-based LLM chatbot topic.

    Parameter

    Description

    Vector Database Settings

    Vector Database Type

    The service that you want to use to build the vector database. Select Elasticsearch.

    Private Endpoint and Port

    The Elasticsearch cluster URL that you obtained in Step 2.

    Index Name

    The name of the index. You can enter a new index name or an existing index name. If you use an existing index name, the index schema must meet the requirements of the RAG-based chatbot. For example, you can enter the name of the index that is automatically created when you deploy the RAG-based chatbot by using EAS.

    Account

    The username of the Elasticsearch cluster. Enter elastic.

    Password

    The password used to log on to the Elasticsearch cluster, which is configured in Step 2.

    VPC Configuration

    VPC

    The VPC in which the Elasticsearch cluster resides.

    vSwitch

    Security Group Name

  4. After you configure the parameters, click Deploy.

Use the RAG-based LLM chatbot

Basic features provided by the RAG-based LLM chatbot

The following section describes the basic features provided by the RAG-based LLM chatbot. For more information, see RAG-based LLM chatbot.

Configure the RAG-based LLM chatbot

  1. After the RAG-based chatbot is deployed, click View Web App in the Service Type column to enter the web UI.

  2. Configure a machine learning model.

    • Embedding Model Name: Four models are available to convert texts into embedding vectors. You can select an embedding model based on your business requirements.

    • Embedding Dimension: After you configure the Embedding Model Name parameter, the system automatically configures this parameter.

  3. Check whether the vector database in the Elasticsearch cluster is connected.

    The system automatically recognizes and applies the vector database settings that are configured when you deploy the chatbot. Click Connect ElasticSearch to check whether the vector database in the Elasticsearch cluster is connected. If the connection fails, see the Step 2: Prepare configuration items section of this topic to check whether the vector database settings are correct. If not, correct the configuration items and click Connect ElasticSearch to reconnect the Elasticsearch cluster.

Upload business data files

On the Upload tab, upload the specified business data files. You can upload files in the following formats: TXT, PDF, XLSX, XLS, CSV, DOCX, DOC, Markdown, and HTML.

image

  1. Configure semantic-based chunking parameters.

    Sample file

    Description

    rag_chatbot_test_doc.txt

    Configure the following parameters to control the granularity of document chunking and enable automatic Q&A information extraction.

    • Chunk Size: the size of each chunk. Unit: bytes. Default value: 500.

    • Chunk Overlap: the number of overlapping bytes between adjacent chunks. Default value: 10.

    • Process with QA Extraction Model: specifies whether to extract Q&A information. If you select Yes, the system automatically extracts questions and corresponding answers in pairs from uploaded business data files. This enhances retrieval accuracy and provides more relevant responses to data queries.

  2. On the Files tab, upload one or more business data files. You can also upload a directory that contains the business data files on the Directory tab.

  3. Click Upload. The system performs data cleansing and semantic-based chunking on the business data files before the business data files are uploaded. Data cleansing includes text extraction and hyperlink replacement.

Configure Q&A policies

On the Chat tab, configure Q&A policies. The following retrieval methods are available. You can select a retrieval method based on your business requirements.

  • Embedding Only: Vector database-based retrieval is used.

  • Keyword Only: Keyword-based retrieval is used.

  • Hybrid: Multimodal retrieval that combines vector database-based retrieval and keyword-based retrieval is used.

image

Perform knowledge Q&A

The chatbot enters the results returned from the vector database and the query into the selected prompt template and sends the template to the LLM application to provide an answer.

image

Special features provided by Elasticsearch

Customize main and stopword dictionaries

Alibaba Cloud Elasticsearch comes with a built-in IK analysis plug-in named analysis-ik. The analysis-ik plug-in acts as a dictionary that contains common words of various categories. The analysis-ik plug-in comes with a built-in main dictionary and a stopword dictionary. The main dictionary is used to analyze complex texts. The stopword dictionary is used to remove meaningless high-frequency words. This enhances retrieval efficiency and accuracy. Although the built-in dictionaries of analysis-ik are quite comprehensive, they may not include specific terminology in specialized fields such as law and medicine, as well as product names, company names, and brand names found in the knowledge base. To enhance retrieval accuracy, you can create custom dictionaries based on your business requirements. For more information, see Use the analysis-ik plug-in.

1. Prepare a custom main or stopword dictionary

Prepare a dictionary file for a custom main or stopword dictionary on your local device. The dictionary file must meet the following requirements:

  • Format: The dictionary file must be in the DIC format. For example, you can upload a dictionary file named new_word.dic.

  • Content: Each row in the dictionary file can contain only a token or stopword. For example, if you use the built-in main dictionary, the phrase "cloud server" is tokenized into two separate words: "cloud" and "server". If your application requires "cloud server" to be treated as a single term, you can add "cloud server" to a custom dictionary. The following example shows how to define a custom dictionary named new_word.dic:

    Cloud server
    Custom token
2. Upload the dictionary file

After you prepare the dictionary file, you need to upload the dictionary file to the specified location by performing the following steps:

  1. Go to the Elasticsearch Clusters page.

    1. Log on to the Alibaba Cloud Elasticsearch console.
    2. In the left-side navigation pane, click Elasticsearch Clusters.
    3. In the top navigation bar of the Elasticsearch Clusters page, select the region in which the created Elasticsearch cluster resides.

    4. On the page that appears, find the cluster and click its ID.

  2. In the left-side pane of the cluster details page, choose Configuration and Management > Plug-ins.

  3. Update the IK dictionaries of the cluster.

    The analysis-ik plug-in supports the following update methods for IK dictionaries: standard update and rolling update.

    Note
    • You cannot perform standard updates on IK dictionaries for Elasticsearch clusters of V7.16 or later and Elasticsearch clusters deployed based on the cloud-native control architecture in some regions.

    • You cannot use the rolling update method to modify the built-in main dictionary. If you want to modify the built-in main dictionary, use the standard update method.

    • How rolling updates take effect:

      The first time you upload a dictionary file, the dictionaries on all nodes in an Elasticsearch cluster are updated. The update can take effect only after the cluster is restarted. If the name of the dictionary file that you upload is the same as that of the existing dictionary file, the cluster is not restarted. The dictionaries are automatically loaded during the running of the cluster.

    1. On the Built-in Plug-ins tab, enter analysis-ik in the search box and press the ENTER key.

    2. Find analysis-ik and click Rolling Update in the Actions column. In the Configure IK Dictionaries - Rolling Update panel, click Edit.

    3. Select a dictionary update method and upload the dictionary file.

      Update method

      Description

      Upload a DIC file

      Select Upload On-premises File. Upload the dictionary file that you prepare as prompted.

      Upload an Object Storage Service (OSS) file

      Select Upload OSS File. Configure the Bucket Name and File Name parameters and click Add.

      Note
      • Make sure that the specified OSS bucket resides in the same region as your Elasticsearch cluster and the file that you want to upload is in the DIC format.

      • If the content of the dictionary stored in the OSS bucket changes, you must upload the related dictionary file again. Automatic data synchronization is not supported.

      • If you encounter an error message indicating that you cannot access the specified OSS bucket due to the authorization policy of the bucket, click Authorize and complete authorization as prompted.

    4. After you upload the dictionary file, select This operation will restart the cluster. Continue? and click Save.

      After the dictionary file is saved, the Elasticsearch cluster performs a rolling restart. After the rolling restart is complete, the new dictionary takes effect.

  4. Optional. Update the dictionary file.

    If you want to add tokens to or remove tokens from the dictionary, update the uploaded dictionary file by performing the following steps:

    1. In the Configure IK Dictionaries - Rolling Update panel, delete the existing dictionary file and upload a new dictionary file by perform the preceding operations. The name of the new dictionary file must be the same as that of the existing dictionary file.

    2. Click Save.

      Note

      The Elasticsearch cluster does not perform a rolling restart because the dictionary file name remains unchanged.

  5. After the dictionary file is updated, reconnect the RAG-based LLM chatbot to the Elasticsearch cluster on the web UI. For more information, see the Configure the RAG-based LLM chatbot section of this topic.

    After the RAG-based LLM chatbot is reconnected to the Elasticsearch cluster, perform knowledge Q&A on the web UI. If you select Keyword Only or Hybrid for the Keyword Model parameter, you can perform full-text queries by using the updated dictionary file of the Elasticsearch cluster.

Index management

Elasticsearch provides the index management feature, which is crucial for deploying a RAG-based LLM chatbot. By effectively managing indexes, the RAG-based LLM chatbot can efficiently and accurately retrieve valuable information from vast datasets and generate high-quality answers. To manage indexes, perform the following steps:

  1. Go to the Elasticsearch Clusters page.

    1. Log on to the Alibaba Cloud Elasticsearch console.
    2. In the left-side navigation pane, click Elasticsearch Clusters.
    3. In the top navigation bar of the Elasticsearch Clusters page, select the region in which the created Elasticsearch cluster resides.

    4. On the page that appears, find the cluster and click its ID.

  2. In the left-side navigation pane of the page that appears, choose Configuration and Management > Data Visualization.

  3. In the Kibana section, click Modify Configuration. On the page that appears, configure an IP address whitelist for Kibana. In this example, a public IP address whitelist is configured. For more information, see Configure a public or private IP address whitelist for Kibana.

  4. Log on to the Kibana console.

    1. Click the image icon to close the Modify Public IP Address Whitelist panel and return to the Data Visualization page.

    2. In the Kibana section, click Access over Internet.image

    3. On the Kibana logon page, enter the username and password.

    4. Click Log In. The page shown in the following figure appears.image

  5. View and manage indexes.

    1. In the top navigation bar, click the image icon and select Management > Stack Management.

    2. In the left-side navigation pane, choose Data > Index Management.

    3. On the Index tab of the page that appears, view the index that you want to manage and perform management operations such as disabling the index, refreshing the index, clearing the index, or deleting the index. The following figure shows an example on how to manage an index named es_test.

References

  • EAS provides simplified deployment methods for typical cutting-edge scenarios of AI-Generated Content (AIGC) and LLM. You can easily deploy model services by using deployment methods such as ComfyUI, Stable Diffusion WebUI, ModelScope, Hugging Face, Triton Inference Server, and TensorFlow Serving. For more information, see Scenario-based deployment.

  • You can configure various inference parameters on the web UI of the RAG-based LLM chatbot to meet diverse requirements. You can also call the RAG-based LLM chatbot by calling API operations. For more information about implementation details and parameter settings, see RAG-based LLM chatbot.