All Products
Search
Document Center

Platform For AI:RAG-based LLM chatbot

Last Updated:Nov 12, 2024

Elastic Algorithm Service (EAS) provides simplified deployment methods for different scenarios. You can configure parameters to deploy a Retrieval-Augmented Generation (RAG)-based large language model (LLM) chatbot. This significantly shortens the service deployment time. When you use the chatbot to perform model inference, the chatbot effectively retrieves relevant information from the knowledge base and combines the retrieved information with answers from LLM applications to provide accurate and informative answers. This significantly improves the quality of Q&A and overall performance. The chatbot is suitable for Q&A, summarization, and other natural language processing (NLP) tasks that rely on specific knowledge bases. This topic describes how to deploy a RAG-based LLM chatbot and how to perform model inference.

Background information

LLM applications have limits in generating accurate and real-time responses. Therefore, LLM applications are not suitable for scenarios that require precise information, such as the customer service or Q&A scenario. To resolve these issues, the RAG technique is used to enhance the performance of LLM applications. This significantly improves the quality of Q&A, summarization, and other NLP tasks that rely on specific knowledge bases.

RAG improves the answer accuracy and increases the amount of information about answers by combining LLM applications such as Qwen with information retrieval components. When a query is initiated, RAG uses an information retrieval component to find documents or information fragments related to the query in the knowledge base, and integrates these retrieved contents with the original query into an LLM application. The LLM application uses its induction and generation capabilities to generate factual answers based on the latest information. You do not need to retrain the LLM application.

The chatbot that is deployed in EAS integrates LLM applications with RAG to overcome the limits of LLM applications in terms of accuracy and timeliness. This chatbot provides accurate and informative answers in various Q&A scenarios and helps improve the overall performance and user experience of NLP tasks.

Prerequisites

  • Virtual private cloud (VPC), vSwitch, and security group are created. For more information, see Create and manage a VPC and Create a security group.

    Note

    If you use Facebook AI Similarity Search (Faiss) to build a vector database, VPC, vSwitch, and security group are not required.

  • An Object Storage Service (OSS) bucket or File Storage NAS (NAS) file system is created to store fine-tuned model files. This prerequisite must be met if you use a fine-tuned model to deploy the chatbot. For more information, see Get started by using the OSS console or Create a file system.

    Note

    If you use Faiss to build a vector database, you must prepare an OSS bucket.

Usage notes

This practice is subject to the maximum number of tokens of an LLM service and is designed to help you understand the basic retrieval feature of a RAG-based LLM chatbot.

  • The chatbot is limited by the server resource size of the LLM service and the default number of tokens. The conversation length supported by the chatbot is also limited.

  • If you do not need to perform multiple rounds of conversations, we recommended that you disable the with chat history feature of the chatbot on the WebUI page. This effectively reduces the possibility of reaching the limit. For more information, see How do I disable the with chat history feature of the RAG-based chatbot?

Step 1: Deploy the RAG service

To deploy a RAG-based LLM chatbot and bind a vector database, take the following steps:

  1. Log on to the PAI console. Select a region and a workspace. Then, click Enter Elastic Algorithm Service (EAS).

  2. On the Elastic Algorithm Service (EAS) page, click Deploy Service. In the Scenario-based Model Deployment section, click RAG-based Smart Dialogue Deployment. 6eea7736f88e6ec8b3b900e4d028bb48

  3. On the RAG-based LLM Chatbot Deployment page, configure the following key parameters.

    • Basic Information

      Parameter

      Description

      Model Source

      The source of the model. Valid values:

      • Open Source Model :PAI provides a variety of preset open-source models, including Qwen, Llama, ChatGLM, Baichuan, Falcon, Yi, Mistral, Gemma, and DeepSpeek. You can select and deploy a model with the appropriate parameter size.

      • Custom Fine-tuned Model :PAI supports your fine-tuned models for specific scenarios.

      Model Type

      • If you use an Open Source Model, select a model with the appropriate parameter size.

      • If you use a Custom Fine-tuned Model, you need to specify the model type, parameter size, and precision.

      Model Settings

      If you use a Custom Fine-tuned Model, you need to specify the path of the model. The system reads the model configuration file from this path when deploying the model. Valid values:

      Note

      We recommend that you first run the fine-tuned model in Transformers of Huggingface to confirm that the output meets your expectations before you deploy the model as an EAS service.

      • OSS: Select the OSS path in which the fine-tuned model file is stored.

      • NAS: Select the NAS file system in which the fine-tuned model file is stored, the source path and the mount path.

    • Resource Configuration

      Parameter

      Description

      Resource Configuration

      After you select a model, the system recommends appropriate resource configurations. If you switch to another specification, the model service may fail to start.

      Inference Acceleration

      Inference acceleration can be enabled for the Qwen, Llama2, ChatGLM, or Baichuan2 model that is deployed on A10 or GU30 instances. Valid values:

      • BladeLLM Inference Acceleration: The BladeLLM inference acceleration engine ensures high concurrency and low latency. You can use BladeLLM to accelerate LLM inference in a cost-effective manner.

      • Open-source vLLM Inference Acceleration

    • Vector Database Settings

      You can use one of the following types of vector database: Faiss, Elasticsearch, Hologres, OpenSearch, or RDS PostgreSQL. Select a type based on your business requirements.

      FAISS

      You can use Faiss to quickly build a local vector database in an EAS instance without the need to purchase or activate online vector databases.

      Parameter

      Description

      Vector Database Type

      Select FAISS.

      OSS Path

      The OSS path of the vector database. Select an OSS path in the current region. You can create an OSS path if no OSS path is available. For more information, see Get started by using the OSS console.

      Note

      If you use a Custom Fine-tuned Model, make sure that the OSS paths of the vector database and the model are different.

      ElasticSearch

      Specify the connection information of an Elasticsearch cluster. For information about how to create and prepare an Elasticsearch cluster, see Prepare a vector database by using Elasticsearch.

      Parameter

      Description

      Vector Database Type

      Select Elasticsearch.

      Private Endpoint and Port

      The private endpoint and port number of the Elasticsearch cluster. Format: http://Private endpoint:Port number. For information about how to obtain the private endpoint and port number of the Elasticsearch cluster, see View the basic information of a cluster.

      Index Name

      The name of the index. You can enter a new index name or an existing index name. If you use an existing index name, the index schema must meet the requirements of the RAG-based chatbot. For example, you can enter the name of the index that is automatically created when you deploy the RAG-based chatbot by using EAS.

      Account

      The logon name that you specified when you created the Elasticsearch cluster. Default logon name: elastic.

      Password

      The password that you configured when you created the Elasticsearch cluster. If you forget the password, see Reset the access password for an Elasticsearch cluster.

      Hologres

      Specify the connection information of a Hologres instance. To purchase a Hologres instance, see Purchase a Hologres instance.

      Parameter

      Description

      Vector Database Type

      Select Hologres.

      Invocation Information

      The host information of Select VPC. Go to the Instance Details page in the Hologres console. In the Network Information section, click Copy next to Select VPC to obtain the host before the domain name :80.

      Database Name

      The name of the database in the Hologres instance. For more information about how to create a database, see Create a database.

      Account

      The custom account that you created. For more information, see Create a custom account. In the Select Member Role section, select Examples of the Super Administrator (SuperUser).

      Password

      The password of the custom account that you created.

      Table Name

      The name of the table. You can enter a new table name or an existing table name. If you use an existing table name, the table schema must meet the requirements of the RAG-based chatbot. For example, you can enter the name of the Hologres table that is automatically created when you deploy the RAG-based chatbot by using EAS.

      OpenSearch

      Specify the connection information of an OpenSearch instance of Vector Search Edition. For information about how to create and prepare an OpenSearch instance, see Prepare an OpenSearch Vector Search Edition instance.

      Parameter

      Description

      Vector Database Type

      Select OpenSearch.

      Endpoint

      The public endpoint of the OpenSearch instance. You must first configure Internet access for the OpenSearch instance. For more information, see Prepare an OpenSearch Vector Search Edition instance.

      Instance ID

      Obtain the instance ID from the OpenSearch instance list.

      Username

      Enter the username and password of the OpenSearch instance.

      Password

      Table Name

      Enter the name of the index table of the OpenSearch instance. For information about how to prepare the index table, see Prepare an OpenSearch Vector Search Edition instance.

      RDS PostgreSQL

      Specify the connection information of the ApsaraDB RDS for PostgreSQL instance. For information about how to create and prepare an ApsaraDB RDS for PostgreSQL instance, see Prepare a vector database by using ApsaraDB RDS for PostgreSQL.

      Parameter

      Description

      Vector Database Type

      Select RDS PostgreSQL.

      Host Address

      The internal endpoint of the ApsaraDB RDS for PostgreSQL instance. You can log on to the ApsaraDB ApsaraDB RDS for PostgreSQL console and view the endpoint on the Database Connection page of the instance.

      Port

      The port number. Default value: 5432.

      Database

      The name of the database. For information about how to create a database and an account, see Create a database and an account.

      • When you create an account, select Privileged Account for Account Type.

      • When you create a database, select the created privileged account from the Authorized By drop-down list.

      Table Name

      The name of the database table.

      Account

      Specify the privileged account and password you created. Create a database and an account.

      Password

    • VPC Configuration

      Parameter

      Description

      VPC

      • If you use Hologres, ElasticSearch, OpenSearch, or RDS PostgreSQL to build a vector database, select the VPC in which the vector database is deployed.

        Note

        If you use OpenSearch to build a vector database, you can select a VPC that is different from the VPC in which the RAG application resides. However, make sure that the VPC can be accessed over the Internet and the associated Elastic IP address (EIP) is added to the public IP address whitelist of the OpenSearch instance. For more information, see Use the SNAT feature of an Internet NAT gateway to access the Internet and Configure the public access whitelist.

      • If you use Faiss to build a vector database, you do not need to configure the VPC.

      vSwitch

      Security Group Name

  4. Click Deploy.

    When the Service Status changes to Running, the RAG-based chatbot is deployed.

Step 2: Test the chatbot through WebUI

Follow the following steps to upload your knowledge base file on the WebUI page and test the Q&A chatbot.

1. Connect to the vector database

  1. After you deploy the RAG-based chatbot, click View Web App in the Service Type column to enter the web UI.

  2. Configure the embedding model. The system uses the embedding model to convert text chunks into vectors.

    • Embedding Model Name: Four models are available. By default, the optimal model is selected.

    • Embedding Dimension: This parameter has a direct impact on the performance of the model. After you select an embedding model, the system automatically configures this parameter.

  3. Check whether the vector database is connected.

    The system automatically recognizes and applies the vector database settings that are configured when you deploy the chatbot. If you use Hologres to build the vector database, click Connect Hologres to check whether the vector database in Hologres is connected. If the connection fails, check whether the vector database is correctly configured based Step 1. Then, reconnect the database.

2. Upload knowledge base files

Upload your knowledge base files. The system automatically stores the knowledge base in the PAI-RAG format to the vector database for retrieval. You can also use existing knowledge bases in the database, but the knowledge bases must meet the PAI-RAG format requirements. Otherwise, errors may occur during retrieval.

image

  1. On the Upload tab, configure the chunk parameters.

    The following parameters controls the granularity of document chunking and whether to enable Q&A extraction.

    Parameter

    Description

    Chunk Size

    The size of each chunk. Unit: bytes. Default value: 500.

    Chunk Overlap

    The overlap between adjacent chunks. Default value: 10.

    Process with QA Extraction Model

    Specifies whether to extract Q&A information. If you select Yes, the system automatically extracts questions and corresponding answers in pairs after knowledge files are uploaded. This way, more accurate answers are returned in data queries.

  2. On the Files tab or Directory tab, upload one or more business data files. You can also upload a directory that contains the business data files. Supported file types: txt,. pdf, Excel (.xlsx or. xls),. csv, Word (.docx or. doc), Markdown, or. html. For example: rag_chatbot_test_doc.txt.

  3. Click Upload. The system performs data cleansing and semantic-based chunking on the business data files before uploading the business data files. Data cleansing includes text extraction and hyperlink replacement.image

3. Configure model inference parameters

On the Chat tab, configure Q&A policies.

Retrieval policies

image

Parameter

description

Streaming Output

Specifies whether to return results in streaming mode. If you select Streaming Output, the results are returned in streaming mode.

Retrieval Mode

The retrieval method. Valid values:

  • Embedding Only: Vector database-based retrieval.

  • Keyword Only: Keyword-based retrieval.

  • Hybrid: Combines vector database-based retrieval and keyword-based retrieval.

Note

In most complex scenarios, vector database-based retrieval delivers good performance. However, in some vertical fields that lacks information or in scenarios in which accurate matching is required, vector database-based retrieval may not achieve the same effect as the traditional retrieval based on sparse and dense vectors. Retrieval based on sparse and dense vectors is simpler and more efficient by calculating the keyword overlap between user queries and knowledge files.

PAI provides keyword-based retrieval algorithms, such as BM25, to perform retrieval based on sparse and dense vectors. Vector database-based retrieval and keyword-based retrieval have their own advantages and disadvantages. Combining the results of the two types of retrieval methods can improve the overall accuracy and efficiency.

The reciprocal rank fusion (RRF) algorithm calculates the weighted sum value of ranks by which a file is sorted in different retrieval methods to obtain a total score. If you select Hybrid for the Keyword Model parameter, multimodal retrieval is used. In this case, PAI uses the RRF algorithm by default to combine results returned from the vector database-based retrieval and keyword-based retrieval.

Reranker Type

Most vector databases compromise data accuracy to provide high computing efficiency. As a result, the top K results that are returned from the vector database may not be the most relevant. In this case, you can use one of the following rerank models to perform a higher-precision re-rank operation on the top K results that are returned from the vector database to obtain more relevant and accurate knowledge files.

  • simple-weighted-reranker: The top K returned results are sorted by weighting.

  • model-based-reranker: Select the open source rerank model BAAI/bge-reranker-base or BAAI/bge-reranker-large to sort the top K results that are first recalled from the vector database.

    Note

    If you use a model for the first time, you may need to wait for a period of time before the model is loaded.

Top K

The number of the most relevant results that are returned from the vector database.

RAG (Retrieval + LLM) policies

image

  • PAI provides various prompt policies. You can select a predefined prompt template or specify a custom prompt template for better inference results. The retrieval-augmented generation (RAG) system fills the returned results and user query into a prompt template, and then submits the prompt to the LLM.

  • You can also configure the following parameters in RAG (Retrieval + LLM) mode: Streaming Output, Retrieval Mode, and Reranker Type. For more information, see the Retrieval policies tab of this section.

4. Perform model inference

Retrieval

The chatbot returns the top K relevant results from the vector database.

image

LLM

The chatbot uses only the LLM to generate an answer.

image

RAG (Retrieval + LLM)

The chatbot fills the returned results from the database and user query into a prompt template, and then submits the prompt to the LLM to generate an answer.

image

After you test the Q&A performance of the RAG-based chatbot on the web UI, you can call API operations provided by Platform for AI (PAI) to apply the RAG-based chatbot to your business system. For more information, see Step 3: Call API operations to perform model inference in this topic.

Step 3: Call API operations to perform model inference

  1. Obtain the invocation information of the RAG-based chatbot.

    1. Click the name of the RAG-based chatbot to go to the Service Details page.

    2. In the Basic Information section, click View Endpoint Information.

    3. On the Public Endpoint tab of the Invocation Method dialogue box, obtain the service endpoint and token.

  2. Connect to the vector database through the WebUI and upload knowledge base files.

    You can also upload knowledge base to the vector base based on the structure of a generated table, which conforms to the PAI-RAG format.

  3. Call the service through APIs.

    PAI allows you to call the RAG-based chatbot by using the following API operations in different query modes: service/query/retrieval in retrieval mode, service/query/llm in LLM mode, and service/query in RAG mode. Sample code:

    cURL command

    • Initiate a single-round conversation request

      • Method 1: Call the service/query/retrieval operation.

        curl -X 'POST'  '<service_url>service/query/retrieval' -H 'Authorization: <service_token>' -H 'accept: application/json' -H 'Content-Type: application/json'  -d '{"question": "What is PAI?"}'
        # Replace <service_url> and <service_token> with the service endpoint and service token that you obtained in Step 1.

      • Method 2: Call the /service/query/llm operation.

        curl -X 'POST'  '<service_url>service/query/llm' -H 'Authorization: <service_token>' -H 'accept: application/json'  -H 'Content-Type: application/json'  -d '{"question": "What is PAI?"}'
        # Replace <service_url> and <service_token> with the service endpoint and service token that you obtained in Step 1.

        You can add other adjustable inference parameters such as {"question":"What is PAI?", "temperature": 0.9}.

      • Method 3: Call the service/query operation.

        curl -X 'POST'  '<service_url>service/query' -H 'Authorization: <service_token>' -H 'accept: application/json'  -H 'Content-Type: application/json'  -d '{"question": "What is PAI?"}'
        # Replace <service_url> and <service_token> with the service endpoint and service token that you obtained in Step 1.

        You can add other adjustable inference parameters such as {"question":"What is PAI?", "temperature": 0.9}.

    • Initiate a multi-round conversational search request

      You can initiate a multi-round conversational search request only in RAG and LLM query modes. The following sample code shows an example on how to initiate a multi-round conversational search request in RAG query mode:

      # Send the request.  
      curl -X 'POST'  '<service_url>service/query' -H 'Authorization: <service_token>' -H 'accept: application/json'  -H 'Content-Type: application/json'  -d '{"question": "What is PAI?"}'
      
      # Provide the session ID returned for the request. This ID uniquely identifies a conversation in the conversation history. After the session ID is provided, the corresponding conversation is stored and is automatically included in subsequent requests to call an LLM. 
      curl -X 'POST'  '<service_url>service/query' -H 'Authorization: <service_token>' -H 'accept: application/json'  -H 'Content-Type: application/json'  -d '{"question": "What are the benefits of PAI?","session_id": "ed7a80e2e20442eab****"}'
      
      # Provide the chat_history parameter, which contains the conversation history between you and the chatbot. The parameter value is a list in which each element indicates a single round of conversation in the {"user":"Inputs","bot":"Outputs"} format. Multiple conversations are sorted in chronological order. 
      curl -X 'POST'  '<service_url>service/query' -H 'Authorization: <service_token>' -H 'accept: application/json'  -H 'Content-Type: application/json'  -d '{"question":"What are the features of PAI?", "chat_history": [{"user":"What is PAI", "bot":"PAI is an AI platform provided by Alibaba Cloud..."}]}'
      
      # If you provide both the session_id and chat_history parameters, the conversation history is appended to the conversation that corresponds to the specified session ID.  
      curl -X 'POST'  '<service_url>service/query' -H 'Authorization: <service_token>' -H 'accept: application/json'  -H 'Content-Type: application/json'  -d '{"question":"What are the features of PAI?", "chat_history": [{"user":"What is PAI", "bot":"PAI is an AI platform provided by Alibaba Cloud..."}], "session_id": "1702ffxxad3xxx6fxxx97daf7c"}'

    Python

    • The following sample code shows an example on how to initiate a single-round conversational search request:

      import requests
      
      EAS_URL = 'http://xxxx.****.cn-beijing.pai-eas.aliyuncs.com'
      headers = {
          'accept': 'application/json',
          'Content-Type': 'application/json',
          'Authorization': 'MDA5NmJkNzkyMGM1Zj****YzM4M2YwMDUzZTdiZmI5YzljYjZmNA==',
      }
      
      
      def test_post_api_query_llm():
          url = EAS_URL + '/service/query/llm'
          data = {
             "question":"What is PAI?"
          }
          response = requests.post(url, headers=headers, json=data)
      
          if response.status_code != 200:
              raise ValueError(f'Error post to {url}, code: {response.status_code}')
          ans = dict(response.json())
          print(f"======= Question =======\n {data['question']}")
          print(f"======= Answer =======\n {ans['answer']} \n\n")
      
      
      def test_post_api_query_retrieval():
          url = EAS_URL + '/service/query/retrieval'
          data = {
             "question":"What is PAI?"
          }
          response = requests.post(url, headers=headers, json=data)
      
          if response.status_code != 200:
              raise ValueError(f'Error post to {url}, code: {response.status_code}')
          ans = dict(response.json())
          print(f"======= Question =======\n {data['question']}")
          print(f"======= Answer =======\n {ans['docs']}\n\n")
      
      
      def test_post_api_query_rag():
          url = EAS_URL + '/service/query'
          data = {
             "question":"What is PAI?"
          }
          response = requests.post(url, headers=headers, json=data)
      
          if response.status_code != 200:
              raise ValueError(f'Error post to {url}, code: {response.status_code}')
          ans = dict(response.json())
          print(f"======= Question =======\n {data['question']}")
          print(f"======= Answer =======\n {ans['answer']}")
          print(f"======= Retrieved Docs =======\n {ans['docs']}\n\n")
      # LLM
      test_post_api_query_llm()
      # Retrieval
      test_post_api_query_retrieval()
      # RAG (Retrieval + LLM)
      test_post_api_query_rag()
      

      Set the EAS_URL parameter to the endpoint of the RAG-based chatbot. Make sure to remove the forward slash (/) at the end of the endpoint. Set the Authorization parameter to the token of the RAG-based chatbot.

    • Initiate a multi-round conversational search request

      You can initiate a multi-round conversational search request only in RAG (Retrieval + LLM) and LLM query modes. Sample code:

      import requests
      
      EAS_URL = 'http://xxxx.****.cn-beijing.pai-eas.aliyuncs.com'
      headers = {
          'accept': 'application/json',
          'Content-Type': 'application/json',
          'Authorization': 'MDA5NmJkN****jNlMDgzYzM4M2YwMDUzZTdiZmI5YzljYjZmNA==',
      }
      
      
      def test_post_api_query_llm_with_chat_history():
          url = EAS_URL + '/service/query/llm'
          # Round 1 query
          data = {
             "question":"What is PAI?"
          }
          response = requests.post(url, headers=headers, json=data)
      
          if response.status_code != 200:
              raise ValueError(f'Error post to {url}, code: {response.status_code}')
          ans = dict(response.json())
          print(f"=======Round 1: Question =======\n {data['question']}")
          print(f"=======Round 1: Answer =======\n {ans['answer']} session_id: {ans['session_id']} \n")
         
          # Round 2 query
          data_2 = {
             "question": "What are the benefits of PAI?",
             "session_id": ans['session_id']
          }
          response_2 = requests.post(url, headers=headers, json=data_2)
      
          if response.status_code != 200:
              raise ValueError(f'Error post to {url}, code: {response.status_code}')
          ans_2 = dict(response_2.json())
          print(f"=======Round 2: Question =======\n {data_2['question']}")
          print(f"=======Round 2: Answer =======\n {ans_2['answer']} session_id: {ans_2['session_id']} \n\n")
      
      
      def test_post_api_query_rag_with_chat_history():
          url = EAS_URL + '/service/query'
         
          # Round 1 query
          data = {
             "question":"What is PAI?"
          }
          response = requests.post(url, headers=headers, json=data)
      
          if response.status_code != 200:
              raise ValueError(f'Error post to {url}, code: {response.status_code}')
          ans = dict(response.json())
      
          print(f"=======Round 1: Question =======\n {data['question']}")
          print(f"=======Round 1: Answer =======\n {ans['answer']} session_id: {ans['session_id']}")
          print(f"=======Round 1: Retrieved Docs =======\n {ans['docs']}\n")
      
          # Round 2 query
          data = {
             "question":"What are the features of PAI?",
             "session_id": ans['session_id']
          }
          response = requests.post(url, headers=headers, json=data)
      
          if response.status_code != 200:
              raise ValueError(f'Error post to {url}, code: {response.status_code}')
          ans = dict(response.json())
      
          print(f"=======Round 2: Question =======\n {data['question']}")
          print(f"=======Round 2: Answer =======\n {ans['answer']} session_id: {ans['session_id']}")
          print(f"=======Round 2: Retrieved Docs =======\n {ans['docs']}")
      # LLM
      test_post_api_query_llm_with_chat_history()
      # RAG (Retrieval + LLM)
      test_post_api_query_rag_with_chat_history()
      

      Set the EAS_URL parameter to the endpoint of the RAG-based chatbot. Make sure to remove the forward slash (/) at the end of the endpoint. Set the Authorization parameter to the token of the RAG-based chatbot.

References

You can also use EAS to deploy the following items:

  • You can deploy an LLM application that can be called by using the web UI or API operations. After the LLM application is deployed, use the LangChain framework to integrate enterprise knowledge bases into the LLM application to implement intelligent Q&A and automation features. For more information, see Quickly deploy open source LLMs in EAS.

  • You can deploy an AI video generation model service by using ComfyUI and Stable Video Diffusion models. This helps you complete tasks such as short video generation and animation on social media platforms. For more information, see Use ComfyUI to deploy an AI video generation model service.

  • You can deploy a model service based on Stable Diffusion WebUI by configuring a few parameters. For more information, see Use Stable Diffusion web UI to deploy an AI painting service.

FAQ

How do I disable the with chat history feature of the RAG-based chatbot?

On the web UI page of the RAG-based chatbot, uncheck With Chat History.image