Chatbots are a smart communication system that uses natural language processing (NLP) to provide human-like responses to human prompts. You can use chatbots to build an enterprise knowledge base for intelligent queries. This topic describes how to use Compute Nest and AnalyticDB for PostgreSQL to create a dedicated chatbot.
Billing rules
When you create a dedicated enterprise chatbot based on large language models (LLMs) and vector databases, the system creates an Elastic Compute Service (ECS) instance and an AnalyticDB for PostgreSQL instance in elastic storage mode. You are charged for the instances. For information about the billing rules of the instances, see the following topics:
Benefits
Supports multiple LLMs, such as Tongyi Qianwen-7b, ChatGLM-6b, Llama2-7b, and Llama2-13b. You can switch between the LLMs.
Supports GPU cluster management. During the testing stage, you can use a GPU instance that features low resource usage. As your business grows, you can perform elastic GPU cluster management to reduce GPU overheads.
Supports fine-grained permission design based on AnalyticDB for PostgreSQL database capabilities. You can query permissions based on open source code and flexibly call API operations for AnalyticDB for PostgreSQL knowledge base management.
Allows you to call API operations or use the web UI to integrate with applications and implement Artificial Intelligence Generated Content (AIGC) capabilities.
Ensures the security of core enterprise data, such as business data, algorithms, and GPU resources.
Grant permissions to RAM users
Before you use a Resource Access Management (RAM) user to perform the following operations, you must grant permissions to the RAM user.
Video tutorial
Create a service instance
In this example, GenAI-LLM-RAG is used.
Log on to the Compute Nest console, go to the Service Marketplace page, and then click GenAI-LLM-RAG. On the page that appears, click Launch Now.
On the Create Service Instance page, configure the parameters that are described in the following table.
Category
Parameter
Description
Service Instance Name
The name of the service instance. The system generates a random name. We recommend that you specify a descriptive name that is easy to identify.
Region
The region in which the service instance, ECS instance, and AnalyticDB for PostgreSQL instance are deployed.
PayType Configuration
ECS Instance Charge Type
The billing method of the ECS instance. You can select Pay-as-you-go or Subscription.
In this example, Pay-as-you-go is selected.
ECS Configuration
Instance Type
The specifications of the ECS instance.
Instance Password
The password that is used to log on to the ECS instance.
IngressIP
The IP address whitelist of the ECS instance.
We recommend that you add the IP address of the server that needs to access the specified LLM to the whitelist.
PAI-EAS Configuration
ModelType
The LLM that you want to use. In this example, llama2-7b is used.
pai instance type
The GPU specifications of Platform for AI (PAI).
AnalyticDB PostgreSQL
DBInstanceSpec
The compute node specifications of the AnalyticDB for PostgreSQL instance.
SegmentStorageSize
The compute node storage capacity of the AnalyticDB for PostgreSQL instance. Unit: GB.
DB Username
The name of the privileged account of the AnalyticDB for PostgreSQL instance.
Instance Password
The password of the privileged account of the AnalyticDB for PostgreSQL instance.
Choose model repo
User Name
The logon name of the LLM software.
Software Login Password
The logon password of the LLM software.
Zone Configuration
vSwitch Availability Zone
The zone in which the service instance is deployed.
Choose existing Infrastructure Configuration
WhetherCreateVpc
Specifies whether to create a virtual private cloud (VPC) or use an existing VPC. In this example, a VPC is created.
VPC ID
The VPC ID.
VSwitch ID
The vSwitch ID.
Tags and Resource Groups
Tag
The tag that you want to add to the service instance.
Resource Group
The resource group to which the service instance belongs. For more information, see What is Resource Management?
Click Next: Confirm Order.
Check information in the Dependency Check, Service Instance Information, and Price Preview sections.
NoteIf specific role permissions are not granted, click Authorize in the Dependency Check section. After you grant the required role permissions, click the refresh button in the section.
Select I have read and agreed to Computing Nest Service Agreement and click Create Now.
After the creation request is submitted, click View Service.
The service instance requires approximately 10 minutes to create. When the status of the service instance changes from Deploying to Deployed on the Service Instances page, the service instance is created.
Use a chatbot
Before you use a chatbot, you must upload a file that contains knowledge questions and answers to the knowledge base. This section describes how to upload files and use the chatbot.
On the Service Instances page of the Compute Nest console, click the ID of the service instance that you want to manage to go to the Service Instance Details page.
In the Instance Information section of the Service Instance Details page, click the endpoint in the Endpoint field.
Upload files to the knowledge base.
To upload files, you can click Upload File, Upload File and URL, or Upload Folder.
You can upload files in the PDF, Markdown, TXT, or Word format.
To remove files, click Delete File.
After you upload files, enter your question and click Submit.
Resource management
View the resources associated with a service instance
On the Service Instances page of the Compute Nest console, click the ID of the service instance that you want to manage to go to the Service Instance Details page.
Click the Resources tab.
AnalyticDB for PostgreSQL resource management
On the Resources tab, find the resource whose service is AnalyticDB for PostgreSQL and click the resource ID to go to the AnalyticDB for PostgreSQL instance management page.
For information about AnalyticDB for PostgreSQL vector analysis, see the following topics:
For information about additional storage and computing resources, see the following topics:
View knowledge base data on the AnalyticDB for PostgreSQL instance
On the AnalyticDB for PostgreSQL instance management page, click Log On to Database in the upper-right corner. For more information, see Use DMS to connect to an AnalyticDB for PostgreSQL instance.
NoteWhen you use Data Management (DMS) to connect to the AnalyticDB for PostgreSQL instance, use the database account name and database account password that you specified when you created the service instance.
After logon, click Instances Connected in the left-side navigation pane, find the AnalyticDB for PostgreSQL instance that you want to manage, and then double-click the public schema in the chatglmuser database.
The knowledge base data is stored in the langchain_collections table.
After you upload a knowledge base or a document, the corresponding chunks are stored in a table that is named after the knowledge base or the document. The table contains information such as embedding data, chunks, file metadata, and original file names.
For more information about DMS, see What is DMS?
EAS resource management
Enable elastic scaling
Elastic Algorithm Service (EAS) provides various Serverless scaling capabilities, such as auto scaling, scheduled scaling, and elastic resource pools. If your workloads frequently fluctuate, you can enable the horizontal auto scaling feature to prevent resource waste. After you enable this feature, EAS automatically adjusts the number of instances to manage the computing resources of online services. This ensures the stability of your business and improves resource utilization.
On the Resources tab of the Service Instance Details page, find the resource whose service is PAI and click the resource ID to go to the Service Details page of the PAI console.
Click the Auto Scaling tab.
On the Auto Scaling tab, click Enable Auto Scaling.
In the Auto Scaling Settings dialog box, configure the Minimum Number of Instances, Maximum Number of Instances, and General Scaling Metrics parameters.
If your business involves only a small amount of data, you can configure auto scaling to start based on your business requirements. We recommend that you set the Minimum Number of Instances parameter to 0, the Maximum Number of Instances parameter to 1, the General Scaling Metrics parameter to QPS Threshold of Individual Instance, and the QPS Threshold of Individual Instance parameter to 1. This way, auto scaling starts when a request occurs and stops when no request occurs.
If your business involves a large amount of data and your workloads frequently fluctuate, you can configure auto scaling based on your business requirements. For example, you can set the Minimum Number of Instances parameter to 5, the Maximum Number of Instances parameter to 50, the General Scaling Metrics parameter to QPS Threshold of Individual Instance, and the QPS Threshold of Individual Instance parameter to 2. This way, auto scaling is performed between 5 to 50 instances based on your business requirements.
Click Enable.
Change the open source LLM
On the Resources tab of the Service Instance Details page, find the resource whose service is PAI and click the resource ID to go to the Service Details page of the PAI console.
Click Update Service.
On the Deploy Service page, modify the content in the Command to Run field, reselect a GPU instance type, and retain the default settings for other parameters.
The following table describes the running commands and recommended GPU instance types of different LLMs.
LLM
Command to run
Recommended instance type
llama2-13b
python api/api_server.py --port=8000 --model-path=meta-llama/Llama-2-13b-chat-hf --precision=fp16
V100 (gn6e)
llama2-7b
python api/api_server.py--port=8000 --model-path=meta-llama/Llama-2-7b-chat-hf
GU30 and A10
chatglm2-6b
python api/api_server.py --port=8000 --model-path=THUDM/chatglm2-6b
GU30 and A10
Qwen-7b
python api/api_server.py --port=8000 --model-path=Qwen/Qwen-7B-Chat
GU30 and A10
Click Deploy.
In the Deploy Service message, click OK.
FAQ
Q: How do I call the API operations for vector search?
A: See the following topics: Java.
Q: How do I check the deployment progress of a service instance?
A: After you submit a request to create a service instance, the service instance requires approximately 10 minutes to create. The creation time includes the amount of time that is required to initialize the ECS instance and the AnalyticDB for PostgreSQL instance. At the same time, the LLM is downloaded in an asynchronous manner. This process requires approximately 30 minutes to 60 minutes to complete. To check the LLM download progress, connect to the ECS instance and view download logs. After the LLM is downloaded, you can log on to the web UI to view the chatbot.
Q: After I create a Compute Nest service instance, how do I connect to the ECS instance?
A: On the Resources tab of the service instance details page, find the resource whose type is security group and click the resource ID. On the Instance Details tab of the ECS instance, click Connect in the Basic Information section. For more information, see Connect to an instance.
Q: How do I restart the LangChain service?
A: Connect to the ECS instance and run the following command:
systemctl restart langchain-chatglm
Q: How do I query LangChain logs?
A: Connect to the ECS instance and run the following command:
journalctl -ef -u langchain-chatglm
Q: What do I do if the LLM fails to be loaded for a Compute Nest service instance?
A: After you create a Compute Nest service instance, the system downloads the LLM from Hugging Face to your on-premises device. The download requires 30 minutes to 60 minutes to complete in Chinese regions. After the download is complete, refresh the page and reload the LLM.
Q: How do I view the information about deployment code?
A: See langchain-ChatGLM.
Q: How do I make a request for support from the service team?
A: Apply for the one-stop dedicated enterprise chatbot O&M service.
Q: Where is the LangChain service deployed on ECS?
A: The LangChain service is deployed in the
/home/admin/langchain-ChatGLM
path.Q: How do I enable the LangChain API?
A: Connect to the ECS instance and run the following commands:
# Create the systemd file of langchain-chatglm-api. cp /lib/systemd/system/langchain-chatglm.service /lib/systemd/system/langchain-chatglm-api.service # Modify the ExecStart parameter in /lib/systemd/system/langchain-chatglm-api.service. # The ExecStart parameter for EAS ExecStart=/usr/bin/python3.9 /home/langchain/langchain-ChatGLM/api.py # The ExecStart parameter for a GPU single-host ExecStart=/usr/bin/python3.9 /home/admin/langchain-ChatGLM/api.py # Reload the systemd file. systemctl daemon-reload # Enable the API. systemctl restart langchain-chatglm-api # The following information indicates that the API is enabled: INFO: Uvicorn running on http://0.0.0.0:7861 (Press CTRL+C to quit) # List all API operations. curl http://0.0.0.0:7861/openapi.json