Develop an LLM-based intent recognition solution

This solution is based on the intent recognition technology of Large Language Models (LLMs). The solution can learn complex language and user behavior patterns from massive data volumes to achieve more accurate recognition of user intents and provide a smoother and more natural interaction experience. This topic describes the complete development process of an LLM-based intent recognition solution based on the Qwen1.5 LLM.

Background information

What is intent recognition?

AI agents interpret user requirements described in natural language to perform proper operations or provide related information. AI agents have become an essential component in intelligent interaction systems. LLM-based intent recognition technology has gained significant attention in the industry and is widely applied.

Typical scenarios of intent recognition technology

In intelligent voice assistant scenarios, users interact with voice assistants by using simple voice commands. For example, when a user says "I want to listen to music" to a voice assistant, the system must accurately recognize that the user requirement is to play music, and then perform the related operation.
In intelligent customer service scenarios, the challenge lies in how to handle various customer service requests and quickly and accurately classify them into different processes, such as returns, exchanges, and complaints. For example, a user may say "I received a defective item and I want to return it" to the customer service system of an e-commerce platform. In this case, the LLM-based intent recognition system must quickly capture the information that the intent of the user is to "return an item" and trigger the return process to further guide the user to complete subsequent operations.

Working process

The following figure shows the working process of the LLM-based intent recognition solution.

Prepare training data
You need to prepare training datasets for specific business scenarios based on data format requirements and data preparation strategies. You can also prepare business data based on data preparation strategies and use iTAG to label raw data. Then, you need to export labeling results and convert the labeling results into data in the format supported by QuickStart of Platform for AI (PAI) for subsequent model training.
Train and perform an offline evaluation on a model
In QuickStart, you can train the Qwen1.5-1.8B-Chat model. After you train the model, you need to perform an offline evaluation on the model.
Deploy and call a model service
If the model evaluation results meet your expectations, you can use QuickStart to deploy the trained model to Elastic Algorithm Service (EAS) as an online service.

Prerequisites

Before you perform the operations that are described in this topic, make sure that you have completed the following preparations:

Deep Learning Containers (DLC) and EAS of PAI are activated and a default workspace is created. For more information, see Activate PAI and create a default workspace.
An Object Storage Service (OSS) bucket is created to store training data and the model file obtained from model training. For information about how to create a bucket, see Create buckets.

Prepare training data

You can prepare training data by using one of the following methods:

Method 1: Build a training dataset based on data preparation strategies and data format requirements.
Method 2: Use iTAG to label data based on data preparation strategies. This method is suitable for large-scale data scenarios and can significantly improve the labeling efficiency.

Data preparation strategies

To improve the effectiveness and stability of training, you can prepare data based on the following strategies:

In single-intent recognition scenarios, make sure that at least 50 to 100 data records are labeled for each type of intent. If the model performance after fine-tuning does not meet your expectations, you can increase the number of labeled data records. In addition, you must ensure that the quantity of labeled data records for each type of intent is balanced.
In multi-intent recognition scenarios or multi-round chat scenarios, we recommend that the quantity of labeled data records is more than 20% of the quantity of labeled data records in single-intent recognition scenarios, and the intents involved in multi-intent recognition scenarios or multi-round chat scenarios must have occurred in single-intent recognition scenarios.
Intent descriptions need to cover as many phrasings and scenarios as possible.

Data format requirements

The training data must be saved in a JSON file, which contains the instruction and output fields. The output field corresponds to the intent predicted by a model and related parameters. The following sample code provides examples of training data in different intent recognition scenarios.

In single-intent recognition scenarios, you need to prepare business data for a specific business scenario to fine-tune an LLM. The following sample code provides an example of single-round chats for the smart home scenario.

[
    {
        "instruction": "I want to listen to music",
        "output": "play_music()"
    },
    {
        "instruction": "Too loud, turn the sound down",
        "output": "volume_down()"
    },
    {
        "instruction": "I do not want to listen to this, turn it off",
        "output": "music_exit()"
    },
    {
        "instruction": "I want to visit Hangzhou. Check the weather forecast for me",
        "output": "weather_search (Hangzhou)"
    },
]

In multi-intent recognition or multi-round chat scenarios, the intents of a user may be expressed across multiple rounds in a chat. In this case, you can prepare multiple rounds of chat data and label the multi-round inputs. The following sample code provides an example of multi-round chats for a voice assistant:
```
User: I want to listen to music. 
Assistant: What kind of music? 
User: Play ***. 
Assistant:play_music(***) 
```
The training data for multi-round chats is in the following format:
```
[
    {
        "instruction": "I want to listen to music. Play ***",
        "output": "play_music(***)"
    }
]
```

The sequence length for model training in multi-round chats is significantly longer, and the number of intention recognition scenarios that use multi-round chats is limited. We recommend that you use the multi-round chat mode for model training only if the single-round chat mode cannot meet your business requirements. The following section provides an example of the single-round chat mode to illustrate the complete process.

Use iTAG to label data

You can label data in iTAG of PAI to generate a training dataset that meets specific requirements by performing the following steps:

Prepare a data file in the manifest format. For more information, see Data preparation strategies. Example:

{"data":{"instruction": "I want to listen to music"}}
{"data":{"instruction": "Too loud, turn it down"}}
{"data":{"instruction": "I do not want to listen to this, turn it off"}}
{"data":{"instruction": "I want to visit Hangzhou. Check the weather forecast for me"}}

Create a dataset. The following table describes the key parameters. For information about other parameters, see Create and manage datasets.

Parameter	Description

Parameter	Description
Storage type	Select OSS.
Import Format	Select File.
OSS Path	Select the created OSS directory and upload the manifest file that you prepared by performing the following steps: In the Select OSS file dialog box, click Upload File. Click View local files and select the manifest file that you want to upload from your on-premises machine, or directly drag the file to the blank area.

In the upper-right corner of the iTAG page, click Go to Management Page. On the page that appears, click the Templates tab. On the Template Management tab of the Asset Management page, click Create Template. On the page that appears, choose Custom Template > Basic Template and click Edit. On the Basic Template tab, configure the parameters described in the following table. For information about other parameters, see Manage templates.

Section	Description

Section	Description
Basic Template Canvas	Select Text and click Generate Content Card. Click the text area. In the Import Data dialog box, select an existing dataset. In the Configuration for Basic Template section, add the instruction field for the Dataset Field parameter.
Basic Template Answers	Select Input Field and click Generate Title Card. In the Configuration for Basic Template section, set Title to output.

On the Create Labeling Job page, configure the parameters described in the following table. For information about other parameters, see Create a labeling job.

Parameter	Description

Parameter	Description
Input Dataset	Select the dataset that you created in Step 1. Note Note that the data must match the template.
Template Type	Select Custom Template and select an existing template from the Select Template drop-down list.

After you create the labeling job, label the data. For more information, see Process labeling jobs.

After you label the data, export the labeling results to an OSS directory. For more information, see Export labeling results.

The following sample code shows an example of the exported manifest file. For information about the data format, see Overview.

{"data":{"instruction":"I want to listen to music","_itag_index":""},"label-1787402095227383808":{"results":[{"questionId":"2","data":"play_music()","markTitle":"output","type":"survey/value"}]},"abandonFlag":0,"abandonRemark":null}
{"data":{"instruction":"Too loud, turn the sound down","_itag_index":""},"label-1787402095227383808":{"results":[{"questionId":"2","data":"volume_down()","markTitle":"output","type":"survey/value"}]},"abandonFlag":0,"abandonRemark":null}
{"data":{"instruction":"I do not want to listen to this. Turn it off","_itag_index":""},"label-1787402095227383808":{"results":[{"questionId":"2","data":"music_exit()","markTitle":"output","type":"survey/value"}]},"abandonFlag":0,"abandonRemark":null}
{"data":{"instruction":"I want to visit Hangzhou. Check the weather forecast for me","_itag_index":""},"label-1787402095227383808":{"results":[{"questionId":"2","data":"weather_search(Hangzhou)","markTitle":"output","type":"survey/value"}]},"abandonFlag":0,"abandonRemark":null}

In the terminal, execute the following Python script to convert the manifest-formatted labeling result file into a training data format suitable for QuickStart.

import json

# Enter the path of the input file and the path of the output file. 
input_file_path = 'test_json.manifest'
output_file_path = 'train.json'

converted_data = []
with open(input_file_path, 'r', encoding='utf-8') as file:
    for line in file:
        data = json.loads(line)
        instruction = data['data']['instruction']
        for key in data.keys():
            if key.startswith('label-'):
                output = data[key]['results'][0]['data']
                converted_data.append({'instruction': instruction, 'output': output})
                break

with open(output_file_path, 'w', encoding='utf-8') as outfile:
    json.dump(converted_data, outfile, ensure_ascii=False, indent=4)

The output is a JSON file.

Train and perform an offline evaluation on a model

Train a model

QuickStart is integrated with high-quality pre-trained models from open source Artificial Intelligence (AI) communities. QuickStart allows you to implement the complete process from model training and deployment to inference without the need to write code. This greatly simplifies the model development process.

In this example, the Qwen1.5-1.8B-Chat model is used to illustrate how to use the prepared training data to train a model in QuickStart. To train a model, perform the following steps:

Go to the Model Gallery page.
1. Log on to the PAI console.
2. In the upper-left corner, select a region based on your business requirements.
3. In the left-side navigation pane, click Workspaces. On the Workspaces page, click the name of the workspace that you want to use.
4. In the left-side navigation pane, click Model Gallery.
In the model list of the Model Gallery page, find and click the Qwen1.5-1.8B-Chat model.

In the upper-right corner of the model details page, click Train. In the Train panel, configure the key parameters described in the following table. Use the default settings of other parameters.

Parameter		Description

Parameter		Description
Training Mode		Full-Parameter Fine-Tuning: This mode requires more resources and has a long training time, but delivers good training results. Note Models with a small number of parameters support full-parameter fine-tuning. Select Full-Parameter Fine-Tuning based on your business requirements. QLoRA: This is a lightweight fine-tuning mode. Compared with full-parameter fine-tuning, Quantized Low-Rank Adaptation (QLoRA) requires less resources and has a shorter training time, but its training results are not as good. LoRA: This mode is similar to QLoRA.
Dataset Configuration	Training dataset	To select a prepared training dataset, perform the following steps: Select OSS file or directory in the drop-down list. Click the icon to select an OSS directory. In the Select OSS file dialog box, click Upload File, drag the prepared training dataset file to the blank area, and then click OK.
Output Configuration	ModelOutput Path	Select an OSS directory to store the output configuration file and model file.
	TensorboardOutput Path
Hyper-parameters		For more information about hyperparameters, see Table 1. Full hyperparameters. We recommend that you configure the hyperparameters based on the following configuration strategies. For information about recommended hyperparameter configurations, see Table 2. Recommended hyperparameter configurations. Configure hyperparameters based on different training methods. `Global batch size = Number of GPUs x per_device_train_batch_size x gradient_accumulation_steps` To maximize training performance, increase the number of GPUs and set per_device_train_batch_size to a higher value first. In most cases, the global batch size ranges from 64 to 256. If a small amount of training data is involved, you can appropriately reduce the global batch size. You need to configure the seq_length parameter based on your business requirements. For example, if the maximum length of a text sequence in a dataset is 50, you can set this parameter to 64 (a power of 2). If the training loss decreases too slowly or does not converge, we recommend that you increase the learning rate specified by the learning_rate parameter. You also need to confirm whether the quality of the training data is guaranteed.

Table 1. Hyperparameters

Hyperparameter	Type	Description	Default value
learning_rate	FLOAT	The learning rate of model training.	5e-5
num_train_epochs	INT	The number of epochs.	1
per_device_train_batch_size	INT	The amount of data processed by each GPU in one training iteration.	1
seq_length	INT	The length of the text sequence.	128
lora_dim	INT	The inner dimensions of the low-rank matrices that are used in Low-Rank Adaptation (LoRA) or QLoRA training. Set this parameter to a value greater than 0.	32
lora_alpha	INT	The LoRA or QLoRA weights. This parameter takes effect only if you set the lora_dim parameter to a value greater than 0.	32
load_in_4bit	BOOL	Specifies whether to load the model in 4-bit quantization. This parameter takes effect only if you set the lora_dim parameter to a value greater than 0 and the load_in_8bit parameter to false.	true
load_in_8bit	BOOL	Specifies whether to load the model in 8-bit quantization. This parameter takes effect only if you set the lora_dim parameter to a value greater than 0 and the load_in_4bit parameter to false.	false
gradient_accumulation_steps	INT	The number of gradient accumulation steps.	8
apply_chat_template	BOOL	Specifies whether the algorithm combines the training data with the default chat template. Qwen1.5 models use the following format: Question: `<\|im_start\|>user\n + instruction + <\|im_end\|>\n` Answer: `<\|im_start\|>assistant\n + output + <\|im_end\|>\n`	true
system_prompt	STRING	The default system prompt for model training. This parameter takes effect only if you set the apply_chat_template parameter to true. You can configure a custom system prompt during the training of a Qwen1.5 model to allow the LLM to assume a specific role. The algorithm automatically expands the training data. You do not need to pay attention to the execution details. For example, you set system_prompt to "You are an intent recognition expert. You can recognize an intent based on user questions and return the corresponding intent and parameters." In this case, the following training sample is provided: `[ { "instruction": "I want to listen to music", "output": "play_music()" } ]` The training data is in the following format: `<\|im_start\|>system\nYou are an intent recognition expert. You can recognize an intent based on user questions and return the corresponding intent and parameters<\|im_end\|>\n<\|im_start\|>user\nI want to listen to music<\|im_end\|>\n<\|im_start\|>assistant\nplay_music()<\|im_end\|>\n`	You are a helpful assistant

Table 2. Recommended hyperparameter configurations

Parameter	Full-parameter fine-tuning	LoRA/QLoRA
learning_rate	5e-6 and 5e-5	3e-4
Global batch size	256	256
seq_length	256	256
num_train_epochs	3	5
lora_dim	0	64
lora_alpha	0	16
load_in_4bit	False	True/False
load_in_8bit	False	True/False

Click Train. In the Billing Notification message, click OK.
The system automatically navigates to the training job details page. After the training job runs, you can view the status and training logs of the training job.

Evaluate a model offline

After you train a model, you can use a Python script to evaluate the model in the terminal.

Prepare the evaluation data file testdata.json. Sample content:

[
    {
        "instruction": "Who sings the song Ten Years?",
        "output": "music_query_player (Ten Years)"
    },
    {
        "instruction": "What is the weather like in Hangzhou today?",
        "output": "weather_search(Hangzhou)"
    }
]

In the terminal, use the following Python script to evaluate the model offline.

#encoding=utf-8
from transformers import AutoModelForCausalLM, AutoTokenizer
import json
from tqdm import tqdm

device = "cuda" # the device to load the model onto

# Modify the path of the model.
model_name = '/mnt/workspace/model/qwen14b-lora-3e4-256-train/'
print(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)

count = 0
ecount = 0


# Modify the path in which the training data is stored.
test_data = json.load(open('/mnt/workspace/data/testdata.json'))
system_prompt = 'You are an intent recognition expert. You can recognize an intent based on user questions and return the corresponding intent and parameters. '

for i in tqdm(test_data[:]):
    prompt = '<|im_start|>system\n' + system_prompt + '<|im_end|>\n<|im_start|>user\n' + i['instruction'] + '<|im_end|>\n<|im_start|>assistant\n'
    gold = i['output']
    gold = gold.split(';')[0] if ';' in gold else gold

    model_inputs = tokenizer([prompt], return_tensors="pt").to(device)
    generated_ids = model.generate(
        model_inputs.input_ids,
        max_new_tokens=64,
        pad_token_id=tokenizer.eos_token_id,
        eos_token_id=tokenizer.eos_token_id,
        do_sample=False
    )
    generated_ids = [
        output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
    ]
    pred = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
    if gold.split('(')[0] == pred.split('(')[0]:
        count += 1
        gold_list = set(gold.strip()[:-1].split('(')[1].split(','))
        pred_list = set(pred.strip()[:-1].split('(')[1].split(','))
        if gold_list == pred_list:
            ecount += 1
    else:
        pass

print("Intent recognition accuracy:", count/len(test_data))
print("Parameter recognition accuracy:", ecount/len(test_data))

Deploy and call a model service

Deploy a model service

After you train a model, you can deploy the model as an online service in EAS by performing the following steps:

On the training job details page, click Deploy. In the Deploy panel, the parameters in the Model Service Information and Resource Deployment Information sections are automatically configured. You can modify the parameters based on your business requirements. After you configure the parameters, click Deploy.
In the Billing Notification message, click OK.
The system automatically navigates to the service details page. When the service status changes to Running, the model service is deployed.

In intent recognition scenarios of voice assistants, lower latency is required to ensure a good user interaction experience. Therefore, we recommend that you use the BladeLLM inference engine provided by PAI to deploy the LLM service. For more information, see How do I improve concurrency and reduce latency for the inference service?

Call a model service

After you deploy a model service, you can click View Web App in the upper-right corner of the service details page to perform real-time interaction by using ChatLLM-WebUI. You can also call API operations to perform model inference. For more information, see Quickly deploy LLMs in EAS.

The following example shows how to initiate a service call request by using the client:

Obtain the endpoint and token of the model service.
1. In the Basic Information section of the Service details tab of the model service details page, click View Call Information.
2. In the Call Information dialog box, view and save the endpoint and token of the model service to your on-premises machine.

In the terminal, run the following code to call the model service:

import argparse
import json
from typing import Iterable, List

import requests

def post_http_request(prompt: str,
                      system_prompt: str,
                      history: list,
                      host: str,
                      authorization: str,
                      max_new_tokens: int = 2048,
                      temperature: float = 0.95,
                      top_k: int = 1,
                      top_p: float = 0.8,
                      langchain: bool = False,
                      use_stream_chat: bool = False) -> requests.Response:
    headers = {
        "User-Agent": "Test Client",
        "Authorization": f"{authorization}"
    }
    pload = {
        "prompt": prompt,
        "system_prompt": system_prompt,
        "top_k": top_k,
        "top_p": top_p,
        "temperature": temperature,
        "max_new_tokens": max_new_tokens,
        "use_stream_chat": use_stream_chat,
        "history": history
    }
    response = requests.post(host, headers=headers,
                             json=pload, stream=use_stream_chat)
    return response

def get_response(response: requests.Response) -> List[str]:
    data = json.loads(response.content)
    output = data["response"]
    history = data["history"]
    return output, history

if __name__ == "__main__":
    parser = argparse.ArgumentParser()
    parser.add_argument("--top-k", type=int, default=4)
    parser.add_argument("--top-p", type=float, default=0.8)
    parser.add_argument("--max-new-tokens", type=int, default=2048)
    parser.add_argument("--temperature", type=float, default=0.95)
    parser.add_argument("--prompt", type=str, default="How can I get there?")
    parser.add_argument("--langchain", action="store_true")

    args = parser.parse_args()

    prompt = args.prompt
    top_k = args.top_k
    top_p = args.top_p
    use_stream_chat = False
    temperature = args.temperature
    langchain = args.langchain
    max_new_tokens = args.max_new_tokens

    host = "<Public endpoint of the EAS service>"
    authorization = "<Public token of the EAS service>"

    print(f"Prompt: {prompt!r}\n", flush=True)
    # System prompts can be included in the requests. 
    system_prompt = "You are an intent recognition expert. You can recognize an intent based on user questions and return the corresponding intent and parameters."

    # Dialogue history can be included in the requests. The client manages the history to implement multi-round dialogues. In most cases, the information from the previous round of dialogue is used. The information is in the List[Tuple(str, str)] format. 
    history = []
    response = post_http_request(
        prompt, system_prompt, history,
        host, authorization,
        max_new_tokens, temperature, top_k, top_p,
        langchain=langchain, use_stream_chat=use_stream_chat)
    output, history = get_response(response)
    print(f" --- output: {output} \n --- history: {history}", flush=True)

# The server returns a JSON response that includes the inference result and dialogue history. 
def get_response(response: requests.Response) -> List[str]:
    data = json.loads(response.content)
    output = data["response"]
    history = data["history"]
    return output, history

Take note of the following parameters:

host: the endpoint of your model service.
authorization: the token of your model service.

References

For more information about how to use iTAG and the format requirements for data labeling, see Overview.
For more information about EAS, see EAS overview.
You can use QuickStart of PAI to train and deploy models in different scenarios, including Llama-3, Qwen1.5, and Stable Diffusion V1.5 models. For more information, see Scenario-specific practices.

Background information

What is intent recognition?

Typical scenarios of intent recognition technology

Working process

Prerequisites

Prepare training data

Data preparation strategies

Data format requirements

Use iTAG to label data

Train and perform an offline evaluation on a model

Train a model

Evaluate a model offline

Deploy and call a model service

Deploy a model service

Call a model service

References

Sales Support

Technical Support

Connect & Report Abuse

About Alibaba Cloud

Our Global Network

Quick Start

Global Offices

Olympic Games Paris 2024 New

Stade Roland Garros – Glitz from the Past New

Place de la Concorde – “Breaking” the Barriers New

Vaires-sur-Marne Nautical Stadium – Sports with Sustainability New

International Broadcast Center – Images, Sounds, and Data that Captivate Billions New

Customer Success Stories New

Trust Center

Security & Compliance Center

Cloud Compliance Resources

Security Compliance FAQs

Product & Feature Update New

Cloud Forward

Press Room

Alibaba Cloud e-Magazine New

Alibaba Cloud in Analyst Research

Notice

Go Global Service New

Go Global Alliance with Alibaba Cloud

Asia Accelerator Hot

Information Compliance

China Gateway - MLPS 2.0 Compliance New

China Gateway - Networking

China Gateway - Global Application Acceleration New

China Gateway - Security

China Gateway - Data Security New

ICP Support Hot

China Gateway - Omnichannel Data Mid-End New

China Gateway - Organizational Data Mid-End New

China Gateway - Business Mid-End New

China Gateway - AI Service for Conversational Chatbots New

China Gateway - Online Education

China Gateway - Domain Registration

Work at Alibaba Cloud

Experienced Professionals

Students and Graduates

Free Trial

Pricing

Promo Center

Price Reduction

Pay Less and Deploy More

FinOps

Elastic Compute Service (ECS)

Simple Application Server (SAS)

Elastic GPU Service

Elastic Desktop Service (EDS)

Object Storage Service (OSS)

Cloud Enterprise Network (CEN)

Web Application Firewall (WAF)

Domain Names

Lingma

Container Compute Service (ACS)

Secure Access Service Edge (SASE)

Intelligent Media Services(IMS)

Edge Security Acceleration (ESA)(Original DCDN)

Intelligent Media Management

DingTalk Enterprise

YiDA

Alibaba Cloud Model Studio

Apsara Prime - For Easy Cloud Product Selection

Alibaba Cloud ECS - Cater All Your Cloud Hosting Needs

1TB CDN—Get Free 1 TB Outbound Traffic Plan Now

Security—Under Attack? Get Free Security Support

Short Message Service - Free Testing is Available

Elastic Compute Service (ECS) Hot

CloudBox

Compute Nest

Dedicated Host Hot

ECS Bare Metal Instance

Elastic GPU Service Featured

Simple Application Server (SAS) Hot

Auto Scaling

Cloud Phone Beta

Elastic Desktop Service (EDS) Featured

Batch Compute

Elastic High Performance Computing (E-HPC)

Super Computing Cluster (SCC)