Performs a text-based conversational search based on a knowledge base.
Prerequisites
An API key for identity authentication is obtained. When you call the API operations of OpenSearch LLM-Based Conversational Search Edition, you must be authenticated. For more information, see Manage API keys. LLM is short for large language model.
An endpoint is obtained. When you call the API operations of OpenSearch LLM-Based Conversational Search Edition, you must specify an endpoint. For more information, see Obtain endpoints.
Operation information
Request method
POST
Request protocol
HTTP
Request URL
{host}/v3/openapi/apps/{app_group_identity}/actions/knowledge-search
{host}
: the endpoint that is used to call the API operation. You can call the API operation over the Internet or a virtual private cloud (VPC). For more information about how to obtain an endpoint, see Obtain endpoints.{app_group_identity}
: the application that you want to access. You can specify an application name to access an application that is in service. You can log on to the OpenSearch LLM-Based Conversational Search Edition console and view the application name of the corresponding instance in the instance list.
Request data format
JSON
Request parameters
Header parameters
Parameter | Type | Required | Description | Example |
Content-Type | string | Yes | The data format of the request. Only the JSON format is supported. Set the value to application/json. | application/json |
Authorization | string | Yes | The API key used for request authentication. The value must start with Bearer. | Bearer OS-d1**2a |
Body parameters
Parameter | Type | Required | Description | Example |
question | map | Yes | The input question. | { "text":"user question" "type": "TEXT", "session" : "" } |
question.text | string | Yes | The text content of the input question. | user question |
question.session | string | No | The session ID of the multi-round conversation. The ID is used to identify the context of the multi-round conversation.
| 1725530408586 |
question.type | string | No | The format of the input question. In this example, | TEXT |
options | map | No | The additional configurations, such as retrieval, model, and prompt configurations. | |
options.chat | map | No | The configuration of LLM access. | |
options.chat.disable | boolean | No | Specifies whether to disable LLM access.
| false |
options.chat.stream | boolean | No | Specifies whether to enable HTTP chunked transfer encoding.
| true |
options.chat.model | string | No | The LLM to be used. Valid values: | opensearch-llama2-13b |
options.chat.model_generation | integer | No | The version of the custom model to be used. By default, the earliest version is used for access. | 20 |
options.chat.prompt_template | string | No | The name of the custom prompt template. By default, this parameter is left empty. In this case, the built-in prompt template is used. | user_defined_prompt_name |
options.chat.prompt_config | object | No | The configuration of the custom prompt template. Specify key-value pairs in the following format:
|
|
options.chat.prompt_config.attitude | string | No | The tone of the conversation. This parameter is included in the built-in prompt template. Default value: normal. Valid values:
| normal |
options.chat.prompt_config.rule | string | No | The detail level of the conversation. Default value: detailed. Valid values:
| detailed |
options.chat.prompt_config.noanswer | string | No | The information returned if the system fails to find an answer to the question. Default value: sorry. Valid values:
| sorry |
options.chat.prompt_config.language | string | No | The language of the answer. Default value: Chinese. Valid values:
| Chinese |
options.chat.prompt_config.role | boolean | No | Specifies whether to enable a custom role to answer the question. | false |
options.chat.prompt_config.role_name | string | No | The name of the custom role. Example: AI assistant. | AI Assistant |
options.chat.prompt_config.out_format | string | No | The format of the answer. Default value: text. Valid values:
| text |
options.chat.generate_config.repetition_penalty | float | No | The repetition level of the content generated by the model. A greater value indicates lower repetition. A value of 1.0 specifies no penalty. No valid values are specified for this parameter. | 1.01 |
options.chat.generate_config.top_k | integer | No | The size of the candidate set from which tokens are sampled. For example, if this parameter is set to 50, the top 50 tokens with the highest probability are used as the candidate set. The greater the value, the higher the randomness of the generated content. Conversely, the smaller the value, the more deterministic the generated content. Default value: 0, which indicates that the top_k parameter is disabled. In this case, only the top_p parameter takes effect. | 50 |
options.chat.generate_config.top_p | float | No | The probability threshold in the nucleus sampling method used during the generation process. For example, if this parameter is set to 0.8, only the smallest subset of the most probable tokens that sum to a cumulative probability of at least 0.8 is kept as the candidate set. Valid values: (0,1.0). The greater the value, the higher the randomness of the generated content. Conversely, the smaller the value, the more deterministic the generated content. | 0.5 |
options.chat.generate_config.temperature | float | No | The level of randomness and diversity of the content generated by the model. To be specific, the temperature value determines how much the probability distribution for each candidate word is smoothed during text generation. Greater temperature values decrease the peaks of the probability distribution, allowing for the selection of more low-probability words and resulting in more diverse content. Conversely, smaller temperature values increase the peaks of the probability distribution, making high-probability words more likely to be chosen and resulting in more deterministic content. Valid values: [0,2). We recommend that you do not set this parameter to 0 because it is meaningless. python version >=1.10.1 java version >= 2.5.1 | 0.7 |
options.chat.history_max | integer | No | The maximum number of rounds of conversations based on which the system returns results. You can specify up to 20 rounds. Default value: 1. | 20 |
options.chat.link | boolean | No | Specifies whether to return the URL of the reference source. To be specific, it specifies whether the reference source is included in the content generated by the model. Valid values:
Sample response if you set this parameter to true:
| false |
options.chat.agent | map | No | Specifies whether to enable the Retrieval-Augmented Generation (RAG) tool feature. If the feature is enabled, the model determines whether to use a RAG tool based on the existing content. The feature is supported by the following LLMs:
| |
options.chat.agent.tools | list of string | No | The name of the RAG tool to be used. The following tool is available:
| ["knowledge_search"] |
options.retrieve | map | No | The retrieval configuration. | |
options.retrieve.doc | map | No | The configuration of document retrieval. | |
options.retrieve.doc.disable | boolean | No | Specifies whether to disable document retrieval.
| false |
options.retrieve.doc.filter | string | No | The filter that is used to filter documents in the knowledge base based on a specific field during document retrieval. By default, this parameter is left empty. For more information, see the "filter" section of the Extended parameters topic. Supported fields:
Example:
| category=\"value1\" |
options.retrieve.doc.sf | float | No | The threshold for determining the vector relevance for document retrieval.
| 1.3 |
options.retrieve.doc.top_n | integer | No | The number of documents to be retrieved. Valid values: (0,50]. Default value: 5. | 5 |
options.retrieve.doc.formula | string | No | The formula based on which the retrieved documents are sorted. Note For more information about the syntax, see Fine sort functions. Algorithm relevance and geographical location relevance are not supported. | -timestamp: Sort the retrieved documents in descending order by document timestamp. |
options.retrieve.doc.rerank_size | integer | No | The number of documents to be reranked if the reranking feature is enabled. Valid values: (0,100]. Default value: 30. | 30 |
options.retrieve.doc.operator | string | No | The operator between terms obtained after text segmentation during document retrieval. This parameter takes effect only if the sparse vector model is disabled.
| AND |
options.retrieve.doc.dense_weight | float | No | The weight of the dense vector during document retrieval if the sparse vector model is enabled. Valid values: (0.0,1.0). Default value: 0.7. | 0.7 |
options.retrieve.entry | map | No | The configuration of intervention data retrieval. | |
options.retrieve.entry.disable | boolean | No | Specifies whether to disable intervention data retrieval.
| false |
options.retrieve.entry.sf | float | No | The threshold for determining the vector relevance for intervention data retrieval. Valid values: [0,2.0]. Default value: 0.3. The smaller the value, the more relevant but fewer the retrieved results. Conversely, less relevant results may be retrieved. | 0.3 |
options.retrieve.image | map | No | The configuration of image retrieval. | |
options.retrieve.image.disable | boolean | No | Specifies whether to disable image retrieval. Valid values:
| false |
options.retrieve.image.sf | float | No | The threshold for determining the vector relevance for image retrieval.
| 1.0 |
options.retrieve.image.dense_weight | float | No | The weight of the dense vector during image retrieval if the sparse vector model is enabled. Valid values: (0.0,1.0). Default value: 0.7. | 0.7 |
options.retrieve.qp | map | No | The configuration of query rewriting. | |
options.retrieve.qp.query_extend | boolean | No | Specifies whether to extend queries. The extended queries are used to retrieve document segments in OpenSearch. Default value: false. Valid values:
| false |
options.retrieve.qp.query_extend_num | integer | No | The maximum number of queries to be extended if the query extension feature is enabled. Default value: 5. | 5 |
options.retrieve.rerank | map | No | The reranking configuration for document retrieval. | |
options.retrieve.rerank.enable | boolean | No | Specifies whether to use the model to rerank the retrieved results based on the relevance. Valid values:
| true |
options.retrieve.rerank.model | string | No | The name of the LLM for reranking. Valid values:
| ops-bge-reranker-larger |
options.retrieve.return_hits | boolean | No | Specifies whether to return document retrieval results. If you set this parameter to true, the search_hits parameter is returned in the response. | false |
Sample request body
{
"question" : {
"text" : "user question",
"session" : "The session of the conversation. You can specify this parameter to enable the multi-round conversation feature.",
"type" : "TEXT"
},
"options": {
"chat": {
"disable" : false, # Specifies whether to disable LLM access. Default value: false.
"stream" : false, # Specifies whether to enable HTTP chunked transfer encoding. Default value: false.
"model" : "Qwen", # The LLM to be used.
"prompt_template" : "user_defined_prompt_name", # The name of the custom prompt template.
"prompt_config" : { # Optional. The configuration of the custom prompt template.
"key" : "value" # Specify a key-value pair.
},
"generate_config" : {
"repetition_penalty": 1.01,
"top_k": 50,
"top_p": 0.5,
"temperature": 0.7
},
"history_max": 20, # The maximum number of rounds of conversations based on which the system returns results.
"link": false, # Specifies whether to return the URL of the reference source.
"agent":{
"tools":["knowledge_search"]
}
},
"retrieve": {
"doc": {
"disable": false, # Specifies whether to disable document retrieval. Default value: false.
"filter": "category=\"type\"", # The filter that is used to filter documents based on the category field during document retrieval. By default, this parameter is left empty.
"sf": 1.3, # The threshold for determining the vector relevance for document retrieval. Default value: 1.3. The greater the value, the less relevant the retrieved documents.
"top_n": 5, # The number of documents to be retrieved. Valid values: (0,50]. Default value: 5.
"formula" : "", # The formula based on which the retrieved documents are sorted. By default, the retrieved documents are sorted based on vector similarity.
"rerank_size" : 5, # The number of documents to be reranked. By default, you do not need to specify this parameter. The system automatically determines the number of documents to be reranked.
"operator":"OR" # The operator between terms obtained after text segmentation during document retrieval. Default value: AND.
},
"entry": {
"disable": false, # Specifies whether to disable intervention data retrieval. Default value: false.
"sf": 0.3 # The threshold for determining the vector relevance for intervention data retrieval. Default value: 0.3.
},
"image": {
"disable": false, # Specifies whether to disable image retrieval. Default value: false.
"sf": 1.0 # The threshold for determining the vector relevance for image retrieval. Default value: 1.0.
},
"qp": {
"query_extend": false, # Specifies whether to extend queries.
"query_extend_num": 5 # The maximum number of queries to be extended. Default value: 5.
},
"rerank" : {
"enable": true # Specifies whether to use the LLM to rerank the retrieved results. Default value: true.
"model":"model_name" # The name of the LLM.
},
"return_hits": false # Specifies whether to return document retrieval results. If you set this parameter to true, the search_hits parameter is returned in the response.
}
}
}
Response parameters
Parameter | Type | Description |
request_id | string | The request ID. |
status | string | Indicates whether the request was successful.
|
latency | float | The amount of time consumed by the server to process a successful request. Unit: milliseconds. |
id | integer | The ID of the primary key. |
title | string | The title of the document. |
category | string | The name of the category. |
url | string | The URL of the document. |
answer | string | The returned result. |
type | string | The format of the returned result. |
scores | array | The relevance-based score of the document. |
code | string | The error code returned. |
message | string | The error message returned. |
Sample response body
{
"request_id": "6859E98D-D885-4AEF-B61C-9683A0184744",
"status": "OK",
"latency": 6684.410397,
"result" : {
"data" : [
{
"answer" : "answer text",
"type" : "TEXT",
"reference" : [
{"url" : "http://....","title":"doc title"}
]
},
{
"reference": [
{"id": "16","title": "Test title","category": "Test category","url": "Test URL"}
],
"answer": "https://ecmb.bdimg.com/tam-ogel/-xxxx.jpg",
"type": "IMAGE"
}
],
"search_hits" : [ // This parameter is returned only if the options.retrieve.return_hits parameter in the request is set to true.
{
"fields" : {
"content" : "...."
"key1" : "value1"
},
"scores" : ["10000.1234"],
"type" : "doc"
},
{
"fields"{
"answer" : "...",
"key1" : "value1"
},
"scores" : ["10000.1234"],
"type" : "entry"
}
]
}
"errors" : [
{
"code" : "The error code. This parameter is returned only if an error occurs.",
"message" : "The error message. This parameter is returned only if an error occurs."
}
]
}