API details - OpenSearch - Alibaba Cloud Documentation Center

The OpenSearch search development workbench enables API calls to large model services, including the RAG-specific large model built on Alibaba's proprietary model foundation. This service is ideal for RAG scenarios, enhancing answer accuracy and reducing hallucination rates when used alongside document processing and retrieval services.

Service name	Service ID (service_id)	Service description	QPS limit for API calls (Alibaba Cloud account and RAM users)
OpenSearch-Qwen-Turbo	ops-qwen-turbo	Leverages the qwen-turbo large-scale language model for supervised fine-tuning, enhancing retrieval capabilities and reducing harmfulness.	3 Note To apply for higher QPS, submit a ticket.
Qwen-Turbo	qwen-turbo	A large-scale language model, Qwen-Turbo supports inputs in multiple languages, including Chinese and English.
Qwen-Plus	qwen-plus	An enhanced version of the Qwen large-scale language model, Qwen-Plus supports inputs in various languages, including Chinese and English.
Qwen-Max	qwen-max	Also referred to as Qwen 2.0, this trillion-scale large language model supports inputs in multiple languages, such as Chinese and English.

Prerequisites

Get authentication information
When invoking the OpenSearch Search Development Console service through APIs, you need to authenticate the caller's identity.
Get service access address
Supports invoking the service through both public network and VPC. For more information, see Get service registration address.

Request description

General description

The request body must not exceed 8 MB.

HTTP request method

POST

URL

{host}/v3/openapi/workspaces/{workspace_name}/text-generation/{service_id}

Parameter description:

host: The address for calling the service, accessible via both public network and VPC environments. For more information, see the referenced document.
workspace_name: The name of the workspace, such as 'default'.
service_id: The system's built-in service ID, for example, 'ops-qwen-turbo'.

Request parameters

Header parameters

API-KEY authentication

Parameter	Type	Required	Description	Example value
Content-Type	String	Yes	The MIME type of the request, specified as 'application/json'.	application/json
Authorization	String	Yes	The API key for authentication.	Bearer OS-d1**2a

Body parameters

Parameter	Type	Required	Description	Example value
messages	List	Yes	The conversation history between the user and the model. Each list element is a JSON object with 'role' and 'content' keys. The 'role' can be 'system', 'user', or 'assistant'. 'system': Indicates a system-level message, which can only be the first message in the conversation history (messages[0]). Its use is optional, but if present, it must be the first in the list. 'user' and 'assistant': Represent the dialogue between the user and the model. They should alternate in the conversation, with the last message being from the 'user'.
stream	Boolean	No	Indicates whether to return results in streaming mode. By default, this is set to false. When this parameter is set to true, each output is the entire sequence generated up to that point, with the last output being the final complete result.	false
parameters	Map	No	A set of adjustable parameters for the large model request.
parameters.seed	Integer	No	The random seed used during content generation. This parameter controls the randomness of the content generated by the model. Valid values: 64-bit unsigned integers. If you specify the random seed, the model tries to generate the same or similar content for the output of each model call. However, the model cannot ensure that the output is exactly the same for each model call.	"parameters":{"seed":666}
parameters.max_tokens	Integer	No	The maximum number of tokens that can be generated by the model. If you use the qwen-turbo model, the maximum value and default value are 1500. If you use the qwen-max, qwen-max-1201, or qwen-plus model, the maximum value and default value are 2000.	"parameters":{"max_tokens":1500}
parameters.top_p	Float	No	The probability threshold in the nucleus sampling method used during the generation process. For example, if this parameter is set to 0.8, only the smallest subset of the most probable tokens that sum to a cumulative probability of at least 0.8 is kept as the candidate set. Valid values: (0,1.0). A larger value indicates the higher randomness of generated content. A smaller value indicates the lower randomness of generated content.	"parameters":{"top_p":0.7}
parameters.top_k	Integer	No	The size of the candidate set from which tokens are sampled during the generation process. For example, if this parameter is set to 50, only the 50 tokens with the highest scores generated at a time are used as the candidate set for random sampling. A larger value indicates the higher randomness of generated content. A smaller value indicates the higher accuracy of generated content. If this parameter is left empty or set to a value greater than 100, the top_k policy is disabled. In this case, only the top_p policy takes effect.	"parameters":{"top_k":50}
parameters.repetition_penalty	Integer	No	The repetition level of the content generated by the model. A larger value indicates lower repetition. The value 1.0 indicates no penalty. No valid values are specified for this parameter. We recommend that you set this parameter to a value greater than 0.	"parameters":{"repetition_penalty":1.0}
parameters.presence_penalty	Float	No	The repetition of words in generated content. A larger value indicates lower repetition. Valid values: [-2.0, 2.0].	"parameters":{"presence_penalty":1.0}
parameters.temperature	Float	No	The level of randomness and diversity of the content generated by the model. To be specific, the value of this parameter determines the smoothness of the probability distribution of each candidate word for text generation. A larger value indicates a smaller peak value of the probability distribution. In this case, more low-probability words are selected and the generated content is more diversified. A smaller value indicates a larger peak value of the probability distribution. In this case, more high-probability words are selected and the generated content is more accurate. Valid values: [0,2). We recommend that you do not set this parameter to 0, which is meaningless.	"parameters":{"temperature":0.85}
parameters.stop	string/array	No	The precision of the content generated by the model. The model automatically stops generating content when the content generated by the model is about to contain the specified string or token ID. The value of this parameter can be a string or an array. String The model stops generating content when the content generated by the model is about to contain the specified stop word. For example, if you set this parameter to Hello, the model stops generating content when the content generated by the model is about to contain Hello. Array The elements in the array can be token IDs, strings, or arrays whose elements are token IDs. When the token to be generated by the model or the ID of the token is in the stop array, the model stops generating content. For example, the ID of the token Hello is 108386 and the ID of the token Weather is 104307. If the value of the stop parameter is set to the array `["Hello","Weather"]` or `[108386,104307]`, the model stops generating content when the content is about to contain Hello or Weather. The ID of the token Hello is 108386, the ID of the token There is 103924, the ID of the token I is 35946, and the ID of the token Am Fine is 101243. If the value of the stop parameter is set to the array `[[108386, 103924],[35946, 101243]]`, the model stops generating content when the content is about to contain Hello There or I Am Fine. Note If the value of the stop parameter is set to an array, the array cannot contain both token IDs and strings. For example, you cannot set the stop parameter to `["Hello",104307]`.	"parameters":{"stop":["Hello","Weather"]}

Note: The maximum token limit for ops-qwen-turbo is 4000.

Response parameters

Parameter	Type	Description	Example value
result.text	String	The text generated by the model during the current interaction.	Zhengzhou is a...
usage.output_tokens	Integer	The number of tokens in the content generated by the model.	100
usage.input_tokens	Integer	The number of tokens in the user's input content.	100
usage.total_tokens	Integer	The combined token count of the user's input and the model's generated content.	200

Curl request example

curl -XPOST -H "Content-Type: application/json" 
"http://****-hangzhou.opensearch.aliyuncs.com/v3/openapi/workspaces/default/text-generation/ops-qwen-turbo" 
-H "Authorization: Bearer Your-API-KEY"   
-d "{
      \"messages\":[
      {
          \"role\":\"system\",
          \"content\":\"You are a robot assistant\"
      },
      {
          \"role\":\"user\",
          \"content\":\"What is the capital of Henan\"
      },
      {
          \"role\":\"assistant\",
          \"content\":\"Zhengzhou\"
      },
      {
          \"role\":\"user\",
          \"content\":\"What are some fun places there\"
      }
      ],
      \"stream\":false
}"

Response example

Normal response example

{
  "request_id": "450fcb80-f796-****-8d69-e1e86d29aa9f",
  "latency": 564.903929,
  "result": {
    "text": "Zhengzhou is a city with a long history and modernity, offering many fun places to visit. Here are some recommended tourist attractions:
    Songshan Shaolin Temple: As the birthplace of Shaolin martial arts, Songshan Shaolin Temple has always been a place that tourists yearn for. Here, you can enjoy wonderful martial arts performances and appreciate the charm of Shaolin Kung Fu.
    Yellow River Scenic Area: The Yellow River is the mother river of the Chinese nation. In Zhengzhou, you can take a boat tour to enjoy the various sceneries of the Yellow River and feel the beauty of the great river.
    Zhengzhou Zoo: This is a family-friendly attraction with various rare animals such as giant pandas and golden monkeys, allowing children to get close to animals and broaden their knowledge.
    Zhengzhou Museum: If you are interested in history and culture, Zhengzhou Museum is a good choice. It houses a large number of precious cultural relics, showcasing the historical changes and cultural heritage of the Zhengzhou area.
    Zhengzhou Century Park: This is a large urban park with beautiful lakes, gardens, and recreational facilities. Here, you can take a walk, jog, and enjoy the tranquility and harmony of nature.
    These are just some of the many fun places in Zhengzhou. In fact, there are many other attractions worth visiting in Zhengzhou. Hope you have a pleasant trip in Zhengzhou!"
  },
  "usage": {
      "output_tokens": 6320,
      "input_tokens": 35,
      "total_tokens": 6355
  }
  
}

Abnormal response example

In the event of an error during the request, the output will provide the error reason through a code and message.

{
    "request_id": "45C8C9E5-6BCB-****-80D3-E298F788512B",
    "latency": 0,
    "code": "InvalidParameter",
    "message": "JSON parse error: Unexpected character ..."
}

Status code description

For detailed information on status codes, see the referenced document.