Models
Supported fields or tasks: Artificial Intelligence Generated Content (AIGC)
Qwen1.5
Qwen1.5 is the next version of the open source Qwen series. Compared with earlier versions, Qwen1.5 significantly improves the consistency between chat models and human preferences, provides improved multilingual capabilities, and gains strong link ability to external systems. The chat versions of the new Qwen models provide API services in DashScope and show great improvement in chat capabilities. Qwen1.5-Chat series achieves excellent performance even in MT-Bench.
The Qwen1.5-7B, Qwen1.5-14B, Qwen1.5-32B, Qwen1.5-72B, and Qwen1.5-110B models available in Alibaba Cloud Model Studio are specifically optimized for inference performance based on the corresponding open source Qwen1.5 versions. These models provide developers with convenient API services. For more information about the corresponding open source versions, visit ModelScope Qwen1.5. To switch ModelScope to English, click the icon in the top navigation bar.
The inputs are user-entered text prompts and the history of a varying number of conversation rounds, while the outputs are the replies generated by models. During the content generation process, the text is converted into a sequence of tokens that language models can understand. A token is the fundamental unit that models use to represent natural language text and is analogous to a character or a word. For Chinese text, one token usually corresponds to one Chinese character. For English text, one token usually represents three to four letters or a whole word. For example, the Chinese text "你好,我是通义千问"
is tokenized into the sequence ['你', '好', ',', '我', '是', '通', '义', '千', '问']
,while the English text "Nice to meet you." is tokenized into the sequence ['Nice', ' to', ' meet', ' you', '.'].
The computational load of model calling is correlated with the length of the token sequence. The more input or output tokens, the longer the computation time required by models. Charges for using models are based on the number of input and output tokens. You can obtain the number of tokens consumed during each call from the usage parameter of the API response.
Overview
Name | Description | Input and output limits |
qwen1.5-72b-chat | An open source chat model from the Qwen1.5 series. It has a scale of 72 billion parameters and is trained to align with human instructions. | The model supports a context of up to 32,000 tokens, with a maximum of 30,000 tokens for input and 2,000 tokens for output. |
qwen1.5-32b-chat | An open source chat model from the Qwen1.5 series. It has a scale of 32 billion parameters and is trained to align with human instructions. | |
qwen1.5-14b-chat | An open source chat model from the Qwen1.5 series. It has a scale of 14 billion parameters and is trained to align with human instructions. | The model supports a context of up to 8,000 tokens. To ensure normal model use and output, the maximum number of input tokens is limited to 6,000. |
qwen1.5-7b-chat | An open source chat model from the Qwen1.5 series. It has a scale of 7 billion parameters and is trained to align with human instructions. |
Use the SDK
Prerequisites
Alibaba Cloud Model Studio is activated and an API key is obtained. For more information, see Activate Alibaba Cloud Model Studio and Obtain an API key.
The SDK of the latest version is installed. For more information, see Install Alibaba Cloud Model Studio SDK.
Specify the API key
export DASHSCOPE_API_KEY=YOUR_DASHSCOPE_API_KEY
Specify the base URL
export DASHSCOPE_HTTP_BASE_URL='https://dashscope-intl.aliyuncs.com/api/v1'
Single-round conversation
The following sample code shows how to call the Qwen 72B model to respond to user input. To call the Qwen 7B, or 14B models, replace the model name in the code.
Replace YOUR_DASHSCOPE_API_KEY with your API key.
import random
from http import HTTPStatus
import dashscope
# If the environment variable is not set, please add the following line of code:
# dashscope.base_http_api_url = 'https://dashscope-intl.aliyuncs.com/api/v1'
def call_with_messages():
messages = [
{'role': 'user', 'content': 'Who are you'}]
response = dashscope.Generation.call(
'qwen1.5-72b-chat',
messages=messages,
# set the random seed, optional, default to 1234 if not set
seed=random.randint(1, 10000),
result_format='message', # set the result to be "message" format.
)
if response.status_code == HTTPStatus.OK:
print(response)
else:
print('Request id: %s, Status code: %s, error code: %s, error message: %s' % (
response.request_id, response.status_code,
response.code, response.message
))
if __name__ == '__main__':
call_with_messages()
// Copyright (c) Alibaba, Inc. and its affiliates.
import java.util.Arrays;
import java.util.concurrent.Semaphore;
import com.alibaba.dashscope.aigc.generation.Generation;
import com.alibaba.dashscope.aigc.generation.GenerationResult;
import com.alibaba.dashscope.aigc.generation.models.QwenParam;
import com.alibaba.dashscope.common.Message;
import com.alibaba.dashscope.common.ResultCallback;
import com.alibaba.dashscope.common.Role;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.InputRequiredException;
import com.alibaba.dashscope.exception.NoApiKeyException;
public class Main {
public static void callWithMessage()
throws NoApiKeyException, ApiException, InputRequiredException {
Generation gen = new Generation("http", "https://dashscope-intl.aliyuncs.com/api/v1");
Message userMsg = Message.builder().role(Role.USER.getValue()).content("Who are you").build();
QwenParam param =
QwenParam.builder().model("qwen-72b-chat")
.messages(Arrays.asList(userMsg))
.resultFormat(QwenParam.ResultFormat.MESSAGE)
.topP(0.8)
.build();
GenerationResult result = gen.call(param);
System.out.println(result);
}
public static void callWithMessageCallback()
throws NoApiKeyException, ApiException, InputRequiredException, InterruptedException {
Generation gen = new Generation();
Message userMsg = Message.builder().role(Role.USER.getValue()).content("Who are you").build();
QwenParam param =
QwenParam.builder().model("qwen-14b-chat")
.messages(Arrays.asList(userMsg))
.resultFormat(QwenParam.ResultFormat.MESSAGE)
.topP(0.8)
.build();
Semaphore semaphore = new Semaphore(0);
gen.call(param, new ResultCallback<GenerationResult>() {
@Override
public void onEvent(GenerationResult message) {
System.out.println(message);
}
@Override
public void onError(Exception ex){
System.out.println(ex.getMessage());
semaphore.release();
}
@Override
public void onComplete(){
System.out.println("onComplete");
semaphore.release();
}
});
semaphore.acquire();
}
public static void main(String[] args){
try {
callWithMessage();
} catch (ApiException | NoApiKeyException | InputRequiredException e) {
System.out.println(e.getMessage());
}
try {
callWithMessageCallback();
} catch (ApiException | NoApiKeyException | InputRequiredException | InterruptedException e) {
System.out.println(e.getMessage());
}
System.exit(0);
}
}
Multi-round conversation
You can use the messages parameter to pass in the conversation history to enable multiple rounds of interaction with the model.
import random
from http import HTTPStatus
from dashscope import Generation
from dashscope.api_entities.dashscope_response import Role
import dashscope
# If the environment variable is not set, please add the following line of code:
# dashscope.base_http_api_url = 'https://dashscope-intl.aliyuncs.com/api/v1'
def multi_round_conversation():
messages = [{'role': 'system', 'content': 'You are a helpful assistant.'},
{'role': 'user', 'content': 'Who are you'}]
response = Generation.call(
'qwen1.5-72b-chat',
messages=messages,
# set the random seed, optional, default to 1234 if not set
seed=random.randint(1, 10000),
result_format='message', # set the result to be "message" format.
)
if response.status_code == HTTPStatus.OK:
print(response)
messages.append({'role': response.output.choices[0]['message']['role'],
'content': response.output.choices[0]['message']['content']})
else:
print('Request id: %s, Status code: %s, error code: %s, error message: %s' % (
response.request_id, response.status_code,
response.code, response.message
))
messages.append({'role': Role.USER, 'content': 'Nice to meet you'})
response = Generation.call(
'qwen1.5-72b-chat',
messages=messages,
result_format='message', # set the result to be "message" format.
)
if response.status_code == HTTPStatus.OK:
print(response)
else:
print('Request id: %s, Status code: %s, error code: %s, error message: %s' % (
response.request_id, response.status_code,
response.code, response.message
))
if __name__ == '__main__':
multi_round_conversation()
// Copyright (c) Alibaba, Inc. and its affiliates.
import com.alibaba.dashscope.aigc.generation.Generation;
import com.alibaba.dashscope.aigc.generation.GenerationResult;
import com.alibaba.dashscope.aigc.generation.models.QwenParam;
import com.alibaba.dashscope.common.Message;
import com.alibaba.dashscope.common.MessageManager;
import com.alibaba.dashscope.common.Role;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.InputRequiredException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.alibaba.dashscope.utils.JsonUtils;
public class Main {
public static void callWithMessage()
throws NoApiKeyException, ApiException, InputRequiredException {
Generation gen = new Generation("http", "https://dashscope-intl.aliyuncs.com/api/v1");
MessageManager msgManager = new MessageManager(10);
Message systemMsg =
Message.builder().role(Role.SYSTEM.getValue()).content("You are a helpful assistant.").build();
Message userMsg = Message.builder().role(Role.USER.getValue()).content("Who are you").build();
msgManager.add(systemMsg);
msgManager.add(userMsg);
QwenParam param =
QwenParam.builder().model("qwen-72b-chat").messages(msgManager.get())
.resultFormat(QwenParam.ResultFormat.MESSAGE)
.topP(0.8)
/* set the random seed, optional, default to 1234 if not set */
.seed(100)
.build();
GenerationResult result = gen.call(param);
System.out.println(result);
msgManager.add(result);
System.out.println(JsonUtils.toJson(result));
param.setPrompt("Nice to meet you");
param.setMessages(msgManager.get());
result = gen.call(param);
System.out.println(result);
System.out.println(JsonUtils.toJson(result));
}
public static void main(String[] args){
try {
callWithMessage();
} catch (ApiException | NoApiKeyException | InputRequiredException e) {
System.out.println(e.getMessage());
}
System.exit(0);
}
}
Streaming output
import random
from http import HTTPStatus
from dashscope import Generation
import dashscope
# If the environment variable is not set, please add the following line of code:
# dashscope.base_http_api_url = 'https://dashscope-intl.aliyuncs.com/api/v1'
def call_stream_with_messages():
messages = [
{'role': 'user', 'content': 'Who are you'}]
responses = Generation.call(
'qwen1.5-72b-chat',
messages=messages,
seed=random.randint(1, 10000), # set the random seed, optional, default to 1234 if not set
result_format='message', # set the result to be "message" format.
stream=True,
output_in_full=True # get streaming output incrementally
)
full_content = ''
for response in responses:
if response.status_code == HTTPStatus.OK:
full_content += response.output.choices[0]['message']['content']
print(response)
else:
print('Request id: %s, Status code: %s, error code: %s, error message: %s' % (
response.request_id, response.status_code,
response.code, response.message
))
print('Full content: \n' + full_content)
if __name__ == '__main__':
call_stream_with_messages()
// Copyright (c) Alibaba, Inc. and its affiliates.
import java.util.Arrays;
import java.util.concurrent.Semaphore;
import com.alibaba.dashscope.aigc.generation.Generation;
import com.alibaba.dashscope.aigc.generation.GenerationResult;
import com.alibaba.dashscope.aigc.generation.models.QwenParam;
import com.alibaba.dashscope.common.Message;
import com.alibaba.dashscope.common.ResultCallback;
import com.alibaba.dashscope.common.Role;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.InputRequiredException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import io.reactivex.Flowable;
public class Main {
public static void streamCallWithMessage()
throws NoApiKeyException, ApiException, InputRequiredException {
Generation gen = new Generation("http", "https://dashscope-intl.aliyuncs.com/api/v1");
Message userMsg = Message.builder().role(Role.USER.getValue()).content("用萝卜、土豆、茄子做饭,给我个菜谱").build();
QwenParam param =
QwenParam.builder().model("qwen1.5-72b-chat")
.messages(Arrays.asList(userMsg))
.resultFormat(QwenParam.ResultFormat.MESSAGE)
.topP(0.8)
.incrementalOutput(true) // get streaming output incrementally
.build();
Flowable<GenerationResult> result = gen.streamCall(param);
StringBuilder fullContent = new StringBuilder();
result.blockingForEach(item->{
fullContent.append(item.getOutput().getChoices().get(0).getMessage().getContent());
System.out.println(item);
});
System.out.println("Full content: \n" + fullContent);
}
public static void streamCallWithMessageCallback()
throws NoApiKeyException, ApiException, InputRequiredException, InterruptedException {
Generation gen = new Generation();
Message userMsg = Message.builder().role(Role.USER.getValue()).content("Who are you").build();
QwenParam param =
QwenParam.builder().model("qwen-14b-chat")
.messages(Arrays.asList(userMsg))
.resultFormat(QwenParam.ResultFormat.MESSAGE)
.topP(0.8)
.incrementalOutput(true) // get streaming output incrementally
.build();
Semaphore semaphore = new Semaphore(0);
StringBuilder fullContent = new StringBuilder();
gen.streamCall(param, new ResultCallback<GenerationResult>() {
@Override
public void onEvent(GenerationResult message) {
fullContent.append(message.getOutput().getChoices().get(0).getMessage().getContent());
System.out.println(message);
}
@Override
public void onError(Exception ex){
System.out.println(ex.getMessage());
semaphore.release();
}
@Override
public void onComplete(){
System.out.println("onComplete");
semaphore.release();
}
});
semaphore.acquire();
System.out.println("Full content: \n" + fullContent);
}
public static void main(String[] args){
try {
streamCallWithMessage();
} catch (ApiException | NoApiKeyException | InputRequiredException e) {
System.out.println(e.getMessage());
}
try {
streamCallWithMessageCallback();
} catch (ApiException | NoApiKeyException | InputRequiredException | InterruptedException e) {
System.out.println(e.getMessage());
}
System.exit(0);
}
}
Request parameters
Parameter | Type | Description |
model | string | The name of the Qwen model to be used for interaction. For more information, see the Overview section of this topic. |
messages | array |
|
prompt | string | |
history | list[dict] | This parameter will be discontinued. We recommend that you use the messages parameter. Optional. The conversation history between you and the model. Each element in the list is a round of conversation in the format of {"user": "user input", "bot": "model output"}. The multiple rounds of conversations are sorted in ascending chronological order. Default value: []. |
seed | int | Optional. The random seed used during content generation. This parameter controls the randomness of the content generated by the model. Valid values: 64-bit unsigned integers. Default value: 1234. If you specify seed, the model tries to generate the same or similar content for the output of each model call. However, the model cannot ensure that the output is exactly the same for each model call. |
max_tokens | int | Optional. The maximum number of tokens that can be generated by the model.
|
top_p | float | Optional. The probability threshold of nucleus sampling. For example, if this parameter is set to 0.8, the model selects the smallest set of tokens whose cumulative probability is greater than or equal to 0.8. A greater value introduces more randomness to the generated content. Valid values: (0,1.0). Default value: 0.8. |
top_k | int | Optional. The size of the candidate set for sampling. For example, if this parameter is set to 50, only the 50 tokens with the highest scores generated at a time are used as the candidate set for random sampling. A greater value introduces more randomness to the generated content. Default value: 0, indicating that the top_k policy is disabled. In this case, only the top_p policy takes effect. |
repetition_penalty | float | Optional. The repetition of the content generated by the model. A greater value indicates lower repetition. A value of 1.0 specifies no repetition penalty. Default value: 1.1. |
temperature | float | Optional. The randomness and diversity of the generated content. To be specific, the value of this parameter controls the probability distribution from which the model samples each word. A greater value indicates that more low-probability words are selected and the generated content is more diversified. A smaller value indicates that more high-probability words are selected and the generated content is more predictable. Valid values: [0,2). We recommend that you do not set this parameter to 0, which is meaningless. Default value: 0.85. This parameter is valid if you use the SDK for Python version 1.10.1 or later, or the SDK for Java version 2.5.1 or later. |
stop | str/list[str] for specifying strings; list[int]/list[list[int]] for specifying token IDs | Optional. If you specify a string or token ID for this parameter, the model stops generating content when the string or token is about to be generated. For example, if you set this parameter to "Hello", the model stops when it is about to generate the string "Hello". In addition, the stop parameter accepts a list of strings or a list of token ID arrays to support scenarios that require multiple stop conditions. Note that a list cannot contain both token IDs and strings. |
stream | bool | Optional. Specifies whether to enable streaming output mode. In streaming output mode, the model returns a generator. You need to use an iterative loop to fetch the results from the generator and incrementally display the text. In Python, the output mode can be changed to non-incremental by setting the output_in_full parameter in the SDK to False. In Java, a similar change can be made by setting the incrementalOutput request parameter to False. Default value: False. |
result_format | String | Optional. The format of the output results. Valid values: text and message. Default value: text. |
incremental_output | bool | Optional. Specifies whether to enable the incremental streaming output mode. If you set this parameter to True, the incremental streaming output mode is enabled and the subsequent returned content excludes the historical returned content. If you set this parameter to False, the incremental streaming output mode is disabled and the subsequent returned content includes the historical returned content. Examples:
This parameter takes effect only if the stream parameter is set to True. Default value: False. |
Sample response
Sample response in the message format
{
"status_code": 200,
"request_id": "b3d8bb75-05a2-9044-8e9e-ec8c87689a5e",
"code": "",
"message": "",
"output": {
"text": null,
"finish_reason": null,
"choices": [
{
"finish_reason": "stop",
"message": {
"role": "assistant",
"content": "I am Qwen, a large language model created by Alibaba Cloud. My purpose is to assist users in generating various types of text, such as articles, responses, or creative content, while upholding the principles of providing accurate and helpful information. How can I assist you today?"
}
}
]
},
"usage": {
"input_tokens": 31,
"output_tokens": 267
}
}
Sample response in the text format
{
"status_code": 200,
"request_id": "446877aa-dbb8-99ca-98eb-d78a5e90fe61",
"code": "",
"message": "",
"output": {
"text": "I am Qwen, a large language model created by Alibaba Cloud. My purpose is to assist users in generating various types of text, such as articles, responses, or creative content, while upholding the principles of providing accurate and helpful information. How can I assist you today?",
"finish_reason": "stop",
"choices": null
},
"usage": {
"input_tokens": 31,
"output_tokens": 267
}
}
Response parameters
Parameter | Type | Description |
status_code | int | The response code. The status code 200 indicates that the request is successful. Other status codes indicate that the request failed. If the request failed, the corresponding error code and error message are returned by using the code and message parameters. |
request_Id | string | The request ID. |
code | string | The error code. This parameter is valid only if the request failed. |
message | string | The error message. This parameter is valid only if the request failed. |
output | dict | The information about the call results. For Qwen models, the information includes the generated output in the text parameter. |
output.usage | dict | The metering information, which indicates the usage metrics for the request. |
output.text | string | The output text generated by the model. |
output.finish_reason | string | The reason why the generation process stops. Valid values:
|
usage.input_tokens | int | The length of tokens converted from the input text. |
usage.output_tokens | int | The length of tokens converted from the output text. |
choices | List | The choices that are returned if the result_format parameter is set to message. |
choices[i].finish_reason | String | The reason why the generation process stops. Valid values:
This parameter is returned only if the result_format parameter is set to message. |
choices[i].message | dict | The message generated by the model. This parameter is returned only if the result_format parameter is set to message. |
message.role | String | The role of the model. The value is set to assistant. This parameter is returned only if the result_format parameter is set to message. |
message.content | String | The text generated by the model. This parameter is returned only if the result_format parameter is set to message. |
Use HTTP
Overview
Open source Qwen models support interaction with users by using the standard HTTP or HTTP Server-Sent Events (SSE) protocol. You can select a protocol based on your business requirements.
Prerequisites
Alibaba Cloud Model Studio is activated and an API key is created. For more information, see Obtain an API key.
Request syntax
POST https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/text-generation/generation
Request parameters
Section | Parameter | Type | Description | Example |
Header | Content-Type | String | The request type. Set this parameter to application/json for standard requests or text/event-stream to enable SSE. | application/json |
Accept | String | Optional. The media types that the client is willing to receive from the server. If you set this parameter to text/event-stream, SSE is enabled. Default value: */*, indicating that the client accepts any media type. | text/event-stream | |
Authorization | String | The API key. | Bearer d1**2a | |
X-DashScope-WorkSpace | String | Optional. The workspace to be used for this call. This parameter is required if the API key of a Resource Access Management (RAM) user is used. In addition, the specified workspace must contain the RAM user. This parameter is optional if the API key of an Alibaba Cloud account is used. If you specify a workspace, the corresponding identity in the workspace is used. If you leave this parameter empty, the identity of the Alibaba Cloud account is used. | ws_QTggmeAxxxxx | |
X-DashScope-SSE | String | Optional. Specifies whether to enable SSE. To enable SSE, you can either set this parameter to enable or set Accept to text/event-stream. | enable | |
Body | model | String | The model to be used. Valid values: qwen1.5-72b-chat, qwen1.5-14b-chat, qwen1.5-7b-chat, qwen-72b-chat, qwen-14b-chat, and qwen-7b-chat. | qwen1.5-72b-chat |
input.prompt | String | The prompt that you want the model to execute. You can enter a prompt in Chinese or English. | Which park is closest to me? | |
input.history | List | This parameter will be discontinued. We recommend that you use the messages parameter. Optional. The conversation history between the user and the model. Each element in the list is a round of conversation in the format of {"user": "user input", "bot": "model output"}. The multiple rounds of conversations are sorted in ascending chronological order. | "history": [ { "user":"How is the weather today?", "bot":"It's a nice day. Do you want to go out?" }, { "user":"What places do you recommend?", "bot":"I suggest that you go to the park. Spring is coming and the flowers are blooming. The park is beautiful." } ] | |
input.messages | List | The conversation history between the user and the model. Each element in the list is in the format of {"role": role, "content": content}. Valid values of role: system, user, and assistant. input.messages is optional. input.messages.role and input.messages.content are required if input.messages is specified. | "input":{ "messages":[ { "role": "system", "content": "You are a helpful assistant." }, { "role": "user", "content": "Hello, are there any museums nearby?" }] } | |
input.messages.role | String | |||
input.messages.content | String | |||
parameters.result_format | String | Optional. The format of the results. Valid values: text and message. text is used in earlier versions. The message format is compatible with OpenAI. | "text" | |
parameters.seed | Integer | Optional. The random seed used during content generation. This parameter controls the randomness of the content generated by the model. Valid values: 64-bit unsigned integers. Default value: 1234. If you specify seed, the model tries to generate the same or similar content for the output of each model call. However, the model cannot ensure that the output is exactly the same for each model call. | 65535 | |
parameters.max_tokens | Integer | Optional. The maximum number of tokens that can be generated by the model.
| 1500 | |
parameters.top_p | Float | Optional. The probability threshold of nucleus sampling. For example, if this parameter is set to 0.8, the model selects the smallest set of tokens whose cumulative probability is greater than or equal to 0.8. A greater value introduces more randomness to the generated content. Valid values: (0,1.0). Default value: 0.8. | 0.8 | |
parameters.top_k | Integer | Optional. The size of the candidate set for sampling. For example, if this parameter is set to 50, only the 50 tokens with the highest scores generated at a time are used as the candidate set for random sampling. A greater value introduces more randomness to the generated content. By default, the top_k parameter is left empty. If the top_k parameter is left empty or set to a value greater than 100, the top_k policy is disabled. In this case, only the top_p policy takes effect. | 50 | |
parameters.repetition_penalty | Float | Optional. The repetition of the content generated by the model. A greater value indicates lower repetition. A value of 1.0 specifies no repetition penalty. Default value: 1.1. | 1.1 | |
parameters.temperature | Float | Optional. The randomness and diversity of the generated content. To be specific, the value of this parameter controls the probability distribution from which the model samples each word. A greater value indicates that more low-probability words are selected and the generated content is more diversified. A smaller value indicates that more high-probability words are selected and the generated content is more predictable. Valid values: [0,2). We recommend that you do not set this parameter to 0, which is meaningless. Default value: 0.85. | 0.85 | |
parameters.stop | str/list[str] for specifying strings; list[int]/list[list[int]] for specifying token IDs | Optional. If you specify a string or token ID for this parameter, the model stops generating content when the string or token is about to be generated. For example, if you set this parameter to "Hello", the model stops when it is about to generate the string "Hello". In addition, the stop parameter accepts a list of strings or a list of token ID arrays to support scenarios that require multiple stop conditions. Note that a list cannot contain both token IDs and strings. | [[37763, 367]] | |
parameters.incremental_output | Bool | Optional. Specifies whether to enable the incremental streaming output mode. If you set this parameter to True, the incremental streaming output mode is enabled and the subsequent returned content excludes the historical returned content. If you set this parameter to False, the incremental streaming output mode is disabled and the subsequent returned content includes the historical returned content. Examples:
This parameter takes effect only if the stream parameter is set to True. Default value: False. |
Response parameters
Parameter | Type | Description | Example |
output.text | String | The output text. | I suggest that you go to the Summer Palace. |
output.finish_reason | String | The reason why the generation process stops. Valid values:
| stop |
output.choise[list] | List | This parameter is returned only if the result_format parameter is set to message. | This parameter is returned only if the result_format parameter is set to message. |
output.choise[x].finish_reason | String | The reason why the generation process stops. Valid values:
| |
output.choise[x].message | String | Each element in the message is in the format of {"role": role, "content": content}. Valid values of role are system, user, and assistant. More roles will be supported in the future. The content contains the output for the request. | |
output.choise[x].message.role | String | ||
output.choise[x].message.content | String | ||
usage.output_tokens | Integer | The number of tokens in the output for the request. | 380 |
usage.input_tokens | Integer | The number of tokens in the input for the request. If search is enabled, additional tokens for search-related content are included. This increases the total token count beyond the initial input for the request. | 633 |
request_id | String | The request ID. | 7574ee8f-38a3-4b1e-9280-11c33ab46e51 |
Sample request (SSE disabled)
The following sample code shows how to call a Qwen 14B model by using a cURL command. In this example, SSE is disabled. If you want to call a Qwen 7B or 72B model, specify the model in the model parameter.
Replace your-dashscope-api-key with your API key.
curl --location 'https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/text-generation/generation' \
--header 'Authorization: Bearer <YOUR-DASHSCOPE-API-KEY>' \
--header 'Content-Type: application/json' \
--data '{
"model": "qwen-14b-chat",
"input":{
"messages":[
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "Who are you"
}
]
},
"parameters": {
}
}'
Sample response (SSE disabled)
{
"output":{
"text":"I am Qwen, a large language model created by Alibaba Cloud. My purpose is to assist users in generating various types of text, such as articles, responses, or creative content, while upholding the principles of providing accurate and helpful information. How can I assist you today?",
"finish_reason":"stop"
},
"usage":{
"output_tokens":51,
"input_tokens":85
},
"request_id":"d89c06fb-46a1-47b6-acb9-bfb17f814969"
}
Sample request (SSE enabled)
The following sample code shows how to call a Qwen 14B model by using a cURL command. In this example, SSE is enabled. If you want to call a Qwen 72B model, specify the model in the model parameter.
Replace your-dashscope-api-key with your API key.
curl --location 'https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/text-generation/generation' \
--header 'Authorization: Bearer <YOUR-DASHSCOPE-API-KEY>' \
--header 'Content-Type: application/json' \
--header 'X-DashScope-SSE: enable' \
--data '{
"model": "qwen1.5-72b-chat",
"input":{
"messages":[
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "Who are you"
}
]
},
"parameters": {
}
}'
Sample response (SSE enabled)
id:1
event:result
:HTTP_STATUS/200
data:{"output":{"choices":[{"message":{"content":"Hello","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":28,"input_tokens":27,"output_tokens":1},"request_id":"xxx"}
id:2
event:result
:HTTP_STATUS/200
data:{"output":{"choices":[{"message":{"content":",","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":29,"input_tokens":27,"output_tokens":2},"request_id":"xxx"}
... ... ... ...
... ... ... ...
id:12
event:result
:HTTP_STATUS/200
data:{"output":{"choices":[{"message":{"content":"","role":"assistant"},"finish_reason":"stop"}]},"usage":{"total_tokens":91,"input_tokens":27,"output_tokens":64},"request_id":"xxx"}
Sample error response
If the request failed, the corresponding error code and error message are returned by using the code and message parameters.
{
"code":"InvalidApiKey",
"message":"Invalid API-key provided.",
"request_id":"fb53c4ec-1c12-4fc4-a580-cdb7c3261fc1"
}