All Products
Search
Document Center

Alibaba Cloud Model Studio:API reference

Last Updated:Sep 02, 2024

Models

Note

Supported fields or tasks: Artificial Intelligence Generated Content (AIGC)

Qwen1.5

Qwen1.5 is the next version of the open source Qwen series. Compared with earlier versions, Qwen1.5 significantly improves the consistency between chat models and human preferences, provides improved multilingual capabilities, and gains strong link ability to external systems. The chat versions of the new Qwen models provide API services in DashScope and show great improvement in chat capabilities. Qwen1.5-Chat series achieves excellent performance even in MT-Bench.

The Qwen1.5-7B, Qwen1.5-14B, Qwen1.5-32B, Qwen1.5-72B, and Qwen1.5-110B models available in Alibaba Cloud Model Studio are specifically optimized for inference performance based on the corresponding open source Qwen1.5 versions. These models provide developers with convenient API services. For more information about the corresponding open source versions, visit ModelScope Qwen1.5. To switch ModelScope to English, click the image icon in the top navigation bar.

The inputs are user-entered text prompts and the history of a varying number of conversation rounds, while the outputs are the replies generated by models. During the content generation process, the text is converted into a sequence of tokens that language models can understand. A token is the fundamental unit that models use to represent natural language text and is analogous to a character or a word. For Chinese text, one token usually corresponds to one Chinese character. For English text, one token usually represents three to four letters or a whole word. For example, the Chinese text "你好,我是通义千问" is tokenized into the sequence ['你', '好', ',', '我', '是', '通', '义', '千', '问'],while the English text "Nice to meet you." is tokenized into the sequence ['Nice', ' to', ' meet', ' you', '.'].

The computational load of model calling is correlated with the length of the token sequence. The more input or output tokens, the longer the computation time required by models. Charges for using models are based on the number of input and output tokens. You can obtain the number of tokens consumed during each call from the usage parameter of the API response.

Overview

Name

Description

Input and output limits

qwen1.5-72b-chat

An open source chat model from the Qwen1.5 series. It has a scale of 72 billion parameters and is trained to align with human instructions.

The model supports a context of up to 32,000 tokens, with a maximum of 30,000 tokens for input and 2,000 tokens for output.

qwen1.5-32b-chat

An open source chat model from the Qwen1.5 series. It has a scale of 32 billion parameters and is trained to align with human instructions.

qwen1.5-14b-chat

An open source chat model from the Qwen1.5 series. It has a scale of 14 billion parameters and is trained to align with human instructions.

The model supports a context of up to 8,000 tokens. To ensure normal model use and output, the maximum number of input tokens is limited to 6,000.

qwen1.5-7b-chat

An open source chat model from the Qwen1.5 series. It has a scale of 7 billion parameters and is trained to align with human instructions.

Use the SDK

Prerequisites

Single-round conversation

The following sample code shows how to call the Qwen 72B model to respond to user input. To call the Qwen 7B, or 14B models, replace the model name in the code.

Note

Replace YOUR_DASHSCOPE_API_KEY with your API key.

import random
from http import HTTPStatus
import dashscope
# If the environment variable is not set, please add the following line of code:
# dashscope.base_http_api_url = 'https://dashscope-intl.aliyuncs.com/api/v1'

def call_with_messages():
    messages = [
        {'role': 'user', 'content': 'Who are you'}]
    response = dashscope.Generation.call(
        'qwen1.5-72b-chat',
        messages=messages,
        # set the random seed, optional, default to 1234 if not set
        seed=random.randint(1, 10000),
        result_format='message',  # set the result to be "message" format.
    )
    if response.status_code == HTTPStatus.OK:
        print(response)
    else:
        print('Request id: %s, Status code: %s, error code: %s, error message: %s' % (
            response.request_id, response.status_code,
            response.code, response.message
        ))


if __name__ == '__main__':
    call_with_messages()
// Copyright (c) Alibaba, Inc. and its affiliates.

import java.util.Arrays;
import java.util.concurrent.Semaphore;
import com.alibaba.dashscope.aigc.generation.Generation;
import com.alibaba.dashscope.aigc.generation.GenerationResult;
import com.alibaba.dashscope.aigc.generation.models.QwenParam;
import com.alibaba.dashscope.common.Message;
import com.alibaba.dashscope.common.ResultCallback;
import com.alibaba.dashscope.common.Role;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.InputRequiredException;
import com.alibaba.dashscope.exception.NoApiKeyException;



public class Main {
  public static void callWithMessage()
      throws NoApiKeyException, ApiException, InputRequiredException {
    Generation gen = new Generation("http", "https://dashscope-intl.aliyuncs.com/api/v1");
    Message userMsg = Message.builder().role(Role.USER.getValue()).content("Who are you").build();
    QwenParam param =
        QwenParam.builder().model("qwen-72b-chat")
            .messages(Arrays.asList(userMsg))
            .resultFormat(QwenParam.ResultFormat.MESSAGE)
            .topP(0.8)
            .build();
    GenerationResult result = gen.call(param);
    System.out.println(result);
  }

  public static void callWithMessageCallback()
      throws NoApiKeyException, ApiException, InputRequiredException, InterruptedException {
    Generation gen = new Generation();
    Message userMsg = Message.builder().role(Role.USER.getValue()).content("Who are you").build();
    QwenParam param =
        QwenParam.builder().model("qwen-14b-chat")
            .messages(Arrays.asList(userMsg))
            .resultFormat(QwenParam.ResultFormat.MESSAGE)
            .topP(0.8)
            .build();
    Semaphore semaphore = new Semaphore(0);
    gen.call(param, new ResultCallback<GenerationResult>() {

      @Override
      public void onEvent(GenerationResult message) {
        System.out.println(message);
      }
      @Override
      public void onError(Exception ex){
        System.out.println(ex.getMessage());
        semaphore.release();
      }
      @Override
      public void onComplete(){
        System.out.println("onComplete");
        semaphore.release();
      }
      
    });
    semaphore.acquire();
  }

  public static void main(String[] args){
        try {
          callWithMessage();
        } catch (ApiException | NoApiKeyException | InputRequiredException e) {
          System.out.println(e.getMessage());
        }
        try {
          callWithMessageCallback();
        } catch (ApiException | NoApiKeyException | InputRequiredException | InterruptedException e) {
          System.out.println(e.getMessage());
        }
        System.exit(0);
  }
}

Multi-round conversation

You can use the messages parameter to pass in the conversation history to enable multiple rounds of interaction with the model.

import random
from http import HTTPStatus
from dashscope import Generation
from dashscope.api_entities.dashscope_response import Role
import dashscope
# If the environment variable is not set, please add the following line of code:
# dashscope.base_http_api_url = 'https://dashscope-intl.aliyuncs.com/api/v1'

def multi_round_conversation():
    messages = [{'role': 'system', 'content': 'You are a helpful assistant.'},
                {'role': 'user', 'content': 'Who are you'}]
    response = Generation.call(
        'qwen1.5-72b-chat',
        messages=messages,
        # set the random seed, optional, default to 1234 if not set
        seed=random.randint(1, 10000),
        result_format='message',  # set the result to be "message"  format.
    )
    if response.status_code == HTTPStatus.OK:
        print(response)
        messages.append({'role': response.output.choices[0]['message']['role'],
                         'content': response.output.choices[0]['message']['content']})
    else:
        print('Request id: %s, Status code: %s, error code: %s, error message: %s' % (
            response.request_id, response.status_code,
            response.code, response.message
        ))
    messages.append({'role': Role.USER, 'content': 'Nice to meet you'})
    response = Generation.call(
        'qwen1.5-72b-chat',
        messages=messages,
        result_format='message',  # set the result to be "message"  format.
    )
    if response.status_code == HTTPStatus.OK:
        print(response)
    else:
        print('Request id: %s, Status code: %s, error code: %s, error message: %s' % (
            response.request_id, response.status_code,
            response.code, response.message
        ))


if __name__ == '__main__':
    multi_round_conversation()
// Copyright (c) Alibaba, Inc. and its affiliates.

import com.alibaba.dashscope.aigc.generation.Generation;
import com.alibaba.dashscope.aigc.generation.GenerationResult;
import com.alibaba.dashscope.aigc.generation.models.QwenParam;
import com.alibaba.dashscope.common.Message;
import com.alibaba.dashscope.common.MessageManager;
import com.alibaba.dashscope.common.Role;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.InputRequiredException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.alibaba.dashscope.utils.JsonUtils;



public class Main {
  public static void callWithMessage()
      throws NoApiKeyException, ApiException, InputRequiredException {
    Generation gen = new Generation("http", "https://dashscope-intl.aliyuncs.com/api/v1");
    MessageManager msgManager = new MessageManager(10);
    Message systemMsg =
        Message.builder().role(Role.SYSTEM.getValue()).content("You are a helpful assistant.").build();
    Message userMsg = Message.builder().role(Role.USER.getValue()).content("Who are you").build();
    msgManager.add(systemMsg);
    msgManager.add(userMsg);
    QwenParam param =
        QwenParam.builder().model("qwen-72b-chat").messages(msgManager.get())
            .resultFormat(QwenParam.ResultFormat.MESSAGE)
            .topP(0.8)
            /* set the random seed, optional, default to 1234 if not set */
            .seed(100)
            .build();
    GenerationResult result = gen.call(param);
    System.out.println(result);
    msgManager.add(result);
    System.out.println(JsonUtils.toJson(result));
    param.setPrompt("Nice to meet you");
    param.setMessages(msgManager.get());
    result = gen.call(param);
    System.out.println(result);
    System.out.println(JsonUtils.toJson(result));
  }


  public static void main(String[] args){
        try {
          callWithMessage();
        } catch (ApiException | NoApiKeyException | InputRequiredException e) {
          System.out.println(e.getMessage());
        }
        System.exit(0);
  }
}

Streaming output

import random
from http import HTTPStatus
from dashscope import Generation
import dashscope
# If the environment variable is not set, please add the following line of code:
# dashscope.base_http_api_url = 'https://dashscope-intl.aliyuncs.com/api/v1'

def call_stream_with_messages():
    messages = [
        {'role': 'user', 'content': 'Who are you'}]
    responses = Generation.call(
        'qwen1.5-72b-chat',
        messages=messages,
        seed=random.randint(1, 10000),  # set the random seed, optional, default to 1234 if not set
        result_format='message',  # set the result to be "message"  format.
        stream=True,
        output_in_full=True  # get streaming output incrementally
    )
    full_content = ''
    for response in responses:
        if response.status_code == HTTPStatus.OK:
            full_content += response.output.choices[0]['message']['content']
            print(response)
        else:
            print('Request id: %s, Status code: %s, error code: %s, error message: %s' % (
                response.request_id, response.status_code,
                response.code, response.message
            ))
    print('Full content: \n' + full_content)


if __name__ == '__main__':
    call_stream_with_messages()
// Copyright (c) Alibaba, Inc. and its affiliates.

import java.util.Arrays;
import java.util.concurrent.Semaphore;
import com.alibaba.dashscope.aigc.generation.Generation;
import com.alibaba.dashscope.aigc.generation.GenerationResult;
import com.alibaba.dashscope.aigc.generation.models.QwenParam;
import com.alibaba.dashscope.common.Message;
import com.alibaba.dashscope.common.ResultCallback;
import com.alibaba.dashscope.common.Role;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.InputRequiredException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import io.reactivex.Flowable;


public class Main {
  public static void streamCallWithMessage()
      throws NoApiKeyException, ApiException, InputRequiredException {
    Generation gen = new Generation("http", "https://dashscope-intl.aliyuncs.com/api/v1");
    Message userMsg = Message.builder().role(Role.USER.getValue()).content("用萝卜、土豆、茄子做饭,给我个菜谱").build();
    QwenParam param =
        QwenParam.builder().model("qwen1.5-72b-chat")
            .messages(Arrays.asList(userMsg))
            .resultFormat(QwenParam.ResultFormat.MESSAGE)
            .topP(0.8)
            .incrementalOutput(true) // get streaming output incrementally
            .build();
    Flowable<GenerationResult> result = gen.streamCall(param);
    StringBuilder fullContent = new StringBuilder();
    result.blockingForEach(item->{
      fullContent.append(item.getOutput().getChoices().get(0).getMessage().getContent());
      System.out.println(item);
    });
    System.out.println("Full content: \n" + fullContent);
  }

  public static void streamCallWithMessageCallback()
      throws NoApiKeyException, ApiException, InputRequiredException, InterruptedException {
    Generation gen = new Generation();
    Message userMsg = Message.builder().role(Role.USER.getValue()).content("Who are you").build();
    QwenParam param =
        QwenParam.builder().model("qwen-14b-chat")
            .messages(Arrays.asList(userMsg))
            .resultFormat(QwenParam.ResultFormat.MESSAGE)
            .topP(0.8)
            .incrementalOutput(true) // get streaming output incrementally
            .build();
    Semaphore semaphore = new Semaphore(0);
    StringBuilder fullContent = new StringBuilder();
    gen.streamCall(param, new ResultCallback<GenerationResult>() {

      @Override
      public void onEvent(GenerationResult message) {
        fullContent.append(message.getOutput().getChoices().get(0).getMessage().getContent());
        System.out.println(message);
      }
      @Override
      public void onError(Exception ex){
        System.out.println(ex.getMessage());
        semaphore.release();
      }
      @Override
      public void onComplete(){
        System.out.println("onComplete");
        semaphore.release();
      }
      
    });
    semaphore.acquire();
    System.out.println("Full content: \n" + fullContent);
  }

  public static void main(String[] args){
        try {
          streamCallWithMessage();
        } catch (ApiException | NoApiKeyException | InputRequiredException e) {
          System.out.println(e.getMessage());
        }
        try {
          streamCallWithMessageCallback();
        } catch (ApiException | NoApiKeyException | InputRequiredException | InterruptedException e) {
          System.out.println(e.getMessage());
        }
        System.exit(0);
  }
}

Request parameters

Parameter

Type

Description

model

string

The name of the Qwen model to be used for interaction. For more information, see the Overview section of this topic.

messages

array

  • The messages parameter specifies the conversation history between you and the model. Each element in the list is in the format of {"role": role, "content": content}. Valid values of role are system, user, and assistant. The system role is allowed only in messages[0]. The user and assistant roles must appear in an alternating sequence.

  • The prompt parameter specifies the prompt that you want the model to execute.

  • You must specify either the messages or prompt parameter. We recommend that you specify the messages parameter for chat scenarios.

prompt

string

history

list[dict]

This parameter will be discontinued. We recommend that you use the messages parameter. Optional. The conversation history between you and the model. Each element in the list is a round of conversation in the format of {"user": "user input", "bot": "model output"}. The multiple rounds of conversations are sorted in ascending chronological order.

Default value: [].

seed

int

Optional. The random seed used during content generation. This parameter controls the randomness of the content generated by the model.

Valid values: 64-bit unsigned integers.

Default value: 1234.

If you specify seed, the model tries to generate the same or similar content for the output of each model call. However, the model cannot ensure that the output is exactly the same for each model call.

max_tokens

int

Optional. The maximum number of tokens that can be generated by the model.

  • If you use the qwen1.5-14b-chat, qwen1.5-7b-chat, qwen-14b-chat, and qwen-7b-chat models, the maximum value and default value are 1500.

  • If you use the qwen-72b-chat model, the maximum value and default value are 2000.

top_p

float

Optional. The probability threshold of nucleus sampling. For example, if this parameter is set to 0.8, the model selects the smallest set of tokens whose cumulative probability is greater than or equal to 0.8. A greater value introduces more randomness to the generated content.

Valid values: (0,1.0).

Default value: 0.8.

top_k

int

Optional. The size of the candidate set for sampling. For example, if this parameter is set to 50, only the 50 tokens with the highest scores generated at a time are used as the candidate set for random sampling. A greater value introduces more randomness to the generated content.

Default value: 0, indicating that the top_k policy is disabled. In this case, only the top_p policy takes effect.

repetition_penalty

float

Optional. The repetition of the content generated by the model. A greater value indicates lower repetition. A value of 1.0 specifies no repetition penalty.

Default value: 1.1.

temperature

float

Optional. The randomness and diversity of the generated content. To be specific, the value of this parameter controls the probability distribution from which the model samples each word. A greater value indicates that more low-probability words are selected and the generated content is more diversified. A smaller value indicates that more high-probability words are selected and the generated content is more predictable.

Valid values: [0,2). We recommend that you do not set this parameter to 0, which is meaningless.

Default value: 0.85.

This parameter is valid if you use the SDK for Python version 1.10.1 or later, or the SDK for Java version 2.5.1 or later.

stop

str/list[str] for specifying strings; list[int]/list[list[int]] for specifying token IDs

Optional. If you specify a string or token ID for this parameter, the model stops generating content when the string or token is about to be generated. For example, if you set this parameter to "Hello", the model stops when it is about to generate the string "Hello". In addition, the stop parameter accepts a list of strings or a list of token ID arrays to support scenarios that require multiple stop conditions. Note that a list cannot contain both token IDs and strings.

stream

bool

Optional. Specifies whether to enable streaming output mode. In streaming output mode, the model returns a generator. You need to use an iterative loop to fetch the results from the generator and incrementally display the text. In Python, the output mode can be changed to non-incremental by setting the output_in_full parameter in the SDK to False. In Java, a similar change can be made by setting the incrementalOutput request parameter to False.

Default value: False.

result_format

String

Optional. The format of the output results.

Valid values: text and message.

Default value: text.

incremental_output

bool

Optional. Specifies whether to enable the incremental streaming output mode. If you set this parameter to True, the incremental streaming output mode is enabled and the subsequent returned content excludes the historical returned content. If you set this parameter to False, the incremental streaming output mode is disabled and the subsequent returned content includes the historical returned content.

Examples:

  • False:

    I

    I like

    I like apple

  • True:

    I

    like

    apple

This parameter takes effect only if the stream parameter is set to True.

Default value: False.

Sample response

  • Sample response in the message format

{
    "status_code": 200,
    "request_id": "b3d8bb75-05a2-9044-8e9e-ec8c87689a5e",
    "code": "",
    "message": "",
    "output": {
        "text": null,
        "finish_reason": null,
        "choices": [
            {
                "finish_reason": "stop",
                "message": {
                    "role": "assistant",
                    "content": "I am Qwen, a large language model created by Alibaba Cloud. My purpose is to assist users in generating various types of text, such as articles, responses, or creative content, while upholding the principles of providing accurate and helpful information. How can I assist you today?"
                }
            }
        ]
    },
    "usage": {
        "input_tokens": 31,
        "output_tokens": 267
    }
}
  • Sample response in the text format

{
    "status_code": 200,
    "request_id": "446877aa-dbb8-99ca-98eb-d78a5e90fe61",
    "code": "",
    "message": "",
    "output": {
        "text": "I am Qwen, a large language model created by Alibaba Cloud. My purpose is to assist users in generating various types of text, such as articles, responses, or creative content, while upholding the principles of providing accurate and helpful information. How can I assist you today?",
        "finish_reason": "stop",
        "choices": null
    },
    "usage": {
        "input_tokens": 31,
        "output_tokens": 267
    }
}

  • Response parameters

Parameter

Type

Description

status_code

int

The response code. The status code 200 indicates that the request is successful. Other status codes indicate that the request failed. If the request failed, the corresponding error code and error message are returned by using the code and message parameters.

request_Id

string

The request ID.

code

string

The error code. This parameter is valid only if the request failed.

message

string

The error message. This parameter is valid only if the request failed.

output

dict

The information about the call results. For Qwen models, the information includes the generated output in the text parameter.

output.usage

dict

The metering information, which indicates the usage metrics for the request.

output.text

string

The output text generated by the model.

output.finish_reason

string

The reason why the generation process stops.

Valid values:

  • null: The model is generating content.

  • stop: The model encounters a stop token.

  • length: The generated content reaches the maximum allowed length.

usage.input_tokens

int

The length of tokens converted from the input text.

usage.output_tokens

int

The length of tokens converted from the output text.

choices

List

The choices that are returned if the result_format parameter is set to message.

choices[i].finish_reason

String

The reason why the generation process stops.

Valid values:

  • null: The model is generating content.

  • stop: The model encounters a stop token.

  • length: The generated content reaches the maximum allowed length.

This parameter is returned only if the result_format parameter is set to message.

choices[i].message

dict

The message generated by the model.

This parameter is returned only if the result_format parameter is set to message.

message.role

String

The role of the model. The value is set to assistant.

This parameter is returned only if the result_format parameter is set to message.

message.content

String

The text generated by the model.

This parameter is returned only if the result_format parameter is set to message.

Use HTTP

Overview

Open source Qwen models support interaction with users by using the standard HTTP or HTTP Server-Sent Events (SSE) protocol. You can select a protocol based on your business requirements.

Prerequisites

Alibaba Cloud Model Studio is activated and an API key is created. For more information, see Obtain an API key.

Request syntax

POST https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/text-generation/generation

Request parameters

Section

Parameter

Type

Description

Example

Header

Content-Type

String

The request type. Set this parameter to application/json for standard requests or text/event-stream to enable SSE.

application/json

Accept

String

Optional. The media types that the client is willing to receive from the server. If you set this parameter to text/event-stream, SSE is enabled.

Default value: */*, indicating that the client accepts any media type.

text/event-stream

Authorization

String

The API key.

Bearer d1**2a

X-DashScope-WorkSpace

String

Optional. The workspace to be used for this call. This parameter is required if the API key of a Resource Access Management (RAM) user is used. In addition, the specified workspace must contain the RAM user. This parameter is optional if the API key of an Alibaba Cloud account is used. If you specify a workspace, the corresponding identity in the workspace is used. If you leave this parameter empty, the identity of the Alibaba Cloud account is used.

ws_QTggmeAxxxxx

X-DashScope-SSE

String

Optional. Specifies whether to enable SSE. To enable SSE, you can either set this parameter to enable or set Accept to text/event-stream.

enable

Body

model

String

The model to be used.

Valid values: qwen1.5-72b-chat, qwen1.5-14b-chat, qwen1.5-7b-chat, qwen-72b-chat, qwen-14b-chat, and qwen-7b-chat.

qwen1.5-72b-chat

input.prompt

String

The prompt that you want the model to execute. You can enter a prompt in Chinese or English.

Which park is closest to me?

input.history

List

This parameter will be discontinued. We recommend that you use the messages parameter. Optional. The conversation history between the user and the model. Each element in the list is a round of conversation in the format of {"user": "user input", "bot": "model output"}. The multiple rounds of conversations are sorted in ascending chronological order.

"history": [

{

"user":"How is the weather today?",

"bot":"It's a nice day. Do you want to go out?"

},

{

"user":"What places do you recommend?",

"bot":"I suggest that you go to the park. Spring is coming and the flowers are blooming. The park is beautiful."

}

]

input.messages

List

The conversation history between the user and the model. Each element in the list is in the format of {"role": role, "content": content}.

Valid values of role: system, user, and assistant.

input.messages is optional.

input.messages.role and input.messages.content are required if input.messages is specified.

"input":{

"messages":[

{

"role": "system",

"content": "You are a helpful assistant."

},

{

"role": "user",

"content": "Hello, are there any museums nearby?"

}]

}

input.messages.role

String

input.messages.content

String

parameters.result_format

String

Optional. The format of the results.

Valid values: text and message. text is used in earlier versions.

The message format is compatible with OpenAI.

"text"

parameters.seed

Integer

Optional. The random seed used during content generation. This parameter controls the randomness of the content generated by the model.

Valid values: 64-bit unsigned integers.

Default value: 1234.

If you specify seed, the model tries to generate the same or similar content for the output of each model call. However, the model cannot ensure that the output is exactly the same for each model call.

65535

parameters.max_tokens

Integer

Optional. The maximum number of tokens that can be generated by the model.

  • If you use the qwen1.5-14b-chat, qwen1.5-7b-chat, qwen-14b-chat, and qwen-7b-chat models, the maximum value and default value are 1500.

  • If you use the qwen-72b-chat model, the maximum value and default value are 2000.

1500

parameters.top_p

Float

Optional. The probability threshold of nucleus sampling. For example, if this parameter is set to 0.8, the model selects the smallest set of tokens whose cumulative probability is greater than or equal to 0.8. A greater value introduces more randomness to the generated content.

Valid values: (0,1.0).

Default value: 0.8.

0.8

parameters.top_k

Integer

Optional. The size of the candidate set for sampling. For example, if this parameter is set to 50, only the 50 tokens with the highest scores generated at a time are used as the candidate set for random sampling. A greater value introduces more randomness to the generated content.

By default, the top_k parameter is left empty.

If the top_k parameter is left empty or set to a value greater than 100, the top_k policy is disabled. In this case, only the top_p policy takes effect.

50

parameters.repetition_penalty

Float

Optional. The repetition of the content generated by the model. A greater value indicates lower repetition. A value of 1.0 specifies no repetition penalty.

Default value: 1.1.

1.1

parameters.temperature

Float

Optional. The randomness and diversity of the generated content. To be specific, the value of this parameter controls the probability distribution from which the model samples each word. A greater value indicates that more low-probability words are selected and the generated content is more diversified. A smaller value indicates that more high-probability words are selected and the generated content is more predictable.

Valid values: [0,2). We recommend that you do not set this parameter to 0, which is meaningless.

Default value: 0.85.

0.85

parameters.stop

str/list[str] for specifying strings; list[int]/list[list[int]] for specifying token IDs

Optional. If you specify a string or token ID for this parameter, the model stops generating content when the string or token is about to be generated. For example, if you set this parameter to "Hello", the model stops when it is about to generate the string "Hello". In addition, the stop parameter accepts a list of strings or a list of token ID arrays to support scenarios that require multiple stop conditions. Note that a list cannot contain both token IDs and strings.

[[37763, 367]]

parameters.incremental_output

Bool

Optional. Specifies whether to enable the incremental streaming output mode. If you set this parameter to True, the incremental streaming output mode is enabled and the subsequent returned content excludes the historical returned content. If you set this parameter to False, the incremental streaming output mode is disabled and the subsequent returned content includes the historical returned content.

Examples:

  • False:

    I

    I like

    I like apple

  • True:

    I

    like

    apple

This parameter takes effect only if the stream parameter is set to True.

Default value: False.

Response parameters

Parameter

Type

Description

Example

output.text

String

The output text.

I suggest that you go to the Summer Palace.

output.finish_reason

String

The reason why the generation process stops.

Valid values:

  • null: The model is generating content.

  • stop: The model encounters a stop token.

  • length: The generated content reaches the maximum allowed length.

stop

output.choise[list]

List

This parameter is returned only if the result_format parameter is set to message.

This parameter is returned only if the result_format parameter is set to message.

output.choise[x].finish_reason

String

The reason why the generation process stops.

Valid values:

  • null: The model is generating content.

  • stop: The model encounters a stop token.

  • length: The generated content reaches the maximum allowed length.

output.choise[x].message

String

Each element in the message is in the format of {"role": role, "content": content}. Valid values of role are system, user, and assistant. More roles will be supported in the future. The content contains the output for the request.

output.choise[x].message.role

String

output.choise[x].message.content

String

usage.output_tokens

Integer

The number of tokens in the output for the request.

380

usage.input_tokens

Integer

The number of tokens in the input for the request. If search is enabled, additional tokens for search-related content are included. This increases the total token count beyond the initial input for the request.

633

request_id

String

The request ID.

7574ee8f-38a3-4b1e-9280-11c33ab46e51

Sample request (SSE disabled)

The following sample code shows how to call a Qwen 14B model by using a cURL command. In this example, SSE is disabled. If you want to call a Qwen 7B or 72B model, specify the model in the model parameter.

Note

Replace your-dashscope-api-key with your API key.

curl --location 'https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/text-generation/generation' \
--header 'Authorization: Bearer <YOUR-DASHSCOPE-API-KEY>' \
--header 'Content-Type: application/json' \
--data '{
    "model": "qwen-14b-chat",
    "input":{
        "messages":[      
            {
                "role": "system",
                "content": "You are a helpful assistant."
            },
            {
                "role": "user",
                "content": "Who are you"
            }
        ]
    },
    "parameters": {
    }
}'

Sample response (SSE disabled)

{
    "output":{
        "text":"I am Qwen, a large language model created by Alibaba Cloud. My purpose is to assist users in generating various types of text, such as articles, responses, or creative content, while upholding the principles of providing accurate and helpful information. How can I assist you today?",
        "finish_reason":"stop"    
    },
    "usage":{
        "output_tokens":51,
        "input_tokens":85
    },
    "request_id":"d89c06fb-46a1-47b6-acb9-bfb17f814969"
}

Sample request (SSE enabled)

The following sample code shows how to call a Qwen 14B model by using a cURL command. In this example, SSE is enabled. If you want to call a Qwen 72B model, specify the model in the model parameter.

Note

Replace your-dashscope-api-key with your API key.

curl --location 'https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/text-generation/generation' \
--header 'Authorization: Bearer <YOUR-DASHSCOPE-API-KEY>' \
--header 'Content-Type: application/json' \
--header 'X-DashScope-SSE: enable' \
--data '{
    "model": "qwen1.5-72b-chat",
    "input":{
        "messages":[      
            {
                "role": "system",
                "content": "You are a helpful assistant."
            },
            {
                "role": "user",
                "content": "Who are you"
            }
        ]
    },
    "parameters": {
    }
}'

Sample response (SSE enabled)

id:1
event:result
:HTTP_STATUS/200
data:{"output":{"choices":[{"message":{"content":"Hello","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":28,"input_tokens":27,"output_tokens":1},"request_id":"xxx"}

id:2
event:result
:HTTP_STATUS/200
data:{"output":{"choices":[{"message":{"content":",","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":29,"input_tokens":27,"output_tokens":2},"request_id":"xxx"}

... ... ... ...
... ... ... ...

id:12
event:result
:HTTP_STATUS/200
data:{"output":{"choices":[{"message":{"content":"","role":"assistant"},"finish_reason":"stop"}]},"usage":{"total_tokens":91,"input_tokens":27,"output_tokens":64},"request_id":"xxx"}

Sample error response

If the request failed, the corresponding error code and error message are returned by using the code and message parameters.

{
    "code":"InvalidApiKey",
    "message":"Invalid API-key provided.",
    "request_id":"fb53c4ec-1c12-4fc4-a580-cdb7c3261fc1"
}