All Products
Search
Document Center

Alibaba Cloud Model Studio:Qwen API reference

Last Updated:Oct 31, 2024

Qwen models provide powerful capabilities to process natural languages. You can call a Qwen model by using SDK or calling API operations over HTTP to integrate Qwen models into your business.

Model overview

The following table describes the Qwen models that you can use by calling API operations.

Name

Description

Input and output limits

qwen-turbo

An ultra-large language model that supports multiple input languages such as Chinese and English.

This model supports a context of up to 8,000 tokens. To ensure normal model use and output, the maximum number of input tokens is limited to 6,000.

qwen-plus

An enhanced ultra-large language model that supports multiple input languages such as Chinese and English.

This model supports a context of up to 32,000 tokens. To ensure normal model use and output, the maximum number of input tokens is limited to 30,000.

qwen-max

A 100-billion-level ultra-large language model that supports multiple input languages such as Chinese and English. The qwen-max model is updated in rolling mode. If you want to use a stable version, use a historical snapshot version. The latest qwen-max model is equivalent to the qwen-max-0428 snapshot and is the API model for Qwen2.5.

This model supports a context of up to 8,000 tokens. To ensure normal model use and output, the maximum number of input tokens is limited to 6,000.

Important

The limits on query frequency and number of tokens vary based on the model. Before you call a model, we recommend that you check the throttling thresholds of the model. For more information, see the Throttling thresholds section of the Billing topic.

Use SDK

You can use the SDK to use multiple features, such as single-round conversation, multi-round conversation, streaming output, and function call.

Prerequisites

  • If you use Python, the SDK for Python Version 1.17.0 or later is installed.

  • If you use Java, the SDK for Java Version 2.12.0 or later is installed.

  • For more information, see Install Alibaba Cloud Model Studio SDK.

Single-round conversation

You can use Qwen in various scenarios such as content creation, translation, and text summary. You can run the following sample code to use the single-round conversation capability of Qwen models:

import random
from http import HTTPStatus
from dashscope import Generation
import dashscope
# If the environment variable is not set, please add the following line of code:
# dashscope.base_http_api_url = 'https://dashscope-intl.aliyuncs.com/api/v1'

def call_with_messages():
    messages = [{'role': 'system', 'content': 'You are a helpful assistant.'},
                {'role': 'user', 'content': 'Who are you'}]
    response = Generation.call(model="qwen-turbo",
                               messages=messages,
                               # Specify the random seed. If you leave this parameter empty, the random seed is set to 1234 by default.
                               seed=random.randint(1, 10000),
                               # Set the output format to message.
                               result_format='message')
    if response.status_code == HTTPStatus.OK:
        print(response)
    else:
        print('Request id: %s, Status code: %s, error code: %s, error message: %s' % (
            response.request_id, response.status_code,
            response.code, response.message
        ))


if __name__ == '__main__':
    call_with_messages()
// Copyright (c) Alibaba, Inc. and its affiliates.

// We recommend that you use DashScope SDK for Java V2.12.0 or later.

import java.util.Arrays;  
import com.alibaba.dashscope.aigc.generation.Generation;  
import com.alibaba.dashscope.aigc.generation.GenerationParam;  
import com.alibaba.dashscope.aigc.generation.GenerationResult;  
import com.alibaba.dashscope.common.Message;  
import com.alibaba.dashscope.common.Role;  
import com.alibaba.dashscope.exception.ApiException;  
import com.alibaba.dashscope.exception.InputRequiredException;  
import com.alibaba.dashscope.exception.NoApiKeyException;  
import com.alibaba.dashscope.utils.Constants;
  
public class Main {  
    public static GenerationResult callWithMessage() throws ApiException, NoApiKeyException, InputRequiredException {  
        Generation gen = new Generation("http", "https://dashscope-intl.aliyuncs.com/api/v1");
          
        Message systemMsg = Message.builder()  
                .role(Role.SYSTEM.getValue())  
                .content("You are a helpful assistant.")  
                .build();  
          
        Message userMsg = Message.builder()  
                .role(Role.USER.getValue())  
                .content("Who are you")  
                .build();  
          
        GenerationParam param = GenerationParam.builder()  
                .model("qwen-turbo")  
                .messages(Arrays.asList(systemMsg, userMsg))  
                .resultFormat(GenerationParam.ResultFormat.MESSAGE)  
                .topP(0.8)  
                .build();  
          
        return gen.call(param);  
    }  
  
    public static void main(String[] args) {  
        try {  
            GenerationResult result = callWithMessage();  
            System.out.println(result);  
        } catch (ApiException | NoApiKeyException | InputRequiredException e) {  
            // Record the error information by using a logging framework.  
            // Logger.error("An error occurred while calling the generation service", e);  
            System.err.println("An error occurred while calling the generation service: " + e.getMessage());  
        }  
         System.exit(0);
    }  
}

Sample response:

{
  "status_code": 200,
  "request_id": "dbb7fab4-6a82-92f5-896c-22ec1532c0a5",
  "code": "",
  "message": "",
  "output": {
    "text": null,
    "finish_reason": null,
    "choices": [
      {
        "finish_reason": "stop",
        "message": {
          "role": "assistant",
          "content": "I am Qwen, a large language model created by Alibaba Cloud. I'm here to assist you with your questions and provide information on various topics. How can I help you today?"
        }
      }
    ]
  },
  "usage": {
    "input_tokens": 22,
    "output_tokens": 37,
    "total_tokens": 59
  }
}

Multi-round conversation

Compared with the single-round conversation capability, the multi-round conversation capability allows the model to refer to conversation history, which is more similar to daily communication. However, the number of tokens that are consumed increases because the model refers to conversation history. You can run the following sample code to use the multi-round conversation capability of Qwen models:

from http import HTTPStatus
from dashscope import Generation
import dashscope
# If the environment variable is not set, please add the following line of code:
# dashscope.base_http_api_url = 'https://dashscope-intl.aliyuncs.com/api/v1'

def multi_round():
    messages = [{'role': 'system', 'content': 'You are a helpful assistant.'},
                {'role': 'user', 'content': 'Who are you'}]
    response = Generation.call(model="qwen-turbo",
                               messages=messages,
                               # Set the output format to message.
                               result_format='message')
    if response.status_code == HTTPStatus.OK:
        print(response)
        # Add the message returned by the model to the message list.
        messages.append({'role': response.output.choices[0]['message']['role'],
                         'content': response.output.choices[0]['message']['content']})
    else:
        print('Request id: %s, Status code: %s, error code: %s, error message: %s' % (
            response.request_id, response.status_code,
            response.code, response.message
        ))
        # If the response fails, delete the last user message from the message list. This ensures that the user messages and the messages returned by the model alternately appear.
        messages = messages[:-1]
    # Add the second user question to the message list.
    messages.append({'role': 'user', 'content': 'Nice to meet you'})
    # Respond to the second user question.
    response = dashscope.Generation.call(model="qwen-turbo",
                               messages=messages,
                               result_format='message',  # Set the output format to message.
                               )
    if response.status_code == HTTPStatus.OK:
        print(response)
    else:
        print('Request id: %s, Status code: %s, error code: %s, error message: %s' % (
            response.request_id, response.status_code,
            response.code, response.message
        ))


if __name__ == '__main__':
    multi_round()
// Copyright (c) Alibaba, Inc. and its affiliates.

import java.util.ArrayList;  
import java.util.List;    
import com.alibaba.dashscope.aigc.generation.Generation;  
import com.alibaba.dashscope.aigc.generation.GenerationParam;  
import com.alibaba.dashscope.aigc.generation.GenerationResult;  
import com.alibaba.dashscope.common.Message;  
import com.alibaba.dashscope.common.Role;  
import com.alibaba.dashscope.exception.ApiException;  
import com.alibaba.dashscope.exception.InputRequiredException;  
import com.alibaba.dashscope.exception.NoApiKeyException;  
import com.alibaba.dashscope.utils.JsonUtils;  
  
public class Main {  
  
    public static GenerationParam createGenerationParam(List<Message> messages) {  
        return GenerationParam.builder()  
                .model("qwen-turbo")  
                .messages(messages)  
                .resultFormat(GenerationParam.ResultFormat.MESSAGE)  
                .topP(0.8)  
                .build();  
    }  
  
    public static GenerationResult callGenerationWithMessages(GenerationParam param) throws ApiException, NoApiKeyException, InputRequiredException {  
        Generation gen = new Generation("http", "https://dashscope-intl.aliyuncs.com/api/v1");
        return gen.call(param);  
    }  
  
    public static void main(String[] args) {  
        try {  
            List<Message> messages = new ArrayList<>();  
            messages.add(createMessage(Role.SYSTEM, "You are a helpful assistant."));  
            messages.add(createMessage(Role.USER, "Who are you"));  
  
            GenerationParam param = createGenerationParam(messages);  
            GenerationResult result = callGenerationWithMessages(param);  
            printResult(result);  
  
            // Add the message returned by the model to the message list.  
            messages.add(result.getOutput().getChoices().get(0).getMessage());  
  
            // Add the second user question.  
            messages.add(createMessage(Role.USER, "Nice to meet you"));  
  
            result = callGenerationWithMessages(param);  
            printResult(result);  
            printResultAsJson(result);  
        } catch (ApiException | NoApiKeyException | InputRequiredException e) {  
            e.printStackTrace(); 
        } 
           System.exit(0); 
    }  
  
    private static Message createMessage(Role role, String content) {  
        return Message.builder().role(role.getValue()).content(content).build();  
    }  
  
    private static void printResult(GenerationResult result) {  
        System.out.println(result);  
    }  
  
    private static void printResultAsJson(GenerationResult result) {  
        System.out.println(JsonUtils.toJson(result));  
    }  
}

Sample response:

{
  "status_code": 200,
  "request_id": "8cf046e4-4b3b-92be-ab03-d2d3152198ee",
  "code": "",
  "message": "",
  "output": {
    "text": null,
    "finish_reason": null,
    "choices": [
      {
        "finish_reason": "stop",
        "message": {
          "role": "assistant",
          "content": "I am Qwen, a large language model created by Alibaba Cloud. My purpose is to assist users in generating various types of text, such as articles, responses, or creative content, while upholding the principles of providing accurate and helpful information. How can I assist you today?"
        }
      }
    ]
  },
  "usage": {
    "input_tokens": 22,
    "output_tokens": 56,
    "total_tokens": 78
  }
}
{
  "status_code": 200,
  "request_id": "75824057-7214-9b15-a701-004d88337def",
  "code": "",
  "message": "",
  "output": {
    "text": null,
    "finish_reason": null,
    "choices": [
      {
        "finish_reason": "stop",
        "message": {
          "role": "assistant",
          "content": "Nice to meet you too! If you have any questions or need assistance, feel free to ask, and I'll do my best to help you."
        }
      }
    ]
  },
  "usage": {
    "input_tokens": 92,
    "output_tokens": 30,
    "total_tokens": 122
  }
}

You can also run the following sample code to use the real-time interaction feature:

from dashscope import Generation
import dashscope
# If the environment variable is not set, please add the following line of code:
# dashscope.base_http_api_url = 'https://dashscope-intl.aliyuncs.com/api/v1'

def get_response(messages):
    response = Generation.call(model="qwen-turbo",
                               messages=messages,
                               # Set the output format to message.
                               result_format='message')
    return response

messages = [{'role': 'system', 'content': 'You are a helpful assistant.'}]

# Customize the number of conversation rounds. In this example, the number of conversation rounds is set to 3.
for i in range(3):
    user_input = input("Input:")
    messages.append({'role': 'user', 'content': user_input})
    assistant_output = get_response(messages).output.choices[0]['message']['content']
    messages.append({'role': 'assistant', 'content': assistant_output})
    print(f'Input: {user_input}')
    print(f'Output: {assistant_output}')
    print('\n')
import java.util.ArrayList;
import java.util.List;
import com.alibaba.dashscope.aigc.generation.Generation;
import com.alibaba.dashscope.aigc.generation.GenerationParam;
import com.alibaba.dashscope.aigc.generation.GenerationResult;
import com.alibaba.dashscope.common.Message;
import com.alibaba.dashscope.common.Role;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.InputRequiredException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import java.util.Scanner;

public class Main {

    public static GenerationParam createGenerationParam(List<Message> messages) {
        return GenerationParam.builder()
                .model("qwen-turbo")
                .messages(messages)
                .resultFormat(GenerationParam.ResultFormat.MESSAGE)
                .topP(0.8)
                .build();
    }

    public static GenerationResult callGenerationWithMessages(GenerationParam param) throws ApiException, NoApiKeyException, InputRequiredException {
        Generation gen = new Generation("http", "https://dashscope-intl.aliyuncs.com/api/v1");
        return gen.call(param);
    }

    public static void main(String[] args) {
        try {
            List<Message> messages = new ArrayList<>();

            messages.add(createMessage(Role.SYSTEM, "You are a helpful assistant."));
            for (int i = 0; i < 3;i++) {
                Scanner scanner = new Scanner(System.in);
                System.out.print("Input:");
                String userInput = scanner.nextLine();
                if ("exit".equalsIgnoreCase(userInput)) {
                    break;
                }
                messages.add(createMessage(Role.USER, userInput));
                GenerationParam param = createGenerationParam(messages);
                GenerationResult result = callGenerationWithMessages(param);
                System.out.println("Output: "+result.getOutput().getChoices().get(0).getMessage().getContent());
                messages.add(result.getOutput().getChoices().get(0).getMessage());
            }
        } catch (ApiException | NoApiKeyException | InputRequiredException e) {
            e.printStackTrace();
        }
        System.exit(0);
    }

    private static Message createMessage(Role role, String content) {
        return Message.builder().role(role.getValue()).content(content).build();
    }
}

Streaming output

A large language model (LLM) does not directly generate the final answer. The model gradually generates and returns intermediate answers. In non-streaming output mode, a model generates and concatenates intermediate answers to generate and return the final answer. In streaming output mode, a model generates and returns intermediate answers in real time. You can immediately read the intermediate answers. This reduces the amount of time that is required to wait for a response from the model. To enable the streaming output mode, you must configure some settings. If you use the SDK for Python, set the stream parameter to True. If you use SDK for Java, call the streamCall operation.

from http import HTTPStatus
from dashscope import Generation
import dashscope
# If the environment variable is not set, please add the following line of code:
# dashscope.base_http_api_url = 'https://dashscope-intl.aliyuncs.com/api/v1'

def call_with_stream():
    messages = [
        {'role': 'user', 'content': 'Who are you'}]
    responses = Generation.call(model="qwen-turbo",
                                messages=messages,
                                result_format='message',  # Set the output format to message.
                                stream=True,  # Enable the streaming output mode.
                                incremental_output=True  # Enable the incremental streaming output mode.
                                )
    for response in responses:
        if response.status_code == HTTPStatus.OK:
            print(response.output.choices[0]['message']['content'], end='')
        else:
            print('Request id: %s, Status code: %s, error code: %s, error message: %s' % (
                response.request_id, response.status_code,
                response.code, response.message
            ))


if __name__ == '__main__':
    call_with_stream()
// Copyright (c) Alibaba, Inc. and its affiliates.

import java.util.Arrays;
import java.util.concurrent.Semaphore;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import java.util.concurrent.Semaphore;
import com.alibaba.dashscope.aigc.generation.Generation;
import com.alibaba.dashscope.aigc.generation.GenerationParam;
import com.alibaba.dashscope.aigc.generation.GenerationResult;
import com.alibaba.dashscope.common.Message;
import com.alibaba.dashscope.common.ResultCallback;
import com.alibaba.dashscope.common.Role;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.InputRequiredException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.alibaba.dashscope.utils.JsonUtils;
import io.reactivex.Flowable;

public class Main {

    private static final Logger logger = LoggerFactory.getLogger(Main.class);

    private static void handleGenerationResult(GenerationResult message, StringBuilder fullContent) {
        fullContent.append(message.getOutput().getChoices().get(0).getMessage().getContent());
        logger.info("Received message: {}", JsonUtils.toJson(message));
    }

    public static void streamCallWithMessage(Generation gen, Message userMsg)
            throws NoApiKeyException, ApiException, InputRequiredException {
        GenerationParam param = buildGenerationParam(userMsg);
        Flowable<GenerationResult> result = gen.streamCall(param);
        StringBuilder fullContent = new StringBuilder();

        result.blockingForEach(message -> handleGenerationResult(message, fullContent));

        logger.info("Full content: \n{}", fullContent.toString());
    }

    public static void streamCallWithCallback(Generation gen, Message userMsg)
            throws NoApiKeyException, ApiException, InputRequiredException, InterruptedException {
        GenerationParam param = buildGenerationParam(userMsg);
        Semaphore semaphore = new Semaphore(0);
        StringBuilder fullContent = new StringBuilder();

        gen.streamCall(param, new ResultCallback<GenerationResult>() {
            @Override
            public void onEvent(GenerationResult message) {
                handleGenerationResult(message, fullContent);
            }

            @Override
            public void onError(Exception err) {
                logger.error("Exception occurred: {}", err.getMessage());
                semaphore.release();
            }

            @Override
            public void onComplete() {
                logger.info("Completed");
                semaphore.release();
            }
        });

        semaphore.acquire();
        logger.info("Full content: \n{}", fullContent.toString());
    }

    private static GenerationParam buildGenerationParam(Message userMsg) {
        return GenerationParam.builder()
                .model("qwen-turbo")
                .messages(Arrays.asList(userMsg))
                .resultFormat(GenerationParam.ResultFormat.MESSAGE)
                .topP(0.8)
                .incrementalOutput(true)
                .build();
    }

    public static void main(String[] args) {
        try {
            Generation gen = new Generation("http", "https://dashscope-intl.aliyuncs.com/api/v1");
            Message userMsg = Message.builder().role(Role.USER.getValue()).content("Who are you").build();

            streamCallWithMessage(gen, userMsg);
            streamCallWithCallback(gen, userMsg);
        } catch (ApiException | NoApiKeyException | InputRequiredException | InterruptedException e) {
            logger.error("An exception occurred: {}", e.getMessage());
        }
    }
}

Function calling

LLMs may not provide expected answers to questions related to time-sensitive topics, private-domain knowledge, or mathematical calculation. You can use the function calling feature to improve the generated output. When you call a model, you can use the tools parameter to specify the name, description, and request parameters of a tool. After the model receives the prompt and tool information, the model determines whether to use a tool and perform the following operations:

  • If the model does not need to use the tool, it does not return the tool_calls parameter and directly return the generated response.

  • If the model needs to use the tool, it returns a message that contains the tool_calls parameter. The application calls the tool based on the message. In this case, your application needs to parse the function name and request parameters of the tool from the tool_calls parameter and pass the request parameters to the tool to obtain results from the tool. The application needs to configure the tool information in the following format:

  • {
        "name": "$Tool name",
        "role": "tool",
        "content": "$Output generated by the tool"
    }

    Add the tool information to conversation history, enter a question to ask the model, and then obtain the final answer.

The following figure shows the flowchart of a function call.

image
Note

The information generated by function calls cannot be returned in incremental streaming output mode. For more information about the incremental streaming output mode, see the incremental_output parameter in the Request parameters section of this topic.

The application needs to parse the parameters of a tool during the function call process. Therefore, you must use a model that provides high-quality responses. We recommend that you use the qwen-max model. Sample code:

from dashscope import Generation
import dashscope
from datetime import datetime
import random
import json
# If the environment variable is not set, please add the following line of code:
# dashscope.base_http_api_url = 'https://dashscope-intl.aliyuncs.com/api/v1'


# Define a tool list. The model selects a tool based on the name and description of the tool.
tools = [
    # Use Tool 1 to query the current time.
    {
        "type": "function",
        "function": {
            "name": "get_current_time",
            "description": "This tool can help you query the current time.",
            "parameters": {}  # You can query the current time without the need to specify request parameters. Therefore, the parameters parameter is left empty.
        }
    },  
    # Use Tool 2 to query the weather of a city.
    {
        "type": "function",
        "function": {
            "name": "get_current_weather",
            "description": "This tool can help you query the weather of a city.",
            "parameters": {  # The location parameter specifies the location whose weather you want to query. Therefore, the location parameter is specified in the parameters parameter.
                        "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "The city, county, or district, such as Beijing, Hangzhou, or Yuhang."
                    }
                }
            },
            "required": [
                "location"
            ]
        }
    }
]

# Simulate the weather query tool. Sample response: "It is sunny today in Beijing."
def get_current_weather(location):
    return f"It is sunny today in {location}.  "

# Simulate the tool that is used to query the current time. Sample response: "Current time: 2024-04-15 17:15:18. "
def get_current_time():
    # Query the current date and time.
    current_datetime = datetime.now()
    # Format the current date and time.
    formatted_time = current_datetime.strftime('%Y-%m-%d %H:%M:%S')
    # Return the formatted current date and time.
    return f"Current time: {formatted_time}."

# Encapsulate the response function of the model.
def get_response(messages):
    response = Generation.call(
       model='qwen-max',
        messages=messages,
        tools=tools,
        seed=random.randint(1, 10000),  # Specify the random seed. If you leave this parameter empty, the random seed is set to 1234 by default.
        result_format='message'  # Set the output format to message.
    )
    return response

def call_with_messages():
    print('\n')
    messages = [
            {
                "content": input('Input:'),  # Sample questions: "What time is it now?" "What is the time in an hour?" "What is the weather like in Beijing?"
                "role": "user"
            }
    ]
    
    # Call the model in the first round.
    first_response = get_response(messages)
    assistant_output = first_response.output.choices[0].message
    print(f"\nResponse returned by the model in the first round: {first_response}\n")
    messages.append(assistant_output)
    if 'tool_calls' not in assistant_output:  # If the model determines that no tool is required, display the answer generated by the model without the need to call the model in the second round.
        print(f"Final answer: {assistant_output.content}") # Return the final answer generated by the model. You can specify the content of the final answer to be returned when no tool is called based on your business requirements.
        return
    # The following sample code provides an example if the get_current_weather tool is called:
    elif assistant_output.tool_calls[0]['function']['name'] == 'get_current_weather':
        tool_info = {"name": "get_current_weather", "role":"tool"}
        location = json.loads(assistant_output.tool_calls[0]['function']['arguments'])['properties']['location']
        tool_info['content'] = get_current_weather(location)
    # The following sample code provides an example if the get_current_time tool is called:
    elif assistant_output.tool_calls[0]['function']['name'] == 'get_current_time':
        tool_info = {"name": "get_current_time", "role":"tool"}
        tool_info['content'] = get_current_time()
    print(f"Output generated by the tool: {tool_info['content']}\n")
    messages.append(tool_info)

    # Call the model in the second round to summarize the output generated by the tool.
    second_response = get_response(messages)
    print(f"Response returned by the model in the second round: {second_response}\n")
    print(f"Final answer: {second_response.output.choices[0].message['content']}")

if __name__ == '__main__':
    call_with_messages()
    
// Copyright (c) Alibaba, Inc. and its affiliates.
// We recommend that you use DashScope SDK for Java V2.12.0 or later.
import java.util.ArrayList;
import java.util.Arrays;
import java.util.List;
import com.alibaba.dashscope.aigc.conversation.ConversationParam.ResultFormat;
import com.alibaba.dashscope.aigc.generation.Generation;
import com.alibaba.dashscope.aigc.generation.GenerationOutput.Choice;
import com.alibaba.dashscope.aigc.generation.GenerationParam;
import com.alibaba.dashscope.aigc.generation.GenerationResult;
import com.alibaba.dashscope.common.Message;
import com.alibaba.dashscope.common.Role;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.InputRequiredException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.alibaba.dashscope.tools.FunctionDefinition;
import com.alibaba.dashscope.tools.ToolCallBase;
import com.alibaba.dashscope.tools.ToolCallFunction;
import com.alibaba.dashscope.tools.ToolFunction;
import com.alibaba.dashscope.utils.JsonUtils;
import com.fasterxml.jackson.databind.node.ObjectNode;
import com.github.victools.jsonschema.generator.Option;
import com.github.victools.jsonschema.generator.OptionPreset;
import com.github.victools.jsonschema.generator.SchemaGenerator;
import com.github.victools.jsonschema.generator.SchemaGeneratorConfig;
import com.github.victools.jsonschema.generator.SchemaGeneratorConfigBuilder;
import com.github.victools.jsonschema.generator.SchemaVersion;

import java.time.LocalDateTime;
import java.time.format.DateTimeFormatter;
import java.util.Scanner;

public class Main {

    public class GetWhetherTool {
        private String location;

        public GetWhetherTool(String location) {
            this.location = location;
        }

        public String call() {
            return location+"It is sunny today.";
        }
    }

    public class GetTimeTool {

        public GetTimeTool() {
        }

        public String call() {
            LocalDateTime now = LocalDateTime.now();
            DateTimeFormatter formatter = DateTimeFormatter.ofPattern("yyyy-MM-dd HH:mm:ss");
            String currentTime = "Current time:" + now.format(formatter) + ".";
            return currentTime;
        }
    }

    public static void SelectTool()
            throws NoApiKeyException, ApiException, InputRequiredException {

        SchemaGeneratorConfigBuilder configBuilder =
                new SchemaGeneratorConfigBuilder(SchemaVersion.DRAFT_2020_12, OptionPreset.PLAIN_JSON);
        SchemaGeneratorConfig config = configBuilder.with(Option.EXTRA_OPEN_API_FORMAT_VALUES)
                .without(Option.FLATTENED_ENUMS_FROM_TOSTRING).build();
        SchemaGenerator generator = new SchemaGenerator(config);


        ObjectNode jsonSchema_whether = generator.generateSchema(GetWhetherTool.class);
        ObjectNode jsonSchema_time = generator.generateSchema(GetTimeTool.class);


        FunctionDefinition fd_whether = FunctionDefinition.builder().name("get_current_whether").description("Queries the weather of a location.")
                .parameters(JsonUtils.parseString(jsonSchema_whether.toString()).getAsJsonObject()).build();

        FunctionDefinition fd_time = FunctionDefinition.builder().name("get_current_time").description("Queries the current time.")
                .parameters(JsonUtils.parseString(jsonSchema_time.toString()).getAsJsonObject()).build();

        Message systemMsg = Message.builder().role(Role.SYSTEM.getValue())
                .content("You are a helpful assistant. When asked a question, use tools wherever possible.")
                .build();

        Scanner scanner = new Scanner(System.in);
        System.out.print("\nInput:");
        String userInput = scanner.nextLine();
        Message userMsg =
                Message.builder().role(Role.USER.getValue()).content(userInput).build();

        List<Message> messages = new ArrayList<>();
        messages.addAll(Arrays.asList(systemMsg, userMsg));

        GenerationParam param = GenerationParam.builder().model(Generation.Models.QWEN_MAX)
                .messages(messages).resultFormat(ResultFormat.MESSAGE)
                .tools(Arrays.asList(ToolFunction.builder().function(fd_whether).build(),ToolFunction.builder().function(fd_time).build())).build();
        // Call the model in the first round.
        Generation gen = new Generation("http", "https://dashscope-intl.aliyuncs.com/api/v1");
        GenerationResult result = gen.call(param);

        System.out.println("\nResponse returned by the model in the first round:"+JsonUtils.toJson(result));

        for (Choice choice : result.getOutput().getChoices()) {
            messages.add(choice.getMessage());
            // Call a tool.
            if (result.getOutput().getChoices().get(0).getMessage().getToolCalls() != null) {
                for (ToolCallBase toolCall : result.getOutput().getChoices().get(0).getMessage()
                        .getToolCalls()) {
                    if (toolCall.getType().equals("function")) {
                        // Parse the function name and request parameters of the tool.
                        String functionName = ((ToolCallFunction) toolCall).getFunction().getName();
                        String functionArgument = ((ToolCallFunction) toolCall).getFunction().getArguments();
                        // The model determines whether to call the get_current_whether tool.
                        if (functionName.equals("get_current_whether")) {
                            GetWhetherTool GetWhetherFunction =
                                    JsonUtils.fromJson(functionArgument, GetWhetherTool.class);
                            String whether = GetWhetherFunction.call();
                            Message toolResultMessage = Message.builder().role("tool")
                                    .content(String.valueOf(whether)).toolCallId(toolCall.getId()).build();
                            messages.add(toolResultMessage);
                            System.out.println("\nOutput generated by the tool:"+whether);
                        }
                        // The model determines whether to call the get_current_time tool.
                        else if (functionName.equals("get_current_time")) {
                            GetTimeTool GetTimeFunction =
                                    JsonUtils.fromJson(functionArgument, GetTimeTool.class);
                            String time = GetTimeFunction.call();
                            Message toolResultMessage = Message.builder().role("tool")
                                    .content(String.valueOf(time)).toolCallId(toolCall.getId()).build();
                            messages.add(toolResultMessage);
                            System.out.println("\nOutput generated by the tool:"+time);
                        }
                    }
                }
            }
            // Return the final answer generated by the model if no tool is required.
            else {
                // Return the final answer generated by the model. You can specify the content of the final answer to be returned when no tool is called based on your business requirements.
                System.out.println("\nFinal answer:"+result.getOutput().getChoices().get(0).getMessage().getContent());
                return;
            }
        }
        // Call the model in the second round to generate the answer that contains the output generated by the tool.
        param.setMessages(messages);
        result = gen.call(param);
        System.out.println("\nResponse returned by the model in the second round:"+JsonUtils.toJson(result));
        System.out.println(("\nFinal answer:"+result.getOutput().getChoices().get(0).getMessage().getContent()));
    }


    public static void main(String[] args) {
        try {
            SelectTool();
        } catch (ApiException | NoApiKeyException | InputRequiredException e) {
            System.out.println(String.format("Exception %s", e.getMessage()));
        }
        System.exit(0);
    }
}

The following sample code provides examples on the response returned by the model in the first round when the function call process is initiated. If you enter "Weather in Hangzhou", the model returns the tool_calls parameter. If you enter "Hello", the model determines that no tool is required and does not return the tool_calls parameter.

Enter "Weather in Hangzhou"

{
    "status_code": 200,
    "request_id": "bd803417-56a7-9597-9d3f-a998a35b0477",
    "code": "",
    "message": "",
    "output": {
        "text": null,
        "finish_reason": null,
        "choices": [
            {
                "finish_reason": "tool_calls",
                "message": {
                    "role": "assistant",
                    "content": "",
                    "tool_calls": [
                        {
                            "function": {
                                "name": "get_current_weather",
                                "arguments": "{\"properties\": {\"location\": \"Hangzhou\"}, \"type\": \"object\"}"
                            },
                            "id": "",
                            "type": "function"
                        }
                    ]
                }
            }
        ]
    },
    "usage": {
        "input_tokens": 222,
        "output_tokens": 27,
        "total_tokens": 249
    }
}

Enter "Hello"

{
    "status_code": 200,
    "request_id": "28e9d70c-c4d7-9bfb-bd07-8cf4228dda91",
    "code": "",
    "message": "",
    "output": {
        "text": null,
        "finish_reason": null,
        "choices": [
            {
                "finish_reason": "stop",
                "message": {
                    "role": "assistant",
                    "content": "Hello! What can I do for you? You can ask questions about the weather, time, or other things."
                }
            }
        ]
    },
    "usage": {
        "input_tokens": 221,
        "output_tokens": 21,
        "total_tokens": 242
    }
}

You can refer to the definitions of tools in the sample code to add more tools based on your business requirements.

Request parameters

The responses generated by a model are determined by request parameters, such as prompt, model, stream, and temperature. The following table describes the request parameters that you can specify when you call a model.

The Type column contains the following data types:

  • string

  • array: the List type in Python or the ArrayList type in Java.

  • integer

  • float

  • boolean

  • object: the hash table.

Parameter

Type

Description

model

string

Required. The name of the Qwen model to be called for conversations.

Valid values: qwen-turbo, qwen-plus, qwen-max.

messages

array

  • The conversation history between the user and the Qwen model. Specify each element in the array in the following format: {"role": Role, "content": Content}.

  • Valid values of the role parameter: system, user, assistant, and tool.

    • system: the system. The messages generated by the system that prompt the Qwen model to respond based on the preset specifications, role, or context. The system role is optional. If you specify the system role, it must appear at the beginning of the message list.

    • user and assistant: the user and the Qwen model. The messages of the user and the Qwen model alternately appear to simulate real conversations.

    • tool: the tool to be called. If you use the function call feature, you must specify the output generated by the tool in the following format: {"content":"Output generated by the tool", "name":"Function name of the tool", "role":"tool"}.

      • name specifies the function name of the tool, which must be the same as the value of the tool_calls[i]['function']['name'] parameter that is returned in the previous response.

      • content specifies the output generated by the tool. For more information, see the sample code in the Function calling section of this topic.

  • prompt: the prompt that is entered by the user. The Qwen model generates an answer based on the prompt.

Note

You need to specify one of the messages and prompt parameters. The history parameter that can be used together with the prompt parameter will be discontinued. If you use only the prompt parameter, the Qwen model may have limits on recording conversation history.

The messages parameter allows the Qwen model to refer to conversation history. This way, the Qwen model can parse the intention of the user more accurately and ensure the context and continuity of conversations. Therefore, we recommend that you use the messages parameter in multi-round conversation scenarios.

prompt

string

history

array

This parameter will be discontinued. We recommend that you use the messages parameter. The conversation history between the user and the Qwen model. Each element in the array specifies a round of conversation. Specify each element in the following format: {"user":"User question","bot":"Answer generated by the model"}. Specify multiple rounds of conversations in chronological order.

Default value: [].

seed

integer

Optional. The random seed used during content generation. This parameter controls the randomness of the content generated by the model.

Valid values: 64-bit unsigned integers.

Default value: 1234.

max_tokens

integer

Optional. The maximum number of tokens that can be generated by the model.

  • If you use the qwen-turbo model, the maximum value and default value are 1500.

  • If you use the qwen-max model, the maximum value and default value are 2000.

top_p

float

Optional. The probability threshold of nucleus sampling. For example, if this parameter is set to 0.8, the model selects the smallest set of tokens whose cumulative probability is greater than or equal to 0.8. A greater value introduces more randomness to the generated content.

Valid values: (0,1.0).

Default value: 0.8.

top_k

integer

Optional. The size of the candidate set for sampling. For example, if this parameter is set to 50, only the 50 tokens with the highest scores generated at a time are used as the candidate set for random sampling. A greater value introduces more randomness to the generated content.

By default, the top_k parameter is left empty.

If the top_k parameter is left empty or set to a value greater than 100, the top_k policy is disabled. In this case, only the top_p policy takes effect.

repetition_penalty

float

Optional. The repetition of the content generated by the model. A greater value indicates lower repetition. A value of 1.0 specifies no repetition penalty.

No valid values are specified for this parameter.

Default value: 1.1.

temperature

float

Optional. The randomness and diversity of the generated content. To be specific, the value of this parameter controls the probability distribution from which the model samples each word. A greater value indicates that more low-probability words are selected and the generated content is more diversified. A smaller value indicates that more high-probability words are selected and the generated content is more predictable.

Valid values: [0,2). We recommend that you do not set this parameter to 0, which is meaningless.

Default value: 0.85.

stop

string or array

Optional. If you specify a string or token ID for this parameter, the model stops generating content when the string or token is about to be generated. The value of the stop parameter can be a string or an array.

  • String

    The model stops generating content when the content generated by the model is about to contain the specified stop word.

    For example, if you set the stop parameter to "Hello", the model stops generating content when the content generated by the model is about to contain "Hello".

  • Array

    The elements in the array can be token IDs, strings, or arrays whose elements are token IDs. When the token to be generated by the model or the ID of the token is in the stop array, the model stops generating content. In the following example, the value of the stop parameter is set to an array and the tokenizer is used in the qwen-turbo model.

    The array can contain token IDs, strings, or arrays whose elements are token IDs. The model stops generating content when a token whose ID is contained in the array is about to be generated. In the following example, the stop parameter is set to an array. The tokenizer used in the following examples is from Qwen-turbo.

    1. The elements in the array are token IDs.

    The ID of the token "hello" is 14990. The ID of the token "weather" is 9104. If the stop parameter is set to [14990,9104], the model stops generating content when "hello" or "weather" is about to be generated.

    2. The elements in the array are strings.

    If the stop parameter is set to ["hello","weather"], the model stops generating content when "hello" or "weather" is about to be generated.

    3. The elements in the array are arrays.

    The ID of the token "hello" is 14990. The ID of the token "there" is 1052. The ID of the token "thank" is 9702. The ID of the token "you" is 498. If the stop parameter is set to [[108386, 103924],[35946, 101243]], the model stops generating content when "hello there" or "thank you" is about to be generated.

    Note

    If the stop parameter is an array, the array cannot contain both token IDs and strings. For example, you cannot set the stop parameter to ["hello",9104].

stream

boolean

Optional. Specifies whether to enable streaming output mode. In streaming output mode, the model returns a generator. You need to use an iterative loop to fetch the results from the generator and incrementally display the text. You can change the output mode to non-incremental by setting the incremental_output parameter to False.

Default value: False.

enable_search

boolean

Optional. Specifies whether to enable the Internet search feature for reference during content generation. Valid values:

  • True: enables the Internet search feature. The model references the results queried over the Internet during content generation. However, the model determines whether to use the results queried over the Internet based on its internal logic.

  • False: disables the Internet search feature. This is the default value.

result_format

string

Optional. The output format of the response.

Valid values: text and message. For more information about the message format, see the Sample responses section of this topic. We recommend that you set the output format to message.

Default value: text.

incremental_output

boolean

Optional. Specifies whether to enable the incremental streaming output mode. If you set this parameter to True, the incremental streaming output mode is enabled and the subsequent returned content excludes the historical returned content. If you set this parameter to False, the incremental streaming output mode is disabled and the subsequent returned content includes the historical returned content. For more information, see the sample code in the Streaming output section of this topic.

Examples:

  • False:

    I

    I like

    I like apple

  • True:

    I

    like

    apple

This parameter takes effect only if the stream parameter is set to True.

Default value: False.

Note

The incremental_output parameter cannot be used together with the tools parameter.

tools

array

A list of tools that can be called by the model. The model calls a tool from the tool list during each function call process. A tool in the tool list contains the following parameters:

  • type: the type of the tool. The value of this parameter is a string. Set the value to function.

  • function: the function of the tool, including name, description, and parameters. The value of the function parameter is an object.

    • name: the name of the tool function. The value of this parameter is a string and can contain letters, digits, underscores (_), and hyphens (-). It can be up to 64 characters in length.

    • description: the description of the tool function, which contains the time when the model calls the tool function and the method used to call the tool function. The value of this parameter is a string.

    • parameters: the request parameters of the tool function. The request parameters must be specified in a valid JSON schema. The value of this parameter is an object. For more information about JSON schemas, see Understanding JSON Schema. For more information about the request parameters, see the sample code in the Streaming output section of this topic. If the parameters parameter is left empty, the function does not contain request parameters.

To use the tools parameter, you must set the result_format parameter to message. During a function call process, you must specify the tools parameter regardless of whether you initiate a round of function call or submit the results of a tool function to the model. Supported models include qwen-turbo, qwen-plus, and qwen-max.

Note

The tools parameter cannot be used together with the incremental_output parameter.

Sample responses

  • The sample response when the result_format parameter is set to message:

    {
      "status_code": 200,
      "request_id": "75824057-7214-9b15-a701-004d88337def",
      "code": "",
      "message": "",
      "output": {
        "text": null,
        "finish_reason": null,
        "choices": [
          {
            "finish_reason": "stop",
            "message": {
              "role": "assistant",
              "content": "Nice to meet you too! If you have any questions or need assistance, feel free to ask, and I'll do my best to help you."
            }
          }
        ]
      },
      "usage": {
        "input_tokens": 92,
        "output_tokens": 30,
        "total_tokens": 122
      }
    }
  • The sample response when a function call is initiated:

    {
        "status_code": 200,
        "request_id": "a2b49cd7-ce21-98ff-87ac-b00cc590dc5e",
        "code": "",
        "message": "",
        "output": {
            "text": null,
            "finish_reason": null,
            "choices": [
                {
                    "finish_reason": "tool_calls",
                    "message": {
                        "role": "assistant",
                        "content": "",
                        "tool_calls":[
                            {
                                'function': {
                                    'name': 'get_current_weather',
                                    'arguments': '{"properties": {"location": "Beijing"}}'
                                    },
                                'id': '',
                                'type': 'function'}]
                    }
                }
            ]
        },
        "usage": {
            "input_tokens": 12,
            "output_tokens": 98,
            "total_tokens": 110
        }
    }
  • Response parameters

    Parameter

    Type

    Description

    Note

    status_code

    integer

    The response code. The status code 200 indicates that the request is successful. Other status codes indicate that the request failed. If the request failed, the corresponding error code and error message are returned for the code and message parameters.

    Note

    This parameter is returned only in Python. If a request failed in Java, an error is reported and the error code and error message are returned for the code and message parameters.

    request_id

    string

    The request ID.

    code

    string

    The error code that is returned if the request failed. If the request was successful, no value is returned for this parameter. This parameter is returned only in Python.

    message

    string

    The error message that is returned if the request failed. If the request was successful, no value is returned for this parameter. This parameter is returned only in Python.

    output

    object

    The returned results.

    output.text

    string

    The answer that is generated by the model.

    A value is returned if the prompt parameter is specified.

    output.finish_reason

    string

    The reason why the model stops generating the answer. Valid values:

    • null: The model is generating the answer.

    • stop: The content generated by the model triggers the stop conditions.

    • length: The content generated by the model is excessively long.

    • tool_calls: A tool is called during content generation.

    output.choices

    array

    The choices that are returned if the result_format parameter is set to message.

    If the result_format parameter is set to message, choices is returned.

    output.choices[i].finish_reason

    string

    The reason why the model stops generating the answer. Valid values:

    • null: The model is generating the answer.

    • stop: The content generated by the model triggers the stop conditions.

    • length: The content generated by the model is excessively long.

    output.choices[i].message

    object

    The message returned by the model.

    output.choices[i].message.role

    string

    The role of the model. Only assistant can be returned.

    output.choices[i].message.content

    string

    The content generated by the model.

    output.choices[i].message.tool_calls

    object

    The tool_calls parameter that is returned if the model needs to call a tool. This parameter is used when a tool is called.

    A tool contains the type, function, and id parameters. For more information, see the Sample responses section of this topic. The following list describes the type and function parameter:

    • type: the type of the tool. The value of this parameter is a string. Only function may be returned.

    • function: the function of the tool, including the name and arguments parameters. The value of this parameter is an object.

      • name: the name of the tool to be called. In function call scenarios, the value indicates the name of the tool function to be called.

      • arguments: the request parameters of the tool to be passed during content generation. You can parse the value of the arguments parameter to a dictionary by using the json.loads method in Python.

    usage

    object

    The number of tokens that are consumed during the request.

    usage.input_tokens

    integer

    The number of tokens that are converted from the input text.

    For more information about how to calculate tokens, see the Convert strings into tokens and convert tokens back into strings section of the Billing topic.

    usage.output_tokens

    integer

    The number of tokens that are converted from the answer generated by the model.

    usage.total_tokens

    integer

    The total number of tokens that are converted from the input text and tokens that are converted from the answer generated by the model.

Use HTTP

Overview

You can call API operations over HTTP to use Qwen models. This eliminates the need to install the SDK. The HTTP and HTTP Server-Sent Events (SSE) protocols are supported. You can send requests over one of the protocols based on your business requirements.

Prerequisites

Alibaba Cloud Model Studio is activated and an API key is created. For more information, see Obtain an API key.

Request syntax

POST https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/text-generation/generation

Request parameters

The following table describes the request parameters. The Type column contains the following data types:

  • string

  • array

  • integer

  • float

  • boolean

  • object: the hash table.

Component

Parameter

Type

Description

Example

Header

Content-Type

string

The request type. Set the value to application/json.

"Content-Type":"application/json"

Accept

string

Optional. Specifies whether to enable SSE. If you set this parameter to text/event-stream, SSE is enabled. By default, this parameter is left empty.

"Accept":"text/event-stream"

Authorization

string

The API key.

"Authorization":"Bearer d1**2a"

X-DashScope-WorkSpace

String

Optional. The name of the workspace to be used for this call. This parameter is required if the API key of a Resource Access Management (RAM) user is used. In addition, the specified workspace must contain the RAM user. This parameter is optional if the API key of an Alibaba Cloud account is used. If you specify a workspace, the corresponding identity in the workspace is used. If you leave this parameter empty, the identity of the Alibaba Cloud account is used.

ws_QTggmeAxxxxx

X-DashScope-SSE

string

Optional. Specifies whether to enable SSE. You can set this parameter to enable or the Accept parameter to text/event-stream to enable SSE.

"X-DashScope-SSE":"enable"

Body

model

string

The name of the Qwen model to be called for conversations.

Valid values: qwen-turbo, qwen-plus, qwen-max.

"model":"qwen-turbo"

input

object

The information that you enter for the model.

input.prompt

string

Optional. The prompt that you want the model to execute. You can enter a prompt in Chinese or English. You can specify one of the input.messages and input.prompt parameters.

Note

A period (.) in the parameter name indicates that the information after the period is the attribute of the information before the period. In the API testing tool, you cannot set the key to input.prompt. You can specify this parameter in the following format: "input":{"prompt":"xxx"}.

"input":{"prompt":"Hello"}

input.history

array

This parameter will be discontinued. We recommend that you use the input.messages parameter. Optional. The conversation history between the user and the model. Each element in the array specifies a round of conversation. Specify each element in the following format: {"user":"User question","bot":"Answer generated by the model"}. Specify multiple rounds of conversations in chronological order.

"input":{"history":[{"user":"How is the weather today?",

"bot":"It is a nice day. Do you want to go out?"},

{"user":"What do you recommend?",

"bot":"I suggest that you go to the park. Spring is coming and the flowers are blooming. It is very beautiful."}]}

input.messages

array

Optional. The conversation history between the user and the model. Specify each element in the array in the following format: {"role": Role, "content": Content}. For example, if the role parameter is set to tool, specify each element in the array in the following format:

{"role":"tool","content":Content,"name":Function name of the tool}

Valid values of the role parameter: system, user, assistant, and tool.

These parameters are required if the input.messages parameter is specified.

"input":{

"messages":[

{

"role": "system",

"content": "You are a helpful assistant."

},

{

"role": "user",

"content": "Hello, where is the museum nearby?"

}]

}

input.messages.role

string

input.messages.content

string

input.messages.name

string

Optional. If the role parameter is set to tool, the messages are the results of the function call. name specifies the function name of the tool, which must be the same as the value of the tool_calls[i].function.name parameter that is returned in the previous response. content specifies the output generated by the tool. For more information, see the sample code in the Function calling section of this topic.

This parameter is required if the input.messages.role parameter is set to tool.

parameters

object

Optional. The parameters used to control the content generated by the model.

parameters.result_format

string

Optional. The output format of the response. Default value: text. You can also set this parameter to message. For more information about the message format, see the Sample responses section of this topic. We recommend that you set the output format to message.

"parameters":{"result_format":"message"}

parameters.seed

integer

Optional. The random seed used during content generation. This parameter controls the randomness of the content generated by the model.

Valid values: 64-bit unsigned integers.

Default value: 1234.

If you specify seed, the model tries to generate the same or similar content for the output of each model call. However, the model cannot ensure that the output is exactly the same for each model call.

"parameters":{"seed":666}

parameters.max_tokens

integer

Optional. The maximum number of tokens that can be generated by the model.

  • If you use the qwen-turbo model, the maximum value and default value are 1500.

  • If you use the qwen-max model, the maximum value and default value are 2000.

"parameters":{"max_tokens":1500}

parameters.top_p

float

Optional. The probability threshold of nucleus sampling. For example, if this parameter is set to 0.8, the model selects the smallest set of tokens whose cumulative probability is greater than or equal to 0.8. A greater value introduces more randomness to the generated content.

Valid values: (0,1.0).

Default value: 0.8.

"parameters":{"top_p":0.7}

parameters.top_k

integer

Optional. The size of the candidate set for sampling. For example, if this parameter is set to 50, only the 50 tokens with the highest scores generated at a time are used as the candidate set for random sampling. A greater value introduces more randomness to the generated content.

By default, the top_k parameter is left empty.

If the top_k parameter is left empty or set to a value greater than 100, the top_k policy is disabled. In this case, only the top_p policy takes effect.

"parameters":{"top_k":50}

parameters.repetition_penalty

float

Optional. The repetition of the content generated by the model. A greater value indicates lower repetition. A value of 1.0 specifies no repetition penalty.

No valid values are specified for this parameter.

Default value: 1.1.

"parameters":{"repetition_penalty":1.0}

parameters.temperature

float

Optional. The randomness and diversity of the generated content. To be specific, the value of this parameter controls the probability distribution from which the model samples each word. A greater value indicates that more low-probability words are selected and the generated content is more diversified. A smaller value indicates that more high-probability words are selected and the generated content is more predictable.

Valid values: [0,2). We recommend that you do not set this parameter to 0, which is meaningless.

Default value: 0.85.

"parameters":{"temperature":0.85}

parameters.stop

string/array

Optional. If you specify a string or token ID for this parameter, the model stops generating content when the string or token is about to be generated. The value of the stop parameter can be a string or an array.

  • String

    The model stops generating content when the content generated by the model is about to contain the specified stop word.

    For example, if you set the stop parameter to "Hello", the model stops generating content when the content generated by the model is about to contain "Hello".

  • Array

    The elements in the array can be token IDs, strings, or arrays whose elements are token IDs. When the token to be generated by the model or the ID of the token is in the stop array, the model stops generating content. In the following example, the value of the stop parameter is set to an array and the tokenizer is used in the qwen-turbo model.

    The array can contain token IDs, strings, or arrays whose elements are token IDs. The model stops generating content when a token whose ID is contained in the array is about to be generated. In the following example, the stop parameter is set to an array. The tokenizer used in the following examples is from Qwen-turbo.

    1. The elements in the array are token IDs.

    The ID of the token "hello" is 14990. The ID of the token "weather" is 9104. If the stop parameter is set to [14990,9104], the model stops generating content when "hello" or "weather" is about to be generated.

    2. The elements in the array are strings.

    If the stop parameter is set to ["hello","weather"], the model stops generating content when "hello" or "weather" is about to be generated.

    3. The elements in the array are arrays.

    The ID of the token "hello" is 14990. The ID of the token "there" is 1052. The ID of the token "thank" is 9702. The ID of the token "you" is 498. If the stop parameter is set to [[108386, 103924],[35946, 101243]], the model stops generating content when "hello there" or "thank you" is about to be generated.

Note

If the stop parameter is an array, the array cannot contain both token IDs and strings. For example, you cannot set the stop parameter to ["hello",9104].

"parameters":{"stop":["Hello","Weather"]}

parameters.enable_search

boolean

Optional. Specifies whether to enable the Internet search feature for reference during content generation. Valid values:

  • True: enables the Internet search feature. The model references the results queried over the Internet during content generation. However, the model determines whether to use the results queried over the Internet based on its internal logic.

  • False: disables the Internet search feature. This is the default value.

"parameters":{"enable_search":false}

parameters.incremental_output

boolean

Optional. Specifies whether to enable the incremental streaming output mode. If you set this parameter to True, the incremental streaming output mode is enabled and the subsequent returned content excludes the historical returned content. If you set this parameter to False, the incremental streaming output mode is disabled and the subsequent returned content includes the historical returned content.

Examples:

  • False:

    I

    I like

    I like apple

  • True:

    I

    like

    apple

This parameter takes effect only if SSE is enabled.

Default value: False.

Note

The incremental_output parameter cannot be used together with the tools parameter.

"parameters":{"incremental_output":false}

parameters.tools

array

Optional. A list of tools that can be called by the model. The model calls a tool from the tool list during each function call process. A tool in the tool list contains the following parameters:

  • type: the type of the tool. The value of this parameter is a string. Set the value to function.

  • function: the function of the tool, including name, description, and parameters. The value of the function parameter is an object.

    • name: the name of the tool function. The value of this parameter is a string and can contain letters, digits, underscores (_), and hyphens (-). It can be up to 64 characters in length.

    • description: the description of the tool function, which contains the time when the model calls the tool function and the method used to call the tool function. The value of this parameter is a string.

    • parameters: the request parameters of the tool function. The request parameters must be specified in a valid JSON schema. The value of this parameter is an object. For more information about JSON schemas, see Understanding JSON Schema. For more information about the request parameters, see the sample code in the Streaming output section of this topic. If the parameters parameter is left empty, the function does not contain request parameters.

To use the tools parameter, you must set the result_format parameter to message. During a function call process, you must specify the tools parameter regardless of whether you initiate a round of function call or submit the results of a tool function to the model. Supported models include qwen-turbo, qwen-plus, qwen-max, and qwen-max-longcontext.

Note

The tools parameter cannot be used together with the incremental_output parameter.

"parameters":{"tools":[
    {
        "type": "function",
        "function": {
            "name": "get_current_weather",
            "description": "Get the current weather in a given location",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "The city and state, e.g. San Francisco, CA"
                    },
                    "unit": {
                        "type": "string",
                        "enum": [
                            "celsius",
                            "fahrenheit"
                        ]
                    }
                },
                "required": [
                    "location"
                ]
            }
        }
    }
]}

Response parameters

Parameter

Type

Description

Example

output.text

string

The output content returned by the model. A value is returned for this parameter if the result_format parameter is set to text.

I suggest that you go to the Summer Palace.

output.finish_reason

string

The reason why the model stops generating the answer.

Valid values:

  • null: The model is generating the answer.

  • stop: The content generated by the model is about to contain a stop word.

  • length: The content generated by the model is excessively long.

A value is returned for this parameter if the result_format parameter is set to text.

stop

output.choices

array

The choices that are returned if the result_format parameter is set to message.

  • Example if no function call is initiated

{
    "choices": [
        {
            "finish_reason": "null",
            "message": {
                "role": "assistant",
                "content": "The cafes nearby are..."
            }
        }
    ]
}
  • Example if a function call is initiated

{
    "choices": [
        {
            "finish_reason": "tool_calls",
            "message": {
                "role": "assistant",
                "content": "",
                "tool_calls": [
                    {
                        "function": {
                            "name": "get_current_weather",
                            "arguments": "{\"location\": \"Boston\", \"unit\": \"fahrenheit\"}"
                        },
                        "type": "function"
                    }
                ]
            }
        }
    ]
}

output.choices[x].finish_reason

string

The reason why the model stops generating the answer. Valid values: null: The model is generating the answer.

  • stop: The content generated by the model is about to contain a stop word.

  • length: The content generated by the model is excessively long.

output.choices[x].message

object

Each message is displayed in the following format: {"role": Role, "content": Content}. Valid values of the role parameter: system, user, and assistant.

content indicates the output content returned by the model.

output.choices[x].message.role

string

output.choices[x].message.content

string

output.choices[x].message.tool_calls

object

The tool_calls parameter that is returned if the model needs to call a tool. This parameter is used when a tool is called. A tool contains the type and function parameters. For more information, see the sample code in the Function calling section of this topic The following list describes the parameters:

  • type: the type of the tool. The value of this parameter is a string. Only function may be returned.

  • function: the function of the tool, including the name and arguments parameters. The value of this parameter is a dictionary.

    • name: the name of the tool to be called. In function call scenarios, the value indicates the name of the tool function to be called.

    • arguments: the request parameters of the tool to be passed during content generation. You can parse the value of the arguments parameter to a dictionary by using the json.loads method in Python.

usage

object

The number of tokens that are consumed during the model call.

usage.output_tokens

integer

The number of tokens that are converted from the answer generated by the model.

380

usage.input_tokens

integer

The number of tokens that are converted from the input content. If you set the enable_search parameter to true, the value of this parameter is greater than the number of tokens that are converted from the actual input content. This is because more tokens are converted from the information queried by the Internet search feature.

633

usage.total_tokens

integer

The total number of tokens that are converted from the input text and tokens that are converted from the answer generated by the model.

1013

request_id

string

The request ID.

7574ee8f-38a3-4b1e-9280-11c33ab46e51

Sample requests with SSE disabled

The following sample code provides examples on how to run the cURL command or a Python script to call a Qwen model when SSE is disabled:

Note

You must replace $your-dashscope-api-key in the sample code with your API key.

curl --location 'https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/text-generation/generation' \
--header "Authorization: Bearer $DASHSCOPE_API_KEY" \
--header 'Content-Type: application/json' \
--data '{
    "model": "qwen-turbo",
    "input":{
        "messages":[      
            {
                "role": "system",
                "content": "You are a helpful assistant."
            },
            {
                "role": "user",
                "content": "Hello, which park is closest to me?"
            }
        ]
    },
    "parameters": {
        "result_format": "message"
    }
}'
import requests

url = 'https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/text-generation/generation'
headers = {'Content-Type': 'application/json',
           'Authorization':'Bearer $your-dashscope-api-key'}
body = {
    'model': 'qwen-turbo',
    "input":{
    "messages":[
    {
    "role": "system",
    "content": "You are a helpful assistant."
    },
    {
    "role": "user",
    "content": "Hello, which park is closest to me?"
    }]
    },
    
}

response = requests.post(url, headers=headers, json=body)
print(response.text)

Sample responses with SSE disabled

Response when the result_format parameter is set to text

{
    "output":{
        "text":"If you are in China, I suggest that you go to the Summer Palace in Beijing... for walking and enjoying the scenery.",
        "finish_reason":"stop"    
    },
    "usage":{
        "output_tokens":380,
        "input_tokens":633
    },
    "request_id":"d89c06fb-46a1-47b6-acb9-bfb17f814969"
}

Response when the result_format parameter is set to message

{
    "output":{
        "text":"If you are in China, I suggest that you go to the Summer Palace in Beijing... for walking and enjoying the scenery.",
        "finish_reason":"stop"    
    },
    "usage":{
        "output_tokens":380,
        "input_tokens":633
    },
    "request_id":"d89c06fb-46a1-47b6-acb9-bfb17f814969"
}

Response when a tool is called

{
    "status_code": 200,
    "request_id": "a2b49cd7-ce21-98ff-87ac-b00cc590dc5e",
    "code": "",
    "message": "",
    "output": {
        "text": null,
        "finish_reason": null,
        "choices": [
            {
                "finish_reason": "tool_calls",
                "message": {
                    "role": "assistant",
                    "content": "",
                    "tool_calls":[
                        {
                            'function': {
                                'name': 'get_current_weather',
                                'arguments': '{"properties": {"location": "Beijing"}}'
                                },
                            'id': '',
                            'type': 'function'}]
                }
            }
        ]
    },
    "usage": {
        "input_tokens": 12,
        "output_tokens": 98,
        "total_tokens": 110
    }
}

Sample requests with SSE enabled

The following sample code provides examples on how to run the cURL command or a Python script to call a Qwen model when SSE is enabled. Output content is returned in a way that is similar to the streaming output mode.

Note

You must replace $your-dashscope-api-key in the sample code with your API key.

curl --location 'https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/text-generation/generation' \
--header 'Authorization: Bearer $your-dashscope-api-key' \
--header 'Content-Type: application/json' \
--header 'X-DashScope-SSE: enable' \
--data '{
    "model": "qwen-turbo",
    "input":{
        "messages":[      
            {
                "role": "system",
                "content": "You are a helpful assistant."
            },
            {
                "role": "user",
                "content": "Hello, which park is closest to me?"
            }
        ]
    },
    "parameters": {
        "result_format": "message",
        "incremental_output":true
    }
}'
import requests

url = 'https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/text-generation/generation'
headers = {'Content-Type': 'application/json',
           'Authorization':'Bearer $your-dashscope-api-key',
           'X-DashScope-SSE': 'enable'}
body = {
    'model': 'qwen-turbo',
    "input":{
    "messages":[
    {
    "role": "system",
    "content": "You are a helpful assistant."
    },
    {
    "role": "user",
    "content": "Hello"
    }]
    },
    'parameters':{'incremental_output':True}
    
}

response = requests.post(url, headers=headers, json=body)
print(response.text)

Sample responses with SSE enabled

id:1
event:result
:HTTP_STATUS/200
data:{"output":{"choices":[{"message":{"content":"Hello","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":28,"input_tokens":27,"output_tokens":1},"request_id":"c13ac6fc-9281-9ac4-9f1d-003a38c48e02"}

id:2
event:result
:HTTP_STATUS/200
data:{"output":{"choices":[{"message":{"content":",","role":"assistant"},"finish_reason":"null"}]},"usage":{"total_tokens":29,"input_tokens":27,"output_tokens":2},"request_id":"c13ac6fc-9281-9ac4-9f1d-003a38c48e02"}

... ... ... ...
... ... ... ...

id:12
event:result
:HTTP_STATUS/200
data:{"output":{"choices":[{"message":{"content":"","role":"assistant"},"finish_reason":"stop"}]},"usage":{"total_tokens":91,"input_tokens":27,"output_tokens":64},"request_id":"c13ac6fc-9281-9ac4-9f1d-003a38c48e02"}

Sample error responses

If an error occurs during a request, the error code and error message are returned for the code and message parameters.

{
    "code":"InvalidApiKey",
    "message":"Invalid API-key provided.",
    "request_id":"fb53c4ec-1c12-4fc4-a580-cdb7c3261fc1"
}