how to call the Qwen model by using OpenAI interfaces - Alibaba Cloud Model Studio

Model Studio provides interfaces that are compatible with OpenAI for the Qwen models. If you have used the OpenAI SDK, other OpenAI-compatible interfaces such as the langchain_openai SDK, or HTTP to call OpenAI services, you need to only adjust three parameters in your framework.

Before you begin

Before you use the interfaces, obtain the necessary parameters: base_url, api_key, and model.

base_url

The base_url parameter specifies the endpoint or address of the model services in Model Studio.

When you use the OpenAI SDK or other OpenAI-compatible SDKs, set base_url to:
```
https://dashscope-intl.aliyuncs.com/compatible-mode/v1
```

When you use HTTP requests, set the endpoint to:

POST https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions

api_key

Activate Alibaba Cloud Model Studio and obtain an API key. For more information, see Obtain an API key.

We recommend that you set the API key as an environment variable to reduce the threat of API key leakage. For more information, see Set API key as an environment variable.

model

The model parameter specifies the name of the model that you want to access. The following table lists Qwen models that are supported by OpenAI-compatible interfaces. Set the model parameter to the name of the model you use.

Type

Name

Qwen

qwen-max

qwen-max-latest

qwen-max-2025-01-25

qwen-plus

qwen-plus-latest

qwen-plus-2025-01-25

qwen-turbo

qwen-turbo-latest

qwen-turbo-2024-11-01

Open source Qwen

qwen2.5-14b-instruct-1m

qwen2.5-7b-instruct-1m

qwen2.5-72b-instruct

qwen2.5-32b-instruct

qwen2.5-14b-instruct

qwen2.5-7b-instruct

qwen2-72b-instruct

qwen2-7b-instruct

qwen1.5-110b-chat

qwen1.5-72b-chat

qwen1.5-32b-chat

qwen1.5-14b-chat

qwen1.5-7b-chat

Use OpenAI SDK

Prerequisites

Python is installed.

The latest version of the OpenAI SDK is installed.

# If the following command returns an error, replace pip with pip3. pip3 is used for Python versions later than Python 3
pip install -U openai

Usage

The following examples show how to use OpenAI SDK to access Qwen models in Model Studio.

Non-streaming output

from openai import OpenAI
import os

def get_response():
    client = OpenAI(
        api_key=os.getenv("DASHSCOPE_API_KEY"), # If you have not configured the environment variable, replace DASHSCOPE_API_KEY with your API key
        base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",  # Replace https://dashscope-intl.aliyuncs.com/compatible-mode/v1 with the base_url of the DashScope SDK
    )
    completion = client.chat.completions.create(
        model="qwen-plus",
        messages=[{'role': 'system', 'content': 'You are a helpful assistant.'},
                  {'role': 'user', 'content': 'Who are you?'}]
        )
    print(completion.model_dump_json())

if __name__ == '__main__':
    get_response()

Sample response:

{
    "id": "chatcmpl-xxx",
    "choices": [
        {
            "finish_reason": "stop",
            "index": 0,
            "logprobs": null,
            "message": {
                "content": "I am a large language model created by Alibaba Cloud. I am called Qwen.",
                "role": "assistant",
                "function_call": null,
                "tool_calls": null
            }
        }
    ],
    "created": 1716430652,
    "model": "qwen-plus",
    "object": "chat.completion",
    "system_fingerprint": null,
    "usage": {
        "completion_tokens": 18,
        "prompt_tokens": 22,
        "total_tokens": 40
    }
}

Streaming output

from openai import OpenAI
import os


def get_response():
    client = OpenAI(
        # If you have not configured the environment variable, replace DASHSCOPE_API_KEY with your API key
        api_key=os.getenv("DASHSCOPE_API_KEY"),
        # Specify the base_url of the DashScope SDK
        base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
    )
    completion = client.chat.completions.create(
        model="qwen-plus",
        messages=[{'role': 'system', 'content': 'You are a helpful assistant.'},
                  {'role': 'user', 'content': 'Who are you?'}],
        stream=True,
        # Add the following settings to display token usage in the last line of the streaming output
        stream_options={"include_usage": True}
        )
    for chunk in completion:
        print(chunk.model_dump_json())


if __name__ == '__main__':
    get_response()

Sample response:

{"id":"chatcmpl-xxx","choices":[{"delta":{"content":"","function_call":null,"role":"assistant","tool_calls":null},"finish_reason":null,"index":0,"logprobs":null}],"created":1719286190,"model":"qwen-plus","object":"chat.completion.chunk","system_fingerprint":null,"usage":null}
{"id":"chatcmpl-xxx","choices":[{"delta":{"content":"I am","function_call":null,"role":null,"tool_calls":null},"finish_reason":null,"index":0,"logprobs":null}],"created":1719286190,"model":"qwen-plus","object":"chat.completion.chunk","system_fingerprint":null,"usage":null}
{"id":"chatcmpl-xxx","choices":[{"delta":{"content":"a large","function_call":null,"role":null,"tool_calls":null},"finish_reason":null,"index":0,"logprobs":null}],"created":1719286190,"model":"qwen-plus","object":"chat.completion.chunk","system_fingerprint":null,"usage":null}
{"id":"chatcmpl-xxx","choices":[{"delta":{"content":"language model","function_call":null,"role":null,"tool_calls":null},"finish_reason":null,"index":0,"logprobs":null}],"created":1719286190,"model":"qwen-plus","object":"chat.completion.chunk","system_fingerprint":null,"usage":null}
{"id":"chatcmpl-xxx","choices":[{"delta":{"content":"created by Alibaba Cloud","function_call":null,"role":null,"tool_calls":null},"finish_reason":null,"index":0,"logprobs":null}],"created":1719286190,"model":"qwen-plus","object":"chat.completion.chunk","system_fingerprint":null,"usage":null}
{"id":"chatcmpl-xxx","choices":[{"delta":{"content":". I am called Qwen.","function_call":null,"role":null,"tool_calls":null},"finish_reason":null,"index":0,"logprobs":null}],"created":1719286190,"model":"qwen-plus","object":"chat.completion.chunk","system_fingerprint":null,"usage":null}
{"id":"chatcmpl-xxx","choices":[{"delta":{"content":"","function_call":null,"role":null,"tool_calls":null},"finish_reason":"stop","index":0,"logprobs":null}],"created":1719286190,"model":"qwen-plus","object":"chat.completion.chunk","system_fingerprint":null,"usage":null}
{"id":"chatcmpl-xxx","choices":[],"created":1719286190,"model":"qwen-plus","object":"chat.completion.chunk","system_fingerprint":null,"usage":{"completion_tokens":16,"prompt_tokens":22,"total_tokens":38}}

Function calling

In the following example, OpenAI interfaces are used to implement a function call. In the example, the weather query and time query tools are called over multiple rounds.

from openai import OpenAI
from datetime import datetime
import json
import os

client = OpenAI(
    # If you have not configured the environment variable, replace DASHSCOPE_API_KEY with your API key
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",  # Replace https://dashscope-intl.aliyuncs.com/compatible-mode/v1 with the base_url of the DashScope SDK
)

# Define a tool list. The model selects a tool based on the name and description of the tool
tools = [
    # Tool 1: obtain the current time
    {
        "type": "function",
        "function": {
            "name": "get_current_time",
            "description": "This tool can help you query the current time.",
            # No input parameters are required to obtain the current time. Therefore, the parameters are set to an empty dictionary
            "parameters": {}
        }
    },  
    # Tool 2: obtain the weather of a specific city
    {
        "type": "function",
        "function": {
            "name": "get_current_weather",
            "description": "This tool can help you query the weather of a city.",
            "parameters": {  
                "type": "object",
                "properties": {
                    # Location is provided when you query the weather. Therefore, the parameter is set to location
                    "location": {
                        "type": "string",
                        "description": "A city, county, or district, such as Beijing, Hangzhou, or Yuhang."
                    }
                }
            },
            "required": [
                "location"
            ]
        }
    }
]

# Simulate the weather query tool. Sample response: "Beijing It's rainy today."
def get_current_weather(location):
    return f"{location} is rainy today. "

# Simulate the time query tool. Sample response: "Current time: 2024-04-15 17:15:18."
def get_current_time():
    # Obtain the current date and time
    current_datetime = datetime.now()
    # Format the current date and time
    formatted_time = current_datetime.strftime('%Y-%m-%d %H:%M:%S')
    # Return the formatted current date and time
    return f"Current time: {formatted_time}."

# Encapsulate the response function of the model
def get_response(messages):
    completion = client.chat.completions.create(
        model="qwen-plus",
        messages=messages,
        tools=tools
        )
    return completion.model_dump()

def call_with_messages():
    print('\n')
    messages = [
            {
                "content": input('Input:'),  # Sample questions: "What time is it now?" "What time will it be in an hour?" "What is the weather like in Beijing?"
                "role": "user"
            }
    ]
    print("-"*60)
    # Call the model in the first round
    i = 1
    first_response = get_response(messages)
    assistant_output = first_response['choices'][0]['message']
    print(f"\nOutput of the model in round {i}:{first_response}\n")
    if  assistant_output['content'] is None:
        assistant_output['content'] = ""
    messages.append(assistant_output)
    # If the model need not to call the tools, a response is returned directly
    if assistant_output['tool_calls'] == None:  # If the model determines that the tools are not needed, the response is directly printed. The model will not be called in the second round
        print(f"Without the need to call the tools, I can answer directly:{assistant_output['content']}")
        return
    # If the model need to call the tools, it is called for multiple rounds until the model determines that the tools is not needed
    while assistant_output['tool_calls'] != None:
        # If the model determines that the weather query tool is needed, run the weather query tool
        if assistant_output['tool_calls'][0]['function']['name'] == 'get_current_weather':
            tool_info = {"name": "get_current_weather", "role":"tool"}
            # Location is provided
            location = json.loads(assistant_output['tool_calls'][0]['function']['arguments'])['location']
            tool_info['content'] = get_current_weather(location)
        # If the model determines that the time query tool is needed, run the time query tool
        elif assistant_output['tool_calls'][0]['function']['name'] == 'get_current_time':
            tool_info = {"name": "get_current_time", "role":"tool"}
            tool_info['content'] = get_current_time()
        print(f"Tool output: {tool_info['content']}\n")
        print("-"*60)
        messages.append(tool_info)
        assistant_output = get_response(messages)['choices'][0]['message']
        if  assistant_output['content'] is None:
            assistant_output['content'] = ""
        messages.append(assistant_output)
        i += 1
        print(f"Model output of round {i}:{assistant_output}\n")
    print(f"Final answer:{assistant_output['content']}")

if __name__ == '__main__':
    call_with_messages()

Enter What's the weather like in Singapore?. The program returns the following response.

2024-06-26_10-04-56 (1).gif

Request parameters

The request parameters are aligned with those of the OpenAI interface. The following table describes the parameters.

Parameter	Type	Description
model	string	Specifies the model name. For a list of supported models, see model.
messages	array	The conversation history between the user and the model. Each element in the array is in the format `{"role": role, "content": content}`. Valid values for role: system, user, and assistant. The system role is supported only in the first element of the array (`messages[0]`). In most cases, the user and assistant roles alternate, and the role of the last element in messages must be user.
top_p	float	Optional. The probability threshold of nucleus sampling. For example, if this parameter is set to 0.8, the model selects the smallest set of tokens whose cumulative probability is greater than or equal to 0.8. A greater value introduces more randomness to the generated content. Valid values: (0,1.0).
temperature	float	Optional. The randomness and diversity of the generated content. To be specific, the value of this parameter controls the probability distribution from which the model samples each word. A greater value indicates that more low-probability words are selected and the generated content is more diversified. A smaller value indicates that more high-probability words are selected and the generated content is more predictable. Valid values: [0, 2). We recommend that you do not set this parameter to 0 because 0 is meaningless.
presence_penalty	float	Optional. The repetition of words in generated content. A greater value reduces the repetition of words in generated content. Valid values: [-2.0, 2.0]. Note This parameter is supported only by Qwen commercial models and open source Qwen Version 1.5 or later.
max_tokens	integer	Optional. The maximum number of tokens that can be generated by the model. Different models have different upper limits.
seed	integer	Optional. The random seed used during content generation. This parameter controls the randomness of generated content. Valid values: 64-bit unsigned integers.
stream	boolean	Optional. Specifies whether to enable streaming output mode. In streaming output mode, the model returns a generator. You need to use an iterative loop to fetch the results from the generator and incrementally display the text. Default value: False.
stop	string or array	Optional. If you specify a string or token ID for this parameter, the model stops generating content when the string or token is about to be generated. The value of the stop parameter can be a string or an array. String The model stops generating content when the string is about to be generated. For example, if you set the stop parameter to "Hello", the model stops generating content when "Hello" is about to be generated. Array The array can contain token IDs, strings, or arrays whose elements are token IDs. The model stops generating content when a token whose ID is contained in the array is about to be generated. In the following example, the stop parameter is set to an array. The tokenizer used in the following examples is from Qwen-turbo. 1. The elements in the array are token IDs. The ID of the token "hello" is 14990. The ID of the token "weather" is 9104. If the stop parameter is set to `[14990,9104]`, the model stops generating content when "hello" or "weather" is about to be generated. 2. The elements in the array are strings. If the stop parameter is set to `["hello","weather"]`, the model stops generating content when "hello" or "weather" is about to be generated. 3. The elements in the array are arrays. The ID of the token "hello" is 14990. The ID of the token "there" is 1052. The ID of the token "thank" is 9702. The ID of the token "you" is 498. If the stop parameter is set to `[[108386, 103924],[35946, 101243]]`, the model stops generating content when "hello there" or "thank you" is about to be generated. Note If the stop parameter is an array, the array cannot contain both token IDs and strings. For example, you cannot set the stop parameter to `["hello",9104]`.
tools	array	Optional. The list of tools that can be called by the model. The model calls a tool from the tool list during each function call process. Each tool in the list contains the following parameters: type: the type of the tool. The value of this parameter is a string. Valid value: function. function: the function of the tool, including name, description, and parameters. The value of the function parameter is an object. name: the name of the tool function. The value of this parameter is a string and can contain letters, digits, underscores (_), and hyphens (-). It can be up to 64 characters in length. description: the description of the tool function. It informs the model when and how to call the tool function. The value of this parameter is a string. parameters: the request parameters of the tool function. The request parameters must be specified in a valid JSON schema. The value of this parameter is an object. For more information about JSON schemas, see Understanding JSON Schema. If the parameters parameter is left empty, the function does not contain request parameters. During a function call process, you must specify the tools parameter when you initiate a round of function call and when you return the results of a tool function to the model. Qwen-turbo, Qwen-plus, and Qwen-max support this parameter. Note If you set the stream parameter to True, the tools parameter cannot be used.
stream_options	object	Optional. Specifies whether to display the number of tokens used in streaming output mode. This parameter takes effect only when the stream parameter is set to True. If you want to count the number of tokens used in streaming output mode, set this parameter to `stream_options={"include_usage":True}`.

Response parameters

Parameter	Type	Description
id	string	The request ID.
model	string	The name of the model that is called.
system_fingerprint	string	The configuration version of the model that is called. This parameter is not supported and an empty string ''" is returned.
choices	array	Details of the generated content.
choices[i].finish_reason	string	The reason why the model stops generating content. Valid values: null: The model has not stopped. stop: The content generated by the model triggers the stop conditions. length: The content generated by the model exceeds the length limit.
choices[i].message	object	The message returned by the model.
choices[i].message.role	string	The role of the model. Only assistant may be returned.
choices[i].message.content	string	The content generated by the model.
choices[i].index	integer	The sequence number of the content. Default value: 0.
created	integer	The timestamp when the content was generated. Unit: seconds.
usage	object	The number of tokens that are consumed during the request.
usage.prompt_tokens	integer	The number of tokens that are converted from the input text.
usage.completion_tokens	integer	The number of tokens that are converted from the response generated by the model.
usage.total_tokens	integer	The sum of usage.prompt_tokens and usage.completion_tokens.

Use langchain_openai SDK

Prerequisites

Python is installed.

langchain_openai SDK is installed. To install langchain_openai SDK, run the following command.

# If the following command returns an error, replace pip with pip3. pip 3 is used for Python versions later than Python 3
pip install -U langchain_openai

Usage

The following examples show how to use langchain_openai SDK to access Qwen models in Model Studio.

Non-streaming output

The following example uses the invoke method to implement non-streaming output:

from langchain_openai import ChatOpenAI
import os

def get_response():
    llm = ChatOpenAI(
        api_key=os.getenv("DASHSCOPE_API_KEY"), # If you have not configured the environment variable, replace DASHSCOPE_API_KEY with your API key
        base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1", # Replace https://dashscope-intl.aliyuncs.com/compatible-mode/v1 with the base_url of the DashScope SDK
        model="qwen-plus"
        )
    messages = [
        {"role":"system","content":"You are a helpful assistant."}, 
        {"role":"user","content":"Who are you?"}
    ]
    response = llm.invoke(messages)
    print(response.json(ensure_ascii=False))

if __name__ == "__main__":
    get_response()

Sample response:

{
    "content": "I am a large language model created by Alibaba Cloud. I am called Qwen.",
    "additional_kwargs": {},
    "response_metadata": {
        "token_usage": {
            "completion_tokens": 16,
            "prompt_tokens": 22,
            "total_tokens": 38
        },
        "model_name": "qwen-plus",
        "system_fingerprint": "",
        "finish_reason": "stop",
        "logprobs": null
    },
    "type": "ai",
    "name": null,
    "id": "run-xxx",
    "example": false,
    "tool_calls": [],
    "invalid_tool_calls": []
}

Streaming output

The following example uses the stream method to implement streaming output. You do not need to configure the stream parameter.

from langchain_openai import ChatOpenAI
import os

def get_response():
    llm = ChatOpenAI(
        api_key=os.getenv("DASHSCOPE_API_KEY"),
        base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1", 
        model="qwen-plus",
        # Add the following settings to display token usage in the last line of the streaming output
        stream_options={"include_usage": True}
        )
    messages = [
        {"role":"system","content":"You are a helpful assistant."}, 
        {"role":"user","content":"Who are you?"},
    ]
    response = llm.stream(messages)
    for chunk in response:
        print(chunk.json(ensure_ascii=False))

if __name__ == "__main__":
    get_response()

Sample response:

{"content": "", "additional_kwargs": {}, "response_metadata": {}, "type": "AIMessageChunk", "name": null, "id": "run-xxx", "example": false, "tool_calls": [], "invalid_tool_calls": [], "usage_metadata": null, "tool_call_chunks": []}
{"content": "I am", "additional_kwargs": {}, "response_metadata": {}, "type": "AIMessageChunk", "name": null, "id": "run-xxx", "example": false, "tool_calls": [], "invalid_tool_calls": [], "usage_metadata": null, "tool_call_chunks": []}
{"content": "a large", "additional_kwargs": {}, "response_metadata": {}, "type": "AIMessageChunk", "name": null, "id": "run-xxx", "example": false, "tool_calls": [], "invalid_tool_calls": [], "usage_metadata": null, "tool_call_chunks": []}
{"content": "language model", "additional_kwargs": {}, "response_metadata": {}, "type": "AIMessageChunk", "name": null, "id": "run-xxx", "example": false, "tool_calls": [], "invalid_tool_calls": [], "usage_metadata": null, "tool_call_chunks": []}
{"content": "created", "additional_kwargs": {}, "response_metadata": {}, "type": "AIMessageChunk", "name": null, "id": "run-xxx", "example": false, "tool_calls": [], "invalid_tool_calls": [], "usage_metadata": null, "tool_call_chunks": []}
{"content": "by Alibaba Cloud", "additional_kwargs": {}, "response_metadata": {}, "type": "AIMessageChunk", "name": null, "id": "run-xxx", "example": false, "tool_calls": [], "invalid_tool_calls": [], "usage_metadata": null, "tool_call_chunks": []}
{"content": ". I am", "additional_kwargs": {}, "response_metadata": {}, "type": "AIMessageChunk", "name": null, "id": "run-xxx", "example": false, "tool_calls": [], "invalid_tool_calls": [], "usage_metadata": null, "tool_call_chunks": []}
{"content": "called Qwen.", "additional_kwargs": {}, "response_metadata": {}, "type": "AIMessageChunk", "name": null, "id": "run-xxx", "example": false, "tool_calls": [], "invalid_tool_calls": [], "usage_metadata": null, "tool_call_chunks": []}
{"content": "", "additional_kwargs": {}, "response_metadata": {"finish_reason": "stop"}, "type": "AIMessageChunk", "name": null, "id": "run-xxx", "example": false, "tool_calls": [], "invalid_tool_calls": [], "usage_metadata": null, "tool_call_chunks": []}
{"content": "", "additional_kwargs": {}, "response_metadata": {}, "type": "AIMessageChunk", "name": null, "id": "run-xxx", "example": false, "tool_calls": [], "invalid_tool_calls": [], "usage_metadata": {"input_tokens": 22, "output_tokens": 16, "total_tokens": 38}, "tool_call_chunks": []}

For more information about the request parameters, see Request parameters.

Use HTTP

You can use the HTTP interface that have the same structure as those from OpenAI to obtain responses from Model Studio.

Specify endpoint

POST https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions

Sample request

The following example calls an API by using the cURL command.

Note

If you have not configured the environment variable, replace $DASHSCOPE_API_KEY with your API key.

Non-streaming output

Shell

curl --location 'https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions' \
--header "Authorization: Bearer $DASHSCOPE_API_KEY" \
--header 'Content-Type: application/json' \
--data '{
    "model": "qwen-plus",
    "messages": [
        {
            "role": "system",
            "content": "You are a helpful assistant."
        },
        {
            "role": "user", 
            "content": "Who are you?"
        }
    ]
}'

Example response:

{
    "choices": [
        {
            "message": {
                "role": "assistant",
                "content": "I am a large language model created by Alibaba Cloud. I am called Qwen."
            },
            "finish_reason": "stop",
            "index": 0,
            "logprobs": null
        }
    ],
    "object": "chat.completion",
    "usage": {
        "prompt_tokens": 11,
        "completion_tokens": 16,
        "total_tokens": 27
    },
    "created": 1715252778,
    "system_fingerprint": "",
    "model": "qwen-plus",
    "id": "chatcmpl-xxx"
}

Streaming output

If you want to use the streaming output mode, set the stream parameter to true in the request body.

curl --location 'https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions' \
--header "Authorization: Bearer $DASHSCOPE_API_KEY" \
--header 'Content-Type: application/json' \
--data '{
    "model": "qwen-plus",
    "messages": [
        {
            "role": "system",
            "content": "You are a helpful assistant."
        },
        {
            "role": "user", 
            "content": "Who are you?"
        }
    ],
    "stream":true
}'

Sample response:

data: {"choices":[{"delta":{"content":"","role":"assistant"},"index":0,"logprobs":null,"finish_reason":null}],"object":"chat.completion.chunk","usage":null,"created":1715931028,"system_fingerprint":null,"model":"qwen-plus","id":"chatcmpl-3bb05cf5cd819fbca5f0b8d67a025022"}

data: {"choices":[{"finish_reason":null,"delta":{"content":"I am"},"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1715931028,"system_fingerprint":null,"model":"qwen-plus","id":"chatcmpl-3bb05cf5cd819fbca5f0b8d67a025022"}

data: {"choices":[{"delta":{"content":"a large"},"finish_reason":null,"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1715931028,"system_fingerprint":null,"model":"qwen-plus","id":"chatcmpl-3bb05cf5cd819fbca5f0b8d67a025022"}

data: {"choices":[{"delta":{"content":"language model"},"finish_reason":null,"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1715931028,"system_fingerprint":null,"model":"qwen-plus","id":"chatcmpl-3bb05cf5cd819fbca5f0b8d67a025022"}

data: {"choices":[{"delta":{"content":"created by Alibaba Cloud"},"finish_reason":null,"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1715931028,"system_fingerprint":null,"model":"qwen-plus","id":"chatcmpl-3bb05cf5cd819fbca5f0b8d67a025022"}

data: {"choices":[{"delta":{"content":". I am called Qwen."},"finish_reason":null,"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1715931028,"system_fingerprint":null,"model":"qwen-plus","id":"chatcmpl-3bb05cf5cd819fbca5f0b8d67a025022"}

data: {"choices":[{"delta":{"content":""},"finish_reason":"stop","index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1715931028,"system_fingerprint":null,"model":"qwen-plus","id":"chatcmpl-3bb05cf5cd819fbca5f0b8d67a025022"}

data: [DONE]

For more information about the request parameters, see Request parameters.

Sample error response

If the request fails, the following error code and error message are returned.

{
    "error": {
        "message": "Incorrect API key provided. ",
        "type": "invalid_request_error",
        "param": null,
        "code": "invalid_api_key"
    }
}

Status codes

Status code	Description
400 - Invalid Request Error	Request error. The error message shows the details.
401 - Incorrect API key provided	The API key is incorrect.
429 - Rate limit reached for requests	The number of queries per second or minute exceeds the limit.
429 - You exceeded your current quota, please check your plan and billing details	You have exceeded the quota or your payment is overdue.
500 - The server had an error while processing your request	An error occurred on the server.
503 - The engine is currently overloaded, please try again later	The server is overloaded. You can try again later.