Alibaba Cloud Model Studio's Qwen models support OpenAI-compatible interfaces. You can use your existing OpenAI code with Model Studio by changing only the API key, base_url, and model name.
Required information
base_url
The base_url is the service endpoint for the model service. When using the OpenAI-compatible interface to access Alibaba Cloud Model Studio, you must configure the base_url.
-
When using the OpenAI SDK or other OpenAI-compatible SDKs, set the base_url as follows:
Singapore: https://dashscope-intl.aliyuncs.com/compatible-mode/v1 US (Virginia): https://dashscope-us.aliyuncs.com/compatible-mode/v1 China (Beijing): https://dashscope.aliyuncs.com/compatible-mode/v1 -
When making HTTP requests, use the full access endpoint as follows:
Singapore: POST https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions US (Virginia): POST https://dashscope-us.aliyuncs.com/compatible-mode/v1/chat/completions China (Beijing): POST https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions
Supported models
The following table lists the Qwen series models currently supported by the OpenAI-compatible interface.
Global
-
Commercial
-
Qwen-Max series: qwen3-max, qwen3-max-preview, qwen3-max-2025-09-23 and later snapshot models
-
Qwen-Plus series: qwen-plus, qwen-plus-latest, qwen-plus-2025-01-25 and later snapshot models
-
Qwen-Flash series: qwen-flash, qwen-flash-2025-07-28 and later snapshot models
-
-
Open source
-
qwen3-next-80b-a3b-thinking, qwen3-next-80b-a3b-instruct, qwen3-235b-a22b-thinking-2507, qwen3-235b-a22b-instruct-2507, qwen3-30b-a3b-thinking-2507, qwen3-30b-a3b-instruct-2507, qwen3-235b-a22b, qwen3-32b, qwen3-30b-a3b, qwen3-14b, qwen3-8b
-
International
-
Business
-
Qwen-Max series: qwen3-max, qwen3-max-preview, qwen3-max-2025-09-23 and later snapshot models; qwen-max, qwen-max-latest, qwen-max-2025-01-25 and later snapshot models
-
Qwen-Plus series: qwen3.5-plus, qwen3.5-plus-2026-02-15 and later snapshot models; qwen-plus, qwen-plus-latest, qwen-plus-2025-01-25 and later snapshot models
-
Qwen-Flash series: qwen3.5-flash, qwen3.5-flash-2026-02-23 and later snapshot models; qwen-flash, qwen-flash-2025-07-28
-
Qwen-Turbo series: qwen-turbo, qwen-turbo-latest, qwen-turbo-2024-11-01 and later snapshot models
-
Qwen-Coder series: qwen3-coder-plus, qwen3-coder-plus-2025-07-22 and later snapshot models; qwen3-coder-flash, qwen3-coder-flash-2025-07-28 and later snapshot models
-
QwQ series: qwq-plus
-
-
Open source
-
qwen3.5-397b-a17b, qwen3.5-120b-a10b, qwen3.5-27b, qwen3.5-35b-a3b
-
qwen3-next-80b-a3b-thinking, qwen3-next-80b-a3b-instruct, qwen3-235b-a22b-thinking-2507, qwen3-235b-a22b-instruct-2507, qwen3-30b-a3b-thinking-2507, qwen3-30b-a3b-instruct-2507, qwen3-235b-a22b, qwen3-32b, qwen3-30b-a3b, qwen3-14b, qwen3-8b, qwen3-4b, qwen3-1.7b, qwen3-0.6b
-
qwen2.5-14b-instruct-1m, qwen2.5-7b-instruct-1m, qwen2.5-72b-instruct, qwen2.5-32b-instruct, qwen2.5-14b-instruct, qwen2.5-7b-instruct
-
US
-
Commercial
-
Qwen-Plus series: qwen-plus-us, qwen-plus-2025-12-01-us and later snapshot models
-
Qwen-Flash series: qwen-flash-us, qwen-flash-2025-07-28-us
-
Chinese Mainland
-
Commercial
-
Qwen-Max series: qwen3-max, qwen3-max-preview, qwen3-max-2025-09-23 and later snapshot models; qwen-max, qwen-max-latest, qwen-max-2024-09-19 and later snapshot models
-
Qwen Plus series: qwen3.5-plus, qwen3.5-plus-2026-02-15, qwen3.5-flash, qwen3.5-flash-2026-02-23, and later (snapshot) models; qwen-plus, qwen-plus-latest, qwen-plus-2024-12-20, and later (snapshot) models
-
Qwen-Flash series: qwen-flash, qwen-flash-2025-07-28 and later snapshot models
-
Qwen-Turbo series: qwen-turbo, qwen-turbo-latest, qwen-turbo-2025-04-28 and later snapshot models
-
Qwen-Coder series: qwen3-coder-plus, qwen3-coder-plus-2025-07-22 and later snapshot models; qwen3-coder-flash, qwen3-coder-flash-2025-07-28 and later snapshot models; qwen-coder-plus, qwen-coder-plus-latest, qwen-coder-plus-2024-11-06; qwen-coder-turbo, qwen-coder-turbo-latest, qwen-coder-turbo-2024-09-19
-
QwQ series: qwq-plus, qwq-plus-latest, qwq-plus-2025-03-05
-
Qwen-Math models: qwen-math-plus, qwen-math-plus-latest, qwen-math-plus-2024-08-16 and later snapshot models; qwen-math-turbo, qwen-math-turbo-latest, qwen-math-turbo-2024-09-19
-
-
Open source
-
qwen3.5-397b-a17b, qwen3.5-120b-a10b, qwen3.5-27b, qwen3.5-35b-a3b
-
qwen3-next-80b-a3b-thinking, qwen3-next-80b-a3b-instruct, qwen3-235b-a22b-thinking-2507, qwen3-235b-a22b-instruct-2507, qwen3-30b-a3b-thinking-2507, qwen3-30b-a3b-instruct-2507, qwen3-235b-a22b, qwen3-32b, qwen3-30b-a3b, qwen3-14b, qwen3-8b, qwen3-4b, qwen3-1.7b, qwen3-0.6b
-
qwen2.5-14b-instruct-1m, qwen2.5-7b-instruct-1m, qwen2.5-72b-instruct, qwen2.5-32b-instruct, qwen2.5-14b-instruct, qwen2.5-7b-instruct, qwen2.5-3b-instruct, qwen2.5-1.5b-instruct, qwen2.5-0.5b-instruct
-
Use OpenAI SDK
Prerequisites
-
A Python environment must be installed on your computer.
-
Install the latest version of the OpenAI SDK.
# If the following command fails, replace pip with pip3 pip install -U openai -
Activate Alibaba Cloud Model Studio and get an API key. For more information, see Get an API key.
-
We recommend configuring the API key as an environment variable to reduce the risk of leaks. For more information, see Configure an API key as an environment variable. You can also configure the API key in your code, but this increases the risk of leaks.
-
Select a model to use. For more information, see Supported models.
Usage
The following examples show how to access Qwen models in Model Studio using the OpenAI SDK.
Non-streaming call example
from openai import OpenAI
import os
def get_response():
client = OpenAI(
api_key=os.getenv("DASHSCOPE_API_KEY"), # If you have not configured the environment variable, replace this line with your Alibaba Cloud Model Studio API key: api_key="sk-xxx"
# The following is the base_url for the Singapore region.
base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
)
completion = client.chat.completions.create(
model="qwen-plus", # This example uses qwen-plus. You can change the model name as needed. For a list of models, see https://www.alibabacloud.com/help/en/model-studio/getting-started/models
messages=[{'role': 'system', 'content': 'You are a helpful assistant.'},
{'role': 'user', 'content': 'Who are you?'}]
)
print(completion.model_dump_json())
if __name__ == '__main__':
get_response()
Run the code to get the following result:
{
"id": "chatcmpl-xxx",
"choices": [
{
"finish_reason": "stop",
"index": 0,
"logprobs": null,
"message": {
"content": "I am a large-scale pre-trained model from Alibaba Cloud. My name is Qwen.",
"role": "assistant",
"function_call": null,
"tool_calls": null
}
}
],
"created": 1716430652,
"model": "qwen-plus",
"object": "chat.completion",
"system_fingerprint": null,
"usage": {
"completion_tokens": 18,
"prompt_tokens": 22,
"total_tokens": 40
}
}
Streaming call example
from openai import OpenAI
import os
def get_response():
client = OpenAI(
# If you have not configured the environment variable, replace this line with your Alibaba Cloud Model Studio API key: api_key="sk-xxx"
api_key=os.getenv("DASHSCOPE_API_KEY"),
# The following is the base_url for the Singapore region.
base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
)
completion = client.chat.completions.create(
model="qwen-plus", # This example uses qwen-plus. You can change the model name as needed. For a list of models, see https://www.alibabacloud.com/help/en/model-studio/getting-started/models
messages=[{'role': 'system', 'content': 'You are a helpful assistant.'},
{'role': 'user', 'content': 'Who are you?'}],
stream=True,
# The following setting displays token usage information in the last line of the streaming output.
stream_options={"include_usage": True}
)
for chunk in completion:
print(chunk.model_dump_json())
if __name__ == '__main__':
get_response()
Run the code to get the following result:
{"id":"chatcmpl-xxx","choices":[{"delta":{"content":"","function_call":null,"role":"assistant","tool_calls":null},"finish_reason":null,"index":0,"logprobs":null}],"created":1719286190,"model":"qwen-plus","object":"chat.completion.chunk","system_fingerprint":null,"usage":null}
{"id":"chatcmpl-xxx","choices":[{"delta":{"content":"I am","function_call":null,"role":null,"tool_calls":null},"finish_reason":null,"index":0,"logprobs":null}],"created":1719286190,"model":"qwen-plus","object":"chat.completion.chunk","system_fingerprint":null,"usage":null}
{"id":"chatcmpl-xxx","choices":[{"delta":{"content":" a large","function_call":null,"role":null,"tool_calls":null},"finish_reason":null,"index":0,"logprobs":null}],"created":1719286190,"model":"qwen-plus","object":"chat.completion.chunk","system_fingerprint":null,"usage":null}
{"id":"chatcmpl-xxx","choices":[{"delta":{"content":" language","function_call":null,"role":null,"tool_calls":null},"finish_reason":null,"index":0,"logprobs":null}],"created":1719286190,"model":"qwen-plus","object":"chat.completion.chunk","system_fingerprint":null,"usage":null}
{"id":"chatcmpl-xxx","choices":[{"delta":{"content":" model from Alibaba Cloud","function_call":null,"role":null,"tool_calls":null},"finish_reason":null,"index":0,"logprobs":null}],"created":1719286190,"model":"qwen-plus","object":"chat.completion.chunk","system_fingerprint":null,"usage":null}
{"id":"chatcmpl-xxx","choices":[{"delta":{"content":", and my name is Qwen.","function_call":null,"role":null,"tool_calls":null},"finish_reason":null,"index":0,"logprobs":null}],"created":1719286190,"model":"qwen-plus","object":"chat.completion.chunk","system_fingerprint":null,"usage":null}
{"id":"chatcmpl-xxx","choices":[{"delta":{"content":"","function_call":null,"role":null,"tool_calls":null},"finish_reason":"stop","index":0,"logprobs":null}],"created":1719286190,"model":"qwen-plus","object":"chat.completion.chunk","system_fingerprint":null,"usage":null}
{"id":"chatcmpl-xxx","choices":[],"created":1719286190,"model":"qwen-plus","object":"chat.completion.chunk","system_fingerprint":null,"usage":{"completion_tokens":16,"prompt_tokens":22,"total_tokens":38}}
Function call example
This example shows how to perform function calls using weather and time query tools through the OpenAI-compatible interface. The sample code supports multiple rounds of tool calls.
from openai import OpenAI
from datetime import datetime
import json
import os
client = OpenAI(
# If you have not configured the environment variable, replace the following line with your Alibaba Cloud Model Studio API key: api_key="sk-xxx",
api_key=os.getenv("DASHSCOPE_API_KEY"),
# The following is the base_url for the Singapore region.
base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
)
# Define the list of tools. The model refers to the name and description of the tools when selecting which one to use.
tools = [
# Tool 1: Get the current time.
{
"type": "function",
"function": {
"name": "get_current_time",
"description": "Useful when you want to know the current time.",
# Because getting the current time requires no input parameters, parameters is an empty dictionary.
"parameters": {}
}
},
# Tool 2: Get the weather for a specified city.
{
"type": "function",
"function": {
"name": "get_current_weather",
"description": "Useful when you want to query the weather for a specified city.",
"parameters": {
"type": "object",
"properties": {
# A location must be provided to query the weather, so the parameter is set to location.
"location": {
"type": "string",
"description": "A city or district, such as Beijing, Hangzhou, or Yuhang."
}
}
},
"required": [
"location"
]
}
}
]
# Impersonate a weather query tool. Example result: "It is rainy in Beijing today."
def get_current_weather(location):
return f"It is rainy in {location} today. "
# Tool to query the current time. Example result: "Current time: 2024-04-15 17:15:18."
def get_current_time():
# Get the current date and time.
current_datetime = datetime.now()
# Format the current date and time.
formatted_time = current_datetime.strftime('%Y-%m-%d %H:%M:%S')
# Return the formatted current time.
return f"Current time: {formatted_time}."
# Encapsulate the model response function.
def get_response(messages):
completion = client.chat.completions.create(
model="qwen-plus", # This example uses qwen-plus. You can change the model name as needed. For a list of models, see https://www.alibabacloud.com/help/en/model-studio/getting-started/models
messages=messages,
tools=tools
)
return completion.model_dump()
def call_with_messages():
print('\n')
messages = [
{
"content": input('Please enter: '), # Example questions: "What time is it now?" "What time will it be in an hour?" "What is the weather like in Beijing?"
"role": "user"
}
]
print("-"*60)
# First round of model call.
i = 1
first_response = get_response(messages)
assistant_output = first_response['choices'][0]['message']
print(f"\nLLM output in round {i}: {first_response}\n")
if assistant_output['content'] is None:
assistant_output['content'] = ""
messages.append(assistant_output)
# If no tool call is needed, return the final answer directly.
if assistant_output['tool_calls'] == None: # If the model determines that no tool call is needed, print the assistant's reply directly without a second model call.
print(f"No tool call is needed. I can reply directly: {assistant_output['content']}")
return
# If a tool call is needed, perform multiple rounds of model calls until the model determines that no tool call is needed.
while assistant_output['tool_calls'] != None:
# If the model determines that the weather query tool needs to be called, run the weather query tool.
if assistant_output['tool_calls'][0]['function']['name'] == 'get_current_weather':
tool_info = {"name": "get_current_weather", "role":"tool"}
# Fetch the location parameter information.
location = json.loads(assistant_output['tool_calls'][0]['function']['arguments'])['location']
tool_info['content'] = get_current_weather(location)
# If the model determines that the time query tool needs to be called, run the time query tool.
elif assistant_output['tool_calls'][0]['function']['name'] == 'get_current_time':
tool_info = {"name": "get_current_time", "role":"tool"}
tool_info['content'] = get_current_time()
print(f"Tool output: {tool_info['content']}\n")
print("-"*60)
messages.append(tool_info)
assistant_output = get_response(messages)['choices'][0]['message']
if assistant_output['content'] is None:
assistant_output['content'] = ""
messages.append(assistant_output)
i += 1
print(f"LLM output in round {i}: {assistant_output}\n")
print(f"Final answer: {assistant_output['content']}")
if __name__ == '__main__':
call_with_messages()
When you enter What is the weather in Hangzhou and Beijing? What time is it now?, the program produces the following output:

Input parameters
Input parameters are aligned with those of the OpenAI interface. Currently supported parameters include:
|
Parameter |
Type |
Default |
Description |
|
model |
string |
- |
Specifies the model to use. For a list of available models, see Supported models. |
|
messages |
array |
- |
The conversation history between the user and the model. Each element in the array is in the format |
|
top_p (optional) |
float |
- |
The probability threshold for nucleus sampling during generation. For example, a value of 0.8 means that only the smallest set of most likely tokens with a cumulative probability of 0.8 or higher are considered. The value must be in the range of (0, 1.0). A higher value increases randomness, and a lower value increases determinism. |
|
temperature (optional) |
float |
- |
Controls the randomness and diversity of the model's responses. Specifically, the temperature value controls the degree of smoothing applied to the probability distribution of candidate words during text generation. A higher temperature value reduces the peak of the probability distribution, allowing more low-probability words to be selected and making the output more diverse. A lower temperature value enhances the peak of the probability distribution, making high-probability words more likely to be selected and the output more deterministic. The value must be in the range of [0, 2). We do not recommend setting this parameter to 0. |
|
presence_penalty (optional) |
float |
- |
Controls the repetition of tokens in the entire generated sequence. A higher value reduces repetition. The value must be in the range of [-2.0, 2.0]. Note
This parameter is supported only for commercial Qwen models and open source models from qwen1.5 onwards. |
|
n (optional) |
integer |
1 |
The number of responses to generate. The value must be in the range of Setting a larger value for n does not increase input token consumption but does increase output token consumption. Currently, this is supported only for the qwen-plus model and is fixed to 1 when the tools parameter is passed. |
|
max_tokens (optional) |
integer |
- |
Specifies the maximum number of tokens that the model can generate. For example, if the model's maximum output length is 2,000 tokens, you can set this to 1,000 to prevent excessively long outputs. Different models have different output limits. For more information, see the model list. |
|
seed (optional) |
integer |
- |
The random number seed used for generation. It controls the randomness of the model's output. The seed must be a 64-bit unsigned integer. |
|
stream (optional) |
boolean |
False |
Controls whether to use streaming output. When stream is enabled, the API returns a generator. You must iterate through the generator to get the results. Each output is an incremental sequence of the currently generated text. |
|
stop (optional) |
string or array |
None |
The stop parameter provides precise control over the content generation process by automatically stopping when the model is about to generate a specified string or token ID. The stop parameter can be a string or an array.
|
|
tools (optional) |
array |
None |
Specifies a library of tools that the model can call. The model selects one tool from the library for each function call. Each tool in the tools array has the following structure:
In a function call flow, you must set the tools parameter both for the round that initiates the function call and for the round that submits the execution result of the tool function to the model. This parameter is currently supported by the qwen-turbo, qwen-plus, and qwen-max models. Note
The tools parameter cannot be used with stream=True at the same time. |
|
stream_options (optional) |
object |
None |
This parameter configures whether to display the number of tokens used during streaming output. It is active only when stream is set to True. To count tokens in streaming mode, set this parameter to |
Output parameters
|
Response parameters |
Data type |
Description |
Notes |
|
id |
string |
The system-generated ID for the call. |
None |
|
model |
string |
The name of the model used for the call. |
None |
|
system_fingerprint |
string |
The configuration version used by the model at runtime. This is not currently supported and returns an empty string "". |
None |
|
choices |
array |
Details of the content generated by the model. |
None |
|
choices[i].finish_reason |
string |
The following three cases apply:
|
|
|
choices[i].message |
object |
The message output by the model. |
|
|
choices[i].message.role |
string |
The role of the model. This is fixed to assistant. |
|
|
choices[i].message.content |
string |
The text generated by the model. |
|
|
choices[i].index |
integer |
The sequence number of the generated result. The default is 0. |
|
|
created |
integer |
The UNIX timestamp (in seconds) when the result was generated. |
None |
|
usage |
object |
Billing information, which indicates the token consumption for the request. |
None |
|
usage.prompt_tokens |
integer |
The number of tokens in the input text. |
None |
|
usage.completion_tokens |
integer |
The number of tokens in the generated response. |
None |
|
usage.total_tokens |
integer |
The sum of usage.prompt_tokens and usage.completion_tokens. |
None |
Use langchain_openai SDK
Prerequisites
-
A Python environment must be installed on your computer.
-
Run the following command to install the langchain_openai SDK.
# If the following command fails, replace pip with pip3 pip install -U langchain_openai
-
Activate Alibaba Cloud Model Studio and get an API key. For more information, see Get an API key.
-
We recommend configuring the API key as an environment variable to reduce the risk of leaks. For more information, see Configure an API key as an environment variable. You can also configure the API key in your code, but this increases the risk of leaks.
-
Select a model to use. For more information, see Supported models.
Usage
The following examples show how to access Qwen models in Alibaba Cloud Model Studio using the langchain_openai SDK.
Non-streaming output
Use the invoke method for non-streaming output. Refer to the following sample code:
from langchain_openai import ChatOpenAI
import os
def get_response():
llm = ChatOpenAI(
api_key=os.getenv("DASHSCOPE_API_KEY"), # If you have not configured the environment variable, replace this line with your Alibaba Cloud Model Studio API key: api_key="sk-xxx"
base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1", # This is the base_url for the Singapore region.
model="qwen-plus" # This example uses qwen-plus. You can change the model name as needed. For a list of models, see https://www.alibabacloud.com/help/en/model-studio/getting-started/models
)
messages = [
{"role":"system","content":"You are a helpful assistant."},
{"role":"user","content":"Who are you?"}
]
response = llm.invoke(messages)
print(response.json())
if __name__ == "__main__":
get_response()
Run the code to get the following result:
{
"content": "I am a large language model from Alibaba Cloud. My name is Qwen.",
"additional_kwargs": {},
"response_metadata": {
"token_usage": {
"completion_tokens": 16,
"prompt_tokens": 22,
"total_tokens": 38
},
"model_name": "qwen-plus",
"system_fingerprint": "",
"finish_reason": "stop",
"logprobs": null
},
"type": "ai",
"name": null,
"id": "run-xxx",
"example": false,
"tool_calls": [],
"invalid_tool_calls": []
}
Streaming output
Use the stream method for streaming output. You do not need to set the stream parameter.
from langchain_openai import ChatOpenAI
import os
def get_response():
llm = ChatOpenAI(
api_key=os.getenv("DASHSCOPE_API_KEY"), # If you have not configured the environment variable, replace this line with your Alibaba Cloud Model Studio API key: api_key="sk-xxx"
base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1", # This is the base_url for the Singapore region.
model="qwen-plus", # This example uses qwen-plus. You can change the model name as needed. For a list of models, see https://www.alibabacloud.com/help/en/model-studio/getting-started/models
stream_usage=True
)
messages = [
{"role":"system","content":"You are a helpful assistant."},
{"role":"user","content":"Who are you?"},
]
response = llm.stream(messages)
for chunk in response:
print(chunk.model_dump_json())
if __name__ == "__main__":
get_response()
Run the code to get the following result:
{"content": "", "additional_kwargs": {}, "response_metadata": {}, "type": "AIMessageChunk", "name": null, "id": "run-xxx", "example": false, "tool_calls": [], "invalid_tool_calls": [], "usage_metadata": null, "tool_call_chunks": []}
{"content": "I am", "additional_kwargs": {}, "response_metadata": {}, "type": "AIMessageChunk", "name": null, "id": "run-xxx", "example": false, "tool_calls": [], "invalid_tool_calls": [], "usage_metadata": null, "tool_call_chunks": []}
{"content": " from", "additional_kwargs": {}, "response_metadata": {}, "type": "AIMessageChunk", "name": null, "id": "run-xxx", "example": false, "tool_calls": [], "invalid_tool_calls": [], "usage_metadata": null, "tool_call_chunks": []}
{"content": " Alibaba", "additional_kwargs": {}, "response_metadata": {}, "type": "AIMessageChunk", "name": null, "id": "run-xxx", "example": false, "tool_calls": [], "invalid_tool_calls": [], "usage_metadata": null, "tool_call_chunks": []}
{"content": " Cloud", "additional_kwargs": {}, "response_metadata": {}, "type": "AIMessageChunk", "name": null, "id": "run-xxx", "example": false, "tool_calls": [], "invalid_tool_calls": [], "usage_metadata": null, "tool_call_chunks": []}
{"content": "'s large language model", "additional_kwargs": {}, "response_metadata": {}, "type": "AIMessageChunk", "name": null, "id": "run-xxx", "example": false, "tool_calls": [], "invalid_tool_calls": [], "usage_metadata": null, "tool_call_chunks": []}
{"content": ", and my name is Qwen", "additional_kwargs": {}, "response_metadata": {}, "type": "AIMessageChunk", "name": null, "id": "run-xxx", "example": false, "tool_calls": [], "invalid_tool_calls": [], "usage_metadata": null, "tool_call_chunks": []}
{"content": ".", "additional_kwargs": {}, "response_metadata": {}, "type": "AIMessageChunk", "name": null, "id": "run-xxx", "example": false, "tool_calls": [], "invalid_tool_calls": [], "usage_metadata": null, "tool_call_chunks": []}
{"content": "", "additional_kwargs": {}, "response_metadata": {"finish_reason": "stop"}, "type": "AIMessageChunk", "name": null, "id": "run-xxx", "example": false, "tool_calls": [], "invalid_tool_calls": [], "usage_metadata": null, "tool_call_chunks": []}
{"content": "", "additional_kwargs": {}, "response_metadata": {}, "type": "AIMessageChunk", "name": null, "id": "run-xxx", "example": false, "tool_calls": [], "invalid_tool_calls": [], "usage_metadata": {"input_tokens": 22, "output_tokens": 16, "total_tokens": 38}, "tool_call_chunks": []}
For information about input parameter configuration, see Input parameters. The relevant parameters are defined in the ChatOpenAI object.
Use HTTP interface
You can call Alibaba Cloud Model Studio through the HTTP interface to get responses with the same structure as those from the OpenAI service.
Prerequisites
-
Activate Alibaba Cloud Model Studio and get an API key. For more information, see Get an API key.
-
We recommend configuring the API key as an environment variable to reduce the risk of leaks. For more information, see Configure an API key as an environment variable. You can also configure the API key in your code, but this increases the risk of leaks.
Submit an API call
Singapore: POST https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions
US (Virginia): POST https://dashscope-us.aliyuncs.com/compatible-mode/v1/chat/completions
China (Beijing): POST https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions
Request example
The following example shows a script that uses the cURL command to call the API.
If you have not configured the API key as an environment variable, replace $DASHSCOPE_API_KEY with your actual API key.
Non-streaming output
# This is the base_url for the Singapore region.
curl --location 'https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions' \
--header "Authorization: Bearer $DASHSCOPE_API_KEY" \
--header 'Content-Type: application/json' \
--data '{
"model": "qwen-plus",
"messages": [
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "Who are you?"
}
]
}'
Run the command to get the following result:
{
"choices": [
{
"message": {
"role": "assistant",
"content": "I am a large language model from Alibaba Cloud. My name is Qwen."
},
"finish_reason": "stop",
"index": 0,
"logprobs": null
}
],
"object": "chat.completion",
"usage": {
"prompt_tokens": 11,
"completion_tokens": 16,
"total_tokens": 27
},
"created": 1715252778,
"system_fingerprint": "",
"model": "qwen-plus",
"id": "chatcmpl-xxx"
}
Streaming output
To enable streaming output, set the stream parameter to true in the request body.
# This is the base_url for the Singapore region.
curl --location 'https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions' \
--header "Authorization: Bearer $DASHSCOPE_API_KEY" \
--header 'Content-Type: application/json' \
--data '{
"model": "qwen-plus",
"messages": [
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "Who are you?"
}
],
"stream":true
}'
Run the command to get the following result:
data: {"choices":[{"delta":{"content":"","role":"assistant"},"index":0,"logprobs":null,"finish_reason":null}],"object":"chat.completion.chunk","usage":null,"created":1715931028,"system_fingerprint":null,"model":"qwen-plus","id":"chatcmpl-3bb05cf5cd819fbca5f0b8d67a025022"}
data: {"choices":[{"finish_reason":null,"delta":{"content":"I am"},"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1715931028,"system_fingerprint":null,"model":"qwen-plus","id":"chatcmpl-3bb05cf5cd819fbca5f0b8d67a025022"}
data: {"choices":[{"delta":{"content":" a large"},"finish_reason":null,"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1715931028,"system_fingerprint":null,"model":"qwen-plus","id":"chatcmpl-3bb05cf5cd819fbca5f0b8d67a025022"}
data: {"choices":[{"delta":{"content":" language"},"finish_reason":null,"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1715931028,"system_fingerprint":null,"model":"qwen-plus","id":"chatcmpl-3bb05cf5cd819fbca5f0b8d67a025022"}
data: {"choices":[{"delta":{"content":" model from Alibaba Cloud"},"finish_reason":null,"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1715931028,"system_fingerprint":null,"model":"qwen-plus","id":"chatcmpl-3bb05cf5cd819fbca5f0b8d67a025022"}
data: {"choices":[{"delta":{"content":", and my name is Qwen."},"finish_reason":null,"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1715931028,"system_fingerprint":null,"model":"qwen-plus","id":"chatcmpl-3bb05cf5cd819fbca5f0b8d67a025022"}
data: {"choices":[{"delta":{"content":""},"finish_reason":"stop","index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1715931028,"system_fingerprint":null,"model":"qwen-plus","id":"chatcmpl-3bb05cf5cd819fbca5f0b8d67a025022"}
data: [DONE]
For more information about input parameters, see Input parameters.
Error response example
If an error occurs during a request, the response includes a code and a message that indicate the cause.
{
"error": {
"message": "Incorrect API key provided. ",
"type": "invalid_request_error",
"param": null,
"code": "invalid_api_key"
}
}
Status codes
|
Error code |
Description |
|
400 — Invalid request error |
The request is invalid. For more information, see the error message. |
|
401 — Incorrect API key provided |
The API key is incorrect. |
|
429 — Rate limit reached for requests |
The rate limit, such as queries per second (QPS) or queries per minute (QPM), is exceeded. |
|
429 — You exceeded your current quota, please check your plan and billing details |
Your quota is exceeded or you have an overdue payment. |
|
500 — The server had an error while processing your request |
A server-side error occurred. |
|
503 — The engine is currently overloaded, please try again later |
The server is overloaded. You can retry the request. |