Alibaba Cloud Model Studio provides interfaces that are compatible with OpenAI for the Qwen models. If you have used the OpenAI SDK, other OpenAI-compatible interfaces such as the langchain_openai SDK, or HTTP to call OpenAI services, you need to only adjust a few parameters in your framework to use the models from Model Studio.
Before you begin
Before you use the interfaces, you must first obtain information about the necessary parameters. The following section describes how to obtain information about the base_url, api_key, and model parameters.
base_url
The base_url parameter specifies the endpoint or address of the model services in Model Studio.
When you use the OpenAI SDK or other OpenAI-compatible SDKs, set base_url to:
https://dashscope-intl.aliyuncs.com/compatible-mode/v1
When you use HTTP requests, set the endpoint to:
POST https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions
api_key
Activate Alibaba Cloud Model Studio and obtain an API key. For more information, see Obtain an API key.
We recommend that you set the API key as an environment variable to reduce the threat of API key leakage. For more information, see Set API key as an environment variable.
model
The model parameter specifies the name of the model that you want to access. The following table lists Qwen models that are supported by OpenAI-compatible interfaces. Set the model parameter to the name of the model you use.
Type | Name |
Qwen | qwen-turbo qwen-plus qwen-max |
Open source Qwen | qwen2-57b-a14b-instruct qwen2-72b-instruct qwen2-7b-instruct qwen1.5-110b-chat qwen1.5-72b-chat qwen1.5-32b-chat qwen1.5-14b-chat qwen1.5-7b-chat |
Use OpenAI SDK
Prerequisites
Python is installed.
The latest version of the OpenAI SDK is installed.
# If the following command returns an error, replace pip with pip3. pip3 is used for Python versions later than Python 3 pip install -U openai
Usage
The following examples show how to use OpenAI SDK to access Qwen models in Model Studio.
Non-streaming output
from openai import OpenAI
import os
def get_response():
client = OpenAI(
api_key=os.getenv("DASHSCOPE_API_KEY"), # If you have not configured the environment variable, replace DASHSCOPE_API_KEY with your API key
base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1", # Replace https://dashscope-intl.aliyuncs.com/compatible-mode/v1 with the base_url of the DashScope SDK
)
completion = client.chat.completions.create(
model="qwen-plus",
messages=[{'role': 'system', 'content': 'You are a helpful assistant.'},
{'role': 'user', 'content': 'Who are you?'}]
)
print(completion.model_dump_json())
if __name__ == '__main__':
get_response()
Sample response:
{
"id": "chatcmpl-xxx",
"choices": [
{
"finish_reason": "stop",
"index": 0,
"logprobs": null,
"message": {
"content": "I am a large language model created by Alibaba Cloud. I am called Qwen.",
"role": "assistant",
"function_call": null,
"tool_calls": null
}
}
],
"created": 1716430652,
"model": "qwen-plus",
"object": "chat.completion",
"system_fingerprint": null,
"usage": {
"completion_tokens": 18,
"prompt_tokens": 22,
"total_tokens": 40
}
}
Streaming output
from openai import OpenAI
import os
def get_response():
client = OpenAI(
# If you have not configured the environment variable, replace DASHSCOPE_API_KEY with your API key
api_key=os.getenv("DASHSCOPE_API_KEY"),
# Specify the base_url of the DashScope SDK
base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
)
completion = client.chat.completions.create(
model="qwen-plus",
messages=[{'role': 'system', 'content': 'You are a helpful assistant.'},
{'role': 'user', 'content': 'Who are you?'}],
stream=True,
# Add the following settings to display token usage in the last line of the streaming output
stream_options={"include_usage": True}
)
for chunk in completion:
print(chunk.model_dump_json())
if __name__ == '__main__':
get_response()
Sample response:
{"id":"chatcmpl-xxx","choices":[{"delta":{"content":"","function_call":null,"role":"assistant","tool_calls":null},"finish_reason":null,"index":0,"logprobs":null}],"created":1719286190,"model":"qwen-plus","object":"chat.completion.chunk","system_fingerprint":null,"usage":null}
{"id":"chatcmpl-xxx","choices":[{"delta":{"content":"I am","function_call":null,"role":null,"tool_calls":null},"finish_reason":null,"index":0,"logprobs":null}],"created":1719286190,"model":"qwen-plus","object":"chat.completion.chunk","system_fingerprint":null,"usage":null}
{"id":"chatcmpl-xxx","choices":[{"delta":{"content":"a large","function_call":null,"role":null,"tool_calls":null},"finish_reason":null,"index":0,"logprobs":null}],"created":1719286190,"model":"qwen-plus","object":"chat.completion.chunk","system_fingerprint":null,"usage":null}
{"id":"chatcmpl-xxx","choices":[{"delta":{"content":"language model","function_call":null,"role":null,"tool_calls":null},"finish_reason":null,"index":0,"logprobs":null}],"created":1719286190,"model":"qwen-plus","object":"chat.completion.chunk","system_fingerprint":null,"usage":null}
{"id":"chatcmpl-xxx","choices":[{"delta":{"content":"created by Alibaba Cloud","function_call":null,"role":null,"tool_calls":null},"finish_reason":null,"index":0,"logprobs":null}],"created":1719286190,"model":"qwen-plus","object":"chat.completion.chunk","system_fingerprint":null,"usage":null}
{"id":"chatcmpl-xxx","choices":[{"delta":{"content":". I am called Qwen.","function_call":null,"role":null,"tool_calls":null},"finish_reason":null,"index":0,"logprobs":null}],"created":1719286190,"model":"qwen-plus","object":"chat.completion.chunk","system_fingerprint":null,"usage":null}
{"id":"chatcmpl-xxx","choices":[{"delta":{"content":"","function_call":null,"role":null,"tool_calls":null},"finish_reason":"stop","index":0,"logprobs":null}],"created":1719286190,"model":"qwen-plus","object":"chat.completion.chunk","system_fingerprint":null,"usage":null}
{"id":"chatcmpl-xxx","choices":[],"created":1719286190,"model":"qwen-plus","object":"chat.completion.chunk","system_fingerprint":null,"usage":{"completion_tokens":16,"prompt_tokens":22,"total_tokens":38}}
Function calling
In the following example, OpenAI interfaces are used to implement a function call. In the example, the weather query and time query tools are called over multiple rounds.
from openai import OpenAI
from datetime import datetime
import json
import os
client = OpenAI(
# If you have not configured the environment variable, replace DASHSCOPE_API_KEY with your API key
api_key=os.getenv("DASHSCOPE_API_KEY"),
base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1", # Replace https://dashscope-intl.aliyuncs.com/compatible-mode/v1 with the base_url of the DashScope SDK
)
# Define a tool list. The model selects a tool based on the name and description of the tool
tools = [
# Tool 1: obtain the current time
{
"type": "function",
"function": {
"name": "get_current_time",
"description": "This tool can help you query the current time.",
# No input parameters are required to obtain the current time. Therefore, the parameters are set to an empty dictionary
"parameters": {}
}
},
# Tool 2: obtain the weather of a specific city
{
"type": "function",
"function": {
"name": "get_current_weather",
"description": "This tool can help you query the weather of a city.",
"parameters": {
"type": "object",
"properties": {
# Location is provided when you query the weather. Therefore, the parameter is set to location
"location": {
"type": "string",
"description": "A city, county, or district, such as Beijing, Hangzhou, or Yuhang."
}
}
},
"required": [
"location"
]
}
}
]
# Simulate the weather query tool. Sample response: "Beijing It's rainy today."
def get_current_weather(location):
return f"{location} is rainy today. "
# Simulate the time query tool. Sample response: "Current time: 2024-04-15 17:15:18."
def get_current_time():
# Obtain the current date and time
current_datetime = datetime.now()
# Format the current date and time
formatted_time = current_datetime.strftime('%Y-%m-%d %H:%M:%S')
# Return the formatted current date and time
return f"Current time: {formatted_time}."
# Encapsulate the response function of the model
def get_response(messages):
completion = client.chat.completions.create(
model="qwen-plus",
messages=messages,
tools=tools
)
return completion.model_dump()
def call_with_messages():
print('\n')
messages = [
{
"content": input('Input:'), # Sample questions: "What time is it now?" "What time will it be in an hour?" "What is the weather like in Beijing?"
"role": "user"
}
]
print("-"*60)
# Call the model in the first round
i = 1
first_response = get_response(messages)
assistant_output = first_response['choices'][0]['message']
print(f"\nOutput of the model in round {i}:{first_response}\n")
if assistant_output['content'] is None:
assistant_output['content'] = ""
messages.append(assistant_output)
# If the model need not to call the tools, a response is returned directly
if assistant_output['tool_calls'] == None: # If the model determines that the tools are not needed, the response is directly printed. The model will not be called in the second round
print(f"Without the need to call the tools, I can answer directly:{assistant_output['content']}")
return
# If the model need to call the tools, it is called for multiple rounds until the model determines that the tools is not needed
while assistant_output['tool_calls'] != None:
# If the model determines that the weather query tool is needed, run the weather query tool
if assistant_output['tool_calls'][0]['function']['name'] == 'get_current_weather':
tool_info = {"name": "get_current_weather", "role":"tool"}
# Location is provided
location = json.loads(assistant_output['tool_calls'][0]['function']['arguments'])['location']
tool_info['content'] = get_current_weather(location)
# If the model determines that the time query tool is needed, run the time query tool
elif assistant_output['tool_calls'][0]['function']['name'] == 'get_current_time':
tool_info = {"name": "get_current_time", "role":"tool"}
tool_info['content'] = get_current_time()
print(f"Tool output: {tool_info['content']}\n")
print("-"*60)
messages.append(tool_info)
assistant_output = get_response(messages)['choices'][0]['message']
if assistant_output['content'] is None:
assistant_output['content'] = ""
messages.append(assistant_output)
i += 1
print(f"Model output of round {i}:{assistant_output}\n")
print(f"Final answer:{assistant_output['content']}")
if __name__ == '__main__':
call_with_messages()
Enter What's the weather like in Singapore?
. The program returns the following response.
Request parameters
The request parameters are aligned with those of the OpenAI interface. The following table describes the parameters.
Parameter | Type | Description |
model | string | Specifies the model name. For a list of supported models, see model. |
messages | array | The conversation history between the user and the model. Each element in the array is in the format Valid values for role: system, user, and assistant. The system role is supported only in the first element of the array ( |
top_p | float | Optional. The probability threshold of nucleus sampling. For example, if this parameter is set to 0.8, the model selects the smallest set of tokens whose cumulative probability is greater than or equal to 0.8. A greater value introduces more randomness to the generated content. Valid values: (0,1.0). |
temperature | float | Optional. The randomness and diversity of the generated content. To be specific, the value of this parameter controls the probability distribution from which the model samples each word.
Valid values: [0, 2). We recommend that you do not set this parameter to 0 because 0 is meaningless. |
presence_penalty | float | Optional. The repetition of words in generated content. A greater value reduces the repetition of words in generated content. Valid values: [-2.0, 2.0]. Note This parameter is supported only by Qwen commercial models and open source Qwen Version 1.5 or later. |
max_tokens | integer | Optional. The maximum number of tokens that can be generated by the model. Different models have different upper limits. |
seed | integer | Optional. The random seed used during content generation. This parameter controls the randomness of generated content. Valid values: 64-bit unsigned integers. |
stream | boolean | Optional. Specifies whether to enable streaming output mode. In streaming output mode, the model returns a generator. You need to use an iterative loop to fetch the results from the generator and incrementally display the text. Default value: False. |
stop | string or array | Optional. If you specify a string or token ID for this parameter, the model stops generating content when the string or token is about to be generated. The value of the stop parameter can be a string or an array.
|
tools | array | Optional. The list of tools that can be called by the model. The model calls a tool from the tool list during each function call process. Each tool in the list contains the following parameters:
During a function call process, you must specify the tools parameter when you initiate a round of function call and when you return the results of a tool function to the model. Qwen-turbo, Qwen-plus, and Qwen-max support this parameter. Note If you set the stream parameter to True, the tools parameter cannot be used. |
stream_options | object | Optional. Specifies whether to display the number of tokens used in streaming output mode. This parameter takes effect only when the stream parameter is set to True. If you want to count the number of tokens used in streaming output mode, set this parameter to |
Response parameters
Parameter | Type | Description |
id | string | The request ID. |
model | string | The name of the model that is called. |
system_fingerprint | string | The configuration version of the model that is called. This parameter is not supported and an empty string ''" is returned. |
choices | array | Details of the generated content. |
choices[i].finish_reason | string | The reason why the model stops generating content. Valid values:
|
choices[i].message | object | The message returned by the model. |
choices[i].message.role | string | The role of the model. Only assistant may be returned. |
choices[i].message.content | string | The content generated by the model. |
choices[i].index | integer | The sequence number of the content. Default value: 0. |
created | integer | The timestamp when the content was generated. Unit: seconds. |
usage | object | The number of tokens that are consumed during the request. |
usage.prompt_tokens | integer | The number of tokens that are converted from the input text. |
usage.completion_tokens | integer | The number of tokens that are converted from the response generated by the model. |
usage.total_tokens | integer | The sum of usage.prompt_tokens and usage.completion_tokens. |
Use langchain_openai SDK
Prerequisites
Python is installed.
langchain_openai SDK is installed. To install langchain_openai SDK, run the following command.
# If the following command returns an error, replace pip with pip3. pip 3 is used for Python versions later than Python 3 pip install -U langchain_openai
Usage
The following examples show how to use langchain_openai SDK to access Qwen models in Model Studio.
Non-streaming output
The following example uses the invoke method to implement non-streaming output:
from langchain_openai import ChatOpenAI
import os
def get_response():
llm = ChatOpenAI(
api_key=os.getenv("DASHSCOPE_API_KEY"), # If you have not configured the environment variable, replace DASHSCOPE_API_KEY with your API key
base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1", # Replace https://dashscope-intl.aliyuncs.com/compatible-mode/v1 with the base_url of the DashScope SDK
model="qwen-plus"
)
messages = [
{"role":"system","content":"You are a helpful assistant."},
{"role":"user","content":"Who are you?"}
]
response = llm.invoke(messages)
print(response.json(ensure_ascii=False))
if __name__ == "__main__":
get_response()
Sample response:
{
"content": "I am a large language model created by Alibaba Cloud. I am called Qwen.",
"additional_kwargs": {},
"response_metadata": {
"token_usage": {
"completion_tokens": 16,
"prompt_tokens": 22,
"total_tokens": 38
},
"model_name": "qwen-plus",
"system_fingerprint": "",
"finish_reason": "stop",
"logprobs": null
},
"type": "ai",
"name": null,
"id": "run-xxx",
"example": false,
"tool_calls": [],
"invalid_tool_calls": []
}
Streaming output
The following example uses the stream method to implement streaming output. You do not need to configure the stream parameter.
from langchain_openai import ChatOpenAI
import os
def get_response():
llm = ChatOpenAI(
api_key=os.getenv("DASHSCOPE_API_KEY"),
base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
model="qwen-plus",
# Add the following settings to display token usage in the last line of the streaming output
stream_options={"include_usage": True}
)
messages = [
{"role":"system","content":"You are a helpful assistant."},
{"role":"user","content":"Who are you?"},
]
response = llm.stream(messages)
for chunk in response:
print(chunk.json(ensure_ascii=False))
if __name__ == "__main__":
get_response()
Sample response:
{"content": "", "additional_kwargs": {}, "response_metadata": {}, "type": "AIMessageChunk", "name": null, "id": "run-xxx", "example": false, "tool_calls": [], "invalid_tool_calls": [], "usage_metadata": null, "tool_call_chunks": []}
{"content": "I am", "additional_kwargs": {}, "response_metadata": {}, "type": "AIMessageChunk", "name": null, "id": "run-xxx", "example": false, "tool_calls": [], "invalid_tool_calls": [], "usage_metadata": null, "tool_call_chunks": []}
{"content": "a large", "additional_kwargs": {}, "response_metadata": {}, "type": "AIMessageChunk", "name": null, "id": "run-xxx", "example": false, "tool_calls": [], "invalid_tool_calls": [], "usage_metadata": null, "tool_call_chunks": []}
{"content": "language model", "additional_kwargs": {}, "response_metadata": {}, "type": "AIMessageChunk", "name": null, "id": "run-xxx", "example": false, "tool_calls": [], "invalid_tool_calls": [], "usage_metadata": null, "tool_call_chunks": []}
{"content": "created", "additional_kwargs": {}, "response_metadata": {}, "type": "AIMessageChunk", "name": null, "id": "run-xxx", "example": false, "tool_calls": [], "invalid_tool_calls": [], "usage_metadata": null, "tool_call_chunks": []}
{"content": "by Alibaba Cloud", "additional_kwargs": {}, "response_metadata": {}, "type": "AIMessageChunk", "name": null, "id": "run-xxx", "example": false, "tool_calls": [], "invalid_tool_calls": [], "usage_metadata": null, "tool_call_chunks": []}
{"content": ". I am", "additional_kwargs": {}, "response_metadata": {}, "type": "AIMessageChunk", "name": null, "id": "run-xxx", "example": false, "tool_calls": [], "invalid_tool_calls": [], "usage_metadata": null, "tool_call_chunks": []}
{"content": "called Qwen.", "additional_kwargs": {}, "response_metadata": {}, "type": "AIMessageChunk", "name": null, "id": "run-xxx", "example": false, "tool_calls": [], "invalid_tool_calls": [], "usage_metadata": null, "tool_call_chunks": []}
{"content": "", "additional_kwargs": {}, "response_metadata": {"finish_reason": "stop"}, "type": "AIMessageChunk", "name": null, "id": "run-xxx", "example": false, "tool_calls": [], "invalid_tool_calls": [], "usage_metadata": null, "tool_call_chunks": []}
{"content": "", "additional_kwargs": {}, "response_metadata": {}, "type": "AIMessageChunk", "name": null, "id": "run-xxx", "example": false, "tool_calls": [], "invalid_tool_calls": [], "usage_metadata": {"input_tokens": 22, "output_tokens": 16, "total_tokens": 38}, "tool_call_chunks": []}
For more information about the request parameters, see Request parameters.
Use HTTP
You can use the HTTP interface that have the same structure as those from OpenAI to obtain responses from Model Studio.
Specify endpoint
POST https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions
Sample request
The following example calls an API by using the cURL
command.
If you have not configured the environment variable, replace $DASHSCOPE_API_KEY with your API key.
Non-streaming output
curl --location 'https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions' \
--header "Authorization: Bearer $DASHSCOPE_API_KEY" \
--header 'Content-Type: application/json' \
--data '{
"model": "qwen-plus",
"messages": [
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "Who are you?"
}
]
}'
Example response:
{
"choices": [
{
"message": {
"role": "assistant",
"content": "I am a large language model created by Alibaba Cloud. I am called Qwen."
},
"finish_reason": "stop",
"index": 0,
"logprobs": null
}
],
"object": "chat.completion",
"usage": {
"prompt_tokens": 11,
"completion_tokens": 16,
"total_tokens": 27
},
"created": 1715252778,
"system_fingerprint": "",
"model": "qwen-plus",
"id": "chatcmpl-xxx"
}
Streaming output
If you want to use the streaming output mode, set the stream parameter to true in the request body.
curl --location 'https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions' \
--header "Authorization: Bearer $DASHSCOPE_API_KEY" \
--header 'Content-Type: application/json' \
--data '{
"model": "qwen-plus",
"messages": [
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "Who are you?"
}
],
"stream":true
}'
Sample response:
data: {"choices":[{"delta":{"content":"","role":"assistant"},"index":0,"logprobs":null,"finish_reason":null}],"object":"chat.completion.chunk","usage":null,"created":1715931028,"system_fingerprint":null,"model":"qwen-plus","id":"chatcmpl-3bb05cf5cd819fbca5f0b8d67a025022"}
data: {"choices":[{"finish_reason":null,"delta":{"content":"I am"},"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1715931028,"system_fingerprint":null,"model":"qwen-plus","id":"chatcmpl-3bb05cf5cd819fbca5f0b8d67a025022"}
data: {"choices":[{"delta":{"content":"a large"},"finish_reason":null,"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1715931028,"system_fingerprint":null,"model":"qwen-plus","id":"chatcmpl-3bb05cf5cd819fbca5f0b8d67a025022"}
data: {"choices":[{"delta":{"content":"language model"},"finish_reason":null,"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1715931028,"system_fingerprint":null,"model":"qwen-plus","id":"chatcmpl-3bb05cf5cd819fbca5f0b8d67a025022"}
data: {"choices":[{"delta":{"content":"created by Alibaba Cloud"},"finish_reason":null,"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1715931028,"system_fingerprint":null,"model":"qwen-plus","id":"chatcmpl-3bb05cf5cd819fbca5f0b8d67a025022"}
data: {"choices":[{"delta":{"content":". I am called Qwen."},"finish_reason":null,"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1715931028,"system_fingerprint":null,"model":"qwen-plus","id":"chatcmpl-3bb05cf5cd819fbca5f0b8d67a025022"}
data: {"choices":[{"delta":{"content":""},"finish_reason":"stop","index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1715931028,"system_fingerprint":null,"model":"qwen-plus","id":"chatcmpl-3bb05cf5cd819fbca5f0b8d67a025022"}
data: [DONE]
For more information about the request parameters, see Request parameters.
Sample error response
If the request fails, the following error code and error message are returned.
{
"error": {
"message": "Incorrect API key provided. ",
"type": "invalid_request_error",
"param": null,
"code": "invalid_api_key"
}
}
Status codes
Status code | Description |
400 - Invalid Request Error | Request error. The error message shows the details. |
401 - Incorrect API key provided | The API key is incorrect. |
429 - Rate limit reached for requests | The number of queries per second or minute exceeds the limit. |
429 - You exceeded your current quota, please check your plan and billing details | You have exceeded the quota or your payment is overdue. |
500 - The server had an error while processing your request | An error occurred on the server. |
503 - The engine is currently overloaded, please try again later | The server is overloaded. You can try again later. |