All Products
Search
Document Center

Alibaba Cloud Model Studio:OpenAI compatible - Chat

Last Updated:Feb 25, 2026

Alibaba Cloud Model Studio's Qwen models support OpenAI-compatible interfaces. You can use your existing OpenAI code with Model Studio by changing only the API key, base_url, and model name.

Required information

base_url

The base_url is the service endpoint for the model service. When using the OpenAI-compatible interface to access Alibaba Cloud Model Studio, you must configure the base_url.

  • When using the OpenAI SDK or other OpenAI-compatible SDKs, set the base_url as follows:

    Singapore: https://dashscope-intl.aliyuncs.com/compatible-mode/v1
    US (Virginia): https://dashscope-us.aliyuncs.com/compatible-mode/v1
    China (Beijing): https://dashscope.aliyuncs.com/compatible-mode/v1
  • When making HTTP requests, use the full access endpoint as follows:

    Singapore: POST https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions
    US (Virginia): POST https://dashscope-us.aliyuncs.com/compatible-mode/v1/chat/completions
    China (Beijing): POST https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions

Supported models

The following table lists the Qwen series models currently supported by the OpenAI-compatible interface.

Global

  • Commercial

    • Qwen-Max series: qwen3-max, qwen3-max-preview, qwen3-max-2025-09-23 and later snapshot models

    • Qwen-Plus series: qwen-plus, qwen-plus-latest, qwen-plus-2025-01-25 and later snapshot models

    • Qwen-Flash series: qwen-flash, qwen-flash-2025-07-28 and later snapshot models

  • Open source

    • qwen3-next-80b-a3b-thinking, qwen3-next-80b-a3b-instruct, qwen3-235b-a22b-thinking-2507, qwen3-235b-a22b-instruct-2507, qwen3-30b-a3b-thinking-2507, qwen3-30b-a3b-instruct-2507, qwen3-235b-a22b, qwen3-32b, qwen3-30b-a3b, qwen3-14b, qwen3-8b

International

  • Business

    • Qwen-Max series: qwen3-max, qwen3-max-preview, qwen3-max-2025-09-23 and later snapshot models; qwen-max, qwen-max-latest, qwen-max-2025-01-25 and later snapshot models

    • Qwen-Plus series: qwen3.5-plus, qwen3.5-plus-2026-02-15 and later snapshot models; qwen-plus, qwen-plus-latest, qwen-plus-2025-01-25 and later snapshot models

    • Qwen-Flash series: qwen3.5-flash, qwen3.5-flash-2026-02-23 and later snapshot models; qwen-flash, qwen-flash-2025-07-28

    • Qwen-Turbo series: qwen-turbo, qwen-turbo-latest, qwen-turbo-2024-11-01 and later snapshot models

    • Qwen-Coder series: qwen3-coder-plus, qwen3-coder-plus-2025-07-22 and later snapshot models; qwen3-coder-flash, qwen3-coder-flash-2025-07-28 and later snapshot models

    • QwQ series: qwq-plus

  • Open source

    • qwen3.5-397b-a17b, qwen3.5-120b-a10b, qwen3.5-27b, qwen3.5-35b-a3b

    • qwen3-next-80b-a3b-thinking, qwen3-next-80b-a3b-instruct, qwen3-235b-a22b-thinking-2507, qwen3-235b-a22b-instruct-2507, qwen3-30b-a3b-thinking-2507, qwen3-30b-a3b-instruct-2507, qwen3-235b-a22b, qwen3-32b, qwen3-30b-a3b, qwen3-14b, qwen3-8b, qwen3-4b, qwen3-1.7b, qwen3-0.6b

    • qwen2.5-14b-instruct-1m, qwen2.5-7b-instruct-1m, qwen2.5-72b-instruct, qwen2.5-32b-instruct, qwen2.5-14b-instruct, qwen2.5-7b-instruct

US

  • Commercial

    • Qwen-Plus series: qwen-plus-us, qwen-plus-2025-12-01-us and later snapshot models

    • Qwen-Flash series: qwen-flash-us, qwen-flash-2025-07-28-us

Chinese Mainland

  • Commercial

    • Qwen-Max series: qwen3-max, qwen3-max-preview, qwen3-max-2025-09-23 and later snapshot models; qwen-max, qwen-max-latest, qwen-max-2024-09-19 and later snapshot models

    • Qwen Plus series: qwen3.5-plus, qwen3.5-plus-2026-02-15, qwen3.5-flash, qwen3.5-flash-2026-02-23, and later (snapshot) models; qwen-plus, qwen-plus-latest, qwen-plus-2024-12-20, and later (snapshot) models

    • Qwen-Flash series: qwen-flash, qwen-flash-2025-07-28 and later snapshot models

    • Qwen-Turbo series: qwen-turbo, qwen-turbo-latest, qwen-turbo-2025-04-28 and later snapshot models

    • Qwen-Coder series: qwen3-coder-plus, qwen3-coder-plus-2025-07-22 and later snapshot models; qwen3-coder-flash, qwen3-coder-flash-2025-07-28 and later snapshot models; qwen-coder-plus, qwen-coder-plus-latest, qwen-coder-plus-2024-11-06; qwen-coder-turbo, qwen-coder-turbo-latest, qwen-coder-turbo-2024-09-19

    • QwQ series: qwq-plus, qwq-plus-latest, qwq-plus-2025-03-05

    • Qwen-Math models: qwen-math-plus, qwen-math-plus-latest, qwen-math-plus-2024-08-16 and later snapshot models; qwen-math-turbo, qwen-math-turbo-latest, qwen-math-turbo-2024-09-19

  • Open source

    • qwen3.5-397b-a17b, qwen3.5-120b-a10b, qwen3.5-27b, qwen3.5-35b-a3b

    • qwen3-next-80b-a3b-thinking, qwen3-next-80b-a3b-instruct, qwen3-235b-a22b-thinking-2507, qwen3-235b-a22b-instruct-2507, qwen3-30b-a3b-thinking-2507, qwen3-30b-a3b-instruct-2507, qwen3-235b-a22b, qwen3-32b, qwen3-30b-a3b, qwen3-14b, qwen3-8b, qwen3-4b, qwen3-1.7b, qwen3-0.6b

    • qwen2.5-14b-instruct-1m, qwen2.5-7b-instruct-1m, qwen2.5-72b-instruct, qwen2.5-32b-instruct, qwen2.5-14b-instruct, qwen2.5-7b-instruct, qwen2.5-3b-instruct, qwen2.5-1.5b-instruct, qwen2.5-0.5b-instruct

Use OpenAI SDK

Prerequisites

  • A Python environment must be installed on your computer.

  • Install the latest version of the OpenAI SDK.

    # If the following command fails, replace pip with pip3
    pip install -U openai
  • Activate Alibaba Cloud Model Studio and get an API key. For more information, see Get an API key.

  • We recommend configuring the API key as an environment variable to reduce the risk of leaks. For more information, see Configure an API key as an environment variable. You can also configure the API key in your code, but this increases the risk of leaks.

  • Select a model to use. For more information, see Supported models.

Usage

The following examples show how to access Qwen models in Model Studio using the OpenAI SDK.

Non-streaming call example

from openai import OpenAI
import os

def get_response():
    client = OpenAI(
        api_key=os.getenv("DASHSCOPE_API_KEY"),  # If you have not configured the environment variable, replace this line with your Alibaba Cloud Model Studio API key: api_key="sk-xxx"
        # The following is the base_url for the Singapore region.
        base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",  
    )
    completion = client.chat.completions.create(
        model="qwen-plus",  # This example uses qwen-plus. You can change the model name as needed. For a list of models, see https://www.alibabacloud.com/help/en/model-studio/getting-started/models
        messages=[{'role': 'system', 'content': 'You are a helpful assistant.'},
                  {'role': 'user', 'content': 'Who are you?'}]
        )
    print(completion.model_dump_json())

if __name__ == '__main__':
    get_response()

Run the code to get the following result:

{
    "id": "chatcmpl-xxx",
    "choices": [
        {
            "finish_reason": "stop",
            "index": 0,
            "logprobs": null,
            "message": {
                "content": "I am a large-scale pre-trained model from Alibaba Cloud. My name is Qwen.",
                "role": "assistant",
                "function_call": null,
                "tool_calls": null
            }
        }
    ],
    "created": 1716430652,
    "model": "qwen-plus",
    "object": "chat.completion",
    "system_fingerprint": null,
    "usage": {
        "completion_tokens": 18,
        "prompt_tokens": 22,
        "total_tokens": 40
    }
}

Streaming call example

from openai import OpenAI
import os


def get_response():
    client = OpenAI(
        # If you have not configured the environment variable, replace this line with your Alibaba Cloud Model Studio API key: api_key="sk-xxx"
        api_key=os.getenv("DASHSCOPE_API_KEY"),
        # The following is the base_url for the Singapore region.
        base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
        
    )
    completion = client.chat.completions.create(
        model="qwen-plus",  # This example uses qwen-plus. You can change the model name as needed. For a list of models, see https://www.alibabacloud.com/help/en/model-studio/getting-started/models
        messages=[{'role': 'system', 'content': 'You are a helpful assistant.'},
                  {'role': 'user', 'content': 'Who are you?'}],
        stream=True,
        # The following setting displays token usage information in the last line of the streaming output.
        stream_options={"include_usage": True}
        )
    for chunk in completion:
        print(chunk.model_dump_json())


if __name__ == '__main__':
    get_response()

Run the code to get the following result:

{"id":"chatcmpl-xxx","choices":[{"delta":{"content":"","function_call":null,"role":"assistant","tool_calls":null},"finish_reason":null,"index":0,"logprobs":null}],"created":1719286190,"model":"qwen-plus","object":"chat.completion.chunk","system_fingerprint":null,"usage":null}
{"id":"chatcmpl-xxx","choices":[{"delta":{"content":"I am","function_call":null,"role":null,"tool_calls":null},"finish_reason":null,"index":0,"logprobs":null}],"created":1719286190,"model":"qwen-plus","object":"chat.completion.chunk","system_fingerprint":null,"usage":null}
{"id":"chatcmpl-xxx","choices":[{"delta":{"content":" a large","function_call":null,"role":null,"tool_calls":null},"finish_reason":null,"index":0,"logprobs":null}],"created":1719286190,"model":"qwen-plus","object":"chat.completion.chunk","system_fingerprint":null,"usage":null}
{"id":"chatcmpl-xxx","choices":[{"delta":{"content":" language","function_call":null,"role":null,"tool_calls":null},"finish_reason":null,"index":0,"logprobs":null}],"created":1719286190,"model":"qwen-plus","object":"chat.completion.chunk","system_fingerprint":null,"usage":null}
{"id":"chatcmpl-xxx","choices":[{"delta":{"content":" model from Alibaba Cloud","function_call":null,"role":null,"tool_calls":null},"finish_reason":null,"index":0,"logprobs":null}],"created":1719286190,"model":"qwen-plus","object":"chat.completion.chunk","system_fingerprint":null,"usage":null}
{"id":"chatcmpl-xxx","choices":[{"delta":{"content":", and my name is Qwen.","function_call":null,"role":null,"tool_calls":null},"finish_reason":null,"index":0,"logprobs":null}],"created":1719286190,"model":"qwen-plus","object":"chat.completion.chunk","system_fingerprint":null,"usage":null}
{"id":"chatcmpl-xxx","choices":[{"delta":{"content":"","function_call":null,"role":null,"tool_calls":null},"finish_reason":"stop","index":0,"logprobs":null}],"created":1719286190,"model":"qwen-plus","object":"chat.completion.chunk","system_fingerprint":null,"usage":null}
{"id":"chatcmpl-xxx","choices":[],"created":1719286190,"model":"qwen-plus","object":"chat.completion.chunk","system_fingerprint":null,"usage":{"completion_tokens":16,"prompt_tokens":22,"total_tokens":38}}

Function call example

This example shows how to perform function calls using weather and time query tools through the OpenAI-compatible interface. The sample code supports multiple rounds of tool calls.

from openai import OpenAI
from datetime import datetime
import json
import os

client = OpenAI(
    # If you have not configured the environment variable, replace the following line with your Alibaba Cloud Model Studio API key: api_key="sk-xxx",
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    # The following is the base_url for the Singapore region.
    base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",  
)

# Define the list of tools. The model refers to the name and description of the tools when selecting which one to use.
tools = [
    # Tool 1: Get the current time.
    {
        "type": "function",
        "function": {
            "name": "get_current_time",
            "description": "Useful when you want to know the current time.",
            # Because getting the current time requires no input parameters, parameters is an empty dictionary.
            "parameters": {}
        }
    },  
    # Tool 2: Get the weather for a specified city.
    {
        "type": "function",
        "function": {
            "name": "get_current_weather",
            "description": "Useful when you want to query the weather for a specified city.",
            "parameters": {  
                "type": "object",
                "properties": {
                    # A location must be provided to query the weather, so the parameter is set to location.
                    "location": {
                        "type": "string",
                        "description": "A city or district, such as Beijing, Hangzhou, or Yuhang."
                    }
                }
            },
            "required": [
                "location"
            ]
        }
    }
]

# Impersonate a weather query tool. Example result: "It is rainy in Beijing today."
def get_current_weather(location):
    return f"It is rainy in {location} today. "

# Tool to query the current time. Example result: "Current time: 2024-04-15 17:15:18."
def get_current_time():
    # Get the current date and time.
    current_datetime = datetime.now()
    # Format the current date and time.
    formatted_time = current_datetime.strftime('%Y-%m-%d %H:%M:%S')
    # Return the formatted current time.
    return f"Current time: {formatted_time}."

# Encapsulate the model response function.
def get_response(messages):
    completion = client.chat.completions.create(
        model="qwen-plus",  # This example uses qwen-plus. You can change the model name as needed. For a list of models, see https://www.alibabacloud.com/help/en/model-studio/getting-started/models
        messages=messages,
        tools=tools
        )
    return completion.model_dump()

def call_with_messages():
    print('\n')
    messages = [
            {
                "content": input('Please enter: '),  # Example questions: "What time is it now?" "What time will it be in an hour?" "What is the weather like in Beijing?"
                "role": "user"
            }
    ]
    print("-"*60)
    # First round of model call.
    i = 1
    first_response = get_response(messages)
    assistant_output = first_response['choices'][0]['message']
    print(f"\nLLM output in round {i}: {first_response}\n")
    if  assistant_output['content'] is None:
        assistant_output['content'] = ""
    messages.append(assistant_output)
    # If no tool call is needed, return the final answer directly.
    if assistant_output['tool_calls'] == None:  # If the model determines that no tool call is needed, print the assistant's reply directly without a second model call.
        print(f"No tool call is needed. I can reply directly: {assistant_output['content']}")
        return
    # If a tool call is needed, perform multiple rounds of model calls until the model determines that no tool call is needed.
    while assistant_output['tool_calls'] != None:
        # If the model determines that the weather query tool needs to be called, run the weather query tool.
        if assistant_output['tool_calls'][0]['function']['name'] == 'get_current_weather':
            tool_info = {"name": "get_current_weather", "role":"tool"}
            # Fetch the location parameter information.
            location = json.loads(assistant_output['tool_calls'][0]['function']['arguments'])['location']
            tool_info['content'] = get_current_weather(location)
        # If the model determines that the time query tool needs to be called, run the time query tool.
        elif assistant_output['tool_calls'][0]['function']['name'] == 'get_current_time':
            tool_info = {"name": "get_current_time", "role":"tool"}
            tool_info['content'] = get_current_time()
        print(f"Tool output: {tool_info['content']}\n")
        print("-"*60)
        messages.append(tool_info)
        assistant_output = get_response(messages)['choices'][0]['message']
        if  assistant_output['content'] is None:
            assistant_output['content'] = ""
        messages.append(assistant_output)
        i += 1
        print(f"LLM output in round {i}: {assistant_output}\n")
    print(f"Final answer: {assistant_output['content']}")

if __name__ == '__main__':
    call_with_messages()

When you enter What is the weather in Hangzhou and Beijing? What time is it now?, the program produces the following output:

2024-06-26_10-04-56 (1).gif

Input parameters

Input parameters are aligned with those of the OpenAI interface. Currently supported parameters include:

Parameter

Type

Default

Description

model

string

-

Specifies the model to use. For a list of available models, see Supported models.

messages

array

-

The conversation history between the user and the model. Each element in the array is in the format {"role": role, "content": content}. The available roles are system, user, and assistant. The system role is supported only in messages[0]. Typically, user and assistant roles must alternate, and the role of the last element in messages must be user.

top_p (optional)

float

-

The probability threshold for nucleus sampling during generation. For example, a value of 0.8 means that only the smallest set of most likely tokens with a cumulative probability of 0.8 or higher are considered. The value must be in the range of (0, 1.0). A higher value increases randomness, and a lower value increases determinism.

temperature (optional)

float

-

Controls the randomness and diversity of the model's responses. Specifically, the temperature value controls the degree of smoothing applied to the probability distribution of candidate words during text generation. A higher temperature value reduces the peak of the probability distribution, allowing more low-probability words to be selected and making the output more diverse. A lower temperature value enhances the peak of the probability distribution, making high-probability words more likely to be selected and the output more deterministic.

The value must be in the range of [0, 2). We do not recommend setting this parameter to 0.

presence_penalty

(optional)

float

-

Controls the repetition of tokens in the entire generated sequence. A higher value reduces repetition. The value must be in the range of [-2.0, 2.0].

Note

This parameter is supported only for commercial Qwen models and open source models from qwen1.5 onwards.

n (optional)

integer

1

The number of responses to generate. The value must be in the range of 1-4. For scenarios that require multiple responses, such as creative writing or ad copy, you can set a larger value for n.

Setting a larger value for n does not increase input token consumption but does increase output token consumption.
Currently, this is supported only for the qwen-plus model and is fixed to 1 when the tools parameter is passed.

max_tokens (optional)

integer

-

Specifies the maximum number of tokens that the model can generate. For example, if the model's maximum output length is 2,000 tokens, you can set this to 1,000 to prevent excessively long outputs.

Different models have different output limits. For more information, see the model list.

seed (optional)

integer

-

The random number seed used for generation. It controls the randomness of the model's output. The seed must be a 64-bit unsigned integer.

stream (optional)

boolean

False

Controls whether to use streaming output. When stream is enabled, the API returns a generator. You must iterate through the generator to get the results. Each output is an incremental sequence of the currently generated text.

stop (optional)

string or array

None

The stop parameter provides precise control over the content generation process by automatically stopping when the model is about to generate a specified string or token ID. The stop parameter can be a string or an array.

  • string type

    The model stops when it is about to generate the specified stop word.

    For example, if you set stop to "Hello", the model stops when it is about to generate "Hello".

  • array type

    The elements in the array can be token IDs, strings, or an array of token IDs. The model stops when the token it is about to generate or its corresponding token ID is in the stop array. The following are examples of the stop parameter as an array (the tokenizer corresponds to the qwen-turbo model):

    1. Elements are token IDs:

    The token IDs 108386 and 104307 correspond to the tokens "Hello" and "weather" respectively. If you set stop to [108386,104307], the model stops when it is about to generate "Hello" or "weather".

    2. Elements are strings:

    If you set stop to ["Hello","weather"], the model stops when it is about to generate "Hello" or "weather".

    3. Elements are arrays:

    The token IDs 108386 and 103924 correspond to the tokens "Hello" and "ah" respectively. The token IDs 35946 and 101243 correspond to the tokens "I" and "am fine" respectively. If you set stop to [[108386, 103924],[35946, 101243]], the model stops when it is about to generate "Hello ah" or "I am fine".

    Note

    When the stop parameter is an array, its elements must be of the same type: either all strings or all token IDs. For example, you cannot specify stop as ["Hello", 104307].

tools (optional)

array

None

Specifies a library of tools that the model can call. The model selects one tool from the library for each function call. Each tool in the tools array has the following structure:

  • type: A string that indicates the type of tool. Currently, only function is supported.

  • function: An object that includes the name, description, and parameters keys:

    • name: A string that indicates the name of the tool function. It must contain only letters, digits, underscores, and hyphens, with a maximum length of 64 characters.

    • description: A string that describes the tool function. The model uses this description to decide when and how to call the function.

    • parameters: An object that describes the tool's parameters. It must be a valid JSON Schema. For more information about JSON Schema, see this link. If the parameters parameter is empty, the function has no input parameters.

In a function call flow, you must set the tools parameter both for the round that initiates the function call and for the round that submits the execution result of the tool function to the model. This parameter is currently supported by the qwen-turbo, qwen-plus, and qwen-max models.

Note

The tools parameter cannot be used with stream=True at the same time.

stream_options (optional)

object

None

This parameter configures whether to display the number of tokens used during streaming output. It is active only when stream is set to True. To count tokens in streaming mode, set this parameter to stream_options={"include_usage":True}.

Output parameters

Response parameters

Data type

Description

Notes

id

string

The system-generated ID for the call.

None

model

string

The name of the model used for the call.

None

system_fingerprint

string

The configuration version used by the model at runtime. This is not currently supported and returns an empty string "".

None

choices

array

Details of the content generated by the model.

None

choices[i].finish_reason

string

The following three cases apply:

  • null: Generation is in progress.

  • stop: Generation stopped because a stop condition in the input parameters was triggered.

  • length: Generation stopped because the output reached the maximum length.

choices[i].message

object

The message output by the model.

choices[i].message.role

string

The role of the model. This is fixed to assistant.

choices[i].message.content

string

The text generated by the model.

choices[i].index

integer

The sequence number of the generated result. The default is 0.

created

integer

The UNIX timestamp (in seconds) when the result was generated.

None

usage

object

Billing information, which indicates the token consumption for the request.

None

usage.prompt_tokens

integer

The number of tokens in the input text.

None

usage.completion_tokens

integer

The number of tokens in the generated response.

None

usage.total_tokens

integer

The sum of usage.prompt_tokens and usage.completion_tokens.

None

Use langchain_openai SDK

Prerequisites

  • A Python environment must be installed on your computer.

  • Run the following command to install the langchain_openai SDK.

    # If the following command fails, replace pip with pip3
    pip install -U langchain_openai
  • Activate Alibaba Cloud Model Studio and get an API key. For more information, see Get an API key.

  • We recommend configuring the API key as an environment variable to reduce the risk of leaks. For more information, see Configure an API key as an environment variable. You can also configure the API key in your code, but this increases the risk of leaks.

  • Select a model to use. For more information, see Supported models.

Usage

The following examples show how to access Qwen models in Alibaba Cloud Model Studio using the langchain_openai SDK.

Non-streaming output

Use the invoke method for non-streaming output. Refer to the following sample code:

from langchain_openai import ChatOpenAI
import os

def get_response():
    llm = ChatOpenAI(
        api_key=os.getenv("DASHSCOPE_API_KEY"),  # If you have not configured the environment variable, replace this line with your Alibaba Cloud Model Studio API key: api_key="sk-xxx"
        base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1", # This is the base_url for the Singapore region.
        model="qwen-plus"  # This example uses qwen-plus. You can change the model name as needed. For a list of models, see https://www.alibabacloud.com/help/en/model-studio/getting-started/models
        )
    messages = [
        {"role":"system","content":"You are a helpful assistant."},
        {"role":"user","content":"Who are you?"}
    ]
    response = llm.invoke(messages)
    print(response.json())

if __name__ == "__main__":
    get_response()

Run the code to get the following result:

{
    "content": "I am a large language model from Alibaba Cloud. My name is Qwen.",
    "additional_kwargs": {},
    "response_metadata": {
        "token_usage": {
            "completion_tokens": 16,
            "prompt_tokens": 22,
            "total_tokens": 38
        },
        "model_name": "qwen-plus",
        "system_fingerprint": "",
        "finish_reason": "stop",
        "logprobs": null
    },
    "type": "ai",
    "name": null,
    "id": "run-xxx",
    "example": false,
    "tool_calls": [],
    "invalid_tool_calls": []
}

Streaming output

Use the stream method for streaming output. You do not need to set the stream parameter.

from langchain_openai import ChatOpenAI
import os

def get_response():
    llm = ChatOpenAI(
        api_key=os.getenv("DASHSCOPE_API_KEY"),  # If you have not configured the environment variable, replace this line with your Alibaba Cloud Model Studio API key: api_key="sk-xxx"
        base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",   # This is the base_url for the Singapore region.
        model="qwen-plus",   # This example uses qwen-plus. You can change the model name as needed. For a list of models, see https://www.alibabacloud.com/help/en/model-studio/getting-started/models
        stream_usage=True
        )
    messages = [
        {"role":"system","content":"You are a helpful assistant."}, 
        {"role":"user","content":"Who are you?"},
    ]
    response = llm.stream(messages)
    for chunk in response:
        print(chunk.model_dump_json())

if __name__ == "__main__":
    get_response()

Run the code to get the following result:

{"content": "", "additional_kwargs": {}, "response_metadata": {}, "type": "AIMessageChunk", "name": null, "id": "run-xxx", "example": false, "tool_calls": [], "invalid_tool_calls": [], "usage_metadata": null, "tool_call_chunks": []}
{"content": "I am", "additional_kwargs": {}, "response_metadata": {}, "type": "AIMessageChunk", "name": null, "id": "run-xxx", "example": false, "tool_calls": [], "invalid_tool_calls": [], "usage_metadata": null, "tool_call_chunks": []}
{"content": " from", "additional_kwargs": {}, "response_metadata": {}, "type": "AIMessageChunk", "name": null, "id": "run-xxx", "example": false, "tool_calls": [], "invalid_tool_calls": [], "usage_metadata": null, "tool_call_chunks": []}
{"content": " Alibaba", "additional_kwargs": {}, "response_metadata": {}, "type": "AIMessageChunk", "name": null, "id": "run-xxx", "example": false, "tool_calls": [], "invalid_tool_calls": [], "usage_metadata": null, "tool_call_chunks": []}
{"content": " Cloud", "additional_kwargs": {}, "response_metadata": {}, "type": "AIMessageChunk", "name": null, "id": "run-xxx", "example": false, "tool_calls": [], "invalid_tool_calls": [], "usage_metadata": null, "tool_call_chunks": []}
{"content": "'s large language model", "additional_kwargs": {}, "response_metadata": {}, "type": "AIMessageChunk", "name": null, "id": "run-xxx", "example": false, "tool_calls": [], "invalid_tool_calls": [], "usage_metadata": null, "tool_call_chunks": []}
{"content": ", and my name is Qwen", "additional_kwargs": {}, "response_metadata": {}, "type": "AIMessageChunk", "name": null, "id": "run-xxx", "example": false, "tool_calls": [], "invalid_tool_calls": [], "usage_metadata": null, "tool_call_chunks": []}
{"content": ".", "additional_kwargs": {}, "response_metadata": {}, "type": "AIMessageChunk", "name": null, "id": "run-xxx", "example": false, "tool_calls": [], "invalid_tool_calls": [], "usage_metadata": null, "tool_call_chunks": []}
{"content": "", "additional_kwargs": {}, "response_metadata": {"finish_reason": "stop"}, "type": "AIMessageChunk", "name": null, "id": "run-xxx", "example": false, "tool_calls": [], "invalid_tool_calls": [], "usage_metadata": null, "tool_call_chunks": []}
{"content": "", "additional_kwargs": {}, "response_metadata": {}, "type": "AIMessageChunk", "name": null, "id": "run-xxx", "example": false, "tool_calls": [], "invalid_tool_calls": [], "usage_metadata": {"input_tokens": 22, "output_tokens": 16, "total_tokens": 38}, "tool_call_chunks": []}

For information about input parameter configuration, see Input parameters. The relevant parameters are defined in the ChatOpenAI object.

Use HTTP interface

You can call Alibaba Cloud Model Studio through the HTTP interface to get responses with the same structure as those from the OpenAI service.

Prerequisites

  • Activate Alibaba Cloud Model Studio and get an API key. For more information, see Get an API key.

  • We recommend configuring the API key as an environment variable to reduce the risk of leaks. For more information, see Configure an API key as an environment variable. You can also configure the API key in your code, but this increases the risk of leaks.

Submit an API call

Singapore: POST https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions
US (Virginia): POST https://dashscope-us.aliyuncs.com/compatible-mode/v1/chat/completions
China (Beijing): POST https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions

Request example

The following example shows a script that uses the cURL command to call the API.

Note

If you have not configured the API key as an environment variable, replace $DASHSCOPE_API_KEY with your actual API key.

Non-streaming output

# This is the base_url for the Singapore region.
curl --location 'https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions' \
--header "Authorization: Bearer $DASHSCOPE_API_KEY" \
--header 'Content-Type: application/json' \
--data '{
    "model": "qwen-plus",
    "messages": [
        {
            "role": "system",
            "content": "You are a helpful assistant."
        },
        {
            "role": "user", 
            "content": "Who are you?"
        }
    ]
}'

Run the command to get the following result:

{
    "choices": [
        {
            "message": {
                "role": "assistant",
                "content": "I am a large language model from Alibaba Cloud. My name is Qwen."
            },
            "finish_reason": "stop",
            "index": 0,
            "logprobs": null
        }
    ],
    "object": "chat.completion",
    "usage": {
        "prompt_tokens": 11,
        "completion_tokens": 16,
        "total_tokens": 27
    },
    "created": 1715252778,
    "system_fingerprint": "",
    "model": "qwen-plus",
    "id": "chatcmpl-xxx"
}

Streaming output

To enable streaming output, set the stream parameter to true in the request body.

# This is the base_url for the Singapore region.
curl --location 'https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions' \
--header "Authorization: Bearer $DASHSCOPE_API_KEY" \
--header 'Content-Type: application/json' \
--data '{
    "model": "qwen-plus",
    "messages": [
        {
            "role": "system",
            "content": "You are a helpful assistant."
        },
        {
            "role": "user", 
            "content": "Who are you?"
        }
    ],
    "stream":true
}'

Run the command to get the following result:

data: {"choices":[{"delta":{"content":"","role":"assistant"},"index":0,"logprobs":null,"finish_reason":null}],"object":"chat.completion.chunk","usage":null,"created":1715931028,"system_fingerprint":null,"model":"qwen-plus","id":"chatcmpl-3bb05cf5cd819fbca5f0b8d67a025022"}

data: {"choices":[{"finish_reason":null,"delta":{"content":"I am"},"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1715931028,"system_fingerprint":null,"model":"qwen-plus","id":"chatcmpl-3bb05cf5cd819fbca5f0b8d67a025022"}

data: {"choices":[{"delta":{"content":" a large"},"finish_reason":null,"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1715931028,"system_fingerprint":null,"model":"qwen-plus","id":"chatcmpl-3bb05cf5cd819fbca5f0b8d67a025022"}

data: {"choices":[{"delta":{"content":" language"},"finish_reason":null,"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1715931028,"system_fingerprint":null,"model":"qwen-plus","id":"chatcmpl-3bb05cf5cd819fbca5f0b8d67a025022"}

data: {"choices":[{"delta":{"content":" model from Alibaba Cloud"},"finish_reason":null,"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1715931028,"system_fingerprint":null,"model":"qwen-plus","id":"chatcmpl-3bb05cf5cd819fbca5f0b8d67a025022"}

data: {"choices":[{"delta":{"content":", and my name is Qwen."},"finish_reason":null,"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1715931028,"system_fingerprint":null,"model":"qwen-plus","id":"chatcmpl-3bb05cf5cd819fbca5f0b8d67a025022"}

data: {"choices":[{"delta":{"content":""},"finish_reason":"stop","index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1715931028,"system_fingerprint":null,"model":"qwen-plus","id":"chatcmpl-3bb05cf5cd819fbca5f0b8d67a025022"}

data: [DONE]

For more information about input parameters, see Input parameters.

Error response example

If an error occurs during a request, the response includes a code and a message that indicate the cause.

{
    "error": {
        "message": "Incorrect API key provided. ",
        "type": "invalid_request_error",
        "param": null,
        "code": "invalid_api_key"
    }
}

Status codes

Error code

Description

400 — Invalid request error

The request is invalid. For more information, see the error message.

401 — Incorrect API key provided

The API key is incorrect.

429 — Rate limit reached for requests

The rate limit, such as queries per second (QPS) or queries per minute (QPM), is exceeded.

429 — You exceeded your current quota, please check your plan and billing details

Your quota is exceeded or you have an overdue payment.

500 — The server had an error while processing your request

A server-side error occurred.

503 — The engine is currently overloaded, please try again later

The server is overloaded. You can retry the request.