全部產品
Search
文件中心

Alibaba Cloud Model Studio:通義千問VL

更新時間:Dec 17, 2024

通義千問VL模型可以根據您傳入的圖片來進行回答。

訪問視覺模型可以線上體驗圖片理解能力。

如何使用

您需要已擷取API Key配置API Key到環境變數。如果通過OpenAI SDK或DashScope SDK進行調用,還需要安裝最新版SDK,並確保您的DashScope Python SDK版本不低於1.20.7。

簡單樣本

OpenAI相容

您可以通過OpenAI SDK或OpenAI相容的HTTP方式調用通義千問VL模型。

Python

from openai import OpenAI
import os

client = OpenAI(
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1"
)

completion = client.chat.completions.create(
    model="qwen-vl-max", 
    messages=[
        {"role": "user", "content": [
            {"type": "image_url", "image_url": {"url": "https://dashscope.oss-cn-beijing.aliyuncs.com/images/dog_and_girl.jpeg"}},
            {"type": "text", "text": "這是什嗎?"}
        ]}
    ]
)

print(completion.choices[0].message.content)

返回結果

這是一張在海灘上拍攝的照片。照片中,一個人和一隻狗坐在沙灘上,背景是大海和天空。人和狗似乎在互動,狗的前爪搭在人的手上。陽光從畫面的右側照射過來,給整個情境增添了一種溫暖的氛圍。

Node.js

import OpenAI from "openai";

const openai = new OpenAI(
    {
        // 若沒有配置環境變數,請用百鍊API Key將下行替換為:apiKey: "sk-xxx"
        apiKey: process.env.DASHSCOPE_API_KEY,
        baseURL: "https://dashscope-intl.aliyuncs.com/compatible-mode/v1"
    }
);

async function main() {
    const response = await openai.chat.completions.create({
        model: "qwen-vl-max",
        messages: [{role: "user",content: [
            { type: "image_url",image_url: {"url": "https://dashscope.oss-cn-beijing.aliyuncs.com/images/dog_and_girl.jpeg"}},
            { type: "text", text: "這是什嗎?" }
        ]}]
    });
    console.log(response.choices[0].message.content);
}

main()

返回結果

這是一張在海灘上拍攝的照片。照片中,一位穿著格子襯衫的女性坐在沙灘上,與一隻戴著項圈的黃色拉布拉多犬互動。背景是大海和天空,陽光灑在她們身上,營造出溫暖的氛圍。

curl

curl --location 'https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions' \
--header "Authorization: Bearer $DASHSCOPE_API_KEY" \
--header 'Content-Type: application/json' \
--data '{
  "model": "qwen-vl-max",
  "messages": [{
    "role": "user",
    "content": [
      {"type": "image_url", "image_url": {"url": "https://dashscope.oss-cn-beijing.aliyuncs.com/images/dog_and_girl.jpeg"}},
      {"type": "text", "text": "這是什嗎?"}
    ]
  }]
}'

返回結果

{
  "choices": [
    {
      "message": {
        "content": "這張圖片展示了一位女士和一隻狗在海灘上互動。女士坐在沙灘上,微笑著與狗握手。背景是大海和天空,陽光灑在她們身上,營造出溫暖的氛圍。狗戴著項圈,顯得很溫順。",
        "role": "assistant"
      },
      "finish_reason": "stop",
      "index": 0,
      "logprobs": null
    }
  ],
  "object": "chat.completion",
  "usage": {
    "prompt_tokens": 1270,
    "completion_tokens": 54,
    "total_tokens": 1324
  },
  "created": 1725948561,
  "system_fingerprint": null,
  "model": "qwen-vl-max",
  "id": "chatcmpl-0fd66f46-b09e-9164-a84f-3ebbbedbac15"
}

DashScope

您可以通過DashScope SDK或HTTP方式調用通義千問VL模型。

Python

import os
import dashscope

dashscope.base_http_api_url = 'https://dashscope-intl.aliyuncs.com/api/v1'

messages = [
    {
        "role": "user",
        "content": [
            {"image": "https://dashscope.oss-cn-beijing.aliyuncs.com/images/dog_and_girl.jpeg"},
            {"text": "這是什麼?"}
        ]
    }
]

response = dashscope.MultiModalConversation.call(
    # 若沒有配置環境變數,請用百鍊API Key將下行替換為:api_key="sk-xxx"
    api_key=os.getenv('DASHSCOPE_API_KEY'),
    model='qwen-vl-max',
    messages=messages
)

print(response.output.choices[0].message.content[0]["text"])

返回結果

是一張在海灘上拍攝的照片。照片中有一位女士和一隻狗。女士坐在沙灘上,微笑著與狗互動。狗戴著項圈,似乎在與女士握手。背景是大海和天空,陽光灑在她們身上,營造出溫馨的氛圍。

Java

import java.util.Arrays;
import java.util.Collections;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversation;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationParam;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationResult;
import com.alibaba.dashscope.common.MultiModalMessage;
import com.alibaba.dashscope.common.Role;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.alibaba.dashscope.exception.UploadFileException;
import com.alibaba.dashscope.utils.JsonUtils;
import com.alibaba.dashscope.utils.Constants;
public class Main {
    static {
        Constants.baseHttpApiUrl="https://dashscope-intl.aliyuncs.com/api/v1";
    }
    public static void simpleMultiModalConversationCall()
            throws ApiException, NoApiKeyException, UploadFileException {
        MultiModalConversation conv = new MultiModalConversation();
        MultiModalMessage userMessage = MultiModalMessage.builder().role(Role.USER.getValue())
                .content(Arrays.asList(
                        Collections.singletonMap("image", "https://dashscope.oss-cn-beijing.aliyuncs.com/images/dog_and_girl.jpeg"),
                        Collections.singletonMap("text", "這是什嗎?"))).build();
        MultiModalConversationParam param = MultiModalConversationParam.builder()
                .model("qwen-vl-max")
                .message(userMessage)
                .build();
        MultiModalConversationResult result = conv.call(param);
        System.out.println(result.getOutput().getChoices().get(0).getMessage().getContent().get(0).get("text"));
    }
    public static void main(String[] args) {
        try {
            simpleMultiModalConversationCall();
        } catch (ApiException | NoApiKeyException | UploadFileException e) {
            System.out.println(e.getMessage());
        }
        System.exit(0);
    }
}

返回結果

這是一張在海灘上拍攝的照片。照片中有一個穿著格子襯衫的人和一隻戴著項圈的狗。人和狗面對面坐著,似乎在互動。背景是大海和天空,陽光灑在他們身上,營造出溫暖的氛圍。

curl

curl -X POST https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H 'Content-Type: application/json' \
-d '{
    "model": "qwen-vl-max",
    "input":{
        "messages":[
            {
                "role": "user",
                "content": [
                    {"image": "https://dashscope.oss-cn-beijing.aliyuncs.com/images/dog_and_girl.jpeg"},
                    {"text": "這是什麼?"}
                ]
            }
        ]
    }
}'

返回結果

{
  "output": {
    "choices": [
      {
        "finish_reason": "stop",
        "message": {
          "role": "assistant",
          "content": [
            {
              "text": "這是一張在海灘上拍攝的照片。照片中有一個穿著格子襯衫的人和一隻戴著項圈的狗。他們坐在沙灘上,背景是大海和天空。陽光從畫面的右側照射過來,給整個情境增添了一種溫暖的氛圍。"
            }
          ]
        }
      }
    ]
  },
  "usage": {
    "output_tokens": 55,
    "input_tokens": 1271,
    "image_tokens": 1247
  },
  "request_id": "ccf845a3-dc33-9cda-b581-20fe7dc23f70"
}

多圖片輸入

您可以在一次請求中向通義千問VL模型輸入多張圖片,傳入方法請參考以下代碼。

OpenAI相容

您可以通過OpenAI SDK或OpenAI相容的HTTP方式調用通義千問VL模型。

Python

import os
from openai import OpenAI

client = OpenAI(
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
)

completion = client.chat.completions.create(
    model="qwen-vl-max",
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "image_url",
                    "image_url": {
                        "url": "https://dashscope.oss-cn-beijing.aliyuncs.com/images/dog_and_girl.jpeg"
                    },
                },
                {
                    "type": "image_url",
                    "image_url": {
                        "url": "https://dashscope.oss-cn-beijing.aliyuncs.com/images/tiger.png"
                    },
                },
                {"type": "text", "text": "這些是什麼?"},
            ],
        }
    ],
)

print(completion.choices[0].message.content)

返回結果

圖1中是一位女士和一隻拉布拉多犬在海灘上互動的情境。女士穿著格子襯衫,坐在沙灘上,與狗進行握手的動作,背景是海浪和天空,整個畫面充滿了溫馨和愉快的氛圍。

圖2中是一隻老虎在森林中行走的情境。老虎的毛色是橙色和黑色條紋相間,它正向前邁步,周圍是茂密的樹木和植被,地面上覆蓋著落葉,整個畫面給人一種野生自然的感覺。

Node.js

import OpenAI from "openai";

const openai = new OpenAI(
    {
        // 若沒有配置環境變數,請用百鍊API Key將下行替換為:apiKey: "sk-xxx"
        apiKey: process.env.DASHSCOPE_API_KEY,
        baseURL: "https://dashscope-intl.aliyuncs.com/compatible-mode/v1"
    }
);

async function main() {
    const response = await openai.chat.completions.create({
        model: "qwen-vl-max",
        messages: [{role: "user",content: [
            { type: "image_url",image_url: {"url": "https://dashscope.oss-cn-beijing.aliyuncs.com/images/dog_and_girl.jpeg"}},
            { type: "image_url",image_url: {"url": "https://dashscope.oss-cn-beijing.aliyuncs.com/images/tiger.png"}},
            { type: "text", text: "這些是什麼?" },
        ]}]
    });
    console.log(response.choices[0].message.content);
}

main()

返回結果

第一張圖片中,一個人和一隻狗在海灘上互動。人穿著格子襯衫,狗戴著項圈,他們似乎在握手或擊掌。

第二張圖片中,一隻老虎在森林中行走。老虎的毛色是橙色和黑色條紋,背景是綠色的樹木和植被。

curl

curl -X POST https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H 'Content-Type: application/json' \
-d '{
  "model": "qwen-vl-max",
  "messages": [
    {
      "role": "user",
      "content": [
        {
          "type": "image_url",
          "image_url": {
            "url": "https://dashscope.oss-cn-beijing.aliyuncs.com/images/dog_and_girl.jpeg"
          }
        },
        {
          "type": "image_url",
          "image_url": {
            "url": "https://dashscope.oss-cn-beijing.aliyuncs.com/images/tiger.png"
          }
        },
        {
          "type": "text",
          "text": "這些是什麼?"
        }
      ]
    }
  ]
}'

返回結果

{
  "choices": [
    {
      "message": {
        "content": "圖1中是一位女士和一隻拉布拉多犬在海灘上互動的情境。女士穿著格子襯衫,坐在沙灘上,與狗進行握手的動作,背景是海景和日落的天空,整個畫面顯得非常溫馨和諧。\n\n圖2中是一隻老虎在森林中行走的情境。老虎的毛色是橙色和黑色條紋相間,它正向前邁步,周圍是茂密的樹木和植被,地面上覆蓋著落葉,整個畫面充滿了自然的野性和生機。",
        "role": "assistant"
      },
      "finish_reason": "stop",
      "index": 0,
      "logprobs": null
    }
  ],
  "object": "chat.completion",
  "usage": {
    "prompt_tokens": 2497,
    "completion_tokens": 109,
    "total_tokens": 2606
  },
  "created": 1725948561,
  "system_fingerprint": null,
  "model": "qwen-vl-max",
  "id": "chatcmpl-0fd66f46-b09e-9164-a84f-3ebbbedbac15"
}

DashScope

您可以通過DashScope SDK或HTTP方式調用通義千問VL模型。

Python

import os
import dashscope

dashscope.base_http_api_url = 'https://dashscope-intl.aliyuncs.com/api/v1'

messages = [
    {
        "role": "user",
        "content": [
            {"image": "https://dashscope.oss-cn-beijing.aliyuncs.com/images/dog_and_girl.jpeg"},
            {"image": "https://dashscope.oss-cn-beijing.aliyuncs.com/images/tiger.png"},
            {"image": "https://dashscope.oss-cn-beijing.aliyuncs.com/images/rabbit.png"},
            {"text": "這些圖是什麼?"}
        ]
    }
]

response = dashscope.MultiModalConversation.call(
    # 若沒有配置環境變數,請用百鍊API Key將下行替換為:api_key="sk-xxx"
    api_key=os.getenv('DASHSCOPE_API_KEY'),
    model='qwen-vl-max',
    messages=messages
)

print(response.output.choices[0].message.content[0]["text"])

返回結果

這些圖片展示了一些動物和自然情境。第一張圖片中,一個人和一隻狗在海灘上互動。第二張圖片是一隻老虎在森林中行走。第三張圖片是一隻卡通風格的兔子在草地上跳躍。

Java

import java.util.Arrays;
import java.util.Collections;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversation;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationParam;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationResult;
import com.alibaba.dashscope.common.MultiModalMessage;
import com.alibaba.dashscope.common.Role;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.alibaba.dashscope.exception.UploadFileException;
import com.alibaba.dashscope.utils.Constants;
public class Main {
    static {
        Constants.baseHttpApiUrl="https://dashscope-intl.aliyuncs.com/api/v1";
    }
    public static void simpleMultiModalConversationCall()
            throws ApiException, NoApiKeyException, UploadFileException {
        MultiModalConversation conv = new MultiModalConversation();
        MultiModalMessage userMessage = MultiModalMessage.builder().role(Role.USER.getValue())
                .content(Arrays.asList(
                        Collections.singletonMap("image", "https://dashscope.oss-cn-beijing.aliyuncs.com/images/dog_and_girl.jpeg"),
                        Collections.singletonMap("image", "https://dashscope.oss-cn-beijing.aliyuncs.com/images/tiger.png"),
                        Collections.singletonMap("image", "https://dashscope.oss-cn-beijing.aliyuncs.com/images/rabbit.png"),
                        Collections.singletonMap("text", "這些圖是什麼?"))).build();
        MultiModalConversationParam param = MultiModalConversationParam.builder()
                .model("qwen-vl-max")
                .message(userMessage)
                .build();
        MultiModalConversationResult result = conv.call(param);
        System.out.println(result.getOutput().getChoices().get(0).getMessage().getContent().get(0).get("text"));    }
    public static void main(String[] args) {
        try {
            simpleMultiModalConversationCall();
        } catch (ApiException | NoApiKeyException | UploadFileException e) {
            System.out.println(e.getMessage());
        }
        System.exit(0);
    }
}

返回結果

這些圖片展示了一些動物和自然情境。

1. 第一張圖片:一個女人和一隻狗在海灘上互動。女人穿著格子襯衫,坐在沙灘上,狗戴著項圈,伸出爪子與女人握手。
2. 第二張圖片:一隻老虎在森林中行走。老虎的毛色是橙色和黑色條紋,背景是樹木和樹葉。
3. 第三張圖片:一隻卡通風格的兔子在草地上跳躍。兔子是白色的,耳朵是粉紅色的,背景是藍天和黃色的花朵。

這些圖片展示了不同的動物和自然環境。

curl

curl --location 'https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation' \
--header "Authorization: Bearer $DASHSCOPE_API_KEY" \
--header 'Content-Type: application/json' \
--data '{
    "model": "qwen-vl-plus",
    "input":{
        "messages":[
            {
                "role": "user",
                "content": [
                    {"image": "https://dashscope.oss-cn-beijing.aliyuncs.com/images/dog_and_girl.jpeg"},
                    {"image": "https://dashscope.oss-cn-beijing.aliyuncs.com/images/tiger.png"},
                    {"image": "https://dashscope.oss-cn-beijing.aliyuncs.com/images/rabbit.png"},
                    {"text": "這些是什麼?"}
                ]
            }
        ]
    }
}'

返回結果

{
  "output": {
    "choices": [
      {
        "finish_reason": "stop",
        "message": {
          "role": "assistant",
          "content": [
            {
              "text": "這張圖片顯示了一位女士和她的狗在海灘上。她們似乎正在享受彼此的陪伴,狗狗坐在沙灘上伸出爪子與女士握手或互動。背景是美麗的日落景色,海浪輕輕拍打著海岸線。\n\n請注意,我提供的描述基於映像中可見的內容,並不包括任何超出視覺資訊之外的資訊。如果您需要更多關於這個情境的具體細節,請告訴我!"
            }
          ]
        }
      }
    ]
  },
  "usage": {
    "output_tokens": 81,
    "input_tokens": 1277,
    "image_tokens": 1247
  },
  "request_id": "ccf845a3-dc33-9cda-b581-20fe7dc23f70"
}

多輪對話(參考歷史對話資訊)

通義千問VL模型可以參考歷史對話資訊進行回複。您可以參考以下範例程式碼,通過OpenAI或者DashScope的方式,調用通義千問VL模型,實現多輪對話的功能。

OpenAI相容

您可以通過OpenAI相容的HTTP方式調用通義千問VL模型,體驗多輪對話的功能。

curl

curl -X POST https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H 'Content-Type: application/json' \
-d '{
  "model": "qwen-vl-max",
  "messages": [
    {
      "role": "user",
      "content": [
        {
          "type": "image_url",
          "image_url": {
            "url": "https://dashscope.oss-cn-beijing.aliyuncs.com/images/dog_and_girl.jpeg"
          }
        },
        {
          "type": "text",
          "text": "這是什麼?"
        }
      ]
    },
    {
      "role": "assistant",
      "content": [
        {
          "type": "text",
          "text": "這是一個女孩和一隻狗。"
        }
      ]
    },
    {
      "role": "user",
      "content": [
        {
          "type": "text",
          "text": "寫一首詩描述這個情境"
        }
      ]
    }
  ]
}'

返回結果

{
    "choices": [
        {
            "message": {
                "content": "海風輕拂笑顏開,  \n沙灘上與犬相陪。  \n夕陽斜照人影短,  \n歡樂時光心自醉。",
                "role": "assistant"
            },
            "finish_reason": "stop",
            "index": 0,
            "logprobs": null
        }
    ],
    "object": "chat.completion",
    "usage": {
        "prompt_tokens": 1295,
        "completion_tokens": 32,
        "total_tokens": 1327
    },
    "created": 1726324976,
    "system_fingerprint": null,
    "model": "qwen-vl-max",
    "id": "chatcmpl-3c953977-6107-96c5-9a13-c01e328b24ca"
}

DashScope

您可以通過DashScope SDK或HTTP方式調用通義千問VL模型,體驗多輪對話的功能。

Python

import os
from dashscope import MultiModalConversation
import dashscope
dashscope.base_http_api_url = 'https://dashscope-intl.aliyuncs.com/api/v1'

messages = [
    {
        "role": "user",
        "content": [
            {
                "image": "https://dashscope.oss-cn-beijing.aliyuncs.com/images/dog_and_girl.jpeg"
            },
            {"text": "這是什麼?"},
        ],
    }
]
response = MultiModalConversation.call(
    # If the environment variable is not configured, please replace the following line with: api_key="sk-xxx",
    api_key=os.getenv('DASHSCOPE_API_KEY'),
    model='qwen-vl-max',
    messages=messages
    )
print(f"模型第一輪輸出 {response.output.choices[0].message.content[0]['text']}")
messages.append(response['output']['choices'][0]['message'])
user_msg = {"role": "user", "content": [{"text": "做一首詩描述這個情境"}]}
messages.append(user_msg)
response = MultiModalConversation.call(
    # If the environment variable is not configured, please replace the following line with: api_key="sk-xxx",
    api_key=os.getenv('DASHSCOPE_API_KEY'),
    model='qwen-vl-max',
    messages=messages
    )
print(f"模型第二輪輸出 {response.output.choices[0].message.content[0]['text']}")

返回結果

模型第一輪輸出:這是一張在海灘上拍攝的照片。照片中有一個穿著格子襯衫的人和一隻戴著項圈的狗。人和狗面對面坐著,似乎在互動。背景是大海和天空,陽光灑在他們身上,營造出溫暖的氛圍。
模型第二輪輸出:在陽光照耀的海灘上,人與狗共用歡樂時光。

Java

import java.util.ArrayList;
import java.util.Arrays;
import java.util.Collections;
import java.util.List;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversation;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationParam;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationResult;
import com.alibaba.dashscope.common.MultiModalMessage;
import com.alibaba.dashscope.common.Role;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.alibaba.dashscope.exception.UploadFileException;
import com.alibaba.dashscope.utils.Constants;
public class Main {
    static {
        Constants.baseHttpApiUrl="https://dashscope-intl.aliyuncs.com/api/v1";
    }
    private static final String modelName = "qwen-vl-max";
    public static void MultiRoundConversationCall() throws ApiException, NoApiKeyException, UploadFileException {
        MultiModalConversation conv = new MultiModalConversation();
        MultiModalMessage systemMessage = MultiModalMessage.builder().role(Role.SYSTEM.getValue())
                .content(Arrays.asList(Collections.singletonMap("text", "You are a helpful assistant."))).build();
        MultiModalMessage userMessage = MultiModalMessage.builder().role(Role.USER.getValue())
                .content(Arrays.asList(Collections.singletonMap("image", "https://dashscope.oss-cn-beijing.aliyuncs.com/images/dog_and_girl.jpeg"),
                        Collections.singletonMap("text", "這是什麼?"))).build();
        List<MultiModalMessage> messages = new ArrayList<>();
        messages.add(systemMessage);
        messages.add(userMessage);
        MultiModalConversationParam param = MultiModalConversationParam.builder()
                // 若沒有配置環境變數,請用百鍊API Key將下行替換為:.apiKey("sk-xxx")
                .apiKey(System.getenv("DASHSCOPE_API_KEY"))                .model(modelName)
                .messages(messages)
                .build();
        MultiModalConversationResult result = conv.call(param);
        System.out.println("第一輪輸出: "+result.getOutput().getChoices().get(0).getMessage().getContent().get(0).get("text"));        // add the result to conversation
        messages.add(result.getOutput().getChoices().get(0).getMessage());
        MultiModalMessage msg = MultiModalMessage.builder().role(Role.USER.getValue())
                .content(Arrays.asList(Collections.singletonMap("text", "做一首詩描述這個情境"))).build();
        messages.add(msg);
        param.setMessages((List)messages);
        result = conv.call(param);
        System.out.println("第二輪輸出: "+result.getOutput().getChoices().get(0).getMessage().getContent().get(0).get("text"));    }

    public static void main(String[] args) {
        try {
            MultiRoundConversationCall();
        } catch (ApiException | NoApiKeyException | UploadFileException e) {
            System.out.println(e.getMessage());
        }
        System.exit(0);
    }
}

返回結果

第一輪輸出:這是一張在海灘上拍攝的照片。照片中有一個穿著格子襯衫的人和一隻戴著項圈的狗。人和狗面對面坐著,似乎在互動。背景是大海和天空,陽光灑在他們身上,營造出溫暖的氛圍。
第二輪輸出:在陽光灑滿的海灘上,人與狗共用歡樂時光。

curl

curl -X POST https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H 'Content-Type: application/json' \
-d '{
    "model": "qwen-vl-max",
    "input":{
        "messages":[
            {
                "role": "user",
                "content": [
                    {"image": "https://dashscope.oss-cn-beijing.aliyuncs.com/images/dog_and_girl.jpeg"},
                    {"text": "這是什麼?"}
                ]
            },
            {
                "role": "assistant",
                "content": [
                    {"text": "這是一隻狗和一隻女孩。"}
                ]
            },
            {
                "role": "user",
                "content": [
                    {"text": "寫一首七言絕句描述這個情境"}
                ]
            }
        ]
    }
}'

返回結果

{
    "output": {
        "choices": [
            {
                "finish_reason": "stop",
                "message": {
                    "role": "assistant",
                    "content": [
                        {
                            "text": "海浪輕拍沙灘邊,女孩與狗同嬉戲。陽光灑落笑顏開,快樂時光永銘記。"
                        }
                    ]
                }
            }
        ]
    },
    "usage": {
        "output_tokens": 27,
        "input_tokens": 1298,
        "image_tokens": 1247
    },
    "request_id": "bdf5ef59-c92e-92a6-9d69-a738ecee1590"
}

流式輸出

大模型並不是一次性產生最終結果,而是逐步地產生中間結果,最終結果由中間結果拼接而成。使用非流式輸出方式需要等待模型產生結束後再將產生的中間結果拼接後返回,而流式輸出可以即時地將中間結果返回,您可以在模型進行輸出的同時進行閱讀,減少等待模型回複的時間。

OpenAI相容

您可以通過OpenAI SDK或OpenAI相容的HTTP方式調用通義千問VL模型,體驗流式輸出的功能。

Python

from openai import OpenAI
import os

client = OpenAI(
    # 若沒有配置環境變數,請用百鍊API Key將下行替換為:api_key="sk-xxx"
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
)

completion = client.chat.completions.create(
    model="qwen-vl-max",
    messages=[
        {"role": "user",
        "content": [{"type": "image_url",
                    "image_url": {"url": "https://dashscope.oss-cn-beijing.aliyuncs.com/images/dog_and_girl.jpeg"},},
                    {"type": "text", "text": "這是什麼?"}]}],
    stream=True
)
full_content = ""
print("流式輸出內容為:")
for chunk in completion:
    print(chunk.model_dump_json())

返回結果

流式輸出內容為:
{"id":"chatcmpl-a6f201a2-0f82-9260-ae13-23243b78239c","choices":[{"delta":{"content":"","function_call":null,"refusal":null,"role":"assistant","tool_calls":null},"finish_reason":null,"index":0,"logprobs":null}],"created":1731683175,"model":"qwen-vl-max-latest","object":"chat.completion.chunk","service_tier":null,"system_fingerprint":null,"usage":null}
{"id":"chatcmpl-a6f201a2-0f82-9260-ae13-23243b78239c","choices":[{"delta":{"content":"這","function_call":null,"refusal":null,"role":null,"tool_calls":null},"finish_reason":null,"index":0,"logprobs":null}],"created":1731683175,"model":"qwen-vl-max-latest","object":"chat.completion.chunk","service_tier":null,"system_fingerprint":null,"usage":null}
{"id":"chatcmpl-a6f201a2-0f82-9260-ae13-23243b78239c","choices":[{"delta":{"content":"是一","function_call":null,"refusal":null,"role":null,"tool_calls":null},"finish_reason":null,"index":0,"logprobs":null}],"created":1731683175,"model":"qwen-vl-max-latest","object":"chat.completion.chunk","service_tier":null,"system_fingerprint":null,"usage":null}
{"id":"chatcmpl-a6f201a2-0f82-9260-ae13-23243b78239c","choices":[{"delta":{"content":"張","function_call":null,"refusal":null,"role":null,"tool_calls":null},"finish_reason":null,"index":0,"logprobs":null}],"created":1731683175,"model":"qwen-vl-max-latest","object":"chat.completion.chunk","service_tier":null,"system_fingerprint":null,"usage":null}
{"id":"chatcmpl-a6f201a2-0f82-9260-ae13-23243b78239c","choices":[{"delta":{"content":"在","function_call":null,"refusal":null,"role":null,"tool_calls":null},"finish_reason":null,"index":0,"logprobs":null}],"created":1731683175,"model":"qwen-vl-max-latest","object":"chat.completion.chunk","service_tier":null,"system_fingerprint":null,"usage":null}
{"id":"chatcmpl-a6f201a2-0f82-9260-ae13-23243b78239c","choices":[{"delta":{"content":"海灘上拍攝的照片","function_call":null,"refusal":null,"role":null,"tool_calls":null},"finish_reason":null,"index":0,"logprobs":null}],"created":1731683175,"model":"qwen-vl-max-latest","object":"chat.completion.chunk","service_tier":null,"system_fingerprint":null,"usage":null}
{"id":"chatcmpl-a6f201a2-0f82-9260-ae13-23243b78239c","choices":[{"delta":{"content":"。照片中,","function_call":null,"refusal":null,"role":null,"tool_calls":null},"finish_reason":null,"index":0,"logprobs":null}],"created":1731683175,"model":"qwen-vl-max-latest","object":"chat.completion.chunk","service_tier":null,"system_fingerprint":null,"usage":null}
{"id":"chatcmpl-a6f201a2-0f82-9260-ae13-23243b78239c","choices":[{"delta":{"content":"一位女士和一隻","function_call":null,"refusal":null,"role":null,"tool_calls":null},"finish_reason":null,"index":0,"logprobs":null}],"created":1731683175,"model":"qwen-vl-max-latest","object":"chat.completion.chunk","service_tier":null,"system_fingerprint":null,"usage":null}
{"id":"chatcmpl-a6f201a2-0f82-9260-ae13-23243b78239c","choices":[{"delta":{"content":"狗坐在沙灘上","function_call":null,"refusal":null,"role":null,"tool_calls":null},"finish_reason":null,"index":0,"logprobs":null}],"created":1731683175,"model":"qwen-vl-max-latest","object":"chat.completion.chunk","service_tier":null,"system_fingerprint":null,"usage":null}
{"id":"chatcmpl-a6f201a2-0f82-9260-ae13-23243b78239c","choices":[{"delta":{"content":",女士正在與","function_call":null,"refusal":null,"role":null,"tool_calls":null},"finish_reason":null,"index":0,"logprobs":null}],"created":1731683175,"model":"qwen-vl-max-latest","object":"chat.completion.chunk","service_tier":null,"system_fingerprint":null,"usage":null}
{"id":"chatcmpl-a6f201a2-0f82-9260-ae13-23243b78239c","choices":[{"delta":{"content":"狗互動,似乎","function_call":null,"refusal":null,"role":null,"tool_calls":null},"finish_reason":null,"index":0,"logprobs":null}],"created":1731683175,"model":"qwen-vl-max-latest","object":"chat.completion.chunk","service_tier":null,"system_fingerprint":null,"usage":null}
{"id":"chatcmpl-a6f201a2-0f82-9260-ae13-23243b78239c","choices":[{"delta":{"content":"在握手或擊","function_call":null,"refusal":null,"role":null,"tool_calls":null},"finish_reason":null,"index":0,"logprobs":null}],"created":1731683175,"model":"qwen-vl-max-latest","object":"chat.completion.chunk","service_tier":null,"system_fingerprint":null,"usage":null}
{"id":"chatcmpl-a6f201a2-0f82-9260-ae13-23243b78239c","choices":[{"delta":{"content":"掌。背景是","function_call":null,"refusal":null,"role":null,"tool_calls":null},"finish_reason":null,"index":0,"logprobs":null}],"created":1731683175,"model":"qwen-vl-max-latest","object":"chat.completion.chunk","service_tier":null,"system_fingerprint":null,"usage":null}
{"id":"chatcmpl-a6f201a2-0f82-9260-ae13-23243b78239c","choices":[{"delta":{"content":"大海和天空,","function_call":null,"refusal":null,"role":null,"tool_calls":null},"finish_reason":null,"index":0,"logprobs":null}],"created":1731683175,"model":"qwen-vl-max-latest","object":"chat.completion.chunk","service_tier":null,"system_fingerprint":null,"usage":null}
{"id":"chatcmpl-a6f201a2-0f82-9260-ae13-23243b78239c","choices":[{"delta":{"content":"陽光從畫面的","function_call":null,"refusal":null,"role":null,"tool_calls":null},"finish_reason":null,"index":0,"logprobs":null}],"created":1731683175,"model":"qwen-vl-max-latest","object":"chat.completion.chunk","service_tier":null,"system_fingerprint":null,"usage":null}
{"id":"chatcmpl-a6f201a2-0f82-9260-ae13-23243b78239c","choices":[{"delta":{"content":"右側照射過來,","function_call":null,"refusal":null,"role":null,"tool_calls":null},"finish_reason":null,"index":0,"logprobs":null}],"created":1731683175,"model":"qwen-vl-max-latest","object":"chat.completion.chunk","service_tier":null,"system_fingerprint":null,"usage":null}
{"id":"chatcmpl-a6f201a2-0f82-9260-ae13-23243b78239c","choices":[{"delta":{"content":"營造出溫暖的","function_call":null,"refusal":null,"role":null,"tool_calls":null},"finish_reason":null,"index":0,"logprobs":null}],"created":1731683175,"model":"qwen-vl-max-latest","object":"chat.completion.chunk","service_tier":null,"system_fingerprint":null,"usage":null}
{"id":"chatcmpl-a6f201a2-0f82-9260-ae13-23243b78239c","choices":[{"delta":{"content":"氛圍。","function_call":null,"refusal":null,"role":null,"tool_calls":null},"finish_reason":"stop","index":0,"logprobs":null}],"created":1731683175,"model":"qwen-vl-max-latest","object":"chat.completion.chunk","service_tier":null,"system_fingerprint":null,"usage":null}

Node.js

import OpenAI from "openai";

const openai = new OpenAI(
    {
        // 若沒有配置環境變數,請用百鍊API Key將下行替換為:apiKey: "sk-xxx"
        apiKey: process.env.DASHSCOPE_API_KEY,
        baseURL: "https://dashscope-intl.aliyuncs.com/compatible-mode/v1"
    }
);

const completion = await openai.chat.completions.create({
    model: "qwen-vl-max",
    messages: [
        {"role": "user",
        "content": [{"type": "image_url",
                    "image_url": {"url": "https://dashscope.oss-cn-beijing.aliyuncs.com/images/dog_and_girl.jpeg"},},
                    {"type": "text", "text": "這是什麼?"}]}],
    stream: true,
});

let fullContent = ""
console.log("流式輸出內容為:")
for await (const chunk of completion) {
    if (chunk.choices[0].delta.content != null) {
        fullContent += chunk.choices[0].delta.content;
        console.log(chunk.choices[0].delta.content);
    }
}
console.log(`Full output content: ${fullContent}`)

返回結果

{"choices":[{"delta":{"content":"","role":"assistant"},"index":0,"logprobs":null,"finish_reason":null}],"object":"chat.completion.chunk","usage":null,"created":1731942585,"system_fingerprint":null,"model":"qwen-vl-max-latest","id":"chatcmpl-9d466dd4-7cbe-9eb1-a429-c0d5a81f2674"}
{"choices":[{"finish_reason":null,"delta":{"content":"這"},"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1731942585,"ystem_fingerprint":null,"model":"qwen-vl-max-latest","id":"chatcmpl-9d466dd4-7cbe-9eb1-a429-c0d5a81f2674"}
{"choices":[{"delta":{"content":"是一"},"finish_reason":null,"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1731942585system_fingerprint":null,"model":"qwen-vl-max-latest","id":"chatcmpl-9d466dd4-7cbe-9eb1-a429-c0d5a81f2674"}
{"choices":[{"delta":{"content":"張"},"finish_reason":null,"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1731942585,"ystem_fingerprint":null,"model":"qwen-vl-max-latest","id":"chatcmpl-9d466dd4-7cbe-9eb1-a429-c0d5a81f2674"}
{"choices":[{"delta":{"content":"在"},"finish_reason":null,"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1731942585,"system_fingerprint":null,"model":"qwen-vl-max-latest","id":"chatcmpl-9d466dd4-7cbe-9eb1-a429-c0d5a81f2674"}
{"choices":[{"delta":{"content":"海灘上拍攝的照片"},"finish_reason":null,"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created2585,"system_fingerprint":null,"model":"qwen-vl-max-latest","id":"chatcmpl-9d466dd4-7cbe-9eb1-a429-c0d5a81f2674"}
{"choices":[{"delta":{"content":"。照片中,"},"finish_reason":null,"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":17315,"system_fingerprint":null,"model":"qwen-vl-max-latest","id":"chatcmpl-9d466dd4-7cbe-9eb1-a429-c0d5a81f2674"}
{"choices":[{"delta":{"content":"一位女士和一隻"},"finish_reason":null,"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":585,"system_fingerprint":null,"model":"qwen-vl-max-latest","id":"chatcmpl-9d466dd4-7cbe-9eb1-a429-c0d5a81f2674"}
{"choices":[{"delta":{"content":"狗坐在沙灘上"},"finish_reason":null,"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1785,"system_fingerprint":null,"model":"qwen-vl-max-latest","id":"chatcmpl-9d466dd4-7cbe-9eb1-a429-c0d5a81f2674"}
{"choices":[{"delta":{"content":",女士正在與"},"finish_reason":null,"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1785,"system_fingerprint":null,"model":"qwen-vl-max-latest","id":"chatcmpl-9d466dd4-7cbe-9eb1-a429-c0d5a81f2674"}
{"choices":[{"delta":{"content":"狗互動,似乎"},"finish_reason":null,"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1785,"system_fingerprint":null,"model":"qwen-vl-max-latest","id":"chatcmpl-9d466dd4-7cbe-9eb1-a429-c0d5a81f2674"}
{"choices":[{"delta":{"content":"在握手或擊"},"finish_reason":null,"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":17315,"system_fingerprint":null,"model":"qwen-vl-max-latest","id":"chatcmpl-9d466dd4-7cbe-9eb1-a429-c0d5a81f2674"}
{"choices":[{"delta":{"content":"掌。背景是"},"finish_reason":null,"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":17315,"system_fingerprint":null,"model":"qwen-vl-max-latest","id":"chatcmpl-9d466dd4-7cbe-9eb1-a429-c0d5a81f2674"}
{"choices":[{"delta":{"content":"大海和天空,"},"finish_reason":null,"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1785,"system_fingerprint":null,"model":"qwen-vl-max-latest","id":"chatcmpl-9d466dd4-7cbe-9eb1-a429-c0d5a81f2674"}
{"choices":[{"delta":{"content":"陽光從畫面的"},"finish_reason":null,"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1785,"system_fingerprint":null,"model":"qwen-vl-max-latest","id":"chatcmpl-9d466dd4-7cbe-9eb1-a429-c0d5a81f2674"}
{"choices":[{"delta":{"content":"右側照射過來,"},"finish_reason":null,"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":585,"system_fingerprint":null,"model":"qwen-vl-max-latest","id":"chatcmpl-9d466dd4-7cbe-9eb1-a429-c0d5a81f2674"}
{"choices":[{"delta":{"content":"營造出溫暖的"},"finish_reason":null,"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1785,"system_fingerprint":null,"model":"qwen-vl-max-latest","id":"chatcmpl-9d466dd4-7cbe-9eb1-a429-c0d5a81f2674"}
{"choices":[{"delta":{"content":"氛圍。"},"finish_reason":"stop","index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1731945,"system_fingerprint":null,"model":"qwen-vl-max-latest","id":"chatcmpl-9d466dd4-7cbe-9eb1-a429-c0d5a81f2674"}

curl

curl --location 'https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions' \
--header "Authorization: Bearer $DASHSCOPE_API_KEY" \
--header 'Content-Type: application/json' \
--data '{
    "model": "qwen-vl-plus",
    "messages": [
    {
      "role": "user",
      "content": [
        {
          "type": "image_url",
          "image_url": {
            "url": "https://dashscope.oss-cn-beijing.aliyuncs.com/images/dog_and_girl.jpeg"
          }
        },
        {
          "type": "text",
          "text": "這是什麼?"
        }
      ]
    }
  ],
    "stream":true,
    "stream_options":{"include_usage":true}
}'

返回結果

data: {"choices":[{"delta":{"content":"","role":"assistant"},"index":0,"logprobs":null,"finish_reason":null}],"object":"chat.completion.chunk","usage":null,"created":1721823635,"system_fingerprint":null,"model":"qwen-vl-plus","id":"chatcmpl-9a9ec75a-3109-9910-b79e-7bcbce81c8f9"}

data: {"choices":[{"finish_reason":null,"delta":{"content":"圖"},"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1721823635,"system_fingerprint":null,"model":"qwen-vl-plus","id":"chatcmpl-9a9ec75a-3109-9910-b79e-7bcbce81c8f9"}

data: {"choices":[{"delta":{"content":"中"},"finish_reason":null,"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1721823635,"system_fingerprint":null,"model":"qwen-vl-plus","id":"chatcmpl-9a9ec75a-3109-9910-b79e-7bcbce81c8f9"}

......

data: {"choices":[{"delta":{"content":"分拍攝的照片。整體氛圍顯得非常"},"finish_reason":null,"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1721823635,"system_fingerprint":null,"model":"qwen-vl-plus","id":"chatcmpl-9a9ec75a-3109-9910-b79e-7bcbce81c8f9"}

data: {"choices":[{"finish_reason":"stop","delta":{"content":"和諧而溫馨。"},"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1721823635,"system_fingerprint":null,"model":"qwen-vl-plus","id":"chatcmpl-9a9ec75a-3109-9910-b79e-7bcbce81c8f9"}

data: {"choices":[],"object":"chat.completion.chunk","usage":{"prompt_tokens":1276,"completion_tokens":85,"total_tokens":1361},"created":1721823635,"system_fingerprint":null,"model":"qwen-vl-plus","id":"chatcmpl-9a9ec75a-3109-9910-b79e-7bcbce81c8f9"}

data: [DONE]

DashScope

您可以通過DashScope SDK或HTTP方式調用通義千問VL模型,體驗流式輸出的功能。

Python

import os
from dashscope import MultiModalConversation
import dashscope
dashscope.base_http_api_url = 'https://dashscope-intl.aliyuncs.com/api/v1'

messages = [
    {
        "role": "user",
        "content": [
            {"image": "https://dashscope.oss-cn-beijing.aliyuncs.com/images/dog_and_girl.jpeg"},
            {"text": "這是什麼?"}
        ]
    }
]
responses = MultiModalConversation.call(
    # 若沒有配置環境變數,請用百鍊API Key將下行替換為:api_key="sk-xxx",
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    model='qwen-vl-max',
    messages=messages,
    stream=True,
    incremental_output=True
    )
full_content = ""
print("流式輸出內容為:")
for response in responses:
    print(response["output"])

返回結果

{"text": null, "finish_reason": null, "choices": [{"finish_reason": "null", "message": {"role": "assistant", "content": [{"text": "這"}]}}]}
{"text": null, "finish_reason": null, "choices": [{"finish_reason": "null", "message": {"role": "assistant", "content": [{"text": "是一"}]}}]}
{"text": null, "finish_reason": null, "choices": [{"finish_reason": "null", "message": {"role": "assistant", "content": [{"text": "張"}]}}]}
{"text": null, "finish_reason": null, "choices": [{"finish_reason": "null", "message": {"role": "assistant", "content": [{"text": "在"}]}}]}
{"text": null, "finish_reason": null, "choices": [{"finish_reason": "null", "message": {"role": "assistant", "content": [{"text": "海灘上拍攝的照片"}]}}]}
{"text": null, "finish_reason": null, "choices": [{"finish_reason": "null", "message": {"role": "assistant", "content": [{"text": "。照片中有一位"}]}}]}
{"text": null, "finish_reason": null, "choices": [{"finish_reason": "null", "message": {"role": "assistant", "content": [{"text": "女士和一隻狗"}]}}]}
{"text": null, "finish_reason": null, "choices": [{"finish_reason": "null", "message": {"role": "assistant", "content": [{"text": "。女士坐在沙灘"}]}}]}
{"text": null, "finish_reason": null, "choices": [{"finish_reason": "null", "message": {"role": "assistant", "content": [{"text": "上,微笑著"}]}}]}
{"text": null, "finish_reason": null, "choices": [{"finish_reason": "null", "message": {"role": "assistant", "content": [{"text": "與狗互動。"}]}}]}
{"text": null, "finish_reason": null, "choices": [{"finish_reason": "null", "message": {"role": "assistant", "content": [{"text": "狗戴著項圈"}]}}]}
{"text": null, "finish_reason": null, "choices": [{"finish_reason": "null", "message": {"role": "assistant", "content": [{"text": ",似乎在與"}]}}]}
{"text": null, "finish_reason": null, "choices": [{"finish_reason": "null", "message": {"role": "assistant", "content": [{"text": "女士握手。背景"}]}}]}
{"text": null, "finish_reason": null, "choices": [{"finish_reason": "null", "message": {"role": "assistant", "content": [{"text": "是大海和天空"}]}}]}
{"text": null, "finish_reason": null, "choices": [{"finish_reason": "null", "message": {"role": "assistant", "content": [{"text": ",陽光灑在"}]}}]}
{"text": null, "finish_reason": null, "choices": [{"finish_reason": "null", "message": {"role": "assistant", "content": [{"text": "她們身上,營造"}]}}]}
{"text": null, "finish_reason": null, "choices": [{"finish_reason": "null", "message": {"role": "assistant", "content": [{"text": "出溫馨的氛圍"}]}}]}
{"text": null, "finish_reason": null, "choices": [{"finish_reason": "stop", "message": {"role": "assistant", "content": [{"text": "。"}]}}]}

Java

import java.util.Arrays;
import java.util.HashMap;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversation;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationParam;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationResult;
import com.alibaba.dashscope.common.MultiModalMessage;
import com.alibaba.dashscope.common.Role;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.alibaba.dashscope.exception.UploadFileException;
import io.reactivex.Flowable;
import com.alibaba.dashscope.utils.Constants;

public class Main {
    static {
        Constants.baseHttpApiUrl="https://dashscope-intl.aliyuncs.com/api/v1";
    }
    public static void streamCall()
            throws ApiException, NoApiKeyException, UploadFileException {
        MultiModalConversation conv = new MultiModalConversation();
        // must create mutable map.
        MultiModalMessage userMessage = MultiModalMessage.builder().role(Role.USER.getValue())
                .content(Arrays.asList(new HashMap<String, Object>(){{put("image", "https://dashscope.oss-cn-beijing.aliyuncs.com/images/dog_and_girl.jpeg");}},
                        new HashMap<String, Object>(){{put("text", "這是什麼");}})).build();
        MultiModalConversationParam param = MultiModalConversationParam.builder()
                // 若沒有配置環境變數,請用百鍊API Key將下行替換為:.apiKey("sk-xxx")
                .apiKey(System.getenv("DASHSCOPE_API_KEY"))
                .model("qwen-vl-max")
                .message(userMessage)
                .incrementalOutput(true)
                .build();
        Flowable<MultiModalConversationResult> result = conv.streamCall(param);
        result.blockingForEach(item -> {
            try {
                System.out.println(item.getOutput().getChoices().get(0).getMessage().getContent().get(0).get("text"));
            } catch (Exception e){
                System.exit(0);
            }
        });
    }

    public static void main(String[] args) {
        try {
            streamCall();
        } catch (ApiException | NoApiKeyException | UploadFileException e) {
            System.out.println(e.getMessage());
        }
        System.exit(0);
    }
}

返回結果

{"requestId":"a750aa02-e0c9-9c10-8542-2f38bf9de326","usage":{"input_tokens":1270,"output_tokens":1},"output":{"choices":[{"finish_reason":"null","message":{"role":"assistant","content":[{"text":"這"}]}}]}}
{"requestId":"a750aa02-e0c9-9c10-8542-2f38bf9de326","usage":{"input_tokens":1270,"output_tokens":2},"output":{"choices":[{"finish_reason":"null","message":{"role":"assistant","content":[{"text":"是一"}]}}]}}
{"requestId":"a750aa02-e0c9-9c10-8542-2f38bf9de326","usage":{"input_tokens":1270,"output_tokens":3},"output":{"choices":[{"finish_reason":"null","message":{"role":"assistant","content":[{"text":"張"}]}}]}}
{"requestId":"a750aa02-e0c9-9c10-8542-2f38bf9de326","usage":{"input_tokens":1270,"output_tokens":4},"output":{"choices":[{"finish_reason":"null","message":{"role":"assistant","content":[{"text":"在"}]}}]}}
{"requestId":"a750aa02-e0c9-9c10-8542-2f38bf9de326","usage":{"input_tokens":1270,"output_tokens":8},"output":{"choices":[{"finish_reason":"null","message":{"role":"assistant","content":[{"text":"海灘上拍攝的照片"}]}}]}}
{"requestId":"a750aa02-e0c9-9c10-8542-2f38bf9de326","usage":{"input_tokens":1270,"output_tokens":12},"output":{"choices":[{"finish_reason":"null","message":{"role":"assistant","content":[{"text":"。照片中,"}]}}]}}
{"requestId":"a750aa02-e0c9-9c10-8542-2f38bf9de326","usage":{"input_tokens":1270,"output_tokens":16},"output":{"choices":[{"finish_reason":"null","message":{"role":"assistant","content":[{"text":"一位穿著格子"}]}}]}}
{"requestId":"a750aa02-e0c9-9c10-8542-2f38bf9de326","usage":{"input_tokens":1270,"output_tokens":20},"output":{"choices":[{"finish_reason":"null","message":{"role":"assistant","content":[{"text":"襯衫的女士坐在"}]}}]}}
{"requestId":"a750aa02-e0c9-9c10-8542-2f38bf9de326","usage":{"input_tokens":1270,"output_tokens":24},"output":{"choices":[{"finish_reason":"null","message":{"role":"assistant","content":[{"text":"沙灘上,與"}]}}]}}
{"requestId":"a750aa02-e0c9-9c10-8542-2f38bf9de326","usage":{"input_tokens":1270,"output_tokens":28},"output":{"choices":[{"finish_reason":"null","message":{"role":"assistant","content":[{"text":"一隻戴著項圈"}]}}]}}
{"requestId":"a750aa02-e0c9-9c10-8542-2f38bf9de326","usage":{"input_tokens":1270,"output_tokens":32},"output":{"choices":[{"finish_reason":"null","message":{"role":"assistant","content":[{"text":"的金毛犬"}]}}]}}
{"requestId":"a750aa02-e0c9-9c10-8542-2f38bf9de326","usage":{"input_tokens":1270,"output_tokens":36},"output":{"choices":[{"finish_reason":"null","message":{"role":"assistant","content":[{"text":"互動。背景是"}]}}]}}
{"requestId":"a750aa02-e0c9-9c10-8542-2f38bf9de326","usage":{"input_tokens":1270,"output_tokens":40},"output":{"choices":[{"finish_reason":"null","message":{"role":"assistant","content":[{"text":"大海和天空,"}]}}]}}
{"requestId":"a750aa02-e0c9-9c10-8542-2f38bf9de326","usage":{"input_tokens":1270,"output_tokens":44},"output":{"choices":[{"finish_reason":"null","message":{"role":"assistant","content":[{"text":"陽光灑在她們"}]}}]}}
{"requestId":"a750aa02-e0c9-9c10-8542-2f38bf9de326","usage":{"input_tokens":1270,"output_tokens":48},"output":{"choices":[{"finish_reason":"null","message":{"role":"assistant","content":[{"text":"身上,營造出"}]}}]}}
{"requestId":"a750aa02-e0c9-9c10-8542-2f38bf9de326","usage":{"input_tokens":1270,"output_tokens":52},"output":{"choices":[{"finish_reason":"null","message":{"role":"assistant","content":[{"text":"溫暖的氛圍。"}]}}]}}
{"requestId":"a750aa02-e0c9-9c10-8542-2f38bf9de326","usage":{"input_tokens":1270,"output_tokens":53},"output":{"choices":[{"finish_reason":"stop","message":{"role":"assistant","content":[]}}]}}

curl

curl -X POST https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H 'Content-Type: application/json' \
-H 'X-DashScope-SSE: enable' \
-d '{
    "model": "qwen-vl-plus",
    "input":{
        "messages":[
            {
                "role": "system",
                "content": [
                    {"text": "You are a helpful assistant."}
                ]
            },
            {
                "role": "user",
                "content": [
                    {"image": "https://dashscope.oss-cn-beijing.aliyuncs.com/images/dog_and_girl.jpeg"},
                    {"text": "這是什麼?"}
                ]
            }
        ]
    },
    "parameters": {
        "incremental_output": true
    }
}'

返回結果

iid:1
event:result
:HTTP_STATUS/200
data:{"output":{"choices":[{"message":{"content":[{"text":"這張"}],"role":"assistant"},"finish_reason":"null"}]},"usage":{"input_tokens":1276,"output_tokens":1,"image_tokens":1247},"request_id":"00917f72-d927-9344-8417-2c4088d64c16"}

id:2
event:result
:HTTP_STATUS/200
data:{"output":{"choices":[{"message":{"content":[{"text":"圖片"}],"role":"assistant"},"finish_reason":"null"}]},"usage":{"input_tokens":1276,"output_tokens":2,"image_tokens":1247},"request_id":"00917f72-d927-9344-8417-2c4088d64c16"}

......

id:17
event:result
:HTTP_STATUS/200
data:{"output":{"choices":[{"message":{"content":[{"text":"的欣賞。這是一個溫馨的畫面,展示了"}],"role":"assistant"},"finish_reason":"null"}]},"usage":{"input_tokens":1276,"output_tokens":112,"image_tokens":1247},"request_id":"00917f72-d927-9344-8417-2c4088d64c16"}

id:18
event:result
:HTTP_STATUS/200
data:{"output":{"choices":[{"message":{"content":[{"text":"人與動物之間深厚的情感紐帶。"}],"role":"assistant"},"finish_reason":"null"}]},"usage":{"input_tokens":1276,"output_tokens":120,"image_tokens":1247},"request_id":"00917f72-d927-9344-8417-2c4088d64c16"}

id:19
event:result
:HTTP_STATUS/200
data:{"output":{"choices":[{"message":{"content":[],"role":"assistant"},"finish_reason":"stop"}]},"usage":{"input_tokens":1276,"output_tokens":121,"image_tokens":1247},"request_id":"00917f72-d927-9344-8417-2c4088d64c16"}

使用本地檔案

您可以參考以下範例程式碼,通過OpenAI或者DashScope的方式,調用通義千問VL模型處理本地檔案。

OpenAI相容

您可以傳入具有BASE64格式的本地映像,使用OpenAI調用通義千問VL模型。

Python

from openai import OpenAI
import os
import base64


#  base 64 編碼格式
def encode_image(image_path):
    with open(image_path, "rb") as image_file:
        return base64.b64encode(image_file.read()).decode("utf-8")


base64_image = encode_image("test.png")
client = OpenAI(
    # 若沒有配置環境變數,請用百鍊API Key將下行替換為:api_key="sk-xxx"
    api_key=os.getenv('DASHSCOPE_API_KEY'),
    base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
)
completion = client.chat.completions.create(
    model="qwen-vl-max",
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "image_url",
                    # 使用格式化字串 (f-string) 建立一個包含 BASE64 編碼映像資料的字串。
                    "image_url": {"url": f"data:image/jpeg;base64,{base64_image}"},
                },
                {"type": "text", "text": "這是什麼?"},
            ],
        }
    ],
)
print(completion.choices[0].message.content)

返回結果

這是一隻飛翔的鷹。鷹是一種猛禽,通常具有強壯的翅膀和銳利的爪子,擅長在高空翱翔和捕獵。圖片中的鷹展翅高飛,背景是藍天白雲,顯得非常壯觀。

Node.js

import OpenAI from "openai";
import { readFileSync } from 'fs';


const openai = new OpenAI(
    {
        // 若沒有配置環境變數,請用百鍊API Key將下行替換為:apiKey: "sk-xxx"
        apiKey: process.env.DASHSCOPE_API_KEY,
        baseURL: "https://dashscope-intl.aliyuncs.com/compatible-mode/v1"
    }
);

const encodeImage = (imagePath) => {
    const imageFile = readFileSync(imagePath);
    return imageFile.toString('base64');
  };
const base64Image = encodeImage("test.png")
async function main() {
    const completion = await openai.chat.completions.create({
        model: "qwen-vl-max",
        messages: [
            {"role": "user",
            "content": [{"type": "image_url",
                        "image_url": {"url": `data:image/jpeg;base64,${base64Image}`},},
                        {"type": "text", "text": "這是什麼?"}]}]
    });
    console.log(completion.choices[0].message.content);
} 

main();

返回結果

這是一隻飛翔的鷹。鷹是一種猛禽,通常具有強壯的翅膀和銳利的爪子,能夠在高空翱翔並捕獵獵物。圖片中的鷹展翅高飛,背景是藍天白雲,顯得非常壯觀。

HTTP

import os
import base64
import requests

#  base 64 編碼格式
def encode_image(image_path):
    with open(image_path, "rb") as image_file:
        return base64.b64encode(image_file.read()).decode("utf-8")


base64_image = encode_image("test.png")
# 若沒有配置環境變數,請用百鍊API Key將下行替換為:api_key="sk-xxx"
api_key = os.getenv("DASHSCOPE_API_KEY")
headers = {"Content-Type": "application/json", "Authorization": f"Bearer {api_key}"}
payload = {
    "model": "qwen-vl-max",
    "messages": [
        {
            "role": "user",
            "content": [
                {
                    "type": "image_url",
                    "image_url": {"url": f"data:image/jpeg;base64,{base64_image}"},
                },
                {"type": "text", "text": "這是什麼?"},
            ],
        }
    ],
}
response = requests.post(
    "https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions",
    headers=headers,
    json=payload,
)
print(response.json()["choices"][0]["message"]["content"])

返回結果

這是一隻飛翔的鷹。鷹是一種猛禽,通常具有強壯的翅膀和銳利的爪子,能夠在高空翱翔並捕獵獵物。圖片中的鷹展翅高飛,背景是藍天白雲,顯得非常壯觀。

應用樣本

識別圖片裡的物體、人物、神態、動作、情境等,理解圖片含義

輸入樣本

輸出樣本

量身高

圖片裡是什嗎?

圖片裡是一位母親正在給她的孩子測量身高。母親蹲在地上,用筆在牆上標記孩子的身高。孩子站在一個黃色的身高測量尺旁邊,顯得非常開心。背景是一個明亮的房間,有一些傢具和裝飾。

範例程式碼

curl -X POST https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H 'Content-Type: application/json' \
-d '{
    "model": "qwen-vl-max",
    "input":{
        "messages":[
            {
                "role": "user",
                "content": [
                    {"image": "https://img.alicdn.com/imgextra/i1/O1CN01lMJMCv29E5O54B7y5_!!6000000008035-0-tps-1024-683.jpg"},
                    {"text": "圖片裡是什麼?"}
                ]
            }
        ]
    }
}'

1

圖片想表達什嗎?

這張圖片通過展示冰山的全貌,包括水面上和水面下的部分,傳達了“冰山一角”這個成語的含義。冰山一角通常用來形容事物的表面現象,而真正的本質或深層次的問題往往隱藏在表面之下,就像冰山的大部分體積隱藏在水下一樣。圖片可能想表達的是,人們在面對問題或事物時,不應該只看到表面,而應該深入探究其背後的真相和本質。

程式碼範例

curl -X POST https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H 'Content-Type: application/json' \
-d '{
    "model": "qwen-vl-max",
    "input":{
        "messages":[
            {
                "role": "user",
                "content": [
                    {"image": "https://img.alicdn.com/imgextra/i1/O1CN01iIByHc1z7zxtfOhuy_!!6000000006668-0-tps-2334-1558.jpg"},
                    {"text": "圖片想表達什麼?"}
                ]
            }
        ]
    }
}'

看圖做題、看圖翻譯

輸入樣本

輸出樣本

巴巴

把圖片裡的文字翻譯成中文

image

程式碼範例

curl -X POST https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H 'Content-Type: application/json' \
-d '{
    "model": "qwen-vl-max",
    "input":{
        "messages":[
            {
                "role": "user",
                "content": [
                    {"image": "https://img.alicdn.com/imgextra/i2/O1CN01tLcqfB1ghHmWEifx5_!!6000000004173-0-tps-1332-406.jpg"},
                    {"text": "把圖片裡的文字翻譯成中文?"}
                ]
            }
        ]
    }
}'

根據圖片產生前端代碼

輸入樣本

輸出樣本

image

提示詞:轉換成HTML、CSS和基於JavaScript的介面。

大模型輸出內容

要將這個介面轉換成HTML、CSS和JavaScript,你可以按照以下步驟進行:

<!DOCTYPE html>
<html lang="zh-CN">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>阿里雲百鍊</title>
    <link rel="stylesheet" href="styles.css">
</head>
<body>
    <div class="container">
        <h1>你好,歡迎使用阿里雲百鍊</h1>
        <div class="search-bar">
            <div class="selected-model">
                已選擇
                <div class="model-selector">
                    <img src="model-icon.png" alt="Model Icon">
                    通義千問-Max
                    <span class="dropdown-icon">▼</span>
                </div>
            </div>
            <input type="text" placeholder="輸入你的問題開啟模型體驗吧" class="search-input">
            <button class="search-button">發送</button>
        </div>
        <div class="features">
            <div class="feature feature-1">
                <h2>模型調用</h2>
                <p>結合業務需求,選擇適合的大模型,支援API/SDK的調用方式</p>
                <img src="feature-1-icon.png" alt="Feature 1 Icon">
                <div class="feature-buttons">
                    <a href="#" class="button">立即調用</a>
                    <a href="#" class="button">查看教程</a>
                </div>
            </div>
            <div class="feature feature-2">
                <h2>應用建立</h2>
                <p>基於Assistant API,支援調用Prompt、外掛程式、RAG等多種能力,完成大模型應用的建立。</p>
                <img src="feature-2-icon.png" alt="Feature 2 Icon">
                <div class="feature-buttons">
                    <a href="#" class="button">立即建立</a>
                    <a href="#" class="button">查看教程</a>
                </div>
            </div>
        </div>
    </div>
    <script src="script.js"></script>
</body>
</html>
body {
    font-family: Arial, sans-serif;
    background: linear-gradient(to bottom, #f8f9fa, #e9ecef);
    color: #343a40;
}

.container {
    max-width: 1200px;
    margin: 0 auto;
    padding: 20px;
}

h1 {
    text-align: center;
    margin-bottom: 30px;
}

.search-bar {
    display: flex;
    align-items: center;
    margin-bottom: 30px;
}

.selected-model {
    display: flex;
    align-items: center;
    margin-right: 10px;
}

.model-selector {
    display: flex;
    align-items: center;
    border: 1px solid #ced4da;
    border-radius: 5px;
    padding: 5px 10px;
    cursor: pointer;
}

.model-selector img {
    width: 20px;
    height: 20px;
    margin-right: 5px;
}

.dropdown-icon {
    margin-left: 5px;
}

.search-input {
    flex: 1;
    padding: 10px;
    border: 1px solid #ced4da;
    border-radius: 5px;
    margin-right: 10px;
}

.search-button {
    padding: 10px 20px;
    background-color: #6c757d;
    color: #fff;
    border: none;
    border-radius: 5px;
    cursor: pointer;
}

.features {
    display: flex;
    justify-content: space-between;
}

.feature {
    flex: 1;
    padding: 20px;
    border-radius: 5px;
    margin: 10px;
}

.feature-1 {
    background-color: #5a5aff;
    color: #fff;
}

.feature-2 {
    background-color: #48dbfb;
    color: #fff;
}

.feature h2 {
    margin-bottom: 10px;
}

.feature p {
    margin-bottom: 20px;
}

.feature img {
    width: 100px;
    height: 100px;
    margin: 0 auto;
}

.feature-buttons {
    display: flex;
    justify-content: space-between;
}

.button {
    padding: 5px 10px;
    border: none;
    border-radius: 5px;
    color: #fff;
    text-decoration: none;
}

.button:hover {
    background-color: #495057;
}
document.addEventListener('DOMContentLoaded', function() {
    const modelSelector = document.querySelector('.model-selector');
    const dropdownIcon = document.querySelector('.dropdown-icon');

    modelSelector.addEventListener('click', function() {
        dropdownIcon.classList.toggle('open');
    });
});

這個HTML、CSS和JavaScript代碼將建立一個類似於圖片中顯示的介面。你可以根據需要調整樣式和功能。

基於大模型提供的代碼,產生前端介面如下:

image

支援的圖片

圖片格式

Content Type

副檔名

BMP

image/bmp

.bmp

DIB

image/bmp

.dib

ICNS

image/icns

.icns

ICO

image/x-icon

.ico

JPEG

image/jpeg

.jfif, .jpe, .jpeg, .jpg

JPEG2000

image/jp2

.j2c, .j2k, .jp2, .jpc, .jpf, .jpx

PNG

image/png

.apng, .png

SGI

image/sgi

.bw, .rgb, .rgba, .sgi

TIFF

image/tiff

.tif, .tiff

WEBP

image/webp

.webp

對於輸入的圖片有以下限制:

  • 圖片檔案大小不超過10MB。

  • 輸入qwen-vl-maxqwen-vl-max-latestqwen-vl-max-0809qwen-vl-plus-latestqwen-vl-plus-0809qwen2-vl-7b-instruct模型的單張圖片,總的像素數不超過 12M,可以支援標準的 4K 圖片;輸入qwen-vl-max-0201qwen-vl-plus模型的單張圖片,總的像素數不超過 1048576,相當於一張寬高均為 1024 的圖片總像素數。

常見問題

我可以刪除已上傳的圖片嗎?

答:在模型完成文本產生後,百鍊伺服器會自動將圖片刪除,無需手動刪除。

通義千問VL是否支援理解視頻內容?

答:目前Qwen-VL模型不可使用視訊內容。

API參考

關於通義千問VL模型的輸入輸出參數,請參見通義千問

錯誤碼

如果模型調用失敗並返回報錯資訊,請參見錯誤碼進行解決。