通過EAS一鍵部署MLLM多模態大語言模型應用 - Platform For AI

多模態大語言模型（Multimodal Large Language Model, MLLM）能夠同時處理多種模態的資料，將文本、映像、音頻等不同類型的資訊進行融合，從而更全面地理解複雜的情境和任務。適用於需要跨模態理解與產生的情境。通過EAS，您可以在5分鐘內一鍵部署MLLM推理服務應用，獲得大模型的推理能力。本文為您介紹如何通過EAS一鍵部署和調用MLLM推理服務。

背景資訊

近年來，各類大語言模型（LLM）在語言任務中達到了前所未有的效果，不僅擅長產生自然語言文本，還在情感分析、機器翻譯和文本摘要等多任務中展現了強大的能力。然而，這些模型局限在文本資料，難以處理其他形式的資訊如映像、音頻或視頻，只有擁有多模態理解，模型才能更加接近人類的超級大腦。

因此，多模態大語言模型（Multimodal Large Language Model, MLLM）引發了研究熱潮，隨著GPT-4o等大模型在業界的廣泛應用，MLLM成為當前熱門的應用之一。這種新型的大語言模型能夠同時處理多種模態的資料，將文本、映像、音頻等不同類型的資訊進行融合，從而更全面地理解複雜的情境和任務。

當您需要自動化部署MLLM時，EAS為您提供了一鍵式解決方案。通過EAS，您可以在5分鐘內一鍵部署流行的MLLM推理服務應用，獲得大模型的推理能力。

前提條件

已開通PAI並建立預設工作空間，詳情請參見開通PAI並建立預設工作空間。
如果使用RAM使用者來部署模型，需要為RAM使用者授予EAS的系統管理權限，詳情請參見雲產品依賴與授權：EAS。

部署EAS服務

登入PAI控制台，在頁面上方選擇目標地區，並在右側選擇目標工作空間，然後單擊進入EAS。
單擊部署服務，然後在自訂模型部署地區，單擊自訂部署。

在自訂部署頁面，配置以下關鍵參數，其他參數配置說明，請參見服務部署：控制台。

參數		描述
環境資訊	部署方式	選擇鏡像部署，並選中開啟Web應用。
	鏡像配置	在官方鏡像列表中選擇chat-mllm-webui>chat-mllm-webui:1.0。說明由於版本迭代迅速，部署時鏡像版本選擇最高版本即可。
	運行命令	選擇鏡像後，系統會自動設定運行命令。您可以通過修改model_type來支援部署不同的模型，支援的模型列表如下表所示。
資源部署	部署資源	選擇GPU類型的資源規格，推薦使用ml.gu7i.c16m60.1-gu30（性價比最高）。

模型列表

model_type	模型連結
qwen_vl_chat	qwen/Qwen-VL-Chat
qwen_vl_chat_int4	qwen/Qwen-VL-Chat-Int4
qwen_vl	qwen/Qwen-VL
glm4v_9b_chat	ZhipuAI/glm-4v-9b
llava1_5-7b-instruct	swift/llava-1___5-7b-hf
llava1_5-13b-instruct	swift/llava-1___5-13b-hf
internvl_chat_v1_5_int8	AI-ModelScope/InternVL-Chat-V1-5-int8
internvl-chat-v1_5	AI-ModelScope/InternVL-Chat-V1-5
mini-internvl-chat-2b-v1_5	OpenGVLab/Mini-InternVL-Chat-2B-V1-5
mini-internvl-chat-4b-v1_5	OpenGVLab/Mini-InternVL-Chat-4B-V1-5
internvl2-2b	OpenGVLab/InternVL2-2B
internvl2-4b	OpenGVLab/InternVL2-4B
internvl2-8b	OpenGVLab/InternVL2-8B
internvl2-26b	OpenGVLab/InternVL2-26B
internvl2-40b	OpenGVLab/InternVL2-40B

參數配置完成後，單擊部署。

調用服務

啟動WebUI進行模型推理

單擊目標服務的服務方式列下的查看Web應用。
在WebUI頁面，進行模型推理驗證。

使用API進行模型推理

擷取服務訪問地址和Token。
1. 進入模型線上服務（EAS）頁面，詳情請參見背景資訊。
2. 在該頁面中，單擊目標服務名稱進入服務詳情頁面。
3. 在基本資料地區單擊查看調用資訊，在公網地址調用頁簽擷取服務Token和訪問地址。

使用API進行模型推理。

PAI提供了以下三個API介面：

infer forward

獲得推理結果。範例程式碼如下，以Python為例：

import requests
import json
import base64


def post_get_history(url='http://127.0.0.1:7860', headers=None):
    r = requests.post(f'{url}/get_history', headers=headers, timeout=1500)
    data = r.content.decode('utf-8')
    return data


def post_infer(prompt, image=None, chat_history=[], temperature=0.2, top_p=0.7, max_output_tokens=512, use_stream = True, url='http://127.0.0.1:7860', headers={}):
    datas = {
        "prompt": prompt,
        "image": image,
        "chat_history": chat_history,
        "temperature": temperature,
        "top_p": top_p,
        "max_output_tokens": max_output_tokens,
        "use_stream": use_stream,
    }

    if use_stream:
        headers.update({'Accept': 'text/event-stream'})

        response = requests.post(f'{url}/infer_forward', json=datas, headers=headers, stream=True, timeout=1500)

        if response.status_code != 200:
            print(f"Request failed with status code {response.status_code}")
            return
        process_stream(response)

    else:
        r = requests.post(f'{url}/infer_forward', json=datas, headers=headers, timeout=1500)
        data = r.content.decode('utf-8')

        print(data)


def image_to_base64(image_path):
    """
    Convert an image file to a Base64 encoded string.

    :param image_path: The file path to the image.
    :return: A Base64 encoded string representation of the image.
    """
    with open(image_path, "rb") as image_file:
        # Read the binary data of the image
        image_data = image_file.read()
        # Encode the binary data to Base64
        base64_encoded_data = base64.b64encode(image_data)
        # Convert bytes to string and remove any trailing newline characters
        base64_string = base64_encoded_data.decode('utf-8').replace('\n', '')
    return base64_string


def process_stream(response, previous_text=""):
    MARK_RESPONSE_END = '##END'  # DONOT CHANGE
    buffer = previous_text
    current_response = ""

    for chunk in response.iter_content(chunk_size=100):
        if chunk:
            text = chunk.decode('utf-8')
            current_response += text

            parts = current_response.split(MARK_RESPONSE_END)
            for part in parts[:-1]:
                new_part = part[len(previous_text):]
                if new_part:
                    print(new_part, end='', flush=True)

                previous_text = part

            current_response = parts[-1]

    remaining_new_text = current_response[len(previous_text):]
    if remaining_new_text:
        print(remaining_new_text, end='', flush=True)


if __name__ == '__main__':
    hosts = 'xxx'
    head = {
        'Authorization': 'xxx'
    }

    # get chat history
    chat_history = json.loads(post_get_history(url=hosts, headers=head))['chat_history']

    prompt = 'Please describe the image'
    image_path = 'path_to_your_image'
    image_base_64 = image_to_base64(image_path)

    post_infer(prompt = prompt, image = image_base_64, chat_history = chat_history, use_stream=False, url=hosts, headers=head)

其中：

關鍵參數配置說明如下：

參數	描述
hosts	配置為步驟1中擷取的服務訪問地址。
authorization	配置為步驟1中擷取的服務Token。
prompt	提問內容，建議使用英文描述。
image_path	圖片所在的本地路徑。

輸入的請求參數列表如下：

參數	類型	說明	預設值
prompt	String	提問內容。	無，必須提供
image	Base64編碼格式	輸入圖片。	None
chat_history	List[List]	聊天歷史。	[]
temperature	Float	用於調節模型輸出結果的隨機性，值越大隨機性越強，0值為固定輸出。區間為0~1。	0.2
top_p	Float	從產生結果中按百分比選擇輸出結果。	0.7
max_output_tokens	Int	產生輸出Token的最大長度，單位為個。	512
use_stream	Bool	是否使用流式輸出： True False	True

輸出為問答的結果（字串）。

get chat history

擷取歷史聊天記錄。範例程式碼如下，以Python為例：

import requests
import json

def post_get_history(url='http://127.0.0.1:7860', headers=None):
    r = requests.post(f'{url}/get_history', headers=headers, timeout=1500)
    data = r.content.decode('utf-8')
    return data


if __name__ == '__main__':
    hosts = 'xxx'
    head = {
        'Authorization': 'xxx'
    }

    chat_history = json.loads(post_get_history(url=hosts, headers=head))['chat_history']
    print(chat_history)

其中：

關鍵參數配置說明如下：
參數
描述
hosts
配置為步驟1已擷取的服務訪問地址。
authorization
配置為步驟1已擷取的服務Token。
無需輸入參數。
輸出參數列表如下：
參數
類型
說明
chat_history
List[List]
對話歷史。

clear chat history

清空歷史聊天記錄。範例程式碼如下，以Python為例：

import requests
import json


def post_clear_history(url='http://127.0.0.1:7860', headers=None):
    r = requests.post(f'{url}/clear_history', headers=headers, timeout=1500)
    data = r.content.decode('utf-8')
    return data


if __name__ == '__main__':
    hosts = 'xxx'
    head = {
        'Authorization': 'xxx'
    }

    clear_info = post_clear_history(url=hosts, headers=head)
    print(clear_info)

其中：

關鍵參數配置說明如下：
參數
描述
hosts
配置為步驟1中擷取的服務訪問地址。
authorization
配置為步驟1中擷取的服務Token。
無需輸入參數。
返回結果為success字串。