Deploy and call multimodal large language models through EAS for image and text processing.
Overview
Multimodal Large Language Models (MLLMs) process text, images, and audio simultaneously, integrating different data types for complex contexts and tasks. EAS enables one-click MLLM deployment in 5 minutes.
Prerequisites
-
Activate PAI and create a default workspace. See Activate PAI and create a default workspace.
-
If using a RAM user to deploy a model, grant the RAM user management permissions for EAS. See Cloud product dependencies and authorization: EAS.
Deploy a model service
-
Log on to the PAI console. Select a region on the top of the page. Then, select the desired workspace and click Elastic Algorithm Service (EAS).
-
Click Deploy Service. In the Custom Model Deployment section, click Custom Deployment.
-
On the Custom Deployment page, configure the following parameters. For other parameters, see Parameters for custom deployment in the console.
Parameter
Description
Environment Information
Deployment Method
Select Image-based Deployment and Enable Web App.
Image Configuration
Select Alibaba Cloud Image > chat-mllm-webui > chat-mllm-webui:1.0.
NoteSelect the latest image version.
Command
After selecting an image, this parameter is auto-configured. Modify the model_type parameter to deploy different models. See the supported model types.
Resource Information
Deployment Resources
Select a GPU type. The ml.gu7i.c16m60.1-gu30 instance type is the most cost-effective.
-
Click Deploy.
Call a service
Use WebUI for model inference
-
On the Elastic Algorithm Service (EAS) page, click the service name, click Web application in the upper-right corner, and follow the instructions to open WebUI.
-
On the WebUI page, perform model inference.

Use API for model inference
-
Obtain the endpoint and token.
-
On the Elastic Algorithm Service (EAS) page, click the service name. In the Basic Information section, click View Endpoint Information.
-
In the Invocation Information pane, obtain the token and endpoint.
-
-
Call APIs for model inference.
Available APIs:
Get inference result
Obtain inference result.
NoteWebUI and API calls cannot be used simultaneously. If the WebUI was already used, run
clear chat historybefore runninginfer forward.Replace the following parameters in the sample code:
Parameter
Description
hosts
Endpoint obtained in Step 1.
authorization
Service token obtained in Step 1.
prompt
Question content. English is recommended.
image_path
Local image path.
Python code example for model inference:
import requests import json import base64 def post_get_history(url='http://127.0.0.1:7860', headers=None): r = requests.post(f'{url}/get_history', headers=headers, timeout=1500) data = r.content.decode('utf-8') return data def post_infer(prompt, image=None, chat_history=[], temperature=0.2, top_p=0.7, max_output_tokens=512, use_stream = True, url='http://127.0.0.1:7860', headers={}): datas = { "prompt": prompt, "image": image, "chat_history": chat_history, "temperature": temperature, "top_p": top_p, "max_output_tokens": max_output_tokens, "use_stream": use_stream, } if use_stream: headers.update({'Accept': 'text/event-stream'}) response = requests.post(f'{url}/infer_forward', json=datas, headers=headers, stream=True, timeout=1500) if response.status_code != 200: print(f"Request failed with status code {response.status_code}") return process_stream(response) else: r = requests.post(f'{url}/infer_forward', json=datas, headers=headers, timeout=1500) data = r.content.decode('utf-8') print(data) def image_to_base64(image_path): """ Convert an image file to a Base64 encoded string. :param image_path: The file path to the image. :return: A Base64 encoded string representation of the image. """ with open(image_path, "rb") as image_file: # Read the binary data of the image image_data = image_file.read() # Encode the binary data to Base64 base64_encoded_data = base64.b64encode(image_data) # Convert bytes to string and remove any trailing newline characters base64_string = base64_encoded_data.decode('utf-8').replace('\n', '') return base64_string def process_stream(response, previous_text=""): MARK_RESPONSE_END = '##END' # DONOT CHANGE buffer = previous_text current_response = "" for chunk in response.iter_content(chunk_size=100): if chunk: text = chunk.decode('utf-8') current_response += text parts = current_response.split(MARK_RESPONSE_END) for part in parts[:-1]: new_part = part[len(previous_text):] if new_part: print(new_part, end='', flush=True) previous_text = part current_response = parts[-1] remaining_new_text = current_response[len(previous_text):] if remaining_new_text: print(remaining_new_text, end='', flush=True) if __name__ == '__main__': # Replace <service_url> with the service endpoint. hosts = '<service_url>' # Replace <token> with the service token. head = { 'Authorization': '<token>' } # get chat history chat_history = json.loads(post_get_history(url=hosts, headers=head))['chat_history'] # The content of the question. A question in English is recommended. prompt = 'Please describe the image' # Replace path_to_your_image with the local path of the image. image_path = 'path_to_your_image' image_base_64 = image_to_base64(image_path) post_infer(prompt = prompt, image = image_base_64, chat_history = chat_history, use_stream=False, url=hosts, headers=head)Get chat history
Obtain chat history.
-
Replace the following parameters in the sample code:
Parameter
Description
hosts
Service endpoint obtained in Step 1.
authorization
Service token obtained in Step 1.
-
No input parameters required.
-
Output parameters:
Parameter
Type
Note
chat_history
List[List]
Conversation history.
Python code example for model inference:
import requests import json def post_get_history(url='http://127.0.0.1:7860', headers=None): r = requests.post(f'{url}/get_history', headers=headers, timeout=1500) data = r.content.decode('utf-8') return data if __name__ == '__main__': # Replace <service_url> with the service URL hosts = '<service_url>' # Replace <token> with the service token head = { 'Authorization': '<token>' } chat_history = json.loads(post_get_history(url=hosts, headers=head))['chat_history'] print(chat_history)Clear chat history
Clear chat history.
-
Replace the following parameters in the sample code:
Parameter
Description
hosts
Endpoint obtained in Step 1.
authorization
Service token obtained in Step 1.
-
No input parameters required.
-
Returns: success.
Python code example for model inference:
import requests import json def post_clear_history(url='http://127.0.0.1:7860', headers=None): r = requests.post(f'{url}/clear_history', headers=headers, timeout=1500) data = r.content.decode('utf-8') return data if __name__ == '__main__': # Replace <service_url> with the service endpoint. hosts = '<service_url>' # Replace <token> with the service token. head = { 'Authorization': '<token>' } clear_info = post_clear_history(url=hosts, headers=head) print(clear_info) -