Real-time workflows allow you to access multimodal large language models (MLLMs) based on the specified specifications.
Access self-developed MLLMs that comply with OpenAPI specifications
You can integrate MLLMs that comply with OpenAPI specifications into your workflow based on the OpenAI specifications. You can request MLLMs that comply with OpenAI specifications only in streaming mode.
To integrate an MLLM into a workflow, set the Select Model parameter to Access Self-developed Model (Based on OpenAPI Specifications) and configure the following parameters in the configuration panel of the MLLM node.
Parameter
Type
Required
Description
Example
Parameter
Type
Required
Description
Example
ModelId
String
Yes
The model name. This parameter corresponds to the model field in the OpenAI specifications.
abc
API-KEY
String
Yes
The authentication information. This parameter corresponds to the api_key field in the OpenAPI specifications.
AUJH-pfnTNMPBm6iWXcJAcWsrscb5KYaLitQhHBLKrI
HTTPS URL of Destination Model
String
Yes
The service request URL. This parameter corresponds to the base_url field in the OpenAPI specifications.
http://www.abc.com
Maximum Number of Images per Call
Integer
Yes
The maximum number of images in a request to the MLLM. The maximum number of image frames that some MLLMs can receive per request is fixed. You can specify this parameter for these models. When you request an MLLM, frames are extracted from the video for sampling based on the specified value.
15
When the real-time workflow is running, the OpenAI specifications data is assembled in a POST request and used to access the HTTPS address of the self-developed model that you configure to obtain the corresponding result. The following table describes the input parameters.
Parameter
Type
Description
Example
Parameter
Type
Description
Example
messages
Array
The context of historical conversations. A maximum of 20 context records can be retained. A context record at the top of the array indicates an early question or answer.
Only the JPEG Base64-encoded data after frame extraction can be passed.
Image data in historical conversations are not delivered as context.
[ { "role": "user", "content": "What is the weather like today?" }, { "role": "assistant", "content": "It is sunny today." }, { "role": "user", "content": "What will the weather be like tomorrow?" }, { "role": "user", "content": [ { "type": "image_url", "image_url": { "url": "data:image/jpeg;base64,xxxx" } }, { "type": "text", "text": "What is this?" } ] } ]
model
String
The model name.
abc
stream
Boolean
Specifies whether to access the model in stream mode. Currently, only the streaming mode is supported.
True
extendData
Object
The supplementary information.
{'instanceId':'68e00b6640e*****3e943332fee7','channelId':'123','userData':'{"aaaa":"bbbb"}'}
instanceId
String
The instance ID.
68e00b6640e*****3e943332fee7
channelId
String
The channel ID.
123
userData
String
The value of the UserData field that is passed when the instance is started.
{"aaaa":"bbbb"}