All Products
Search
Document Center

Intelligent Media Services:API references

Last Updated:Nov 07, 2025

This topic describes the APIs available in AlCallKit SDK for Web.

Overview

Note

Earlier versions of the SDK contain deprecated parameters and methods. We recommend that you update the SDK to the latest version. For more information, see Web guide.

Class/Protocol

API

Description

ARTCAICallEngine

An engine instance

call

Starts a call.

handup

Ends a call.

Note

We recommend that you call this operation to end the call before exiting the page. Failing to do so may cause the agent to remain active for approximately 90 seconds before exiting, potentially exceeding the concurrency limit of the avatar agent.

setAgentView

Configures the rendering view for the agent.

setLocalView

Sets the local video view.

interruptSpeaking

Interrupts the agent's speech.

enableVoiceInterrupt

Enables or disables intelligent interruption.

muteLocalCamera

Enables or disables camera push.

switchCamera

Switches between the front and rear cameras.

switchVoiceId

Changes the voice.

mute

Mutes or unmutes the microphone.

muteAgentAudioPlaying

Mutes or unmutes the agent's audio output.

startPushToTalk

Starts speaking in push-to-talk mode.

finishPushToTalk

Finishes speaking in push-to-talk mode.

cancelPushToTalk

Cancels speaking in push-to-talk mode.

enablePushToTalk

Enables or disables push-to-talk mode.

getRTCInstance

Queries the information about the real-time communication (RTC) engine instance.

sendTextToAgent

Sends a text message to the agent.

sendCustomMessageToServer

Sends a custom message to the AppServer.

updateLlmSystemPrompt

Updates the system prompt for the large language model (LLM).

startVisionCustomCapture

Starts custom frame capture.

stopVisionCustomCapture

Ends custom frame capture.

destroy

Releases resources.

ARTCAICallEngine

Callback events of the engine instance

errorOccurred

An error occurred.

callBegin

A call has started.

callEnd

A call has ended.

agentStateChanged

The agent's status has changed.

speakingVolumeChanged

The volume has changed.

userSubtitleNotify

The agent recognizes the user's question.

agentSubtitleNotify

The agent returns an answer.

voiceIdChanged

The agent's voice has changed.

pushToTalkChanged

The push-to-talk mode has changed.

agentWillLeave

The agent is about to end the current call.

receivedAgentCustomMessage

A custom message has been received from the agent.

voiceInterruptChanged

The status of voice interruption has changed.

humanTakeoverWillStart

A human agent is stepping in to take over from the current agent.

humanTakeoverConnected

A human agent has taken over from the current agent.

agentDataChannelAvailable

The agent's data channel is ready for use.

Details

ARTCAICallEngine details

call

Starts a call.

async call(userId: string, agentInfo: AICallAgentInfo, config?: AICallEngineConfig): Promise<void>>

Parameter

Type

Description

userId

String

The UID of the current user.

agentInfo

AICallAgentInfo

The agent information.

config

AICallEngineConfig

The initialization configuration. Example:

{
  // Specifies whether to mute the microphone.
  muteMicrophone?: boolean;
  // Specifies whether to disable the camera in visual understanding mode.
  muteCamera?: boolean;
  // Specifies whether to enable push-to-talk mode.
  enablePushToTalk?: boolean;
  // The video preview element in visual understanding mode.
  previewElement?: string | HTMLVideoElement;
  // The camera settings in visual understanding mode.
  cameraConfig?: AICallCameraConfig;
}

handup

Ends a call.

async handup(): Promise<void>

setAgentView

Configures the rendering view for the agent.

setAgentView(view: HTMLVideoElement | string): void

Parameter

Type

Description

view

HTMLVideoElement | string

The video element or its ID.

setLocalView

Sets the local video view.

setLocalView(view?: HTMLVideoElement | string): void

Parameter

Type

Description

view

HTMLVideoElement | string

The video element or its ID. If it is left empty, preview is disabled.

interruptSpeaking

Interrupts the agent's speech.

async interruptSpeaking(): Promise<void>

enableVoiceInterrupt

Enables or disables intelligent interruption.

async enableVoiceInterrupt(enable: boolean): Promise<void>

Parameter

Type

Description

enable

boolean

Specifies whether to enable intelligent interruption.

muteLocalCamera

Enables or disables camera push.

async muteLocalCamera(mute: boolean)

Parameter

Type

Description

mute

boolean

Specifies whether to enable the camera.

switchCamera

Switches between the front and rear cameras.

async switchCamera(deviceId?: string)

Parameter

Type

Description

deviceId

string

The device ID. You can query the device ID by using ARTCAICallEngine.getCameraList(). If you do not specify this parameter, a camera switch is performed on the mobile phone.

switchVoiceId

Changes the voice.

async switchVoiceId(voiceId: string): Promise<void>

Parameter

Type

Description

voiceId

string

The voice ID.

mute

Mutes or unmutes the microphone.

async mute(mute: boolean): Promise<void>

Parameter

Type

Description

mute

boolean

Specifies whether to mute the microphone.

muteAgentAudioPlaying

Mutes or unmutes the agent's audio output.

muteAgentAudioPlaying(mute: boolean)

Parameter

Type

Description

mute

boolean

Specifies whether to mute the agent's audio output.

startPushToTalk

Starts speaking in push-to-talk mode.

startPushToTalk() :boolean;

finishPushToTalk

Finishes speaking in push-to-talk mode.

finishPushToTalk() :boolean;

cancelPushToTalk

Cancels speaking in push-to-talk mode.

cancelPushToTalk() :boolean;

enablePushToTalk

Enables or disables push-to-talk mode. In push-to-talk mode, the agent returns a result only after the finishPushToTalk operation is called.

enablePushToTalk(enable: boolean):boolean;

Parameter

Type

Description

enable

Bool

Specifies whether to enable push-to-talk mode.

getRTCInstance

Queries the information about the RTC engine instance.

getRTCInstance(): AliRtcEngine | undefined

sendTextToAgent

Sends a text message to the agent.

sendTextToAgent(req: AICallSendTextToAgentRequest);

Parameter

Type

Description

req

AICallSendTextToAgentRequest

The message struct to send.

sendCustomMessageToServer

Sends a custom message to the AppServer. Call this operation after the call session is initiated.

sendCustomMessageToServer(msg: string)

Parameter

Type

Description

msg

string

The message content.

updateLlmSystemPrompt

Updates the system prompt for the LLM. Call this operation after the call session is initiated.

updateLlmSystemPrompt(prompt: string)

Parameter

Type

Description

prompt

string

The prompt.

startVisionCustomCapture

Starts custom frame capture. Once started, voice communication with the vision agent will be disabled. Call this operation after initiating the call with the vision agent.

startVisionCustomCapture(req: AICallVisionCustomCaptureRequest)

Parameter

Type

Description

req

AICallVisionCustomCaptureRequest

The configurations.

{
  /**
   * The text parameter when requesting a multimodal large model.
   */
  text?: string;

  /**
   * true: one-time frame capture
   * false: regular frame capture
   */
  isSingle?: boolean;

  /**
   * The frame capture interval.
   * Default value: 5s
   */
  eachDuration?: number;

  /**
   * The number of images to capture each time.
   * Default value: 2
   */
  num?: number;

  /**
   * The frame capture duration. Unit: seconds.
   */
  duration?: number;

  /**
   * The custom business information in JSON format, which is passed along with the text and frames to the LLM for processing. 
   */
  userData?: string;
}

stopVisionCustomCapture

Ends custom frame capture. Call this operation after initiating the call with the vision agent.

stopVisionCustomCapture()

destroy

Releases resources.

async destroy()

ARTCAICallEngine events

errorOccurred

An error occurred during the current call.

Parameter

Type

Description

code

AICallErrorCode

The error code.

callBegin

A call has started.

callEnd

A call has ended.

agentStateChanged

The agent's status has changed.

Parameter

Type

Description

state

AICallAgentState

The agent's status. It can be listening, thinking, or speaking.

speakingVolumeChanged

The volume has changed.

Parameter

Type

Description

uid

string

The UID of the current speaker. The value is an empty string if the speaker is the current user.

volume

Int32

The volume. Valid values: 0 to 100.

userSubtitleNotify

The agent recognizes the user's question.

Parameter

Type

Description

subtitle

AICallSubtitleData

The subtitle.

agentSubtitleNotify

The agent returns an answer.

Parameter

Type

Description

subtitle

AICallSubtitleData

The subtitle.

voiceIdChanged

The agent's voice has changed.

Parameter

Type

Description

voiceId

string

The voice ID.

pushToTalkChanged

The push-to-talk mode has changed.

Parameter

Type

Description

enable

boolean

Indicates whether push-to-talk mode is enabled.

agentWillLeave

The agent is about to end the current call.

Parameter

Type

Description

reason

number

The reason why the agent is leaving. Valid values:

  • 2001: idle timeout.

  • 0: other reasons.

message

string

The description of the reason.

receivedAgentCustomMessage

A custom message has been received from the current agent.

Parameter

Type

Description

data

Object

The message content.

voiceInterruptChanged

The status of voice interruption has changed.

Parameter

Type

Description

enable

boolean

Indicates whether voice interruption is enabled for the current call.

humanTakeoverWillStart

A human agent is stepping in to take over from the current agent.

Parameter

Type

Description

takeoverUid

string

The UID of the human agent.

takeoverMode

number

The takeover mode. Valid values:

  • 1: The human agent will take over using the human voice.

  • 0: The human agent will take over using the agent's voice.

humanTakeoverConnected

A human agent has taken over from the current agent.

Parameter

Type

Description

takeoverUid

string

The UID of the human agent.

agentEmotionNotify

The agent detects an emotion.

Parameter

Type

Description

emotion

string

The emotion tag, such as neutral, happy, angry, or sad.

userAsrSentenceId

number

The ID of the sentence to which the user's question recognized by the agent belongs.

agentDataChannelAvailable

The agent's data channel is ready for use. Once this callback is invoked, you can send messages to the agent.