Audio and video calls

Updated at: 2025-03-10 09:10

This topic explains how to integrate AI agents for audio and video calls using the AICallKit SDK.

Overview

The AICallKit SDK provides low-code solutions to integrate AI agents with real-time audio and video capabilities. It enables enterprises to rapidly build the functionality of communicating with AI agents in their applications.

Benefits

  • Rapid integration and development: The AICallKit SDK offers pre-built interfaces, allowing developers to implement AI real-time interaction with minimal coding.

  • Cross-platform support: Compatible with multiple mainstream operating systems and platforms, including iOS, Android, and Web, the AICallKit SDK enables developers to use unified APIs, ensuring consistent functionality and user experience across platforms.

  • Rich features: In addition to basic call functionality, the AICallKit SDK provides a variety of features, such as displaying agent status, real-time subtitles, and intelligent interruption. These features can be configured as needed if you use the integration solution without UI.

Integration solutions

Alibaba Cloud offers two integration solutions using the AICallKit SDK:

  • Integration solution with UI: This low-code solution includes UI components for audio and video applications. You can run a demo with simple configurations and integrate the UI components into your project.

  • Integration solution without UI: The AICallKit SDK encapsulates AI real-time interaction capabilities to reduce development workload related to AI agents and real-time communication (RTC). This solution is ideal if you want to customize the user interfaces and do not want to manage the underlying implementation.

Note

Alibaba Cloud provides a guide for feature implementation in integration solution without UI.

AICallKit SDK features

Feature

Description

iOS & Android

Web

Feature

Description

iOS & Android

Web

Voice call

Users can talk with AI agents and obtain instant feedback and services. 

✔️

✔️

Avatar call

Users can make video calls with avatars, which provide more realistic interactions. 

✔️

✔️

Visual call

In video calls with users, the agent provides feedback based on the voice and camera feed. 

✔️

✔️

Agent status

You can display the status of the agent, including listening, thinking, and speaking.

✔️

✔️

Real-time subtitles

The dialogue between the agent and the user is transcripted in real time and displayed on the client.

✔️

✔️

Manual interruption

You can send an instruction to the agent to stop it from speaking.

✔️

✔️

Intelligent interruption

The AI agent intelligently detects the user's intent to interrupt the conversation.

✔️

✔️

Voice

You can configure the agent voice. For supported voices, see Intelligent voice samples and CosyVoice.

✔️

✔️

Intercom mode

Users can set the call mode to the intercom mode at the beginning of or during a call, and push the button to talk.

✔️

✔️

Voiceprint recognition

In a multi-speaker scenario, the agent can identify the voiceprint characteristics of the main speaker to accurately capture their speech and minimize interference from background noise.

✔️

Custom message

You can send a custom message via the RTC custom message channel.

✔️

✔️

Local device management

Users can turn off the speaker and mute the microphone during a call.

✔️

✔️

Callbacks

You can obtain information such as the main speaker's volume and network status through callbacks.

✔️

✔️

  • On this page (1)
  • Overview
  • Benefits
  • Integration solutions
  • AICallKit SDK features
Feedback