Audio and video calls - Intelligent Media Services - Alibaba Cloud Documentation Center

0.0.201

This topic explains how to integrate AI agents for audio and video calls using the AICallKit SDK.

Overview

The AICallKit SDK provides low-code solutions to integrate AI agents with real-time audio and video capabilities. It enables enterprises to rapidly build the functionality of communicating with AI agents in their applications.

Benefits

Rapid integration and development: The AICallKit SDK offers pre-built interfaces, allowing developers to implement AI real-time interaction with minimal coding.
Cross-platform support: Compatible with multiple mainstream operating systems and platforms, including iOS, Android, and Web, the AICallKit SDK enables developers to use unified APIs, ensuring consistent functionality and user experience across platforms.
Rich features: In addition to basic call functionality, the AICallKit SDK provides a variety of features, such as displaying agent status, real-time subtitles, and intelligent interruption. These features can be configured as needed if you use the integration solution without UI.

Integration solutions

Alibaba Cloud offers two integration solutions using the AICallKit SDK:

Integration solution with UI: This low-code solution includes UI components for audio and video applications. You can run a demo with simple configurations and integrate the UI components into your project.
Integration solution without UI: The AICallKit SDK encapsulates AI real-time interaction capabilities to reduce development workload related to AI agents and real-time communication (RTC). This solution is ideal if you want to customize the user interfaces and do not want to manage the underlying implementation.

Note

Alibaba Cloud provides a guide for feature implementation in integration solution without UI.

AICallKit SDK features

Feature	Description	iOS & Android	Web

Feature	Description	iOS & Android	Web
Voice call	Users can talk with AI agents and obtain instant feedback and services.	✔️	✔️
Avatar call	Users can make video calls with avatars, which provide more realistic interactions.	✔️	✔️
Visual call	In video calls with users, the agent provides feedback based on the voice and camera feed.	✔️	✔️
Agent status	You can display the status of the agent, including listening, thinking, and speaking.	✔️	✔️
Real-time subtitles	The dialogue between the agent and the user is transcripted in real time and displayed on the client.	✔️	✔️
Manual interruption	You can send an instruction to the agent to stop it from speaking.	✔️	✔️
Intelligent interruption	The AI agent intelligently detects the user's intent to interrupt the conversation.	✔️	✔️
Voice	You can configure the agent voice. For supported voices, see Intelligent voice samples and CosyVoice.	✔️	✔️
Intercom mode	Users can set the call mode to the intercom mode at the beginning of or during a call, and push the button to talk.	✔️	✔️
Voiceprint recognition	In a multi-speaker scenario, the agent can identify the voiceprint characteristics of the main speaker to accurately capture their speech and minimize interference from background noise.	✔️	❌
Custom message	You can send a custom message via the RTC custom message channel.	✔️	✔️
Local device management	Users can turn off the speaker and mute the microphone during a call.	✔️	✔️
Callbacks	You can obtain information such as the main speaker's volume and network status through callbacks.	✔️	✔️

Feedback

Previous: Developer guideNext: Integration solution with UI

On this page （1）

Overview

Benefits

Integration solutions

AICallKit SDK features

About Alibaba Cloud

Our Global Network

Quick Start

Global Offices

Olympic Games Paris 2024 New

Stade Roland Garros – Glitz from the Past New

Place de la Concorde – “Breaking” the Barriers New

Vaires-sur-Marne Nautical Stadium – Sports with Sustainability New

International Broadcast Center – Images, Sounds, and Data that Captivate Billions New

Customer Success Stories New

Trust Center

Security & Compliance Center

Cloud Compliance Resources

Security Compliance FAQs

Product & Feature Update New

Cloud Forward

Press Room

Alibaba Cloud e-Magazine New

Alibaba Cloud in Analyst Research

Notice

Go Global Service New

Go Global Alliance with Alibaba Cloud

Asia Accelerator Hot

Information Compliance

China Gateway - MLPS 2.0 Compliance New

China Gateway - Networking

China Gateway - Global Application Acceleration New

China Gateway - Security

China Gateway - Data Security New

ICP Support Hot

China Gateway - Omnichannel Data Mid-End New

China Gateway - Organizational Data Mid-End New

China Gateway - Business Mid-End New

China Gateway - AI Service for Conversational Chatbots New

China Gateway - Online Education

China Gateway - Domain Registration

Work at Alibaba Cloud

Experienced Professionals

Students and Graduates

Free Trial

Pricing

Promo Center

Price Reduction

Pay Less and Deploy More

FinOps

Elastic Compute Service (ECS)

Simple Application Server (SAS)

Elastic GPU Service

Elastic Desktop Service (EDS)

Object Storage Service (OSS)

Cloud Enterprise Network (CEN)

Web Application Firewall (WAF)

Domain Names

Container Compute Service (ACS)

Secure Access Service Edge (SASE)

Intelligent Media Services(IMS)

Edge Security Acceleration (ESA)(Original DCDN)

Intelligent Media Management

DingTalk Enterprise

YiDA

Alibaba Cloud Model Studio

Apsara Prime - For Easy Cloud Product Selection

Alibaba Cloud ECS - Cater All Your Cloud Hosting Needs

1TB CDN—Get Free 1 TB Outbound Traffic Plan Now

Security—Under Attack? Get Free Security Support

Short Message Service - Free Testing is Available

Elastic Compute Service (ECS) Hot

CloudBox

Compute Nest

Dedicated Host Hot

ECS Bare Metal Instance

Elastic GPU Service Featured

Simple Application Server (SAS) Hot

Auto Scaling

Cloud Phone Beta

Elastic Desktop Service (EDS) Featured

Batch Compute

Elastic High Performance Computing (E-HPC)

Super Computing Cluster (SCC)

Function Compute (FC)