This topic describes how to integrate AICallKit SDK to quickly set up a solution for Real-time Conversational AI.
Solution overview
AICallKit SDK is a development kit for managing AI agents. The SDK provides a series of APIs that you can use to implement conversations between AI agents and app users. By integrating AICallKit SDK, you can directly use most AI agent features without the need for self-development. This significantly shortens the development cycle, enhances app quality, and offers users a richer and smoother speech interaction experience. For more information about integration on different devices, see the following topics:
Flowchart
Your app can obtain an RTC token from your AppServer, and then call the call(config) method to start a call. During the call, you can call AICallKit APIs to implement interactive features such as live subtitles and interruptions for the AI agent. AICallKit depends on real-time audio and video capabilities, so the features of ApsaraVideo Real-Time Communication (ARTC) have been integrated into AICallKit SDK. If your business scenario requires live streaming and VOD capabilities, consider using ApsaraVideo MediaBox SDK. For more information, see Select and download SDKs.
Solution benefits
After integrating AICallKit SDK, you can perform a series of operations on AI agents.
You can develop your own AppServer based on business requirements.
AICallKit SDK can be integrated into Android, iOS, and web apps.
Features
AI agent calls: You can initiate a call with a voice, digital human, or visual understanding agent.
Agent status: You can query the AI agent status in real time.
Live subtitles: Conversations between the AI agent and users can be converted into text in real time and displayed on the client.
Interruption: The AI agent can intelligently detect the user's intention to interrupt the conversation.
Advanced configuration for agents: You can customize the AI agent's voice and interrupt its speech.
Local device management: You can turn off the speaker and mute the microphone during calls.