This topic describes the AI real-time interaction solution that provides UI components.
Overview
This solution is based on AICallKit SDK and provides UI components for audio and video applications. You can flexibly reuse functional modules of AUI Kits based on your business requirements to quickly bring real-time and interactive AI to your app. This solution is designed for enterprises and developers who want to build AI real-time interaction scenarios in an efficient and quick manner. The functional modules of AUI Kits significantly reduce the development time and costs and ensure app quality and stability. For more information about how to integrate AUI Kits for AI real-time interaction, see the following topics:
Integrate AUI Kits for AI real-time interaction into Android apps
Integrate AUI Kits for AI real-time interaction into iOS apps
For more information about server-side development, see Integrate AUI Kits AppServer for AI real-time interaction and API description.
Features
Feature | Description |
Feature | Description |
Real-time call (ARTC) | Relying on ARTC of Alibaba Cloud, users can make reliable and low-latency calls with intelligent agents anywhere around the world. |
Real-time workflow | Fexibly orchestrate workflows of intelligent agents on the GUI.
|
Custom profile | Upload an image for the AI agent that you created. The image is displayed during voice calls. |
Emotion recognition | Recognize users' emotions and generate empathetic responses. |
Welcome message | Configure the welcome message in the Intelligent Media Services (IMS) console. When the user starts a conversation, the agent broadcasts the welcome message. |
Proactive broadcasting | Configure the business server to allow the agent to proactively push audio and video content to the user by using OpenAPI. |
Live subtitles | The content of the conversation between the user and the agent can be presented in real time on the user interface. |
Intelligent noise reduction | Automatically filter the noise from the user side during a conversation. If multiple users are speaking at the same time, the voice with the highest volume is preferentially collected. |
Intelligent interruption | Recognize the conversation interruption intention of users. |
Intelligent sentence segmentation | Automatically identify and segment long or complex sentences to improve text readability and user experience. |
Intercom mode | The user can set the call mode to the intercom mode at the beginning of or during a call, and interact with the intelligent agent by pressing a button. |
ASR hotwords | You can define business-related hotwords to improve the speech recognition accuracy of intelligent agents |
Voiceprint-based noise suppression | In a multi-speaker scenario, the intelligent agent can identify the voiceprint characteristics of the main speaker to accurately capture their speech and minimize interference from background noise. |
Human takeover | When the intelligent agent encounters situations beyond its capabilities or requires critical decision-making, human agents can take over the conversations with users. |
Graceful shutdown | When the business server stops the intelligent agent, the business server allows the intelligent agent to complete the current sentence. This prevents abrupt interruptions of conversations. |
Data archiving | The conversations between intelligent agents and users are converted into text for storage. You can call API operations to consume the data. In addition, you can store audio and video data of calls between intelligent agents and users to Object Storage Service (OSS) or ApsaraVideo VOD. |