Integration solution with UI - Intelligent Media Services

This topic describes the AI real-time interaction solution that provides UI components.

Overview

This solution is based on AICallKit SDK and provides UI components for audio and video applications. You can flexibly reuse functional modules of AUI Kits based on your business requirements to quickly bring real-time and interactive AI to your app. This solution is designed for enterprises and developers who want to build AI real-time interaction scenarios in an efficient and quick manner. The functional modules of AUI Kits significantly reduce the development time and costs and ensure app quality and stability. For more information about how to integrate AUI Kits for AI real-time interaction, see the following topics:

For more information about server-side development, see Integrate AUI Kits AppServer for AI real-time interaction and API description.

Features

Feature	Description

Feature	Description
Real-time call (ARTC)	Relying on ARTC of Alibaba Cloud, users can make reliable and low-latency calls with intelligent agents anywhere around the world.
Real-time workflow	Fexibly orchestrate workflows of intelligent agents on the GUI. Speech-to-text Alibaba Cloud Qwen is integrated to implement the speech-to-text feature. You can integrate the speech-to-text capability of iFLYTEK as a third-party plug-in. Text-to-speech Alibaba Cloud Qwen is integrated to implement the speech synthesis feature. AI real-time interaction can be connected to your self-developed speech synthesis module based on standard protocols. You can integrate the voice capability of MiniMax as a third-party plug-in. LLM for text generation Alibaba Cloud Qwen is integrated to provide LLM capabilities. AI models in Alibaba Cloud Model Studio can be selected. AI real-time interaction can be connected to self-developed LLMs based on OpenAPI or Alibaba Cloud specifications. Avatar You can integrate the avatar capability of Faceunity as a third-party plug-in. Video frame extraction Multi-modal LLM Alibaba Cloud Qwen is preset. AI real-time interaction can be connected to self-developed LLMs based on OpenAPI specifications.
Custom profile	Upload an image for the AI agent that you created. The image is displayed during voice calls.
Emotion recognition	Recognize users' emotions and generate empathetic responses.
Welcome message	Configure the welcome message in the Intelligent Media Services (IMS) console. When the user starts a conversation, the agent broadcasts the welcome message.
Proactive broadcasting	Configure the business server to allow the agent to proactively push audio and video content to the user by using OpenAPI.
Live subtitles	The content of the conversation between the user and the agent can be presented in real time on the user interface.
Intelligent noise reduction	Automatically filter the noise from the user side during a conversation. If multiple users are speaking at the same time, the voice with the highest volume is preferentially collected.
Intelligent interruption	Recognize the conversation interruption intention of users.
Intelligent sentence segmentation	Automatically identify and segment long or complex sentences to improve text readability and user experience.
Intercom mode	The user can set the call mode to the intercom mode at the beginning of or during a call, and interact with the intelligent agent by pressing a button.
ASR hotwords	You can define business-related hotwords to improve the speech recognition accuracy of intelligent agents
Voiceprint-based noise suppression	In a multi-speaker scenario, the intelligent agent can identify the voiceprint characteristics of the main speaker to accurately capture their speech and minimize interference from background noise.
Human takeover	When the intelligent agent encounters situations beyond its capabilities or requires critical decision-making, human agents can take over the conversations with users.
Graceful shutdown	When the business server stops the intelligent agent, the business server allows the intelligent agent to complete the current sentence. This prevents abrupt interruptions of conversations.
Data archiving	The conversations between intelligent agents and users are converted into text for storage. You can call API operations to consume the data. In addition, you can store audio and video data of calls between intelligent agents and users to Object Storage Service (OSS) or ApsaraVideo VOD.