Integration solution with UI

Updated at: 2025-03-13 08:14

This topic describes the AI real-time interaction solution that provides UI components.

Overview

This solution is based on AICallKit SDK and provides UI components for audio and video applications. You can flexibly reuse functional modules of AUI Kits based on your business requirements to quickly bring real-time and interactive AI to your app. This solution is designed for enterprises and developers who want to build AI real-time interaction scenarios in an efficient and quick manner. The functional modules of AUI Kits significantly reduce the development time and costs and ensure app quality and stability. For more information about how to integrate AUI Kits for AI real-time interaction, see the following topics:

For more information about server-side development, see Integrate AUI Kits AppServer for AI real-time interaction and API description.

Features

Feature

Description

Feature

Description

Real-time call (ARTC)

Relying on ARTC of Alibaba Cloud, users can make reliable and low-latency calls with intelligent agents anywhere around the world.

Real-time workflow

Fexibly orchestrate workflows of intelligent agents on the GUI.

  • Speech-to-text

    • Alibaba Cloud Qwen is integrated to implement the speech-to-text feature.

    • You can integrate the speech-to-text capability of iFLYTEK as a third-party plug-in.

  • Text-to-speech

    • Alibaba Cloud Qwen is integrated to implement the speech synthesis feature.

    • AI real-time interaction can be connected to your self-developed speech synthesis module based on standard protocols.

    • You can integrate the voice capability of MiniMax as a third-party plug-in.

  • LLM for text generation

    • Alibaba Cloud Qwen is integrated to provide LLM capabilities.

    • AI models in Alibaba Cloud Model Studio can be selected.

    • AI real-time interaction can be connected to self-developed LLMs based on OpenAPI or Alibaba Cloud specifications.

  • Avatar

    • You can integrate the avatar capability of Faceunity as a third-party plug-in.

  • Video frame extraction

  • Multi-modal LLM

    • Alibaba Cloud Qwen is preset.

    • AI real-time interaction can be connected to self-developed LLMs based on OpenAPI specifications.

Custom profile

Upload an image for the AI agent that you created. The image is displayed during voice calls.

Emotion recognition

Recognize users' emotions and generate empathetic responses.

Welcome message

Configure the welcome message in the Intelligent Media Services (IMS) console. When the user starts a conversation, the agent broadcasts the welcome message.

Proactive broadcasting

Configure the business server to allow the agent to proactively push audio and video content to the user by using OpenAPI.

Live subtitles

The content of the conversation between the user and the agent can be presented in real time on the user interface.

Intelligent noise reduction

Automatically filter the noise from the user side during a conversation. If multiple users are speaking at the same time, the voice with the highest volume is preferentially collected.

Intelligent interruption

Recognize the conversation interruption intention of users.

Intelligent sentence segmentation

Automatically identify and segment long or complex sentences to improve text readability and user experience.

Intercom mode

The user can set the call mode to the intercom mode at the beginning of or during a call, and interact with the intelligent agent by pressing a button.

ASR hotwords

You can define business-related hotwords to improve the speech recognition accuracy of intelligent agents

Voiceprint-based noise suppression

In a multi-speaker scenario, the intelligent agent can identify the voiceprint characteristics of the main speaker to accurately capture their speech and minimize interference from background noise.

Human takeover

When the intelligent agent encounters situations beyond its capabilities or requires critical decision-making, human agents can take over the conversations with users.

Graceful shutdown

When the business server stops the intelligent agent, the business server allows the intelligent agent to complete the current sentence. This prevents abrupt interruptions of conversations.

Data archiving

The conversations between intelligent agents and users are converted into text for storage. You can call API operations to consume the data. In addition, you can store audio and video data of calls between intelligent agents and users to Object Storage Service (OSS) or ApsaraVideo VOD.

  • On this page (1, T)
  • Overview
  • Features
Feedback