Intelligent Speech Interaction for Human-Computer Interaction

Intelligent Speech Interaction

Intelligent Speech Interaction is developed based on state-of-the-art technologies such as speech recognition, speech synthesis, and natural language understanding. Enterprises can integrate Intelligent Speech Interaction into their products to enable them to listen, understand, and converse with users, providing users with an immersive human-computer interaction experience. Intelligent Speech Interaction is currently available in Mandarin Chinese, Cantonese Chinese, English, Japanese, Korean, French and Indonesian, and please stay tuned for other languages.

Trial Now Developer Documentation Contact Us

Alibaba Cloud Named a VISIONARIE in the Gartner 2021 Magic Quadrant for Cloud AI Developer Services Report

Console

Overview

Console

Intelligent Speech Interaction is suitable for various scenarios, including intelligent Q&A, intelligent quality inspection, real-time subtitling for speeches, and transcription of audio recordings. Intelligent Speech Interaction has been successfully applied in many industries such as finance, insurance, e-commerce and smart home. Intelligent Speech Interaction allows you to use self-learning platform to improve speech recognition accuracy, and provides a comprehensive management console and easy-to-use SDKs. You are welcome to activate Intelligent Speech Interaction.

{"moduleinfo":{"resId":"","bigTitle":"","subtitle":"","note":"","floor":"floor1","benefits":"Service Benefits","outputNews4Partner":"false","outputBuyBtn4Partner":true,"cardColor":"#fff","tipColor":"#F5F5F6","iconColor":"#fff"},"regions":[],"os":[],"products":[],"news":[],"benefits":[{"icon":"https://img.alicdn.com/tfs/TB17CbZn9slXu8jSZFuXXXg7FXa-128-114.png_.webp","title":"High Recognition Accuracy","content":"Alibaba Cloud is the first cloud service provider in China to use word-level LC-BLSTM and DFSMN-CTC models. Compared with the traditional CTC method in the industry, these models reduce the error rate by 20%, greatly improving the accuracy of speech recognition.","alt":""},{"icon":"https://img.alicdn.com/tfs/TB1qkaA2QL0gK0jSZFAXXcA9pXa-116-128.png_.webp","title":"Ultra-high Decoding Speed","content":"Alibaba Cloud is the first cloud service provider in China to use the low frame rate (LFR) decoding technology. This technology increases the decoding speed by more than three times without compromising recognition accuracy, greatly shortening response time and improving user experience","alt":""},{"icon":"https://img.alicdn.com/tfs/TB1iMov2AL0gK0jSZFtXXXQCXXa-92-126.png_.webp","title":"Novel Self-learning Platform","content":"Intelligent Speech Interaction is the first system in the industry that provides a self-learning platform. It allows you to specify hotwords, and upload business-related data to build specific models for better recognition accuracy."},{"icon":"https://img.alicdn.com/tfs/TB1a4_w3oz1gK0jSZLeXXb9kVXa-112-128.png_.webp","title":"Extensive Industry Coverage","content":"Currently, Intelligent Speech Interaction has customers in a wide variety of industries, such as finance, insurance, e-commerce, and smart home. It is ideal for various scenarios, including intelligent Q&A, intelligent quality inspection, real-time subtitling for speeches, and voice assistants."}],"$root":{"moduleinfo":{"resId":"","bigTitle":"","subtitle":"","note":"","floor":"floor1","benefits":"Service Benefits","outputNews4Partner":"false","outputBuyBtn4Partner":true,"cardColor":"#fff","tipColor":"#F5F5F6","iconColor":"#fff"},"regions":[],"os":[],"products":[],"news":[],"benefits":[{"icon":"https://img.alicdn.com/tfs/TB17CbZn9slXu8jSZFuXXXg7FXa-128-114.png_.webp","title":"High Recognition Accuracy","content":"Alibaba Cloud is the first cloud service provider in China to use word-level LC-BLSTM and DFSMN-CTC models. Compared with the traditional CTC method in the industry, these models reduce the error rate by 20%, greatly improving the accuracy of speech recognition.","alt":""},{"icon":"https://img.alicdn.com/tfs/TB1qkaA2QL0gK0jSZFAXXcA9pXa-116-128.png_.webp","title":"Ultra-high Decoding Speed","content":"Alibaba Cloud is the first cloud service provider in China to use the low frame rate (LFR) decoding technology. This technology increases the decoding speed by more than three times without compromising recognition accuracy, greatly shortening response time and improving user experience","alt":""},{"icon":"https://img.alicdn.com/tfs/TB1iMov2AL0gK0jSZFtXXXQCXXa-92-126.png_.webp","title":"Novel Self-learning Platform","content":"Intelligent Speech Interaction is the first system in the industry that provides a self-learning platform. It allows you to specify hotwords, and upload business-related data to build specific models for better recognition accuracy."},{"icon":"https://img.alicdn.com/tfs/TB1a4_w3oz1gK0jSZLeXXb9kVXa-112-128.png_.webp","title":"Extensive Industry Coverage","content":"Currently, Intelligent Speech Interaction has customers in a wide variety of industries, such as finance, insurance, e-commerce, and smart home. It is ideal for various scenarios, including intelligent Q&A, intelligent quality inspection, real-time subtitling for speeches, and voice assistants."}]},"$moduleId":"8141005170"}

Service Benefits

: High Recognition Accuracy
Alibaba Cloud is the first cloud service provider in China to use word-level LC-BLSTM and DFSMN-CTC models. Compared with the traditional CTC method in the industry, these models reduce the error rate by 20%, greatly improving the accuracy of speech recognition.

: Ultra-high Decoding Speed
Alibaba Cloud is the first cloud service provider in China to use the low frame rate (LFR) decoding technology. This technology increases the decoding speed by more than three times without compromising recognition accuracy, greatly shortening response time and improving user experience

: Novel Self-learning Platform
Intelligent Speech Interaction is the first system in the industry that provides a self-learning platform. It allows you to specify hotwords, and upload business-related data to build specific models for better recognition accuracy.

: Extensive Industry Coverage
Currently, Intelligent Speech Interaction has customers in a wide variety of industries, such as finance, insurance, e-commerce, and smart home. It is ideal for various scenarios, including intelligent Q&A, intelligent quality inspection, real-time subtitling for speeches, and voice assistants.

Products and Services

Recording File Recognition

Converts audio from files uploaded by users into text within 24 hours. Applicable to scenarios that are not time-sensitive, such as call center quality assurance, transcription of court trials from recordings, summarization of meeting minutes, and medical record filing.

Real-time Speech Recognition

Converts audio streams into text in real time. Intelligent segmentation is used to identify when sentences start and end. Real-time Speech Recognition is ideal for scenarios with high requirements for real-time response, such as real-time transcription for live videos, meetings, and court trials.

Short Sentence Recognition

Converts short audio (< 1 min.) to text. Applicable to real-time scenarios, such as voice search, voice command control, and voice short message. Short Sentence Recognition can be integrated into various applications, smart home appliances, and smart assistants.

Speech Synthesis

Converts text to natural speech. Speech Synthesis provides a variety of voices and allows you to adjust the speed, intonation, and volume. It is ideal for scenarios such as intelligent customer service, speech interaction, audio book, and broadcasting.

Self-learning Platform

Allows you to upload business-related data to improve the recognition accuracy in specific user case. Currently, you can upload only text to customize language models. In the future, Self-learning Platform will allow you to upload audio data to customize acoustic models.

Scenarios

Intelligent Customer Service Quality Control
Real-time Subtitling and Management
Service Call Analysis

Intelligent Customer Service Quality Control

Traditional quality inspection generally involves listening to customer service call recordings, which is inefficient and labor-intensive. Intelligent quality inspection performs real-time inspection on all service processes, helping enterprises to relief from labor constraints and gain full control over service quality.

Procedure and Benefits

Procedure

After converting the voice recordings to text, quality inspection engine generates quality inspection results and statistics. Quality inspectors can verify the reported violation through the management console.

Benefits

1. Full automation - The quality of all customer service calls can be automatically inspected.
2. Real-time processing - Quality inspection can be completed immediately after a phone call ends and the results can be displayed in real time.
3. Flexibility in rule configuration - Rules can be flexibly configured in various complex business scenarios.

Real-time Subtitling and Management

Converts audio into subtitles in real time for live speeches and videos. In live video scenarios, Intelligent Speech Interaction can also manage video content.

Business Pain Points and Benefits

Business Pain Points

1. When you attend a conference or watch a live stream, you may not be able to hear the speech clearly due to far distance or background noise.
2. Huge amount of videos need subtitles and management: A live streaming application generates over 100,000 hours of videos every day. Live streaming of formal events requires subtitles and live streaming of entertainment requires management.

Benefits

1. High accuracy: Transcribes speeches delivered at the Apsara Conference and beats the runner-up of the international stenography competition in terms of accuracy. Intelligent Speech Interaction has become a standard product of the Apsara Conference.
2. Low latency: Provides real-time transcription of live streaming with low latency.

Service Call Analysis

In traditional intermediary businesses, agents tend to be abandoned once customers establish contact with each other. For example, a landlord convinces tenants to make direct payments, resulting in financial loss to the agency. Such behavior can often be discovered in the phone calls between two parties. The Alibaba Cloud speech recognition service can help agents promptly discover the preceding issue, and thus avoid financial loss.

Procedure and Benefits

Procedure

When the service call analysis system receives the phone call recording from customers, it processes and returns the results in real time. Customers have the option to use quality inspection system or their own systems to analyze the returned text and identify problems in a timely manner.

Benefits

1. Requires no manual intervention, saving labor costs.
2. Provides excellent real-time performance to identify problems in a timely manner.

Get Started with Intelligent Speech Interaction

More Information About Intelligent Speech Interaction

Contact Us > Console >