If you want to use Intelligent Speech Interaction, you can read the Quick Start documentation to help you get started with Intelligent Speech Interaction. Then, we recommend that you read the following topics in sequence to obtain up-to-date information about Intelligent Speech Interaction.
Topic | Description |
---|---|
Introduces the terms and concepts related to Intelligent Speech Interaction. | |
Describes how to create projects and set project parameters in the Intelligent Speech Interaction console. | |
Describes how to obtain an access token. You must obtain an access token before you call Intelligent Speech Interaction services. | |
Call Intelligent Speech Interaction services | |
Describes how to use customization tools to improve the effectiveness of speech recognition. |
Differences among various Intelligent Speech Interaction services
Service | Real-time performance | Feature | Scenario | Audio coding format | Call method | Free quota | Purchase |
---|---|---|---|---|---|---|---|
Short sentence recognition | Real-time recognition. | Recognizes short speech that lasts for 1 minute or less. | Scenarios such as voice search in apps, customer service hotlines, chat conversations, and voice command control | Pulse-code modulation (PCM) for uncompressed PCM or WAV files and Opus | Java/C++/Android/iOS | A maximum of two concurrent call requests | Separate resource package |
Real-time speech recognition | Real-time recognition. | Recognizes speech data streams that last for a long period of time. | Uninterrupted speech recognition scenarios such as conference speeches and live streaming | PCM for uncompressed PCM or WAV files | Java/C++/Android/iOS | A maximum of two concurrent call requests | Separate resource package |
Speech synthesis | Real-time synthesis. | Converts text that contains a maximum of 300 UTF-8 encoded characters to speech. | Scenarios that require text-to-speech synthesis | PCM, WAV, and MP3 | Java/C++/Android/iOS | A maximum of two concurrent call requests | Separate resource package |
Recording file recognition | Non-real-time recognition. After a free trial user sends a recognition request for a recording file, the recognition server recognizes the file and returns the result within 24 hours. For a paying user, the recognition result is returned within 6 hours. Note This is not true if the recording files that are uploaded within 30 minutes are more than 500 hours in length. If you need to convert such data, contact the pre-sales service. | Recognizes a recording file that has a maximum size of 512 MB. | Scenarios that do not require real-time recognition | Single-track and dual-track WAV and MP3 | Java/C++/GO/.NET/Node.js/PHP/Python | Call requests for recognizing recording files that are up to 2 hours in length for each calendar day | Separate resource package |
Long text speech synthesis | Non-real-time synthesis. | Converts text data that contains thousands or tens of thousands of characters to binary audio data. | Scenarios such as reading novels and articles | PCM, WAV, and MP3 | JAVA/C++/RESTful API | No trial edition available | Separate resource package |
Except for the recording file recognition service, other speech interaction services of Intelligent Speech Interaction support only mono speech data.
Intelligent Speech Interaction supports only 16-bit audio files that are sampled at 8 kHz or 16 kHz.