Before you begin - - Alibaba Cloud Documentation Center

If you want to use Intelligent Speech Interaction, you can read the Quick Start documentation to help you get started with Intelligent Speech Interaction. Then, we recommend that you read the following topics in sequence to obtain up-to-date information about Intelligent Speech Interaction.

Topic	Description
Concepts	Introduces the terms and concepts related to Intelligent Speech Interaction.
Manage projects	Describes how to create projects and set project parameters in the Intelligent Speech Interaction console.
Obtain an access token	Describes how to obtain an access token. You must obtain an access token before you call Intelligent Speech Interaction services.
Call Intelligent Speech Interaction services	Short Sentence Recognition Real-time Speech Recognition Speech Synthesis Recording File Recognition
Use customization tools for speech recognition	Describes how to use customization tools to improve the effectiveness of speech recognition.

Differences among various Intelligent Speech Interaction services

Service	Real-time performance	Feature	Scenario	Audio coding format	Call method	Free quota	Purchase
Short sentence recognition	Real-time recognition.	Recognizes short speech that lasts for 1 minute or less.	Scenarios such as voice search in apps, customer service hotlines, chat conversations, and voice command control	Pulse-code modulation (PCM) for uncompressed PCM or WAV files and Opus	Java/C++/Android/iOS	A maximum of two concurrent call requests	Separate resource package
Real-time speech recognition	Real-time recognition.	Recognizes speech data streams that last for a long period of time.	Uninterrupted speech recognition scenarios such as conference speeches and live streaming	PCM for uncompressed PCM or WAV files	Java/C++/Android/iOS	A maximum of two concurrent call requests	Separate resource package
Speech synthesis	Real-time synthesis.	Converts text that contains a maximum of 300 UTF-8 encoded characters to speech.	Scenarios that require text-to-speech synthesis	PCM, WAV, and MP3	Java/C++/Android/iOS	A maximum of two concurrent call requests	Separate resource package
Recording file recognition	Non-real-time recognition. After a free trial user sends a recognition request for a recording file, the recognition server recognizes the file and returns the result within 24 hours. For a paying user, the recognition result is returned within 6 hours. Note This is not true if the recording files that are uploaded within 30 minutes are more than 500 hours in length. If you need to convert such data, contact the pre-sales service.	Recognizes a recording file that has a maximum size of 512 MB.	Scenarios that do not require real-time recognition	Single-track and dual-track WAV and MP3	Java/C++/GO/.NET/Node.js/PHP/Python	Call requests for recognizing recording files that are up to 2 hours in length for each calendar day	Separate resource package
Long text speech synthesis	Non-real-time synthesis.	Converts text data that contains thousands or tens of thousands of characters to binary audio data.	Scenarios such as reading novels and articles	PCM, WAV, and MP3	JAVA/C++/RESTful API	No trial edition available	Separate resource package

Notice

Except for the recording file recognition service, other speech interaction services of Intelligent Speech Interaction support only mono speech data.
Intelligent Speech Interaction supports only 16-bit audio files that are sampled at 8 kHz or 16 kHz.