The speech synthesis service is used to convert input text to binary audio data.
Features
Supports the following audio coding formats: pulse-code modulation (PCM), WAV, and MP3.
Allows you to configure the speed, intonation, and volume of the speaker.
Allows you to set the speaker of the generated speech, including male voices and female voices for different languages or dialects.
ImportantSupports phoneme boundary detection for each Chinese character or English word. The speech synthesis service generates a timestamp for each word in the synthesized speech. This timestamp indicates the point in time of each Chinese character or English word in the speech. The timestamp information can be used for lip synchronization or dubbing. For more information, see Timestamp feature.
Name
Value of the voice parameter
Type
Scenario
Supported language
Supported sampling rate (Hz)
Phoneme boundary detection for each character or word
Remarks
Xiaoyun
Xiaoyun
Standard female voice
All scenarios
Chinese or bilingual (Chinese and English)
8K/16K
No
None
Xiaogang
Xiaogang
Standard male voice
All scenarios
Chinese or bilingual (Chinese and English)
8K/16K
No
None
Ruoxi
Ruoxi
Gentle female voice
All scenarios
Chinese or bilingual (Chinese and English)
8K/16K/24K
No
None
Siqi
Siqi
Gentle female voice
All scenarios
Chinese or bilingual (Chinese and English)
8K/16K/24K
Yes
None
Sijia
Sijia
Standard female voice
All scenarios
Chinese or bilingual (Chinese and English)
8K/16K/24K
No
None
Sicheng
Sicheng
Standard male voice
All scenarios
Chinese or bilingual (Chinese and English)
8K/16K/24K
Yes
None
Aiqi
Aiqi
Gentle female voice
All scenarios
Chinese or bilingual (Chinese and English)
8K/16K
Yes
None
Aijia
Aijia
Standard female voice
All scenarios
Chinese or bilingual (Chinese and English)
8K/16K
Yes
None
Aicheng
Aicheng
Standard male voice
All scenarios
Chinese or bilingual (Chinese and English)
8K/16K
Yes
None
Aida
Aida
Standard male voice
All scenarios
Chinese or bilingual (Chinese and English)
8K/16K
Yes
None
Ning'er
Ninger
Standard female voice
All scenarios
Chinese only
8K/16K/24K
No
None
Ruilin
Ruilin
Standard female voice
All scenarios
Chinese only
8K/16K/24K
No
None
Siyue
Siyue
Gentle female voice
Customer service
Chinese or bilingual (Chinese and English)
8K/16K/24K
No
None
Aiya
Aiya
Harsh female voice
Customer service
Chinese or bilingual (Chinese and English)
8K/16K
Yes
None
Aixia
Aixia
Amiable female voice
Customer service
Chinese or bilingual (Chinese and English)
8K/16K
Yes
None
Aimei
Aimei
Sweet female voice
Customer service
Chinese or bilingual (Chinese and English)
8K/16K
Yes
None
Aiyu
Aiyu
Natural female voice
Customer service
Chinese or bilingual (Chinese and English)
8K/16K
Yes
None
Aiyue
Aiyue
Gentle female voice
Customer service
Chinese or bilingual (Chinese and English)
8K/16K
Yes
None
Aijing
Aijing
Harsh female voice
Customer service
Chinese or bilingual (Chinese and English)
8K/16K
Yes
None
Xiaomei
Xiaomei
Sweet female voice
Customer service
Chinese or bilingual (Chinese and English)
8K/16K/24K
No
None
Aina
Aina
Female voice with Zhejiang accent
Customer service
Chinese only
8K/16K
Yes
None
Yina
Yina
Female voice with Zhejiang accent
Customer service
Chinese only
8K/16K/24K
No
None
Sijing
Sijing
Harsh female voice
Customer service
Chinese only
8K/16K/24K
Yes
None
Sitong
Sitong
Child voice
Scenarios in which child voices are required
Chinese only
8K/16K/24K
No
None
Xiaobei
Xiaobei
Little girl voice
Scenarios in which child voices are required
Chinese only
8K/16K/24K
Yes
None
Aitong
Aitong
Child voice
Scenarios in which child voices are required
Chinese only
8K/16K
Yes
None
Aiwei
Aiwei
Little girl voice
Scenarios in which child voices are required
Chinese only
8K/16K
Yes
None
Aibao
Aibao
Little girl voice
Scenarios in which child voices are required
Chinese only
8K/16K
Yes
None
Harry
Harry
Male voice with British accent
English only
English only
8K/16K
No
None
Abby
Abby
Female voice with American accent
English only
English only
8K/16K
No
None
Andy
Andy
Male voice with American accent
English only
English only
8K/16K
No
None
Eric
Eric
Male voice with British accent
English only
English only
8K/16K
No
None
Emily
Emily
Female voice with British accent
English only
English only
8K/16K
No
None
Luna
Luna
Female voice with British accent
English only
English only
8K/16K
No
None
Luca
Luca
Male voice with British accent
English only
English only
8K/16K
No
None
Wendy
Wendy
Female voice with British accent
English only
English only
8K/16K/24K
No
None
William
William
Male voice with British accent
English only
English only
8K/16K/24K
No
None
Olivia
Olivia
Female voice with British accent
English only
English only
8K/16K/24K
No
None
Shanshan
Shanshan
Voice of a female that speaks Cantonese
Scenarios in which dialects are used
Cantonese (simplified) and bilingual (Cantonese and English)
8K/16K/24K
No
None
Xiaoyue
Xiaoyue
Female voice with Sichuan accent
Scenarios in which dialects are used
Chinese or bilingual (Chinese and English)
8K/16K
No
In public preview
Lydia
Lydia
Female voice with bilingual (Chinese and English)
English only
English only
8K/16K
No
In public preview
Aishuo
Aishuo
Natural male voice
Customer service
Chinese or bilingual (Chinese and English)
8K/16K
Yes
In public preview
Qingqing
Qingqing
Voice of a female that speaks Taiwanese
Scenarios in which dialects are used
Chinese only
8K/16K
No
In public preview
Cuijie
Cuijie
Voice of a female that speaks Northeastern Mandarin
Scenarios in which dialects are used
Chinese only
8K/16K
No
In public preview
Xiaoze
Xiaoze
Male voice with strong Hunan accent
Scenarios in which dialects are used
Chinese only
8K/16K
Yes
In public preview
Limits
The input text must be
UTF-8
encoded.The input text can be up to 300 characters in length. If the text contains more than 300 characters, the additional characters are deleted.
Service addresses
Type | Description | URL |
Access from external networks | You can use the URL to access the speech synthesis service from all clients over the Internet. The URL for external access is specified as the default URL in the SDK. | wss://nls-gateway-ap-southeast-1.aliyuncs.com/ws/v1 |
1. Provide a token to pass the authentication
To establish a WebSocket connection from your client to the server and provide a token to pass the authentication. For more information about how to obtain a token, see Obtain a token.
2. Start the synthesis task
The client sends a request to start speech synthesis. You can use the SET method of the SpeechSynthesizer object in the SDK to configure request parameters. The following table describes the request parameters.
Parameter | Type | Required | Description |
appkey | String | Yes | The appkey of your project that is created in the Intelligent Speech Interaction console. |
text | String | Yes | The text that you want to synthesize. The text must be |
voice | String | No | The speaker that you want to use. Default value: |
format | String | No | The audio coding format. Default value: pcm. Valid values: pcm, wav, and mp3. |
sample_rate | Integer | No | The audio sampling rate. Unit: Hz. Default value: 16000. |
volume | Integer | No | The voice volume of the speaker. Value range: 0 to 100. Default value: 50. |
speech_rate | Integer | No | The speed at which the speaker speaks. Value range: -500 to 500. Default value: 0. |
pitch_rate | Integer | No | The intonation of the speaker. Value range: -500 to 500. Default value: 0. |
3. Receive the synthesized audio data
The server returns the synthesized audio data in binary format. The client receives and processes the audio data by using the SDK.
4. Complete the synthesis task
After the synthesis task is completed, the server sends a notification message. The following example shows a sample notification message:
{
"header": {
"message_id": "05450bf69c53413f8d88aed1ee60****",
"task_id": "640bc797bb684bd6960185651307****",
"namespace": "SpeechSynthesizer",
"name": "SynthesisCompleted",
"status": 20000000,
"status_message": "GATEWAYSUCCESSSuccess."
}
}
In the demo, the synthesized audio is stored in a file. If you want to play the synthesized audio during the reception process, we recommend that you use stream playback. The stream playback mode allows you to play the synthesized audio while audio data is being received. This reduces the amount of time that you need to wait before you can play the audio.
Status codes
Each response contains a status code. The following tables describe the status codes.
Common errors
Status code | Cause | Solution |
40000001 | The client failed to pass the authentication. | Check whether the token that is used by the client is valid or expired. |
40000002 | The request is invalid. | Check whether the request that is sent by the client meets the requirements. |
403 | The token is expired or the request contains invalid parameters. | Check whether the token that is used by the client is expired. Then, check whether the parameter values are valid. |
40000004 | The client timed out. | Check whether the client did not send data to the server for a long period of time, such as 10 seconds. |
40000005 | The number of requests exceeds the upper limit. | Check whether the number of concurrent connections or queries per second (QPS) value exceeds the upper limit. If the number of concurrent connections exceeds the upper limit, we recommend that you upgrade Intelligent Speech Interaction from Trial Edition to Commercial Edition. If you use Commercial Edition, we recommend that you purchase more resources to increase the concurrency. |
40000000 | A client error occurred. This is the default status code for client errors. | Resolve the error based on the error message or submit a ticket. |
50000000 | A server error occurred. This is the default status code for server errors. | If the status code is occasionally returned, ignore it. If the status code is returned multiple times, submit a ticket. |
50000001 | An internal call error occurred. | If the status code is occasionally returned, ignore it. If the status code is returned multiple times, submit a ticket. |
Gateway errors
Status code | Cause | Solution |
40010001 | The method is not supported. | If you use the SDK, submit a ticket. |
40010002 | The instruction is not supported. | If you use the SDK, submit a ticket. |
40010003 | The instruction format is invalid. | If you use the SDK, submit a ticket. |
40010004 | The client unexpectedly disconnected. | Check whether the client disconnected before the server completed the requested task. |
40010005 | The task status is invalid. | Check whether the instruction is supported when the task is in the current state. |
Configuration errors
Status code | Cause | Solution |
40020105 | The application does not exist. | Check whether the appkey is correct and belongs to the same Alibaba Cloud account as the token. |
Text-to-speech (TTS) service errors
Status code | Cause | Solution |
41020001 | One or more parameters are invalid. | Check whether the specified parameter values are valid. |
51020001 | A TTS server error occurred. | If the status code is occasionally returned, ignore it. If the status code is returned multiple times, submit a ticket. |