SSML overview

This topic describes the features and tags of Speech Synthesis Markup Language (SSML) and provides examples on how to use SSML.

Overview

SSML is an XML-based markup language for speech synthesis. Compared with plain text synthesis, SSML-based synthesis improves the quality of synthesized content and supports various synthesis effects. You can use SSML to specify the content that the speech synthesis service reads and specifies how the service reads the text. For example, you can specify how to break sentences and words, control the pronunciation, and pauses.

Note

The speech synthesis service provided by Alibaba Cloud is implemented based on SSML 1.0 of World Wide Web Consortium (W3C). For more information, see Speech Synthesis Markup Language (SSML) Version 1.1. However, not all the markup types that are defined in the W3C standard are supported. The speech synthesis service supports markup types based on your business requirements.

Usage notes

SSML is supported for Chinese and English, and the SSML tags and content supported for each language vary. The following sections describe the tags and provide examples on the tags.
All texts must be enclosed between the <speak> and </speak> tags. You can use the combination of the <speak> and </speak> tags multiple times in a speech synthesis task and use SSML together with texts.
The XML header before the <speak> tag at the beginning of a text can be omitted.
If the text enclosed by a tag contains special XML characters, you must escape the characters. The following section describes the special characters and the corresponding escape characters:
- Double quotation marks ("): "
- Single quotation mark ('): '
- Ampersand (&): &
- Less-than sign (<): <
- Greater-than sign (>): >

Note

Intelligent voices and human voice cloning (Basic Edition) support all SSML tags and attributes that are described in this topic.

Human voice cloning (Public Edition) supports the <speak>, <break>, <s>, , <w>, <phoneme>, and <say-as> tags for human voice cloning (Public Edition). In this case, the <speak> tag supports the rate, pitch, and volume attributes. Other tags do not support attribute configuration and referred to as tags with empty attributes.

Tags

<speak>

Description
The <speak> tag is the root node of all SSML tags to be supported. All texts that needs to call SSML tags must be enclosed between the <speak> and </speak> tags.

Syntax

<speak>Text that needs to call SSML tags</speak>

Attributes

The following table describes the attributes that are supported by the <speak> tag.

Attribute name	Attribute type	Attribute value	Required	Description

Attribute name	Attribute type	Attribute value	Required	Description
voice	String	The name of the voice that can be called. The value of the voice attribute can only contain lowercase letters, such as siyue.	No	This attribute is included in the proprietary tag of Alibaba Cloud for speech synthesis. This attribute specifies the voice that is used for speech synthesis. The specified voice has a higher priority than the voice that is specified by the `voice` parameter in an API request. For more information, see Intelligent voice samples.
encodeType	String	PCM/WAV/MP3	No	This attribute is included in the proprietary tag of Alibaba Cloud for speech synthesis. This attribute specifies the audio file format for speech synthesis. The specified audio file format has a higher priority than the audio file format that is specified by the `format` parameter in an API request.
sampleRate	String	8000/16000/24000/48000	No	This attribute is included in the proprietary tag of Alibaba Cloud for speech synthesis. This attribute specifies the audio sampling rate for speech synthesis. The specified audio sampling rate has a higher priority than the audio sampling rate that is specified by the `sample_rate` parameter in an API request.
rate	String	Valid values: an integer ranging from -500 to 500. Default value: 0. A value greater than 0 indicates that the speech rate is increased. A value less than 0 indicates that the speech rate is reduced.	No	This attribute is included in the proprietary tag of Alibaba Cloud for speech synthesis. This attribute specifies the audio speed for speech synthesis. The specified audio speed has a higher priority than the audio speed that is specified by the `speech_rate` parameter in an API request.
pitch	String	Valid values: an integer ranging from -500 to 500. Default value: 0. A value greater than 0 indicates that the pitch rises. A value less than 0 indicates that the pitch falls.	No	This attribute is included in the proprietary tag of Alibaba Cloud for speech synthesis. This attribute specifies the audio pitch for speech synthesis. The specified audio pitch has a higher priority than the audio pitch that is specified by the `pitch_rate` parameter in an API request.
volume	String	Valid values: an integer ranging from 0 to 100. Default value: 50. A value greater than 50 indicates that the volume is increased. A value less than 50 indicates that the volume is reduced.	No	This attribute is included in the proprietary tag of Alibaba Cloud for speech synthesis. This attribute specifies the audio volume for speech synthesis. The specified audio volume has a higher priority than the audio volume that is specified by the `volume` parameter in an API request.
effect	String	robot/lolita/lowpass/echo/eq/lpfilter/hpfilter	No	This attribute is included in the proprietary tag of Alibaba Cloud for speech synthesis. This attribute can be used to produce various sound effects for the synthesized speech. Valid values: robot: robot voice lolita: little girl voice lowpass: low-pass effect echo: echo effect eq: equalizer lpfilter: low-pass filter hpfilter: high-pass filter Note The eq, lpfilter, and hpfilter values specify advanced filters. If you set this attribute to eq, lpfilter, or hpfilter, you can configure the `effectValue` attribute to specify a custom effect for the specified filter. An SSML structure supports only one sound effect. You cannot set this attribute to multiple values. If you configure this attribute, the system latency may increase.
effectValue	String	The effect of a specific filter. If you set the effect attribute to eq, lpfilter, or hpfilter, you can configure this attribute to modify the default effect of the specified filter.	No	eq: specifies the equalizer. The system provides eight default bands Frequencies: ["40Hz", "100Hz", "200Hz", "400Hz", "800Hz", "1600Hz", "4000Hz", "12000Hz"]; Bandwidths: ["1.0q", "1.0q", "1.0q", "1.0q", "1.0q", "1.0q", "1.0q", "1.0q"]. If you configure this attribute, you must specify a gain for each band. The gain ranges from -20 dB to 20 dB. For example, you can set the effectValue attribute to 1 1 1 1 1 1 1 1. The input value is a string consisting of eight integers separated by spaces. The value 0 indicates that the gain of the band is not adjusted. lpfilter: the frequency of the low-pass filter. The value is an integer in the range of (0, Required sampling rate/2]. For example, you can set the effectValue attribute to 800. hpfilter: the frequency of the high-pass filter. The value is an integer in the range of (0, Required sampling rate/2]. For example, you can set the effectValue attribute to 1200.
bgm	String	The name of the background music (BGM) that can be called online. You can view the description of the bgm attribute to obtain more information.	No	This attribute is included in the proprietary tag of Alibaba Cloud for speech synthesis. This attribute specifies the BGM of the synthesized speech.
backgroundMusicVolume	String	Valid values: an integer ranging from 0 to 100. Default value: 50. A value greater than 50 indicates that the volume is increased. A value less than 50 indicates that the volume is reduced.	No	This attribute is included in the proprietary tag of Alibaba Cloud for speech synthesis. This attribute specifies the volume of the BGM.

The following table describes the bgm attribute.

Built-in BGM URL	Custom BGM URL

Built-in BGM URL

Custom BGM URL

The speech synthesis service provides several built-in BGM streams. You can click the following URLs to listen to the BGM streams:

You can use custom BGM based on your business requirements. Before you specify custom BGM, you must store the BGM in your Alibaba Cloud Object Storage Service (OSS) bucket whose access control list (ACL) is public read or public read/write. For more information about how to create a bucket, see Create a bucket. You can use the HTTP or HTTPS protocol to generate a URL for the object that is stored in a bucket. For more information, see Step 2: Upload an object.

Requirements for audio files to be uploaded:

The audio file must be a mono WAV file with a sampling rate of 16 kHz.
The size of a short text for speech synthesis does not exceed 3.5 MB. The size of a long text for speech synthesis does not exceed 10 MB.
If the synthesis duration is longer than the BGM duration, the BGM is cyclically played. If your audio file is not in the WAV format, you can run the following command to convert the audio file into the WAV format: ffmpeg -i Input audio file -acodec pcm_s16le -ac 1 -ar 16000 Required audio file.wav.
If the URL in the tag contains special XML characters, escape the characters.
The bit depth is 16 bits.

Important

You are legally liable for the copyright of the uploaded audio file.

Tag relationships
The <speak> tag can contain texts and the following tags:
- <break>
- <s>
- <w>
- <phoneme>
- <say-as>

Examples

Empty attribute

<speak>Text that needs to call SSML tags</speak>

Synthesis result: SSML-speak1.mp3

Attribute voice

<speak voice="xiaogang"> This is a male voice. </speak>

Synthesis result: SSML-speak2.mp3

Attribute encodeType

<speak encodeType="mp3">I can generate audio in the compressed format. </speak>

Synthesis result: SSML-encode.mp3

Attribute sampleRate

<speak sampleRate="8000">The size of the file is half of the audio at a sampling rate of 16 kHz. </speak>

Synthesis result: SSML-speak4.mp3

Attribute rate

<speak rate="200">I speak faster than the average. </speak>

Synthesis result: SSML-speak5.mp3

Attribute pitch

<speak pitch="-100">My voice pitch is lower than others. </speak>

Synthesis result: SSML-speak6.mp3

Attribute volume
```
<speak volume="80">My voice is loud. </speak>
```
Synthesis result: SSML-speak7.mp3

Combination of attributes that are separated by spaces

<speak rate="200" pitch="-100" volume="80">This is how my voice sound when multiple attributes are used. </speak>

Synthesis result: SSML-speak8.mp3

Attribute effect

<speak effect="robot">Do you like the Wall-E robot? </speak>

Synthesis result: SSML-speak9.mp3

Attribute bgm

<speak bgm="http://nls.alicdn.com/bgm/2.wav" backgroundMusicVolume="30" rate="-500" volume="40"><break time="2s"/>The ancient trees on the shady cliffs are covered in a thick layer of moss<break time="700ms"/>The sound of rain can still be heard echoing through the bamboo forest<break time="700ms"/>The silk production contributes to the national economy<break time="700ms"/>The scenery of Mianzhou is worth seeing<break time="2s"/></speak>

Synthesis result: SSML-speak10.mp3

<emotion>

Description
The <emotion> tag is used to apply multi-emotional voices to speech synthesis. This tag is optional. If you configure the tag for a voice that does not support multiple emotions, an error occurs.

Syntax

<emotion category="happy" intensity="1.0">What a nice day！ </emotion>

Attributes

The following table describes the attributes that are supported by the <emotion> tag.

Attribute name	Attribute type	Attribute value	Required	Description

Attribute name	Attribute type	Attribute value	Required	Description
category	String	Enumeration value, such as neutral and happy.	Yes	The speech emotion. The following table describes the supported emotions for each voice.
intensity	String	The value is a floating-point number within the range of 0.01 to 2.0.	No	The intensity of the emotion. The default value is 1.0, which indicates the predefined emotional intensity. The minimum value is 0.01, which indicates a slight inclination toward a specific emotion. The minimum value is 2.0, which indicates that the emotional intensity is doubled.

The multi-emotional voices support different emotion categories.

Voice name	Value of voice	Emotion category

Voice name	Value of voice	Emotion category
Zhi Miao_multi-emotional	zhimiao_emo	serious, sad, disgust, jealousy, embarrassed, happy, fear, surprise,neutral, frustrated, affectionate, gentle, angry, newscast, customer-service, story, and living
Zhi Mi_multi-emotional	zhimi_emo	angry, fear, happy, hate, neutral, sad, and surprise
Zhi Yan_multi-emotional	zhiyan_emo	neutral, happy, angry, sad, fear, hate, surprise, and arousal
Zhi Bei_multi-emotional	zhibei_emo	neutral, happy, angry, sad, fear, hate, and surprise
Zhi Tian_multi-emotional	zhitian_emo	neutral, happy, angry, sad, fear, hate, and surprise

Tag relationships
The <emotion> tag can contain texts and the following tags:
- <s>
- 
- <say-as>
- <w>
- <phoneme>
- <soundEvent/>
- <break/>

Example

Empty attribute

<speak voice="zhitian_emo"><emotion category="happy" intensity="1.0">What a nice day! </emotion></speak>

Synthesis result: SSML-emotion.wav

<break>

Description
The <break> tag inserts pauses in texts and is optional.
Syntax
```
<break time="string"/>
```

Attributes

Attribute name	Attribute type	Attribute value	Required	Description

Attribute name

Attribute type

Attribute value

Required

Description

time

String

[number]s/[number]ms

The pause length, in seconds or milliseconds. Example: 2 seconds or 50 milliseconds.

If the pause is in seconds, the value of number is an integer within the range of [1, 10]. In this case, the value is in the format of [number]s.
If the pause is in milliseconds, the value of number is an integer within the range of [50, 10000]. In this case, the value is in the format of [number]ms.

Tag relationships
The <break> tag is an empty tag and cannot contain any tags. If the <s> tag is used, you must enclose the <break> tag between the <s> and </s> tags, which indicates that a pause is inserted into the sentences or paragraphs.

Example

<speak>Close your eyes and have a rest.<break time="500ms"/>OK, please open your eyes. </speak>

Synthesis result: SSML-break.mp3

<s>

Description
The <s> tag specifies the sentence structure in a text and is optional.
Syntax
```
 <s>Text</s>
```
Attributes
N/A.
Tag relationships
The <s> tag can contain texts and the following tags:
- <break>
- <w>
- <phoneme>
- <say-as>

Example

<speak><s>This is the first sentence.</s><s>This is the second sentence.</s></speak>

Synthesis result: SSML-s.mp3

Description
The tag is used to replace the text enclosed by a tag with an alias.
Syntax
```
 
```
Attributes
Attribute name
Attribute type
Attribute value
Required
Description
Attribute name
Attribute type
Attribute value
Required
Description
alias
String
The content of the new text.
Yes
The text that is used to replace the text in a tag.
Tag relationships
The tag can contain texts.

Example

<speak><sub alias="Network protocol standard">W3C</sub></speak>

Synthesis result: SSML-sub.mp3

<w>

Description
The <w> tag specifies the word structure in a text and is optional. In most cases, spaces are used for word segmentation in English texts. You do not need to use this tag. The text enclosed by the <w> and </w> tags must be an independent word or phrase only in English.
Syntax
```
 <w>Text</w>
```
Attributes
N/A.
Tag relationships
The <w> tag can contain texts.

Examples

<speak>Mayor of Nanjing<w>Jiang Daqiao</w>gave a speech today. </speak>

Synthesis result: SSML-w.mp3

<phoneme>

Description
The <phoneme> tag controls the pronunciation of the text enclosed by the tag and is optional. The tag is not supported for English texts.

Syntax

<phoneme alphabet="string" ph="string">Text</phoneme>

Attributes

Attribute name	Attribute type	Attribute value	Required	Description

Attribute name

Attribute type

Attribute value

Required

Description

alphabet

String

Yes

The value of py indicates Pinyin.

String

The Pinyin string that corresponds to the text enclosed by the tag.

Yes

Value assignment rules for pinyin:

Pinyin syllables are separated by spaces. The number of Pinyin syllables must be the same as the number of words.
Each Pinyin syllable is composed of sound and tone marks. The tone marks are represented by tone numbers 1 to 5, in which 5 indicates the neutral tone.

Tag relationships
The <phoneme> tag can contain texts.

Example

 <speak>qu<phoneme alphabet="py" ph="dian3 dang4 hang2">dian dang hang</phoneme>ba zhe ge wan yi<phoneme alphabet="py" ph="dang4 diao4">dang diao</phoneme></speak>

Synthesis result: SSML-phoneme.mp3

<soundEvent>

Description
The <soundEvent> tag is used to insert a sound cue in any position of the text during SSML-based synthesis.
Syntax
```
 <soundEvent src="URL"/>
```

Attributes

Attribute name	Attribute type	Attribute value	Required	Description

Attribute name

Attribute type

Attribute value

Required

Description

src

String

The URL of the sound cue.

Yes

You can use a custom sound cue based on your business requirements. Before you specify a custom BGM, store the BGM in your OSS bucket whose ACL is public read or public read/write. For more information about how to create a bucket, see Create a bucket. You can use the HTTP or HTTPS protocol to generate a URL for the object that is stored in a bucket. For more information, see Step 2: Upload an object.

Requirements for audio files to be uploaded:

The audio file must be a mono WAV file with a sampling rate of 16 kHz.
The maximum file size is 2 MB.
The bit depth is 16 bits.

Important

You are legally liable for the copyright of the uploaded audio file.

Tag relationships
The <soundEvent> tag is an empty tag and cannot contain any tags.

Examples

 <speak>A horse was frightened<soundEvent src="http://nls.alicdn.com/sound-event/horse-neigh.wav"/>and people scattered to escape.</speak>

Synthesis result: SSML-sound-event.mp3

<say-as>

Description
The <say-as> tag specifies the type of the text enclosed by the tag, so that the text can be pronounced based on the default pronunciation method of this type.

Syntax

<say-as interpret-as="string">Text </say-as>

Attributes

Attribute name	Attribute type	Attribute value	Required	Description

Attribute name

Attribute type

Attribute value

Required

Description

interpret-as

String

cardinal/digits/telephone/name/address/id/characters/punctuation/date/time/currency/measure

Yes

The type of the text enclosed by the tag. Valid values:

cardinal: The text is read as an integer or decimal number.
digits: The text is read as a digit.
telephone: The text is read as a phone number.
name: The text is read as a name.
address: The text is read as an address.
id: The text is read as an account name or nickname.
characters: The text is read by character.
punctuation: The text is read as a punctuation mark.
• date: The text is read as a date.
• time: The text is read as a time.
• currency: The text is read as an amount.
• measure: The text is read as a measurement unit.

Text types that the <say-as> tag supports

cardinal

Format	Example	Description

Format	Example	Description
Numeric string	145	Valid integers: positive and negative integers with a maximum of 20 digits in the range of [-99999999999999999999,99999999999999999999]. Valid decimals: No limits are imposed on the number of decimal places. However, we recommend that you retain up to 10 decimal places.
Minus sign + numeric string	-145
Numeric string with each three digits separated by a comma	10,000
Minus sign + numeric string with each three digits separated by a comma	-10,124
Numeric string + decimal point + two zeros	10.00
Minus sign + numeric string + decimal point + two zeros	-110.00
Numeric string + decimal point + numeric string	79.090
Minus sign + numeric string + decimal point + numeric string	-79.001

Format	Example	English output	Description

Format	Example	English output	Description
Numeric string	145	one hundred forty five	Valid integers: positive and negative integers with a maximum of 13 digits in the range of [-999999999999,999999999999]. Valid decimals: No limits are imposed on the number of decimal places. However, we recommend that you retain up to 10 decimal places.
A numeric string that starts with a zero	0145	one hundred forty five
Minus sign + numeric string	-145	minus hundred forty five
Numeric string with each three digits separated by a comma	60,000	sixty thousand
Minus sign + numeric string with each three digits separated by a comma	-208,000	minus two hundred eight thousand
Numeric string + decimal point + zero	12.00	twelve
Numeric string + decimal point + numeric string	12.34	twelve point three four
Numeric string with each three digits separated by a comma + decimal point + numeric string	1,000.1	one thousand point one
Minus sign + numeric string + decimal point + numeric string	-12.34	minus twelve point three four
Minus sign + numeric string with each three digits separated by a comma + decimal point + numeric string	-1,000.1	minus one thousand point one
Numeric string (numeric string with each three digits separated by a comma) + hyphen + number (numeric string with each three digits separated by a comma)	1-1,000	one to one thousand
Other default readings	012.34	twelve point three four	None.
	1/2	one half
	-3/4	minus three quarters
	5.1/6	five point one over six
	-3 1/2	minus three and a half
	1,000.3^3	one thousand point three to the power of three
	3e9.1	three times ten to the power of nine point one
	23.10%	twenty three point one percent

digits

Format	Example	Description

Format

Example

Description

Numeric string

129090909

No limits are imposed on the length of the numeric string. We recommend that a numeric string contains up to 20 digits.

If the numeric string contains more than 10 digits, you must insert a pause after each digit.

Format	Example	English output	Description

Format

Example

English output

Description

Numeric string

12034

one two zero three four

No limits are imposed on the length of the numeric string. We recommend that a numeric string contains up to 20 digits.

When digits in a numeric string are grouped by hyphens (-) or spaces, a comma is inserted between the groups to create a pause. Up to five groups are supported for a numeric string.

Numeric string + space or conjunction + numeric string + space or conjunction + numeric string + space or conjunction + numeric string

1-23-456 7890

one, two three, four five six, seven eight nine zero

telephone

Format	Example	Description

Format	Example	Description
Landline number	4930286	A landline number can be seven or eight digits. You can use spaces or hyphen (-) to separate the digits. A 7-digit landline number can be divided into two groups. In this case, the first group contains 3 digits, and the second group contains 4 digits. A 8-digit landline number can be divided into two groups. In this case, each group contains 4 digits.
	493 0286
	493-0286
	62552560
	6255 2560
	6255-2560
Landline number + extension number	4930286-109	An extension number can have up to four digits.
	4930286, extension 109
	4930286, extension 109
	4930286, extension 109
Area code + landline number	01062552560	Area codes of 010, 02x, 03xx, 04xx, 05xx, 07xx, 08xx, and 09xx are supported.
	010 62552560
	010 6255 2560
	010 6255-2560
	010-62552560
	010-6255-2560
	(010)62552560
	03198907098
	0319-8907098
Area code + landline number + extension number	010 62552560-109	None.
	010-62552560-109
	(010)62552560-109
	(010)62552560, extension 109
	(010)62552560, extension 109
	(010)62552560, extension 109
Country code + area code + landline number	86-010-62791627	Country code formats of 86, (86), +86, (+86), and 0086 are supported, all of which are read as eight-six.
	(86)10-62791627
	+86-010-62791627
	0086-10-62791627
	(+86)-10-6279 1627
Country code + area code + landline number + extension number	(86)21-58118818-207	None.
	(86)021-5811-8818-207
	(86)021-58118818, x. 207
	(86)21-5811-8818, ex. 207
	+86-021-58118818, extension 207
Mobile phone number	139 0000 5678	A mobile phone number consists of 11 digits and can be separated in the formats of 3-3-5 and 3-4-4.
	139-000-05678
	139 000 05678
Country code + mobile number	+86-13900005678	None.
	(+86)-139-0000-5678
	+8613900005678
	0086-139 000 05678
Service number	123	Common service numbers are supported. A 10-digit service number can start with 400 or 800 and can be separated in the format of 3-4-4. A 16-digit service number can start with 12530, 17951, and 12593.
	95678
	4008110510
	800-810-8888
	1253013520638377
Remarks	(86)(21)9899-80800-0909	The numeric string and separators are supported. The separators can be parentheses and hyphens (-).

Format	Example	English output	Description

Format	Example	English output	Description
Numeric string	12034	one two oh three four	No limits are imposed on the length of the numeric string. We recommend that a numeric string contains up to 20 digits. When digits in a numeric string are grouped by hyphens (-) or spaces, a comma is inserted between the groups to create a pause. Up to five groups are supported for a numeric string.
Numeric string + space or conjunction + numeric string + space or conjunction + numeric string	1-23-456 7890	one, two three, four five six, seven eight nine oh
Plus sign + numeric string + space or conjunction + numeric string	+43-211-0567	plus four three, two one one, oh five six seven
Left parenthesis + numeric string + right parenthesis + space + numeric string + space or conjunction + numeric string	(21) 654-3210	(two one) six five four, three two one oh

Format	Example	Description

Format	Example	Description
String	dell0101	Uppercase and lowercase letters, digits from 0 to 9, and underscores (_) are supported. The output space indicates that a pause is inserted between characters, and characters are read one by one.
	myid_1998
	AiTest

In English texts, this tag serves the same as the characters tag.

characters

Format	Example	Description

Format	Example	Description
String	ISBN 1-001-099098-1	Chinese characters, uppercase and lowercase letters, digits from 0 to 9, and specific full-width and half-width characters are supported. The output space indicates that a pause is inserted between characters, and characters are read one by one. If the text enclosed by the tag contains special XML characters, you must escape the characters.
	x10b2345_u
	v1.0.1
	Version 2.0
	Su M MA000
	Airbus A330
	Models s01, s02, and s03
	Airbus A330
	αβγ

Format	Example	English output	Description

Format

Example

English output

Description

String

*b+3$.c-0'=α

asterisk B plus three dollar dot C dash zero apostrophe equals alpha

Chinese characters, uppercase and lowercase letters, digits from 0 to 9, and specific full-width and half-width characters are supported.

The output space indicates that a pause is inserted between characters, and characters are read one by one.

If the text enclosed by the tag contains special XML characters, you must escape the characters.

punctuation

Format	Example	Description

Format	Example	Description
Punctuations	...	Common Chinese and English punctuation marks are supported. The output space indicates that a pause is inserted between characters, and characters are read one by one. If the text enclosed by the tag contains special XML characters, you must escape the characters.
	...
	!"#$%&
	'()*+
	,-./:;
	<=>?@
	[\]^_

In English texts, this tag serves the same as the characters tag.

date

Format	Example	Description

Format	Example	Description
Year	71	Two-digit and four-digit years are supported. Two-digit years range from 60 to 99, 00 to 09, and 10 to 19. Four-digit years range from 1000 to 1999 and 2000 to 2099.
	04
	19
	1011
	1998
	2008
Year and month	April, 98	The months from January to September can be represented by a number with or without a zero. For example, in April 1908, April can be represented by 4 or 04.
	April 1998
	August, 08
	August 2008
Year, month, and day	April 23, 98	The days from the first to ninth day in a month can be represented by a number with or without a zero. For example, if you want to represent the date of April 8 in 1908, you can use 4 or 04 to indicate April and 8 or 08 to indicate Day 8.
	April 23, 1998
	August 8, 08
	August 08, 2008
Year, month, and day	April 23, 98	The days from the first to ninth day in a month can be represented by a number with or without a zero. For example, if you want to represent the date of April 8 in 1908, you can use 4 or 04 to indicate April and 8 or 08 to indicate Day 8.
	April 23, 1998
	August 8, 08
	August 08, 2008
Month and day	March 20	None.
	August 07
Year and month	2018/08	Forward slashes (/), hyphens (-), and periods (.) can be used as separators between the days, months and years.
	2018-08
	2018.08
Year, month, and day	2018/08/08
	2018-8-8
	2018.08.08
Year, month, and day~year, month, and day	September 1~30, 04	Tildes (~) and hyphens (-) can be used as separators between dates.
	September 01, 2004 - June 08, 2008
Year, month, and day~day	September 1~30, 04
	September 01, 2004 - June 08, 2008
Year and month~year and month	April, 01~April, 10
	April 2001 ~ April 2010
Month and day~month and day	October 1~October 7
	October 01~October 07
Month and day~day	October 1~7
	October 01~07
Year, month, and day	2018/03/03~2019/01/01	Forward slashes (/) and periods (.) can be used as separators between the days, months, and years, and tildes (~) and hyphens (-) can be used as separators between dates.
	1997.9.9~1998.9.9
Month and day	10/20~10/31
Month~month	Jan~Oct
	January~October
Year, month, and day	10/20/2018	Only 4-digit years are supported. Only forward slashes (/) can be used as the separators. Only the format of Month/Day/Year is supported.

Format	Example	English output	Description

Format	Example	English output	Description
Four digits/Two digits or four digits-Two digits	2000/01	two thousand, oh one	Cross-year range
	1900-01	nineteen hundred, oh one
	2001-02	twenty oh one, oh two
	2019-20	twenty nineteen, twenty
	1998-99	nineteen ninety eight, ninety nine
	1999-00	nineteen ninety nine, oh oh
A 4-digit number that starts with 1 or 2	2000	two thousand	4-digit years
	1900	nineteen hundred
	1905	nineteen oh five
	2021	twenty twenty one
Day of the week-Day of the week or Day of the week~Day of the week or Day of the week&Day of the week	mon-wed	monday to wednesday	If the text enclosed by the tag contains special XML characters, you must escape the characters.
	tue~fri	tuesday to friday
	sat&sun	saturday and sunday
DD-DD MMM, YYYY or DD~DD MMM, YYYY or DD&DD MMM, YYYY	19-20 Jan, 2000	the nineteen to the twentieth of january two thousand	DD specifies the 2-digit day, MMM specifies the 3-letter abbreviation of the month or a full month name, and YYYY specifies the 4-digit year that starts with 1 or 2.
	01 ~ 10 Jul, 2020	the first to the tenth of july twenty twenty
	05&06 Apr, 2009	the fifth and the sixth of april two thousand nine
MMM DD-DD or MMM DD~DD or MMM DD&DD	Feb 01 - 03	feburary the first to the third	MMM specifies the 3-letter abbreviation of a month or a full month name, and DD specifies a 2-digit day.
	Aug 10~20	august the tenth to the twentieth
	Dec 11&12	december the eleventh and the twelfth
MMM-MMM or MMM~MMM or MMM&MMM	Jan-Jun	january to june	MMM specifies the 3-letter abbreviation of a month or a full month name.
	jul ~ dec	july to december
	sep&oct	september and october
YYYY-YYYY or YYYY~YYYY	1990 - 2000	nineteen ninety to two thousand	YYYY specifies the 4-digit year that starts with 1 or 2.
YYYY-YYYY or YYYY~YYYY	2001~2021	two thousand one to twenty twenty one	YYYY specifies the 4-digit year that starts with 1 or 2.
WWW DD MMM YYYY	Sun 20 Nov 2011	sunday the twentieth of november twenty eleven	WWW specifies the 3-letter abbreviation of the day of the week or the full name for the day of the week. DD specifies the 2-digit day. MMM specifies the 3-letter abbreviation of the month or a full month name. MM specifies the 2-digit month number, the 3-letter abbreviation of the month, or a full month name. YYYY specifies the 4-digit year that starts with 1 or 2.
WWW DD MMM	Sun 20 Nov	sunday the twentieth of november
WWW MMM DD YYYY	Sun Nov 20 2011	sunday november the twentieth twenty eleven
WWW MMM DD	Sun Nov 20	sunday november the twentieth
WWW YYYY-MM-DD	Sat 2010-10-01	aturday october the first twenty ten
WWW YYYY/MM/DD	Sat 2010/10/01	saturday october the first twenty ten
WWW MM/DD/YYYY	Sun 11/20/2011	sunday november the twentieth twenty eleven
MM/DD/YYYY	11/20/2011	november the twentieth twenty eleven
YYYY	1998	nineteen ninety eight
Other default readings	10 Mar, 2001	the tenth of march two thousand one	None.
	10 Mar	the tenth of march
	Mar 2001	march two thousand one
	Fri. 10/Mar/2001	friday the tenth of march two thousand one
	Mar 10th, 2001	march the tenth two thousand one
	Mar 10	march the tenth
	2001/03/10	march the tenth two thousand one
	2001-03-10	march the tenth two thousand one
	2000s	two thousands
	2010's	twenty tens
	1900's	nineteen hundreds
	1990s	nineteen nineties

time

Format	Example	Description

Format	Example	Description
Time	12:00	Common time and time range formats are supported.
	12:00:00
	10:20
	10:20:30
	09:18:14
Point in time~Point in time	11:00~12:00
	09:00-14:00
	11:00~11:30
	11:00-12:18
	10:30~11:00
	09:28-10:00
	10:20~11:20
	06:00~08:00
	10:20 a.m.~1:30 p.m.
Abbreviation of time	5:00 am
	5:30 am
	5:20:12 am
	7:00 am
	7:30 AM
	7:20:12 a.m.
	07:08:12 A.M.
	5:00 pm
	5:30 PM
	5:20:12 p.m.
	05:09:12 P.M.
	9:00 pm
	9:30 pm
	9:20:12 PM
	9:02:12 P.M.
	12:00 pm
	12:30 p.m.
	12:20:12 PM

Format	Example	English output	Description

Format	Example	English output	Description
HH:MM AM or PM	09:00 AM	nine A M	HH specifies 1- or 2-digit hours. MM specifies 2-digit minutes. AM specifies the time before noon. PM specifies the time after noon.
	09:03 PM	nine oh three P M
	09:13 p.m.	nine thirteen p m
HH:MM	21:00	twenty one hundred
HHMM	100	one oclock
Point in time-Point in time	8:00 am - 05:30 pm	eight a m to five p m	Common time range and formats are supported.
	7:05~10:15 AM	seven oh five to ten fifteen A M
	09:00-13:00	nine oclock to thirteen hundred

currency

Format	Example	Description

Format	Example	Description
Number + currency code	12.00 RMB	The following currency codes are supported: AUD, CAD, HKD, JPY, USD, CHF, NOK, SEK, GBP, RMB, CNY, and EUR. Integers, decimals, and international expressions separated by commas (,) are supported.
	12.50 RMB
	12,000,000 RMB
	12,000,000.00 RMB
	12,000.35 RMB
Currency symbol + number	$12	The following currency symbols are supported: Canadian dollar ($), US dollar ($), French franc (Fr), Danish krona (kr), pound sterling (£), Chinese yuan (¥), and euro (€). Integers, decimals, and international expressions separated by commas (,) are supported.
	$12.00
	$12.12
	$12,000
	$12,000.00
	$12,000.99
Other default readings	1213	None.
	1213 KML
	1213.00 KML
	1213.9 KML
	1,000 KML
	1,000.00 KML
	1,000.98 KML
	12,000

Format	Example	English output	Description

Format	Example	English output	Description
Number + currency code	1.00 RMB	one yuan	Integers, decimals, and international expressions separated by commas (,) are supported. Supported currency codes: CN¥ (yuan) CNY (yuan) RMB (yuan) AUD (australian dollar) CAD (canadian dollar) CHF (swiss franc) DKK (danish krone) EUR (euro) GBP (british pound) HKD (Hong Kong(China) dollar) JPY (japanese yen) NOK (norwegian krone) SEK (swedish krona) SGD (singapore dollar) USD (united states dollar)
	2.02 CNY	two point zero two yuan
	1,000.23 CN¥	one thousand point two three yuan
	1.01 SGD	one singapore dollar and one cent
	2.01 CAD	two canadian dollars and one cent
	3.1 HKD	three hong kong dollars and ten cents
	1,000.00 EUR	one thousand euros
Currency code + number	US$ 1.00	one US dollar	Integers, decimals, and international expressions separated by commas (,) are supported. Supported currency codes: US$ (US dollar) CA$ (Canadian dollar) AU$ (Australian dollar) SG$ (Singapore dollar) HK$ (Hong Kong dollar) C$ (Canadian dollar) A$ (Australian dollar) $ (dollar) £ (pound) € (euro) CN¥ (yuan) CNY (yuan) RMB (yuan) AUD (australian dollar) CAD (canadian dollar) CHF (swiss franc) DKK (danish krone) EUR (euro) GBP (british pound) HKD (Hong Kong(China) dollar) JPY (japanese yen) NOK (norwegian krone) SEK (swedish krona) SGD (singapore dollar) USD (united states dollar)
	$0.01	one cent
	JPY 1.01	one japanese yen and one sen
	£1.1	one pound and ten pence
	€ 2.01	two euros and one cent
	USD 1,000	one thousand united states dollars
Number + numerical unit + currency code or Currency code + number + numerical unit	1.23 Tn RMB	one point two three trillion yuan	The following numerical units are supported: thousand million billion trillion Mil (million) mil (million) Bil (billion) bil (billion) MM (million) Bn (billion) bn (billion) Tn (trillion) tn (trillion) K(thousand) k (thousand) M (million) m (million)
	$1.2 K	one point two thousand dollars

measure

Format	Example	Description

Format	Example	Description
Number + Chinese unit	2 pieces	Common Chinese units and unit abbreviations are supported.
	120 hectares
	More than 100 milligrams
	About 100 meters
	More than 100 persons
	1 centimeter and 20 millimeters
	120.00 square kilometers
Number + unit abbreviation	120.56 cm²
	One hundred twenty square meters fifty-six square centimeters
	100 m 12 cm 6 mm
Range	10~15 kg
	10.24 to 789.82 Mu
	10 meters to 15 meters
	10.24 cm~19.08 cm
Number + unit + "/" + unit	CNY 10/kg
	CNY 199 to 299/piece
	CNY 299.99/g to CNY 399.99/g
Other default readings	12 bunches
	30 rm
	400,000,000 fellows
	12.897 micrograms

Format	Examples	English output	Description

Format	Examples	English output	Description
Number + measurement unit	1.0 kg	one kilogram	Integers, decimals, and international expressions separated by commas (,) are supported. Common unit abbreviations are supported.
Number + measurement unit	1,234.01 km	one thousand two hundred thirty four point zero one kilometres.
Measurement unit	mm²	square millimetre

The following table describes the common notations that the <say-as> tag supports.

Notations	Pronunciation in English

Notations	Pronunciation in English
!	exclamation mark
"	double quote
#	pound
$	dollar
%	percent
&	and
'	left quote
(	left parenthesis
)	right parenthesis
*	asterisk
+	plus
,	comma
-	dash
.	dot
/	slash
:	solon
;	semicolon
<	less than
=	equals
>	greater than
?	question mark
@	at
[	left bracket
\	back slash
]	right bracket
^	caret
_	underscore
`	back quote
{	left brace
\|	vertical bar
}	right brace
~	tilde
！	exclamation mark
"	left double quote
"	right double qute
'	left quote
'	right quote
(	left parenthesis
)	right parenthesis
,	comma
.	full stop
--	em dash
:	colon
;	semicolon
?:	question mark
,	enumeration comma
...	ellipsis
...	ellipsis
《	left guillemet
》	right guillemet
￥	yuan
≥	greater than or equal to
≤	less than or equal to
≠	not equal
≈	approximately equal
±	plus or minus
×	times
π	pi
Α	alpha
Β	beta
Γ	gamma
Δ	delta
Ε	epsilon
Ζ	zeta
Θ	theta
Ι	iota
Κ	kappa
∧	lambda
Μ	mu
Ν	nu
Ξ	ksi
Ο	omicron
∏	pi
Ρ	rho
∑	sigma
Τ	tau
Υ	upsilon
Φ	phi
Χ	chi
Ψ	psi
Ω	omega
α	alpha
β	beta
γ	gamma
δ	delta
ε	epsilon
ζ	zeta
η	eta
θ	theta
ι	iota
κ	kappa
λ	lambda
μ	mu
ν	nu
ξ	ksi
ο	omicron
π	pi
ρ	rho
σ	sigma
τ	tau
υ	upsilon
φ	phi
χ	chi
ψ	psi
ω	omega

The following table describes the measurement units that the <say-as> tag supports.

Format	Type	English example

Format	Type	English example
Abbreviation	Length	nm (nanometre), μm (micrometre), mm (millimetre), cm (centimetre), m (metre), km (kilometre), ft (foot), and in (inch)
	Area	cm² (square centimetre), ㎡ (square metre), km2 (square kilometre), and SqFt (square foot)
	Volume	cm³ (cubic centimetre), m³ (cubic metre), km³(cubic kilometre), mL (millilitre), L (millilitre), gal (gallon)
	Weight	μg (microgram), mg (microgram), g (gram), and kg (kilogram)
	Time	min (minute), sec (second), ms (millisecond)
	Electromagnet	μA (microamp), mA (milliamp), Hz (hertz), kHz (kilohertz), MHz (megahertz), GHz (gigahertz), V (volt), kV (kilovolt), and kWh (kilowatt hour)
	Voice	dB (decibel)
	Pressure	Pa (pascal), kPa (kilopascal), MPa (megapascal)
Other common units		The following types of English measurement units are also supported: tsp (teaspoon), rpm (round per minute), KB (kilobyte), and mmHg (millimetre of mercury).

Tag relationships
The tag can contain texts.

Examples

cardinal

<speak><say-as interpret-as="cardinal">12345</say-as></speak>

Synthesis result in Chinese: SSML-say-as_Cardinal.mp3

<speak><say-as interpret-as="cardinal">10234</say-as></speak>

Synthesis result in English: en-SSML-say-as_cardinal.mp3

digits

<speak><say-as interpret-as="digits">12345</say-as></speak>

Synthesis result in Chinese: SSML-say-as_digit.mp3

<speak><say-as interpret-as="digits">10234</say-as></speak>

Synthesis result in English: en-SSML-say-as_digits.mp3

telephone

<speak><say-as interpret-as="telephone">12345</say-as></speak>

Synthesis result in Chinese: SSML-say-as_Telephone.mp3

<speak><say-as interpret-as="telephone">10234</say-as></speak>

Synthesis result in English: en-SSML-say-as_telephone.mp3

name

<speak>Her previous name is<say-as interpret-as="name"> Zeng Xiaofan.</say-as></speak>

Synthesis result: SSML-say-as_Name.mp3

address

<speak><say-as interpret-as="address">No. 304 Unit 3 Building 1 Fuluguoji</say-as></speak>

Synthesis result: SSML-say-as_Address.mp3

<speak><say-as interpret-as="id">myid_1998</say-as></speak>

Synthesis result: SSML-say-as_id.mp3

characters

<speak><say-as interpret-as="characters">Greek letter αβ</say-as></speak>

Synthesis result in Chinese: SSML-say-as_characters.mp3

<speak><say-as interpret-as="characters">*b+3.c$=α</say-as></speak>

Synthesis result in English: en-SSML-say-as_characters.mp3

punctuation

<speak><say-as interpret-as="punctuation"> -./:;</say-as></speak>

Synthesis result: SSML-say-as_punctuation.mp3

date

<speak><say-as interpret-as="date">1000-10-10</say-as></speak>

Synthesis result in Chinese: SSML-say-as_date.mp3

<speak><say-as interpret-as="date">10-01-2020</say-as></speak>

Synthesis result in English: en-SSML-say-as_date.mp3

time

<speak><say-as interpret-as="time">5:00am</say-as></speak>

Synthesis result in Chinese: SSML-say-as_time.mp3

<speak><say-as interpret-as="time">0500</say-as></speak>

Synthesis result in English: en-SSML-say-as_time.mp3

currency

<speak><say-as interpret-as="currency">13,000,000.00RMB</say-as></speak>

Synthesis result in Chinese: SSML-say-as_currency.mp3

<speak><say-as interpret-as="currency">$1,000.01</say-as></speak>

Synthesis result in English: en-SSML-say-as_currency.mp3

measure

<speak><say-as interpret-as="measure">100m12cm6mm</say-as></speak>

Synthesis result in Chinese: SSML-say-as_measure.mp3

<speak><say-as interpret-as="measure">1,000.01kg</say-as></speak>

Synthesis result in English: en-SSML-say-as_measure.mp3

Comprehensive example

<speak>In the Northern Song Dynasty, <say-as interpret-as="date">on October 10, 1121</say-as>,<say-as interpret-as="address">the outskirts of Kaifeng City</say-as>was immersed in the joyful atmosphere of<sub alias="Double eleven">Double eleven</sub>shopping festival. As a caravan of pack mules entered the city gate, a beautiful woman<phoneme alphabet="py" ph="de5">approached</phoneme>a man named<say-as interpret-as="name">A Fa, who was in the front of the team. </say-as></speak>
<speak>"Hi there, our store has a special promotion today. All shoes are on sale<say-as interpret-as="digits">199</say-as>get <say-as interpret-as="cardinal">100 off</say-as>. Don't miss out." </speak>
<speak>"Thanks, but we really need to get going. It is<say-as interpret-as="time">09:59:59</say-as>. If we don't deliver these goods on time, the whole supply chain could fail." </speak>
<speak><say-as interpret-as="name">A Fa</say-as>wiped the sweat from his brow as he guided his team through the crowded alleys filled with vendors shouting out to their customers.</speak>
<speak>Get latest colored fabrics here. Buy two and get one free;</speak>
<speak>Best selling hats. We are offering a seven-day unconditional return policy;</speak>
<speak>Treat all types of intractable diseases for both men and women. </speak>
<speak>Suddenly, a horse got scared and started running quickly down the road. A child was also frightened and stumbled into the arms of his mother,<break time="50ms"/>crying:</speak>
<speak>"Mommy, mommy!"</speak>
<speak>At that moment,<say-as interpret-as="name">A Fa</say-as> thought</speak>
<speak>"I’m so scared!"</speak>
<speak>He quickly covered his <phoneme alphabet="py" ph="he2 bao1">wallet</phoneme> and continued on his way to deliver the goods. Along the way,<say-as interpret-as="address">Kaifeng City</say-as>whose bustling scene<say-as interpret-as="name">gives A Fa</say-as>a deep impression. </speak>
<speak>As time passed and the prosperity of the city faded, he picked up his brush and painted on a long scroll during the shopping festival. The scroll painting is named Along the River During the Qingming Festival. </speak>

Overview

Usage notes

Tags

<speak>

<emotion>

<break>

<s>

<sub>

<w>

<phoneme>

<soundEvent>

<say-as>

Comprehensive example

Sales Support

Technical Support

Connect & Report Abuse

About Alibaba Cloud

Our Global Network

Quick Start

Global Offices

Olympic Games Paris 2024 New

Stade Roland Garros – Glitz from the Past New

Place de la Concorde – “Breaking” the Barriers New

Vaires-sur-Marne Nautical Stadium – Sports with Sustainability New

International Broadcast Center – Images, Sounds, and Data that Captivate Billions New

Customer Success Stories New

Trust Center

Security & Compliance Center

Cloud Compliance Resources

Security Compliance FAQs

Product & Feature Update New

Cloud Forward

Press Room

Alibaba Cloud e-Magazine New

Alibaba Cloud in Analyst Research

Notice

Go Global Service New

Go Global Alliance with Alibaba Cloud

Asia Accelerator Hot

Information Compliance

China Gateway - MLPS 2.0 Compliance New

China Gateway - Networking

China Gateway - Global Application Acceleration New

China Gateway - Security

China Gateway - Data Security New

ICP Support Hot

China Gateway - Omnichannel Data Mid-End New

China Gateway - Organizational Data Mid-End New

China Gateway - Business Mid-End New

China Gateway - AI Service for Conversational Chatbots New

China Gateway - Online Education

China Gateway - Domain Registration

Work at Alibaba Cloud

Experienced Professionals

Students and Graduates

Free Trial

Pricing

Promo Center

Price Reduction

Pay Less and Deploy More

FinOps

Elastic Compute Service (ECS)

Simple Application Server (SAS)

Elastic GPU Service

Elastic Desktop Service (EDS)

Object Storage Service (OSS)

Cloud Enterprise Network (CEN)

Web Application Firewall (WAF)

Domain Names

Container Compute Service (ACS)

Secure Access Service Edge (SASE)

Intelligent Media Services(IMS)

Edge Security Acceleration (ESA)(Original DCDN)

Intelligent Media Management

DingTalk Enterprise

YiDA

Alibaba Cloud Model Studio

Apsara Prime - For Easy Cloud Product Selection

Alibaba Cloud ECS - Cater All Your Cloud Hosting Needs

1TB CDN—Get Free 1 TB Outbound Traffic Plan Now

Security—Under Attack? Get Free Security Support

Short Message Service - Free Testing is Available

Elastic Compute Service (ECS) Hot

CloudBox

Compute Nest

Dedicated Host Hot

ECS Bare Metal Instance

Elastic GPU Service Featured

Simple Application Server (SAS) Hot

Auto Scaling

Cloud Phone Beta

Elastic Desktop Service (EDS) Featured

Batch Compute

Elastic High Performance Computing (E-HPC)

Super Computing Cluster (SCC)

Function Compute (FC)