This topic describes the features and tags of Speech Synthesis Markup Language (SSML) and provides examples on how to use SSML.
Overview
SSML is an XML-based markup language for speech synthesis. Compared with plain text synthesis, SSML-based synthesis improves the quality of synthesized content and supports various synthesis effects. You can use SSML to specify the content that the speech synthesis service reads and specifies how the service reads the text. For example, you can specify how to break sentences and words, control the pronunciation, and pauses.
The speech synthesis service provided by Alibaba Cloud is implemented based on SSML 1.0 of World Wide Web Consortium (W3C). For more information, see Speech Synthesis Markup Language (SSML) Version 1.1. However, not all the markup types that are defined in the W3C standard are supported. The speech synthesis service supports markup types based on your business requirements.
Usage notes
SSML is supported for Chinese and English, and the SSML tags and content supported for each language vary. The following sections describe the tags and provide examples on the tags.
All texts must be enclosed between the <speak> and </speak> tags. You can use the combination of the <speak> and </speak> tags multiple times in a speech synthesis task and use SSML together with texts.
The XML header before the <speak> tag at the beginning of a text can be omitted.
If the text enclosed by a tag contains special XML characters, you must escape the characters. The following section describes the special characters and the corresponding escape characters:
Double quotation marks ("): "
Single quotation mark ('): '
Ampersand (&): &
Less-than sign (<): <
Greater-than sign (>): >
Intelligent voices and human voice cloning (Basic Edition) support all SSML tags and attributes that are described in this topic.
Human voice cloning (Public Edition) supports the <speak>, <break>, <s>, <sub>, <w>, <phoneme>, and <say-as> tags for human voice cloning (Public Edition). In this case, the <speak> tag supports the rate, pitch, and volume attributes. Other tags do not support attribute configuration and referred to as tags with empty attributes.
Tags
<speak>
Description
The <speak> tag is the root node of all SSML tags to be supported. All texts that needs to call SSML tags must be enclosed between the <speak> and </speak> tags.
Syntax
<speak>Text that needs to call SSML tags</speak>
Attributes
The following table describes the attributes that are supported by the <speak> tag.
Attribute name
Attribute type
Attribute value
Required
Description
voice
String
The name of the voice that can be called. The value of the voice attribute can only contain lowercase letters, such as siyue.
No
This attribute is included in the proprietary tag of Alibaba Cloud for speech synthesis. This attribute specifies the voice that is used for speech synthesis. The specified voice has a higher priority than the voice that is specified by the
voice
parameter in an API request.For more information, see Intelligent voice samples.
encodeType
String
PCM/WAV/MP3
No
This attribute is included in the proprietary tag of Alibaba Cloud for speech synthesis. This attribute specifies the audio file format for speech synthesis. The specified audio file format has a higher priority than the audio file format that is specified by the
format
parameter in an API request.sampleRate
String
8000/16000/24000/48000
No
This attribute is included in the proprietary tag of Alibaba Cloud for speech synthesis. This attribute specifies the audio sampling rate for speech synthesis. The specified audio sampling rate has a higher priority than the audio sampling rate that is specified by the
sample_rate
parameter in an API request.rate
String
Valid values: an integer ranging from -500 to 500. Default value: 0.
A value greater than 0 indicates that the speech rate is increased.
A value less than 0 indicates that the speech rate is reduced.
No
This attribute is included in the proprietary tag of Alibaba Cloud for speech synthesis. This attribute specifies the audio speed for speech synthesis. The specified audio speed has a higher priority than the audio speed that is specified by the
speech_rate
parameter in an API request.pitch
String
Valid values: an integer ranging from -500 to 500. Default value: 0.
A value greater than 0 indicates that the pitch rises.
A value less than 0 indicates that the pitch falls.
No
This attribute is included in the proprietary tag of Alibaba Cloud for speech synthesis. This attribute specifies the audio pitch for speech synthesis. The specified audio pitch has a higher priority than the audio pitch that is specified by the
pitch_rate
parameter in an API request.volume
String
Valid values: an integer ranging from 0 to 100. Default value: 50.
A value greater than 50 indicates that the volume is increased.
A value less than 50 indicates that the volume is reduced.
No
This attribute is included in the proprietary tag of Alibaba Cloud for speech synthesis. This attribute specifies the audio volume for speech synthesis. The specified audio volume has a higher priority than the audio volume that is specified by the
volume
parameter in an API request.effect
String
robot/lolita/lowpass/echo/eq/lpfilter/hpfilter
No
This attribute is included in the proprietary tag of Alibaba Cloud for speech synthesis. This attribute can be used to produce various sound effects for the synthesized speech. Valid values:
robot: robot voice
lolita: little girl voice
lowpass: low-pass effect
echo: echo effect
eq: equalizer
lpfilter: low-pass filter
hpfilter: high-pass filter
NoteThe eq, lpfilter, and hpfilter values specify advanced filters. If you set this attribute to eq, lpfilter, or hpfilter, you can configure the
effectValue
attribute to specify a custom effect for the specified filter.An SSML structure supports only one sound effect. You cannot set this attribute to multiple values.
If you configure this attribute, the system latency may increase.
effectValue
String
The effect of a specific filter. If you set the effect attribute to eq, lpfilter, or hpfilter, you can configure this attribute to modify the default effect of the specified filter.
No
eq: specifies the equalizer. The system provides eight default bands Frequencies: ["40Hz", "100Hz", "200Hz", "400Hz", "800Hz", "1600Hz", "4000Hz", "12000Hz"]; Bandwidths: ["1.0q", "1.0q", "1.0q", "1.0q", "1.0q", "1.0q", "1.0q", "1.0q"]. If you configure this attribute, you must specify a gain for each band. The gain ranges from -20 dB to 20 dB. For example, you can set the effectValue attribute to 1 1 1 1 1 1 1 1. The input value is a string consisting of eight integers separated by spaces. The value 0 indicates that the gain of the band is not adjusted.
lpfilter: the frequency of the low-pass filter. The value is an integer in the range of (0, Required sampling rate/2]. For example, you can set the effectValue attribute to 800.
hpfilter: the frequency of the high-pass filter. The value is an integer in the range of (0, Required sampling rate/2]. For example, you can set the effectValue attribute to 1200.
bgm
String
The name of the background music (BGM) that can be called online. You can view the description of the bgm attribute to obtain more information.
No
This attribute is included in the proprietary tag of Alibaba Cloud for speech synthesis. This attribute specifies the BGM of the synthesized speech.
backgroundMusicVolume
String
Valid values: an integer ranging from 0 to 100. Default value: 50.
A value greater than 50 indicates that the volume is increased.
A value less than 50 indicates that the volume is reduced.
No
This attribute is included in the proprietary tag of Alibaba Cloud for speech synthesis. This attribute specifies the volume of the BGM.
The following table describes the bgm attribute.
Built-in BGM URL
Custom BGM URL
The speech synthesis service provides several built-in BGM streams. You can click the following URLs to listen to the BGM streams:
You can use custom BGM based on your business requirements. Before you specify custom BGM, you must store the BGM in your Alibaba Cloud Object Storage Service (OSS) bucket whose access control list (ACL) is public read or public read/write. For more information about how to create a bucket, see Create a bucket. You can use the HTTP or HTTPS protocol to generate a URL for the object that is stored in a bucket. For more information, see Step 2: Upload an object.
Requirements for audio files to be uploaded:
The audio file must be a mono WAV file with a sampling rate of 16 kHz.
The size of a short text for speech synthesis does not exceed 3.5 MB. The size of a long text for speech synthesis does not exceed 10 MB.
If the synthesis duration is longer than the BGM duration, the BGM is cyclically played. If your audio file is not in the WAV format, you can run the following command to convert the audio file into the WAV format:
ffmpeg -i Input audio file -acodec pcm_s16le -ac 1 -ar 16000 Required audio file.wav
.If the URL in the tag contains special XML characters, escape the characters.
The bit depth is 16 bits.
ImportantYou are legally liable for the copyright of the uploaded audio file.
Tag relationships
The <speak> tag can contain texts and the following tags:
<break>
<s>
<w>
<phoneme>
<say-as>
Examples
Empty attribute
<speak>Text that needs to call SSML tags</speak>
Synthesis result: SSML-speak1.mp3
Attribute voice
<speak voice="xiaogang"> This is a male voice. </speak>
Synthesis result: SSML-speak2.mp3
Attribute encodeType
<speak encodeType="mp3">I can generate audio in the compressed format. </speak>
Synthesis result: SSML-encode.mp3
Attribute sampleRate
<speak sampleRate="8000">The size of the file is half of the audio at a sampling rate of 16 kHz. </speak>
Synthesis result: SSML-speak4.mp3
Attribute rate
<speak rate="200">I speak faster than the average. </speak>
Synthesis result: SSML-speak5.mp3
Attribute pitch
<speak pitch="-100">My voice pitch is lower than others. </speak>
Synthesis result: SSML-speak6.mp3
Attribute volume
<speak volume="80">My voice is loud. </speak>
Synthesis result: SSML-speak7.mp3
Combination of attributes that are separated by spaces
<speak rate="200" pitch="-100" volume="80">This is how my voice sound when multiple attributes are used. </speak>
Synthesis result: SSML-speak8.mp3
Attribute effect
<speak effect="robot">Do you like the Wall-E robot? </speak>
Synthesis result: SSML-speak9.mp3
Attribute bgm
<speak bgm="http://nls.alicdn.com/bgm/2.wav" backgroundMusicVolume="30" rate="-500" volume="40"><break time="2s"/>The ancient trees on the shady cliffs are covered in a thick layer of moss<break time="700ms"/>The sound of rain can still be heard echoing through the bamboo forest<break time="700ms"/>The silk production contributes to the national economy<break time="700ms"/>The scenery of Mianzhou is worth seeing<break time="2s"/></speak>
Synthesis result: SSML-speak10.mp3
<emotion>
Description
The <emotion> tag is used to apply multi-emotional voices to speech synthesis. This tag is optional. If you configure the tag for a voice that does not support multiple emotions, an error occurs.
Syntax
<emotion category="happy" intensity="1.0">What a nice day! </emotion>
Attributes
The following table describes the attributes that are supported by the <emotion> tag.
Attribute name
Attribute type
Attribute value
Required
Description
category
String
Enumeration value, such as neutral and happy.
Yes
The speech emotion. The following table describes the supported emotions for each voice.
intensity
String
The value is a floating-point number within the range of 0.01 to 2.0.
No
The intensity of the emotion. The default value is 1.0, which indicates the predefined emotional intensity. The minimum value is 0.01, which indicates a slight inclination toward a specific emotion. The minimum value is 2.0, which indicates that the emotional intensity is doubled.
The multi-emotional voices support different emotion categories.
Voice name
Value of voice
Emotion category
Zhi Miao_multi-emotional
zhimiao_emo
serious, sad, disgust, jealousy, embarrassed, happy, fear, surprise,neutral, frustrated, affectionate, gentle, angry, newscast, customer-service, story, and living
Zhi Mi_multi-emotional
zhimi_emo
angry, fear, happy, hate, neutral, sad, and surprise
Zhi Yan_multi-emotional
zhiyan_emo
neutral, happy, angry, sad, fear, hate, surprise, and arousal
Zhi Bei_multi-emotional
zhibei_emo
neutral, happy, angry, sad, fear, hate, and surprise
Zhi Tian_multi-emotional
zhitian_emo
neutral, happy, angry, sad, fear, hate, and surprise
Tag relationships
The <emotion> tag can contain texts and the following tags:
<s>
<sub>
<say-as>
<w>
<phoneme>
<soundEvent/>
<break/>
Example
Empty attribute
<speak voice="zhitian_emo"><emotion category="happy" intensity="1.0">What a nice day! </emotion></speak>
Synthesis result: SSML-emotion.wav
<break>
Description
The <break> tag inserts pauses in texts and is optional.
Syntax
<break time="string"/>
Attributes
Attribute name
Attribute type
Attribute value
Required
Description
time
String
[number]s/[number]ms
No
The pause length, in seconds or milliseconds. Example: 2 seconds or 50 milliseconds.
If the pause is in seconds, the value of number is an integer within the range of [1, 10]. In this case, the value is in the format of [number]s.
If the pause is in milliseconds, the value of number is an integer within the range of [50, 10000]. In this case, the value is in the format of [number]ms.
Tag relationships
The <break> tag is an empty tag and cannot contain any tags. If the <s> tag is used, you must enclose the <break> tag between the <s> and </s> tags, which indicates that a pause is inserted into the sentences or paragraphs.
Example
<speak>Close your eyes and have a rest.<break time="500ms"/>OK, please open your eyes. </speak>
Synthesis result: SSML-break.mp3
<s>
Description
The <s> tag specifies the sentence structure in a text and is optional.
Syntax
<s>Text</s>
Attributes
N/A.
Tag relationships
The <s> tag can contain texts and the following tags:
<break>
<w>
<phoneme>
<say-as>
Example
<speak><s>This is the first sentence.</s><s>This is the second sentence.</s></speak>
Synthesis result: SSML-s.mp3
<sub>
Description
The <sub> tag is used to replace the text enclosed by a tag with an alias.
Syntax
<sub alias="string"></sub>
Attributes
Attribute name
Attribute type
Attribute value
Required
Description
alias
String
The content of the new text.
Yes
The text that is used to replace the text in a tag.
Tag relationships
The <sub> tag can contain texts.
Example
<speak><sub alias="Network protocol standard">W3C</sub></speak>
Synthesis result: SSML-sub.mp3
<w>
Description
The <w> tag specifies the word structure in a text and is optional. In most cases, spaces are used for word segmentation in English texts. You do not need to use this tag. The text enclosed by the <w> and </w> tags must be an independent word or phrase only in English.
Syntax
<w>Text</w>
Attributes
N/A.
Tag relationships
The <w> tag can contain texts.
Examples
<speak>Mayor of Nanjing<w>Jiang Daqiao</w>gave a speech today. </speak>
Synthesis result: SSML-w.mp3
<phoneme>
Description
The <phoneme> tag controls the pronunciation of the text enclosed by the tag and is optional. The tag is not supported for English texts.
Syntax
<phoneme alphabet="string" ph="string">Text</phoneme>
Attributes
Attribute name
Attribute type
Attribute value
Required
Description
alphabet
String
py
Yes
The value of py indicates Pinyin.
ph
String
The Pinyin string that corresponds to the text enclosed by the tag.
Yes
Value assignment rules for pinyin:
Pinyin syllables are separated by spaces. The number of Pinyin syllables must be the same as the number of words.
Each Pinyin syllable is composed of sound and tone marks. The tone marks are represented by tone numbers 1 to 5, in which 5 indicates the neutral tone.
Tag relationships
The <phoneme> tag can contain texts.
Example
<speak>qu<phoneme alphabet="py" ph="dian3 dang4 hang2">dian dang hang</phoneme>ba zhe ge wan yi<phoneme alphabet="py" ph="dang4 diao4">dang diao</phoneme></speak>
Synthesis result: SSML-phoneme.mp3
<soundEvent>
Description
The <soundEvent> tag is used to insert a sound cue in any position of the text during SSML-based synthesis.
Syntax
<soundEvent src="URL"/>
Attributes
Attribute name
Attribute type
Attribute value
Required
Description
src
String
The URL of the sound cue.
Yes
You can use a custom sound cue based on your business requirements. Before you specify a custom BGM, store the BGM in your OSS bucket whose ACL is public read or public read/write. For more information about how to create a bucket, see Create a bucket. You can use the HTTP or HTTPS protocol to generate a URL for the object that is stored in a bucket. For more information, see Step 2: Upload an object.
Requirements for audio files to be uploaded:
The audio file must be a mono WAV file with a sampling rate of 16 kHz.
The maximum file size is 2 MB.
The bit depth is 16 bits.
Important
You are legally liable for the copyright of the uploaded audio file.
Tag relationships
The <soundEvent> tag is an empty tag and cannot contain any tags.
Examples
<speak>A horse was frightened<soundEvent src="http://nls.alicdn.com/sound-event/horse-neigh.wav"/>and people scattered to escape.</speak>
Synthesis result: SSML-sound-event.mp3
<say-as>
Description
The <say-as> tag specifies the type of the text enclosed by the tag, so that the text can be pronounced based on the default pronunciation method of this type.
Syntax
<say-as interpret-as="string">Text </say-as>
Attributes
Attribute name
Attribute type
Attribute value
Required
Description
interpret-as
String
cardinal/digits/telephone/name/address/id/characters/punctuation/date/time/currency/measure
Yes
The type of the text enclosed by the tag. Valid values:
cardinal: The text is read as an integer or decimal number.
digits: The text is read as a digit.
telephone: The text is read as a phone number.
name: The text is read as a name.
address: The text is read as an address.
id: The text is read as an account name or nickname.
characters: The text is read by character.
punctuation: The text is read as a punctuation mark.
• date: The text is read as a date.
• time: The text is read as a time.
• currency: The text is read as an amount.
• measure: The text is read as a measurement unit.
Text types that the <say-as> tag supports
cardinal
Format
Example
Description
Numeric string
145
Valid integers: positive and negative integers with a maximum of 20 digits in the range of [-99999999999999999999,99999999999999999999].
Valid decimals: No limits are imposed on the number of decimal places. However, we recommend that you retain up to 10 decimal places.
Minus sign + numeric string
-145
Numeric string with each three digits separated by a comma
10,000
Minus sign + numeric string with each three digits separated by a comma
-10,124
Numeric string + decimal point + two zeros
10.00
Minus sign + numeric string + decimal point + two zeros
-110.00
Numeric string + decimal point + numeric string
79.090
Minus sign + numeric string + decimal point + numeric string
-79.001
Format
Example
English output
Description
Numeric string
145
one hundred forty five
Valid integers: positive and negative integers with a maximum of 13 digits in the range of [-999999999999,999999999999].
Valid decimals: No limits are imposed on the number of decimal places. However, we recommend that you retain up to 10 decimal places.
A numeric string that starts with a zero
0145
one hundred forty five
Minus sign + numeric string
-145
minus hundred forty five
Numeric string with each three digits separated by a comma
60,000
sixty thousand
Minus sign + numeric string with each three digits separated by a comma
-208,000
minus two hundred eight thousand
Numeric string + decimal point + zero
12.00
twelve
Numeric string + decimal point + numeric string
12.34
twelve point three four
Numeric string with each three digits separated by a comma + decimal point + numeric string
1,000.1
one thousand point one
Minus sign + numeric string + decimal point + numeric string
-12.34
minus twelve point three four
Minus sign + numeric string with each three digits separated by a comma + decimal point + numeric string
-1,000.1
minus one thousand point one
Numeric string (numeric string with each three digits separated by a comma) + hyphen + number (numeric string with each three digits separated by a comma)
1-1,000
one to one thousand
Other default readings
012.34
twelve point three four
None.
1/2
one half
-3/4
minus three quarters
5.1/6
five point one over six
-3 1/2
minus three and a half
1,000.3^3
one thousand point three to the power of three
3e9.1
three times ten to the power of nine point one
23.10%
twenty three point one percent
digits
Format
Example
Description
Numeric string
129090909
No limits are imposed on the length of the numeric string. We recommend that a numeric string contains up to 20 digits.
If the numeric string contains more than 10 digits, you must insert a pause after each digit.
Format
Example
English output
Description
Numeric string
12034
one two zero three four
No limits are imposed on the length of the numeric string. We recommend that a numeric string contains up to 20 digits.
When digits in a numeric string are grouped by hyphens (-) or spaces, a comma is inserted between the groups to create a pause. Up to five groups are supported for a numeric string.
Numeric string + space or conjunction + numeric string + space or conjunction + numeric string + space or conjunction + numeric string
1-23-456 7890
one, two three, four five six, seven eight nine zero
telephone
Format
Example
Description
Landline number
4930286
A landline number can be seven or eight digits. You can use spaces or hyphen (-) to separate the digits.
A 7-digit landline number can be divided into two groups. In this case, the first group contains 3 digits, and the second group contains 4 digits. A 8-digit landline number can be divided into two groups. In this case, each group contains 4 digits.
493 0286
493-0286
62552560
6255 2560
6255-2560
Landline number + extension number
4930286-109
An extension number can have up to four digits.
4930286, extension 109
4930286, extension 109
4930286, extension 109
Area code + landline number
01062552560
Area codes of 010, 02x, 03xx, 04xx, 05xx, 07xx, 08xx, and 09xx are supported.
010 62552560
010 6255 2560
010 6255-2560
010-62552560
010-6255-2560
(010)62552560
03198907098
0319-8907098
Area code + landline number + extension number
010 62552560-109
None.
010-62552560-109
(010)62552560-109
(010)62552560, extension 109
(010)62552560, extension 109
(010)62552560, extension 109
Country code + area code + landline number
86-010-62791627
Country code formats of 86, (86), +86, (+86), and 0086 are supported, all of which are read as eight-six.
(86)10-62791627
+86-010-62791627
0086-10-62791627
(+86)-10-6279 1627
Country code + area code + landline number + extension number
(86)21-58118818-207
None.
(86)021-5811-8818-207
(86)021-58118818, x. 207
(86)21-5811-8818, ex. 207
+86-021-58118818, extension 207
Mobile phone number
139 0000 5678
A mobile phone number consists of 11 digits and can be separated in the formats of 3-3-5 and 3-4-4.
139-000-05678
139 000 05678
Country code + mobile number
+86-13900005678
None.
(+86)-139-0000-5678
+8613900005678
0086-139 000 05678
Service number
123
Common service numbers are supported.
A 10-digit service number can start with 400 or 800 and can be separated in the format of 3-4-4.
A 16-digit service number can start with 12530, 17951, and 12593.
95678
4008110510
800-810-8888
1253013520638377
Remarks
(86)(21)9899-80800-0909
The numeric string and separators are supported. The separators can be parentheses and hyphens (-).
Format
Example
English output
Description
Numeric string
12034
one two oh three four
No limits are imposed on the length of the numeric string. We recommend that a numeric string contains up to 20 digits. When digits in a numeric string are grouped by hyphens (-) or spaces, a comma is inserted between the groups to create a pause. Up to five groups are supported for a numeric string.
Numeric string + space or conjunction + numeric string + space or conjunction + numeric string
1-23-456 7890
one, two three, four five six, seven eight nine oh
Plus sign + numeric string + space or conjunction + numeric string
+43-211-0567
plus four three, two one one, oh five six seven
Left parenthesis + numeric string + right parenthesis + space + numeric string + space or conjunction + numeric string
(21) 654-3210
(two one) six five four, three two one oh
id
Format
Example
Description
String
dell0101
Uppercase and lowercase letters, digits from 0 to 9, and underscores (_) are supported.
The output space indicates that a pause is inserted between characters, and characters are read one by one.
myid_1998
AiTest
In English texts, this tag serves the same as the characters tag.
characters
Format
Example
Description
String
ISBN 1-001-099098-1
Chinese characters, uppercase and lowercase letters, digits from 0 to 9, and specific full-width and half-width characters are supported.
The output space indicates that a pause is inserted between characters, and characters are read one by one. If the text enclosed by the tag contains special XML characters, you must escape the characters.
x10b2345_u
v1.0.1
Version 2.0
Su M MA000
Airbus A330
Models s01, s02, and s03
Airbus A330
αβγ
Format
Example
English output
Description
String
*b+3$.c-0'=α
asterisk B plus three dollar dot C dash zero apostrophe equals alpha
Chinese characters, uppercase and lowercase letters, digits from 0 to 9, and specific full-width and half-width characters are supported.
The output space indicates that a pause is inserted between characters, and characters are read one by one.
If the text enclosed by the tag contains special XML characters, you must escape the characters.
punctuation
Format
Example
Description
Punctuations
...
Common Chinese and English punctuation marks are supported. The output space indicates that a pause is inserted between characters, and characters are read one by one.
If the text enclosed by the tag contains special XML characters, you must escape the characters.
...
!"#$%&
'()*+
,-./:;
<=>?@
[\]^_
In English texts, this tag serves the same as the characters tag.
date
Format
Example
Description
Year
71
Two-digit and four-digit years are supported.
Two-digit years range from 60 to 99, 00 to 09, and 10 to 19.
Four-digit years range from 1000 to 1999 and 2000 to 2099.
04
19
1011
1998
2008
Year and month
April, 98
The months from January to September can be represented by a number with or without a zero. For example, in April 1908, April can be represented by 4 or 04.
April 1998
August, 08
August 2008
Year, month, and day
April 23, 98
The days from the first to ninth day in a month can be represented by a number with or without a zero. For example, if you want to represent the date of April 8 in 1908, you can use 4 or 04 to indicate April and 8 or 08 to indicate Day 8.
April 23, 1998
August 8, 08
August 08, 2008
Year, month, and day
April 23, 98
The days from the first to ninth day in a month can be represented by a number with or without a zero. For example, if you want to represent the date of April 8 in 1908, you can use 4 or 04 to indicate April and 8 or 08 to indicate Day 8.
April 23, 1998
August 8, 08
August 08, 2008
Month and day
March 20
None.
August 07
Year and month
2018/08
Forward slashes (/), hyphens (-), and periods (.) can be used as separators between the days, months and years.
2018-08
2018.08
Year, month, and day
2018/08/08
2018-8-8
2018.08.08
Year, month, and day~year, month, and day
September 1~30, 04
Tildes (~) and hyphens (-) can be used as separators between dates.
September 01, 2004 - June 08, 2008
Year, month, and day~day
September 1~30, 04
September 01, 2004 - June 08, 2008
Year and month~year and month
April, 01~April, 10
April 2001 ~ April 2010
Month and day~month and day
October 1~October 7
October 01~October 07
Month and day~day
October 1~7
October 01~07
Year, month, and day
2018/03/03~2019/01/01
Forward slashes (/) and periods (.) can be used as separators between the days, months, and years, and tildes (~) and hyphens (-) can be used as separators between dates.
1997.9.9~1998.9.9
Month and day
10/20~10/31
Month~month
Jan~Oct
January~October
Year, month, and day
10/20/2018
Only 4-digit years are supported. Only forward slashes (/) can be used as the separators. Only the format of Month/Day/Year is supported.
Format
Example
English output
Description
Four digits/Two digits or four digits-Two digits
2000/01
two thousand, oh one
Cross-year range
1900-01
nineteen hundred, oh one
2001-02
twenty oh one, oh two
2019-20
twenty nineteen, twenty
1998-99
nineteen ninety eight, ninety nine
1999-00
nineteen ninety nine, oh oh
A 4-digit number that starts with 1 or 2
2000
two thousand
4-digit years
1900
nineteen hundred
1905
nineteen oh five
2021
twenty twenty one
Day of the week-Day of the week
or
Day of the week~Day of the week
or
Day of the week&Day of the week
mon-wed
monday to wednesday
If the text enclosed by the tag contains special XML characters, you must escape the characters.
tue~fri
tuesday to friday
sat&sun
saturday and sunday
DD-DD MMM, YYYY
or
DD~DD MMM, YYYY
or
DD&DD MMM, YYYY
19-20 Jan, 2000
the nineteen to the twentieth of january two thousand
DD specifies the 2-digit day, MMM specifies the 3-letter abbreviation of the month or a full month name, and YYYY specifies the 4-digit year that starts with 1 or 2.
01 ~ 10 Jul, 2020
the first to the tenth of july twenty twenty
05&06 Apr, 2009
the fifth and the sixth of april two thousand nine
MMM DD-DD
or
MMM DD~DD
or
MMM DD&DD
Feb 01 - 03
feburary the first to the third
MMM specifies the 3-letter abbreviation of a month or a full month name, and DD specifies a 2-digit day.
Aug 10~20
august the tenth to the twentieth
Dec 11&12
december the eleventh and the twelfth
MMM-MMM
or
MMM~MMM
or
MMM&MMM
Jan-Jun
january to june
MMM specifies the 3-letter abbreviation of a month or a full month name.
jul ~ dec
july to december
sep&oct
september and october
YYYY-YYYY
or
YYYY~YYYY
1990 - 2000
nineteen ninety to two thousand
YYYY specifies the 4-digit year that starts with 1 or 2.
2001~2021
two thousand one to twenty twenty one
WWW DD MMM YYYY
Sun 20 Nov 2011
sunday the twentieth of november twenty eleven
WWW specifies the 3-letter abbreviation of the day of the week or the full name for the day of the week. DD specifies the 2-digit day. MMM specifies the 3-letter abbreviation of the month or a full month name. MM specifies the 2-digit month number, the 3-letter abbreviation of the month, or a full month name. YYYY specifies the 4-digit year that starts with 1 or 2.
WWW DD MMM
Sun 20 Nov
sunday the twentieth of november
WWW MMM DD YYYY
Sun Nov 20 2011
sunday november the twentieth twenty eleven
WWW MMM DD
Sun Nov 20
sunday november the twentieth
WWW YYYY-MM-DD
Sat 2010-10-01
aturday october the first twenty ten
WWW YYYY/MM/DD
Sat 2010/10/01
saturday october the first twenty ten
WWW MM/DD/YYYY
Sun 11/20/2011
sunday november the twentieth twenty eleven
MM/DD/YYYY
11/20/2011
november the twentieth twenty eleven
YYYY
1998
nineteen ninety eight
Other default readings
10 Mar, 2001
the tenth of march two thousand one
None.
10 Mar
the tenth of march
Mar 2001
march two thousand one
Fri. 10/Mar/2001
friday the tenth of march two thousand one
Mar 10th, 2001
march the tenth two thousand one
Mar 10
march the tenth
2001/03/10
march the tenth two thousand one
2001-03-10
march the tenth two thousand one
2000s
two thousands
2010's
twenty tens
1900's
nineteen hundreds
1990s
nineteen nineties
time
Format
Example
Description
Time
12:00
Common time and time range formats are supported.
12:00:00
10:20
10:20:30
09:18:14
Point in time~Point in time
11:00~12:00
09:00-14:00
11:00~11:30
11:00-12:18
10:30~11:00
09:28-10:00
10:20~11:20
06:00~08:00
10:20 a.m.~1:30 p.m.
Abbreviation of time
5:00 am
5:30 am
5:20:12 am
7:00 am
7:30 AM
7:20:12 a.m.
07:08:12 A.M.
5:00 pm
5:30 PM
5:20:12 p.m.
05:09:12 P.M.
9:00 pm
9:30 pm
9:20:12 PM
9:02:12 P.M.
12:00 pm
12:30 p.m.
12:20:12 PM
Format
Example
English output
Description
HH:MM AM or PM
09:00 AM
nine A M
HH specifies 1- or 2-digit hours. MM specifies 2-digit minutes. AM specifies the time before noon. PM specifies the time after noon.
09:03 PM
nine oh three P M
09:13 p.m.
nine thirteen p m
HH:MM
21:00
twenty one hundred
HHMM
100
one oclock
Point in time-Point in time
8:00 am - 05:30 pm
eight a m to five p m
Common time range and formats are supported.
7:05~10:15 AM
seven oh five to ten fifteen A M
09:00-13:00
nine oclock to thirteen hundred
currency
Format
Example
Description
Number + currency code
12.00 RMB
The following currency codes are supported: AUD, CAD, HKD, JPY, USD, CHF, NOK, SEK, GBP, RMB, CNY, and EUR.
Integers, decimals, and international expressions separated by commas (,) are supported.
12.50 RMB
12,000,000 RMB
12,000,000.00 RMB
12,000.35 RMB
Currency symbol + number
$12
The following currency symbols are supported: Canadian dollar ($), US dollar ($), French franc (Fr), Danish krona (kr), pound sterling (£), Chinese yuan (¥), and euro (€).
Integers, decimals, and international expressions separated by commas (,) are supported.
$12.00
$12.12
$12,000
$12,000.00
$12,000.99
Other default readings
1213
None.
1213 KML
1213.00 KML
1213.9 KML
1,000 KML
1,000.00 KML
1,000.98 KML
12,000
Format
Example
English output
Description
Number + currency code
1.00 RMB
one yuan
Integers, decimals, and international expressions separated by commas (,) are supported.
Supported currency codes:
CN¥ (yuan)
CNY (yuan)
RMB (yuan)
AUD (australian dollar)
CAD (canadian dollar)
CHF (swiss franc)
DKK (danish krone)
EUR (euro)
GBP (british pound)
HKD (Hong Kong(China) dollar)
JPY (japanese yen)
NOK (norwegian krone)
SEK (swedish krona)
SGD (singapore dollar)
USD (united states dollar)
2.02 CNY
two point zero two yuan
1,000.23 CN¥
one thousand point two three yuan
1.01 SGD
one singapore dollar and one cent
2.01 CAD
two canadian dollars and one cent
3.1 HKD
three hong kong dollars and ten cents
1,000.00 EUR
one thousand euros
Currency code + number
US$ 1.00
one US dollar
Integers, decimals, and international expressions separated by commas (,) are supported.
Supported currency codes:
US$ (US dollar)
CA$ (Canadian dollar)
AU$ (Australian dollar)
SG$ (Singapore dollar)
HK$ (Hong Kong dollar)
C$ (Canadian dollar)
A$ (Australian dollar)
$ (dollar)
£ (pound)
€ (euro)
CN¥ (yuan)
CNY (yuan)
RMB (yuan)
AUD (australian dollar)
CAD (canadian dollar)
CHF (swiss franc)
DKK (danish krone)
EUR (euro)
GBP (british pound)
HKD (Hong Kong(China) dollar)
JPY (japanese yen)
NOK (norwegian krone)
SEK (swedish krona)
SGD (singapore dollar)
USD (united states dollar)
$0.01
one cent
JPY 1.01
one japanese yen and one sen
£1.1
one pound and ten pence
€ 2.01
two euros and one cent
USD 1,000
one thousand united states dollars
Number + numerical unit + currency code
or
Currency code + number + numerical unit
1.23 Tn RMB
one point two three trillion yuan
The following numerical units are supported:
thousand
million
billion
trillion
Mil (million)
mil (million)
Bil (billion)
bil (billion)
MM (million)
Bn (billion)
bn (billion)
Tn (trillion)
tn (trillion)
K(thousand)
k (thousand)
M (million)
m (million)
$1.2 K
one point two thousand dollars
measure
Format
Example
Description
Number + Chinese unit
2 pieces
Common Chinese units and unit abbreviations are supported.
120 hectares
More than 100 milligrams
About 100 meters
More than 100 persons
1 centimeter and 20 millimeters
120.00 square kilometers
Number + unit abbreviation
120.56 cm²
One hundred twenty square meters fifty-six square centimeters
100 m 12 cm 6 mm
Range
10~15 kg
10.24 to 789.82 Mu
10 meters to 15 meters
10.24 cm~19.08 cm
Number + unit + "/" + unit
CNY 10/kg
CNY 199 to 299/piece
CNY 299.99/g to CNY 399.99/g
Other default readings
12 bunches
30 rm
400,000,000 fellows
12.897 micrograms
Format
Examples
English output
Description
Number + measurement unit
1.0 kg
one kilogram
Integers, decimals, and international expressions separated by commas (,) are supported.
Common unit abbreviations are supported.
1,234.01 km
one thousand two hundred thirty four point zero one kilometres.
Measurement unit
mm2
square millimetre
The following table describes the common notations that the <say-as> tag supports.
Notations
Pronunciation in English
!
exclamation mark
"
double quote
#
pound
$
dollar
%
percent
&
and
'
left quote
(
left parenthesis
)
right parenthesis
*
asterisk
+
plus
,
comma
-
dash
.
dot
/
slash
:
solon
;
semicolon
<
less than
=
equals
>
greater than
?
question mark
@
at
[
left bracket
\
back slash
]
right bracket
^
caret
_
underscore
`
back quote
{
left brace
|
vertical bar
}
right brace
~
tilde
!
exclamation mark
"
left double quote
"
right double qute
'
left quote
'
right quote
(
left parenthesis
)
right parenthesis
,
comma
.
full stop
--
em dash
:
colon
;
semicolon
?:
question mark
,
enumeration comma
...
ellipsis
...
ellipsis
《
left guillemet
》
right guillemet
¥
yuan
≥
greater than or equal to
≤
less than or equal to
≠
not equal
≈
approximately equal
±
plus or minus
×
times
π
pi
Α
alpha
Β
beta
Γ
gamma
Δ
delta
Ε
epsilon
Ζ
zeta
Θ
theta
Ι
iota
Κ
kappa
∧
lambda
Μ
mu
Ν
nu
Ξ
ksi
Ο
omicron
∏
pi
Ρ
rho
∑
sigma
Τ
tau
Υ
upsilon
Φ
phi
Χ
chi
Ψ
psi
Ω
omega
α
alpha
β
beta
γ
gamma
δ
delta
ε
epsilon
ζ
zeta
η
eta
θ
theta
ι
iota
κ
kappa
λ
lambda
μ
mu
ν
nu
ξ
ksi
ο
omicron
π
pi
ρ
rho
σ
sigma
τ
tau
υ
upsilon
φ
phi
χ
chi
ψ
psi
ω
omega
The following table describes the measurement units that the <say-as> tag supports.
Format
Type
English example
Abbreviation
Length
nm (nanometre), μm (micrometre), mm (millimetre), cm (centimetre), m (metre), km (kilometre), ft (foot), and in (inch)
Area
cm² (square centimetre), ㎡ (square metre), km2 (square kilometre), and SqFt (square foot)
Volume
cm³ (cubic centimetre), m³ (cubic metre), km3(cubic kilometre), mL (millilitre), L (millilitre), gal (gallon)
Weight
μg (microgram), mg (microgram), g (gram), and kg (kilogram)
Time
min (minute), sec (second), ms (millisecond)
Electromagnet
μA (microamp), mA (milliamp), Hz (hertz), kHz (kilohertz), MHz (megahertz), GHz (gigahertz), V (volt), kV (kilovolt), and kWh (kilowatt hour)
Voice
dB (decibel)
Pressure
Pa (pascal), kPa (kilopascal), MPa (megapascal)
Other common units
The following types of English measurement units are also supported: tsp (teaspoon), rpm (round per minute), KB (kilobyte), and mmHg (millimetre of mercury).
Tag relationships
The <sub> tag can contain texts.
Examples
cardinal
<speak><say-as interpret-as="cardinal">12345</say-as></speak>
Synthesis result in Chinese: SSML-say-as_Cardinal.mp3
<speak><say-as interpret-as="cardinal">10234</say-as></speak>
Synthesis result in English: en-SSML-say-as_cardinal.mp3
digits
<speak><say-as interpret-as="digits">12345</say-as></speak>
Synthesis result in Chinese: SSML-say-as_digit.mp3
<speak><say-as interpret-as="digits">10234</say-as></speak>
Synthesis result in English: en-SSML-say-as_digits.mp3
telephone
<speak><say-as interpret-as="telephone">12345</say-as></speak>
Synthesis result in Chinese: SSML-say-as_Telephone.mp3
<speak><say-as interpret-as="telephone">10234</say-as></speak>
Synthesis result in English: en-SSML-say-as_telephone.mp3
name
<speak>Her previous name is<say-as interpret-as="name"> Zeng Xiaofan.</say-as></speak>
Synthesis result: SSML-say-as_Name.mp3
address
<speak><say-as interpret-as="address">No. 304 Unit 3 Building 1 Fuluguoji</say-as></speak>
Synthesis result: SSML-say-as_Address.mp3
id
<speak><say-as interpret-as="id">myid_1998</say-as></speak>
Synthesis result: SSML-say-as_id.mp3
characters
<speak><say-as interpret-as="characters">Greek letter αβ</say-as></speak>
Synthesis result in Chinese: SSML-say-as_characters.mp3
<speak><say-as interpret-as="characters">*b+3.c$=α</say-as></speak>
Synthesis result in English: en-SSML-say-as_characters.mp3
punctuation
<speak><say-as interpret-as="punctuation"> -./:;</say-as></speak>
Synthesis result: SSML-say-as_punctuation.mp3
date
<speak><say-as interpret-as="date">1000-10-10</say-as></speak>
Synthesis result in Chinese: SSML-say-as_date.mp3
<speak><say-as interpret-as="date">10-01-2020</say-as></speak>
Synthesis result in English: en-SSML-say-as_date.mp3
time
<speak><say-as interpret-as="time">5:00am</say-as></speak>
Synthesis result in Chinese: SSML-say-as_time.mp3
<speak><say-as interpret-as="time">0500</say-as></speak>
Synthesis result in English: en-SSML-say-as_time.mp3
currency
<speak><say-as interpret-as="currency">13,000,000.00RMB</say-as></speak>
Synthesis result in Chinese: SSML-say-as_currency.mp3
<speak><say-as interpret-as="currency">$1,000.01</say-as></speak>
Synthesis result in English: en-SSML-say-as_currency.mp3
measure
<speak><say-as interpret-as="measure">100m12cm6mm</say-as></speak>
Synthesis result in Chinese: SSML-say-as_measure.mp3
<speak><say-as interpret-as="measure">1,000.01kg</say-as></speak>
Synthesis result in English: en-SSML-say-as_measure.mp3
Comprehensive example
<speak>In the Northern Song Dynasty, <say-as interpret-as="date">on October 10, 1121</say-as>,<say-as interpret-as="address">the outskirts of Kaifeng City</say-as>was immersed in the joyful atmosphere of<sub alias="Double eleven">Double eleven</sub>shopping festival. As a caravan of pack mules entered the city gate, a beautiful woman<phoneme alphabet="py" ph="de5">approached</phoneme>a man named<say-as interpret-as="name">A Fa, who was in the front of the team. </say-as></speak>
<speak>"Hi there, our store has a special promotion today. All shoes are on sale<say-as interpret-as="digits">199</say-as>get <say-as interpret-as="cardinal">100 off</say-as>. Don't miss out." </speak>
<speak>"Thanks, but we really need to get going. It is<say-as interpret-as="time">09:59:59</say-as>. If we don't deliver these goods on time, the whole supply chain could fail." </speak>
<speak><say-as interpret-as="name">A Fa</say-as>wiped the sweat from his brow as he guided his team through the crowded alleys filled with vendors shouting out to their customers.</speak>
<speak>Get latest colored fabrics here. Buy two and get one free;</speak>
<speak>Best selling hats. We are offering a seven-day unconditional return policy;</speak>
<speak>Treat all types of intractable diseases for both men and women. </speak>
<speak>Suddenly, a horse got scared and started running quickly down the road. A child was also frightened and stumbled into the arms of his mother,<break time="50ms"/>crying:</speak>
<speak>"Mommy, mommy!"</speak>
<speak>At that moment,<say-as interpret-as="name">A Fa</say-as> thought</speak>
<speak>"I’m so scared!"</speak>
<speak>He quickly covered his <phoneme alphabet="py" ph="he2 bao1">wallet</phoneme> and continued on his way to deliver the goods. Along the way,<say-as interpret-as="address">Kaifeng City</say-as>whose bustling scene<say-as interpret-as="name">gives A Fa</say-as>a deep impression. </speak>
<speak>As time passed and the prosperity of the city faded, he picked up his brush and painted on a long scroll during the shopping festival. The scroll painting is named Along the River During the Qingming Festival. </speak>