All Products
Search
Document Center

Intelligent Media Services:SSML overview

Last Updated:Dec 09, 2024

This topic describes the features and tags of Speech Synthesis Markup Language (SSML) and provides examples on how to use SSML.

Overview

SSML is an XML-based markup language for speech synthesis. Compared with plain text synthesis, SSML-based synthesis improves the quality of synthesized content and supports various synthesis effects. You can use SSML to specify the content that the speech synthesis service reads and specifies how the service reads the text. For example, you can specify how to break sentences and words, control the pronunciation, and pauses.

Note

The speech synthesis service provided by Alibaba Cloud is implemented based on SSML 1.0 of World Wide Web Consortium (W3C). For more information, see Speech Synthesis Markup Language (SSML) Version 1.1. However, not all the markup types that are defined in the W3C standard are supported. The speech synthesis service supports markup types based on your business requirements.

Usage notes

  • SSML is supported for Chinese and English, and the SSML tags and content supported for each language vary. The following sections describe the tags and provide examples on the tags.

  • All texts must be enclosed between the <speak> and </speak> tags. You can use the combination of the <speak> and </speak> tags multiple times in a speech synthesis task and use SSML together with texts.

  • The XML header before the <speak> tag at the beginning of a text can be omitted.

  • If the text enclosed by a tag contains special XML characters, you must escape the characters. The following section describes the special characters and the corresponding escape characters:

    • Double quotation marks ("): &quot;

    • Single quotation mark ('): &apos;

    • Ampersand (&): &amp;

    • Less-than sign (<): &lt;

    • Greater-than sign (>): &gt;

Note

Intelligent voices and human voice cloning (Basic Edition) support all SSML tags and attributes that are described in this topic.

Human voice cloning (Public Edition) supports the <speak>, <break>, <s>, <sub>, <w>, <phoneme>, and <say-as> tags for human voice cloning (Public Edition). In this case, the <speak> tag supports the rate, pitch, and volume attributes. Other tags do not support attribute configuration and referred to as tags with empty attributes.

Tags

<speak>

  • Description

    The <speak> tag is the root node of all SSML tags to be supported. All texts that needs to call SSML tags must be enclosed between the <speak> and </speak> tags.

  • Syntax

    <speak>Text that needs to call SSML tags</speak>
  • Attributes

    The following table describes the attributes that are supported by the <speak> tag.

    Attribute name

    Attribute type

    Attribute value

    Required

    Description

    voice

    String

    The name of the voice that can be called. The value of the voice attribute can only contain lowercase letters, such as siyue.

    No

    This attribute is included in the proprietary tag of Alibaba Cloud for speech synthesis. This attribute specifies the voice that is used for speech synthesis. The specified voice has a higher priority than the voice that is specified by the voice parameter in an API request.

    For more information, see Intelligent voice samples.

    encodeType

    String

    PCM/WAV/MP3

    No

    This attribute is included in the proprietary tag of Alibaba Cloud for speech synthesis. This attribute specifies the audio file format for speech synthesis. The specified audio file format has a higher priority than the audio file format that is specified by the format parameter in an API request.

    sampleRate

    String

    8000/16000/24000/48000

    No

    This attribute is included in the proprietary tag of Alibaba Cloud for speech synthesis. This attribute specifies the audio sampling rate for speech synthesis. The specified audio sampling rate has a higher priority than the audio sampling rate that is specified by the sample_rate parameter in an API request.

    rate

    String

    Valid values: an integer ranging from -500 to 500. Default value: 0.

    • A value greater than 0 indicates that the speech rate is increased.

    • A value less than 0 indicates that the speech rate is reduced.

    No

    This attribute is included in the proprietary tag of Alibaba Cloud for speech synthesis. This attribute specifies the audio speed for speech synthesis. The specified audio speed has a higher priority than the audio speed that is specified by the speech_rate parameter in an API request.

    pitch

    String

    Valid values: an integer ranging from -500 to 500. Default value: 0.

    • A value greater than 0 indicates that the pitch rises.

    • A value less than 0 indicates that the pitch falls.

    No

    This attribute is included in the proprietary tag of Alibaba Cloud for speech synthesis. This attribute specifies the audio pitch for speech synthesis. The specified audio pitch has a higher priority than the audio pitch that is specified by the pitch_rate parameter in an API request.

    volume

    String

    Valid values: an integer ranging from 0 to 100. Default value: 50.

    • A value greater than 50 indicates that the volume is increased.

    • A value less than 50 indicates that the volume is reduced.

    No

    This attribute is included in the proprietary tag of Alibaba Cloud for speech synthesis. This attribute specifies the audio volume for speech synthesis. The specified audio volume has a higher priority than the audio volume that is specified by the volume parameter in an API request.

    effect

    String

    robot/lolita/lowpass/echo/eq/lpfilter/hpfilter

    No

    This attribute is included in the proprietary tag of Alibaba Cloud for speech synthesis. This attribute can be used to produce various sound effects for the synthesized speech. Valid values:

    • robot: robot voice

    • lolita: little girl voice

    • lowpass: low-pass effect

    • echo: echo effect

    • eq: equalizer

    • lpfilter: low-pass filter

    • hpfilter: high-pass filter

    Note

    The eq, lpfilter, and hpfilter values specify advanced filters. If you set this attribute to eq, lpfilter, or hpfilter, you can configure the effectValue attribute to specify a custom effect for the specified filter.

    An SSML structure supports only one sound effect. You cannot set this attribute to multiple values.

    If you configure this attribute, the system latency may increase.

    effectValue

    String

    The effect of a specific filter. If you set the effect attribute to eq, lpfilter, or hpfilter, you can configure this attribute to modify the default effect of the specified filter.

    No

    • eq: specifies the equalizer. The system provides eight default bands Frequencies: ["40Hz", "100Hz", "200Hz", "400Hz", "800Hz", "1600Hz", "4000Hz", "12000Hz"]; Bandwidths: ["1.0q", "1.0q", "1.0q", "1.0q", "1.0q", "1.0q", "1.0q", "1.0q"]. If you configure this attribute, you must specify a gain for each band. The gain ranges from -20 dB to 20 dB. For example, you can set the effectValue attribute to 1 1 1 1 1 1 1 1. The input value is a string consisting of eight integers separated by spaces. The value 0 indicates that the gain of the band is not adjusted.

    • lpfilter: the frequency of the low-pass filter. The value is an integer in the range of (0, Required sampling rate/2]. For example, you can set the effectValue attribute to 800.

    • hpfilter: the frequency of the high-pass filter. The value is an integer in the range of (0, Required sampling rate/2]. For example, you can set the effectValue attribute to 1200.

    bgm

    String

    The name of the background music (BGM) that can be called online. You can view the description of the bgm attribute to obtain more information.

    No

    This attribute is included in the proprietary tag of Alibaba Cloud for speech synthesis. This attribute specifies the BGM of the synthesized speech.

    backgroundMusicVolume

    String

    Valid values: an integer ranging from 0 to 100. Default value: 50.

    • A value greater than 50 indicates that the volume is increased.

    • A value less than 50 indicates that the volume is reduced.

    No

    This attribute is included in the proprietary tag of Alibaba Cloud for speech synthesis. This attribute specifies the volume of the BGM.

    The following table describes the bgm attribute.

    Built-in BGM URL

    Custom BGM URL

    The speech synthesis service provides several built-in BGM streams. You can click the following URLs to listen to the BGM streams:

    You can use custom BGM based on your business requirements. Before you specify custom BGM, you must store the BGM in your Alibaba Cloud Object Storage Service (OSS) bucket whose access control list (ACL) is public read or public read/write. For more information about how to create a bucket, see Create a bucket. You can use the HTTP or HTTPS protocol to generate a URL for the object that is stored in a bucket. For more information, see Step 2: Upload an object.

    Requirements for audio files to be uploaded:

    • The audio file must be a mono WAV file with a sampling rate of 16 kHz.

    • The size of a short text for speech synthesis does not exceed 3.5 MB. The size of a long text for speech synthesis does not exceed 10 MB.

    • If the synthesis duration is longer than the BGM duration, the BGM is cyclically played. If your audio file is not in the WAV format, you can run the following command to convert the audio file into the WAV format: ffmpeg -i Input audio file -acodec pcm_s16le -ac 1 -ar 16000 Required audio file.wav.

    • If the URL in the tag contains special XML characters, escape the characters.

    • The bit depth is 16 bits.

    Important

    You are legally liable for the copyright of the uploaded audio file.

  • Tag relationships

    The <speak> tag can contain texts and the following tags:

    • <break>

    • <s>

    • <w>

    • <phoneme>

    • <say-as>

  • Examples

    • Empty attribute

      <speak>Text that needs to call SSML tags</speak>

      Synthesis result: SSML-speak1.mp3

    • Attribute voice

      <speak voice="xiaogang"> This is a male voice. </speak>

      Synthesis result: SSML-speak2.mp3

    • Attribute encodeType

      <speak encodeType="mp3">I can generate audio in the compressed format. </speak>

      Synthesis result: SSML-encode.mp3

    • Attribute sampleRate

      <speak sampleRate="8000">The size of the file is half of the audio at a sampling rate of 16 kHz. </speak>

      Synthesis result: SSML-speak4.mp3

    • Attribute rate

      <speak rate="200">I speak faster than the average. </speak>

      Synthesis result: SSML-speak5.mp3

    • Attribute pitch

      <speak pitch="-100">My voice pitch is lower than others. </speak>

      Synthesis result: SSML-speak6.mp3

    • Attribute volume

      <speak volume="80">My voice is loud. </speak>

      Synthesis result: SSML-speak7.mp3

    • Combination of attributes that are separated by spaces

      <speak rate="200" pitch="-100" volume="80">This is how my voice sound when multiple attributes are used. </speak>

      Synthesis result: SSML-speak8.mp3

    • Attribute effect

      <speak effect="robot">Do you like the Wall-E robot? </speak>

      Synthesis result: SSML-speak9.mp3

    • Attribute bgm

      <speak bgm="http://nls.alicdn.com/bgm/2.wav" backgroundMusicVolume="30" rate="-500" volume="40"><break time="2s"/>The ancient trees on the shady cliffs are covered in a thick layer of moss<break time="700ms"/>The sound of rain can still be heard echoing through the bamboo forest<break time="700ms"/>The silk production contributes to the national economy<break time="700ms"/>The scenery of Mianzhou is worth seeing<break time="2s"/></speak>

      Synthesis result: SSML-speak10.mp3

<emotion>

  • Description

    The <emotion> tag is used to apply multi-emotional voices to speech synthesis. This tag is optional. If you configure the tag for a voice that does not support multiple emotions, an error occurs.

  • Syntax

    <emotion category="happy" intensity="1.0">What a nice day! </emotion>
  • Attributes

    The following table describes the attributes that are supported by the <emotion> tag.

    Attribute name

    Attribute type

    Attribute value

    Required

    Description

    category

    String

    Enumeration value, such as neutral and happy.

    Yes

    The speech emotion. The following table describes the supported emotions for each voice.

    intensity

    String

    The value is a floating-point number within the range of 0.01 to 2.0.

    No

    The intensity of the emotion. The default value is 1.0, which indicates the predefined emotional intensity. The minimum value is 0.01, which indicates a slight inclination toward a specific emotion. The minimum value is 2.0, which indicates that the emotional intensity is doubled.

    The multi-emotional voices support different emotion categories.

    Voice name

    Value of voice

    Emotion category

    Zhi Miao_multi-emotional

    zhimiao_emo

    serious, sad, disgust, jealousy, embarrassed, happy, fear, surprise,neutral, frustrated, affectionate, gentle, angry, newscast, customer-service, story, and living

    Zhi Mi_multi-emotional

    zhimi_emo

    angry, fear, happy, hate, neutral, sad, and surprise

    Zhi Yan_multi-emotional

    zhiyan_emo

    neutral, happy, angry, sad, fear, hate, surprise, and arousal

    Zhi Bei_multi-emotional

    zhibei_emo

    neutral, happy, angry, sad, fear, hate, and surprise

    Zhi Tian_multi-emotional

    zhitian_emo

    neutral, happy, angry, sad, fear, hate, and surprise

  • Tag relationships

    The <emotion> tag can contain texts and the following tags:

    • <s>

    • <sub>

    • <say-as>

    • <w>

    • <phoneme>

    • <soundEvent/>

    • <break/>

  • Example

    • Empty attribute

      <speak voice="zhitian_emo"><emotion category="happy" intensity="1.0">What a nice day! </emotion></speak>

      Synthesis result: SSML-emotion.wav

<break>

  • Description

    The <break> tag inserts pauses in texts and is optional.

  • Syntax

    <break time="string"/>
  • Attributes

    Attribute name

    Attribute type

    Attribute value

    Required

    Description

    time

    String

    [number]s/[number]ms

    No

    The pause length, in seconds or milliseconds. Example: 2 seconds or 50 milliseconds.

    • If the pause is in seconds, the value of number is an integer within the range of [1, 10]. In this case, the value is in the format of [number]s.

    • If the pause is in milliseconds, the value of number is an integer within the range of [50, 10000]. In this case, the value is in the format of [number]ms.

  • Tag relationships

    The <break> tag is an empty tag and cannot contain any tags. If the <s> tag is used, you must enclose the <break> tag between the <s> and </s> tags, which indicates that a pause is inserted into the sentences or paragraphs.

  • Example

    <speak>Close your eyes and have a rest.<break time="500ms"/>OK, please open your eyes. </speak>

    Synthesis result: SSML-break.mp3

<s>

  • Description

    The <s> tag specifies the sentence structure in a text and is optional.

  • Syntax

     <s>Text</s>
  • Attributes

    N/A.

  • Tag relationships

    The <s> tag can contain texts and the following tags:

    • <break>

    • <w>

    • <phoneme>

    • <say-as>

  • Example

    <speak><s>This is the first sentence.</s><s>This is the second sentence.</s></speak>

    Synthesis result: SSML-s.mp3

<sub>

  • Description

    The <sub> tag is used to replace the text enclosed by a tag with an alias.

  • Syntax

     <sub alias="string"></sub>
  • Attributes

    Attribute name

    Attribute type

    Attribute value

    Required

    Description

    alias

    String

    The content of the new text.

    Yes

    The text that is used to replace the text in a tag.

  • Tag relationships

    The <sub> tag can contain texts.

  • Example

    <speak><sub alias="Network protocol standard">W3C</sub></speak>

    Synthesis result: SSML-sub.mp3

<w>

  • Description

    The <w> tag specifies the word structure in a text and is optional. In most cases, spaces are used for word segmentation in English texts. You do not need to use this tag. The text enclosed by the <w> and </w> tags must be an independent word or phrase only in English.

  • Syntax

     <w>Text</w>
  • Attributes

    N/A.

  • Tag relationships

    The <w> tag can contain texts.

  • Examples

    <speak>Mayor of Nanjing<w>Jiang Daqiao</w>gave a speech today. </speak>

    Synthesis result: SSML-w.mp3

<phoneme>

  • Description

    The <phoneme> tag controls the pronunciation of the text enclosed by the tag and is optional. The tag is not supported for English texts.

  • Syntax

    <phoneme alphabet="string" ph="string">Text</phoneme>
  • Attributes

    Attribute name

    Attribute type

    Attribute value

    Required

    Description

    alphabet

    String

    py

    Yes

    The value of py indicates Pinyin.

    ph

    String

    The Pinyin string that corresponds to the text enclosed by the tag.

    Yes

    Value assignment rules for pinyin:

    • Pinyin syllables are separated by spaces. The number of Pinyin syllables must be the same as the number of words.

    • Each Pinyin syllable is composed of sound and tone marks. The tone marks are represented by tone numbers 1 to 5, in which 5 indicates the neutral tone.

  • Tag relationships

    The <phoneme> tag can contain texts.

  • Example

     <speak>qu<phoneme alphabet="py" ph="dian3 dang4 hang2">dian dang hang</phoneme>ba zhe ge wan yi<phoneme alphabet="py" ph="dang4 diao4">dang diao</phoneme></speak>

    Synthesis result: SSML-phoneme.mp3

<soundEvent>

  • Description

    The <soundEvent> tag is used to insert a sound cue in any position of the text during SSML-based synthesis.

  • Syntax

     <soundEvent src="URL"/>
  • Attributes

    Attribute name

    Attribute type

    Attribute value

    Required

    Description

    src

    String

    The URL of the sound cue.

    Yes

    You can use a custom sound cue based on your business requirements. Before you specify a custom BGM, store the BGM in your OSS bucket whose ACL is public read or public read/write. For more information about how to create a bucket, see Create a bucket. You can use the HTTP or HTTPS protocol to generate a URL for the object that is stored in a bucket. For more information, see Step 2: Upload an object.

    Requirements for audio files to be uploaded:

    • The audio file must be a mono WAV file with a sampling rate of 16 kHz.

    • The maximum file size is 2 MB.

    • The bit depth is 16 bits.

    Important

    You are legally liable for the copyright of the uploaded audio file.

  • Tag relationships

    The <soundEvent> tag is an empty tag and cannot contain any tags.

  • Examples

     <speak>A horse was frightened<soundEvent src="http://nls.alicdn.com/sound-event/horse-neigh.wav"/>and people scattered to escape.</speak>

    Synthesis result: SSML-sound-event.mp3

<say-as>

  • Description

    The <say-as> tag specifies the type of the text enclosed by the tag, so that the text can be pronounced based on the default pronunciation method of this type.

  • Syntax

    <say-as interpret-as="string">Text </say-as>
  • Attributes

    Attribute name

    Attribute type

    Attribute value

    Required

    Description

    interpret-as

    String

    cardinal/digits/telephone/name/address/id/characters/punctuation/date/time/currency/measure

    Yes

    The type of the text enclosed by the tag. Valid values:

    • cardinal: The text is read as an integer or decimal number.

    • digits: The text is read as a digit.

    • telephone: The text is read as a phone number.

    • name: The text is read as a name.

    • address: The text is read as an address.

    • id: The text is read as an account name or nickname.

    • characters: The text is read by character.

    • punctuation: The text is read as a punctuation mark.

    • • date: The text is read as a date.

    • • time: The text is read as a time.

    • • currency: The text is read as an amount.

    • • measure: The text is read as a measurement unit.

  • Text types that the <say-as> tag supports

    • cardinal

      Format

      Example

      Description

      Numeric string

      145

      Valid integers: positive and negative integers with a maximum of 20 digits in the range of [-99999999999999999999,99999999999999999999].

      Valid decimals: No limits are imposed on the number of decimal places. However, we recommend that you retain up to 10 decimal places.

      Minus sign + numeric string

      -145

      Numeric string with each three digits separated by a comma

      10,000

      Minus sign + numeric string with each three digits separated by a comma

      -10,124

      Numeric string + decimal point + two zeros

      10.00

      Minus sign + numeric string + decimal point + two zeros

      -110.00

      Numeric string + decimal point + numeric string

      79.090

      Minus sign + numeric string + decimal point + numeric string

      -79.001

      Format

      Example

      English output

      Description

      Numeric string

      145

      one hundred forty five

      Valid integers: positive and negative integers with a maximum of 13 digits in the range of [-999999999999,999999999999].

      Valid decimals: No limits are imposed on the number of decimal places. However, we recommend that you retain up to 10 decimal places.

      A numeric string that starts with a zero

      0145

      one hundred forty five

      Minus sign + numeric string

      -145

      minus hundred forty five

      Numeric string with each three digits separated by a comma

      60,000

      sixty thousand

      Minus sign + numeric string with each three digits separated by a comma

      -208,000

      minus two hundred eight thousand

      Numeric string + decimal point + zero

      12.00

      twelve

      Numeric string + decimal point + numeric string

      12.34

      twelve point three four

      Numeric string with each three digits separated by a comma + decimal point + numeric string

      1,000.1

      one thousand point one

      Minus sign + numeric string + decimal point + numeric string

      -12.34

      minus twelve point three four

      Minus sign + numeric string with each three digits separated by a comma + decimal point + numeric string

      -1,000.1

      minus one thousand point one

      Numeric string (numeric string with each three digits separated by a comma) + hyphen + number (numeric string with each three digits separated by a comma)

      1-1,000

      one to one thousand

      Other default readings

      012.34

      twelve point three four

      None.

      1/2

      one half

      -3/4

      minus three quarters

      5.1/6

      five point one over six

      -3 1/2

      minus three and a half

      1,000.3^3

      one thousand point three to the power of three

      3e9.1

      three times ten to the power of nine point one

      23.10%

      twenty three point one percent

    • digits

      Format

      Example

      Description

      Numeric string

      129090909

      No limits are imposed on the length of the numeric string. We recommend that a numeric string contains up to 20 digits.

      If the numeric string contains more than 10 digits, you must insert a pause after each digit.

      Format

      Example

      English output

      Description

      Numeric string

      12034

      one two zero three four

      No limits are imposed on the length of the numeric string. We recommend that a numeric string contains up to 20 digits.

      When digits in a numeric string are grouped by hyphens (-) or spaces, a comma is inserted between the groups to create a pause. Up to five groups are supported for a numeric string.

      Numeric string + space or conjunction + numeric string + space or conjunction + numeric string + space or conjunction + numeric string

      1-23-456 7890

      one, two three, four five six, seven eight nine zero

    • telephone

      Format

      Example

      Description

      Landline number

      4930286

      A landline number can be seven or eight digits. You can use spaces or hyphen (-) to separate the digits.

      A 7-digit landline number can be divided into two groups. In this case, the first group contains 3 digits, and the second group contains 4 digits. A 8-digit landline number can be divided into two groups. In this case, each group contains 4 digits.

      493 0286

      493-0286

      62552560

      6255 2560

      6255-2560

      Landline number + extension number

      4930286-109

      An extension number can have up to four digits.

      4930286, extension 109

      4930286, extension 109

      4930286, extension 109

      Area code + landline number

      01062552560

      Area codes of 010, 02x, 03xx, 04xx, 05xx, 07xx, 08xx, and 09xx are supported.

      010 62552560

      010 6255 2560

      010 6255-2560

      010-62552560

      010-6255-2560

      (010)62552560

      03198907098

      0319-8907098

      Area code + landline number + extension number

      010 62552560-109

      None.

      010-62552560-109

      (010)62552560-109

      (010)62552560, extension 109

      (010)62552560, extension 109

      (010)62552560, extension 109

      Country code + area code + landline number

      86-010-62791627

      Country code formats of 86, (86), +86, (+86), and 0086 are supported, all of which are read as eight-six.

      (86)10-62791627

      +86-010-62791627

      0086-10-62791627

      (+86)-10-6279 1627

      Country code + area code + landline number + extension number

      (86)21-58118818-207

      None.

      (86)021-5811-8818-207

      (86)021-58118818, x. 207

      (86)21-5811-8818, ex. 207

      +86-021-58118818, extension 207

      Mobile phone number

      139 0000 5678

      A mobile phone number consists of 11 digits and can be separated in the formats of 3-3-5 and 3-4-4.

      139-000-05678

      139 000 05678

      Country code + mobile number

      +86-13900005678

      None.

      (+86)-139-0000-5678

      +8613900005678

      0086-139 000 05678

      Service number

      123

      • Common service numbers are supported.

      • A 10-digit service number can start with 400 or 800 and can be separated in the format of 3-4-4.

      • A 16-digit service number can start with 12530, 17951, and 12593.

      95678

      4008110510

      800-810-8888

      1253013520638377

      Remarks

      (86)(21)9899-80800-0909

      The numeric string and separators are supported. The separators can be parentheses and hyphens (-).

      Format

      Example

      English output

      Description

      Numeric string

      12034

      one two oh three four

      No limits are imposed on the length of the numeric string. We recommend that a numeric string contains up to 20 digits. When digits in a numeric string are grouped by hyphens (-) or spaces, a comma is inserted between the groups to create a pause. Up to five groups are supported for a numeric string.

      Numeric string + space or conjunction + numeric string + space or conjunction + numeric string

      1-23-456 7890

      one, two three, four five six, seven eight nine oh

      Plus sign + numeric string + space or conjunction + numeric string

      +43-211-0567

      plus four three, two one one, oh five six seven

      Left parenthesis + numeric string + right parenthesis + space + numeric string + space or conjunction + numeric string

      (21) 654-3210

      (two one) six five four, three two one oh

    • id

      Format

      Example

      Description

      String

      dell0101

      Uppercase and lowercase letters, digits from 0 to 9, and underscores (_) are supported.

      The output space indicates that a pause is inserted between characters, and characters are read one by one.

      myid_1998

      AiTest

      In English texts, this tag serves the same as the characters tag.

    • characters

      Format

      Example

      Description

      String

      ISBN 1-001-099098-1

      Chinese characters, uppercase and lowercase letters, digits from 0 to 9, and specific full-width and half-width characters are supported.

      The output space indicates that a pause is inserted between characters, and characters are read one by one. If the text enclosed by the tag contains special XML characters, you must escape the characters.

      x10b2345_u

      v1.0.1

      Version 2.0

      Su M MA000

      Airbus A330

      Models s01, s02, and s03

      Airbus A330

      αβγ

      Format

      Example

      English output

      Description

      String

      *b+3$.c-0'=α

      asterisk B plus three dollar dot C dash zero apostrophe equals alpha

      Chinese characters, uppercase and lowercase letters, digits from 0 to 9, and specific full-width and half-width characters are supported.

      The output space indicates that a pause is inserted between characters, and characters are read one by one.

      If the text enclosed by the tag contains special XML characters, you must escape the characters.

    • punctuation

      Format

      Example

      Description

      Punctuations

      ...

      Common Chinese and English punctuation marks are supported. The output space indicates that a pause is inserted between characters, and characters are read one by one.

      If the text enclosed by the tag contains special XML characters, you must escape the characters.

      ...

      !"#$%&

      '()*+

      ,-./:;

      <=>?@

      [\]^_

      In English texts, this tag serves the same as the characters tag.

    • date

      Format

      Example

      Description

      Year

      71

      Two-digit and four-digit years are supported.

      • Two-digit years range from 60 to 99, 00 to 09, and 10 to 19.

      • Four-digit years range from 1000 to 1999 and 2000 to 2099.

      04

      19

      1011

      1998

      2008

      Year and month

      April, 98

      The months from January to September can be represented by a number with or without a zero. For example, in April 1908, April can be represented by 4 or 04.

      April 1998

      August, 08

      August 2008

      Year, month, and day

      April 23, 98

      The days from the first to ninth day in a month can be represented by a number with or without a zero. For example, if you want to represent the date of April 8 in 1908, you can use 4 or 04 to indicate April and 8 or 08 to indicate Day 8.

      April 23, 1998

      August 8, 08

      August 08, 2008

      Year, month, and day

      April 23, 98

      The days from the first to ninth day in a month can be represented by a number with or without a zero. For example, if you want to represent the date of April 8 in 1908, you can use 4 or 04 to indicate April and 8 or 08 to indicate Day 8.

      April 23, 1998

      August 8, 08

      August 08, 2008

      Month and day

      March 20

      None.

      August 07

      Year and month

      2018/08

      Forward slashes (/), hyphens (-), and periods (.) can be used as separators between the days, months and years.

      2018-08

      2018.08

      Year, month, and day

      2018/08/08

      2018-8-8

      2018.08.08

      Year, month, and day~year, month, and day

      September 1~30, 04

      Tildes (~) and hyphens (-) can be used as separators between dates.

      September 01, 2004 - June 08, 2008

      Year, month, and day~day

      September 1~30, 04

      September 01, 2004 - June 08, 2008

      Year and month~year and month

      April, 01~April, 10

      April 2001 ~ April 2010

      Month and day~month and day

      October 1~October 7

      October 01~October 07

      Month and day~day

      October 1~7

      October 01~07

      Year, month, and day

      2018/03/03~2019/01/01

      Forward slashes (/) and periods (.) can be used as separators between the days, months, and years, and tildes (~) and hyphens (-) can be used as separators between dates.

      1997.9.9~1998.9.9

      Month and day

      10/20~10/31

      Month~month

      Jan~Oct

      January~October

      Year, month, and day

      10/20/2018

      Only 4-digit years are supported. Only forward slashes (/) can be used as the separators. Only the format of Month/Day/Year is supported.

      Format

      Example

      English output

      Description

      Four digits/Two digits or four digits-Two digits

      2000/01

      two thousand, oh one

      Cross-year range

      1900-01

      nineteen hundred, oh one

      2001-02

      twenty oh one, oh two

      2019-20

      twenty nineteen, twenty

      1998-99

      nineteen ninety eight, ninety nine

      1999-00

      nineteen ninety nine, oh oh

      A 4-digit number that starts with 1 or 2

      2000

      two thousand

      4-digit years

      1900

      nineteen hundred

      1905

      nineteen oh five

      2021

      twenty twenty one

      Day of the week-Day of the week

      or

      Day of the week~Day of the week

      or

      Day of the week&Day of the week

      mon-wed

      monday to wednesday

      If the text enclosed by the tag contains special XML characters, you must escape the characters.

      tue~fri

      tuesday to friday

      sat&sun

      saturday and sunday

      DD-DD MMM, YYYY

      or

      DD~DD MMM, YYYY

      or

      DD&DD MMM, YYYY

      19-20 Jan, 2000

      the nineteen to the twentieth of january two thousand

      DD specifies the 2-digit day, MMM specifies the 3-letter abbreviation of the month or a full month name, and YYYY specifies the 4-digit year that starts with 1 or 2.

      01 ~ 10 Jul, 2020

      the first to the tenth of july twenty twenty

      05&06 Apr, 2009

      the fifth and the sixth of april two thousand nine

      MMM DD-DD

      or

      MMM DD~DD

      or

      MMM DD&DD

      Feb 01 - 03

      feburary the first to the third

      MMM specifies the 3-letter abbreviation of a month or a full month name, and DD specifies a 2-digit day.

      Aug 10~20

      august the tenth to the twentieth

      Dec 11&12

      december the eleventh and the twelfth

      MMM-MMM

      or

      MMM~MMM

      or

      MMM&MMM

      Jan-Jun

      january to june

      MMM specifies the 3-letter abbreviation of a month or a full month name.

      jul ~ dec

      july to december

      sep&oct

      september and october

      YYYY-YYYY

      or

      YYYY~YYYY

      1990 - 2000

      nineteen ninety to two thousand

      YYYY specifies the 4-digit year that starts with 1 or 2.

      2001~2021

      two thousand one to twenty twenty one

      WWW DD MMM YYYY

      Sun 20 Nov 2011

      sunday the twentieth of november twenty eleven

      WWW specifies the 3-letter abbreviation of the day of the week or the full name for the day of the week. DD specifies the 2-digit day. MMM specifies the 3-letter abbreviation of the month or a full month name. MM specifies the 2-digit month number, the 3-letter abbreviation of the month, or a full month name. YYYY specifies the 4-digit year that starts with 1 or 2.

      WWW DD MMM

      Sun 20 Nov

      sunday the twentieth of november

      WWW MMM DD YYYY

      Sun Nov 20 2011

      sunday november the twentieth twenty eleven

      WWW MMM DD

      Sun Nov 20

      sunday november the twentieth

      WWW YYYY-MM-DD

      Sat 2010-10-01

      aturday october the first twenty ten

      WWW YYYY/MM/DD

      Sat 2010/10/01

      saturday october the first twenty ten

      WWW MM/DD/YYYY

      Sun 11/20/2011

      sunday november the twentieth twenty eleven

      MM/DD/YYYY

      11/20/2011

      november the twentieth twenty eleven

      YYYY

      1998

      nineteen ninety eight

      Other default readings

      10 Mar, 2001

      the tenth of march two thousand one

      None.

      10 Mar

      the tenth of march

      Mar 2001

      march two thousand one

      Fri. 10/Mar/2001

      friday the tenth of march two thousand one

      Mar 10th, 2001

      march the tenth two thousand one

      Mar 10

      march the tenth

      2001/03/10

      march the tenth two thousand one

      2001-03-10

      march the tenth two thousand one

      2000s

      two thousands

      2010's

      twenty tens

      1900's

      nineteen hundreds

      1990s

      nineteen nineties

    • time

      Format

      Example

      Description

      Time

      12:00

      Common time and time range formats are supported.

      12:00:00

      10:20

      10:20:30

      09:18:14

      Point in time~Point in time

      11:00~12:00

      09:00-14:00

      11:00~11:30

      11:00-12:18

      10:30~11:00

      09:28-10:00

      10:20~11:20

      06:00~08:00

      10:20 a.m.~1:30 p.m.

      Abbreviation of time

      5:00 am

      5:30 am

      5:20:12 am

      7:00 am

      7:30 AM

      7:20:12 a.m.

      07:08:12 A.M.

      5:00 pm

      5:30 PM

      5:20:12 p.m.

      05:09:12 P.M.

      9:00 pm

      9:30 pm

      9:20:12 PM

      9:02:12 P.M.

      12:00 pm

      12:30 p.m.

      12:20:12 PM

      Format

      Example

      English output

      Description

      HH:MM AM or PM

      09:00 AM

      nine A M

      HH specifies 1- or 2-digit hours. MM specifies 2-digit minutes. AM specifies the time before noon. PM specifies the time after noon.

      09:03 PM

      nine oh three P M

      09:13 p.m.

      nine thirteen p m

      HH:MM

      21:00

      twenty one hundred

      HHMM

      100

      one oclock

      Point in time-Point in time

      8:00 am - 05:30 pm

      eight a m to five p m

      Common time range and formats are supported.

      7:05~10:15 AM

      seven oh five to ten fifteen A M

      09:00-13:00

      nine oclock to thirteen hundred

    • currency

      Format

      Example

      Description

      Number + currency code

      12.00 RMB

      The following currency codes are supported: AUD, CAD, HKD, JPY, USD, CHF, NOK, SEK, GBP, RMB, CNY, and EUR.

      Integers, decimals, and international expressions separated by commas (,) are supported.

      12.50 RMB

      12,000,000 RMB

      12,000,000.00 RMB

      12,000.35 RMB

      Currency symbol + number

      $12

      The following currency symbols are supported: Canadian dollar ($), US dollar ($), French franc (Fr), Danish krona (kr), pound sterling (£), Chinese yuan (¥), and euro (€).

      Integers, decimals, and international expressions separated by commas (,) are supported.

      $12.00

      $12.12

      $12,000

      $12,000.00

      $12,000.99

      Other default readings

      1213

      None.

      1213 KML

      1213.00 KML

      1213.9 KML

      1,000 KML

      1,000.00 KML

      1,000.98 KML

      12,000

      Format

      Example

      English output

      Description

      Number + currency code

      1.00 RMB

      one yuan

      Integers, decimals, and international expressions separated by commas (,) are supported.

      Supported currency codes:

      CN¥ (yuan)

      CNY (yuan)

      RMB (yuan)

      AUD (australian dollar)

      CAD (canadian dollar)

      CHF (swiss franc)

      DKK (danish krone)

      EUR (euro)

      GBP (british pound)

      HKD (Hong Kong(China) dollar)

      JPY (japanese yen)

      NOK (norwegian krone)

      SEK (swedish krona)

      SGD (singapore dollar)

      USD (united states dollar)

      2.02 CNY

      two point zero two yuan

      1,000.23 CN¥

      one thousand point two three yuan

      1.01 SGD

      one singapore dollar and one cent

      2.01 CAD

      two canadian dollars and one cent

      3.1 HKD

      three hong kong dollars and ten cents

      1,000.00 EUR

      one thousand euros

      Currency code + number

      US$ 1.00

      one US dollar

      Integers, decimals, and international expressions separated by commas (,) are supported.

      Supported currency codes:

      US$ (US dollar)

      CA$ (Canadian dollar)

      AU$ (Australian dollar)

      SG$ (Singapore dollar)

      HK$ (Hong Kong dollar)

      C$ (Canadian dollar)

      A$ (Australian dollar)

      $ (dollar)

      £ (pound)

      € (euro)

      CN¥ (yuan)

      CNY (yuan)

      RMB (yuan)

      AUD (australian dollar)

      CAD (canadian dollar)

      CHF (swiss franc)

      DKK (danish krone)

      EUR (euro)

      GBP (british pound)

      HKD (Hong Kong(China) dollar)

      JPY (japanese yen)

      NOK (norwegian krone)

      SEK (swedish krona)

      SGD (singapore dollar)

      USD (united states dollar)

      $0.01

      one cent

      JPY 1.01

      one japanese yen and one sen

      £1.1

      one pound and ten pence

      € 2.01

      two euros and one cent

      USD 1,000

      one thousand united states dollars

      Number + numerical unit + currency code

      or

      Currency code + number + numerical unit

      1.23 Tn RMB

      one point two three trillion yuan

      The following numerical units are supported:

      thousand

      million

      billion

      trillion

      Mil (million)

      mil (million)

      Bil (billion)

      bil (billion)

      MM (million)

      Bn (billion)

      bn (billion)

      Tn (trillion)

      tn (trillion)

      K(thousand)

      k (thousand)

      M (million)

      m (million)

      $1.2 K

      one point two thousand dollars

    • measure

      Format

      Example

      Description

      Number + Chinese unit

      2 pieces

      Common Chinese units and unit abbreviations are supported.

      120 hectares

      More than 100 milligrams

      About 100 meters

      More than 100 persons

      1 centimeter and 20 millimeters

      120.00 square kilometers

      Number + unit abbreviation

      120.56 cm²

      One hundred twenty square meters fifty-six square centimeters

      100 m 12 cm 6 mm

      Range

      10~15 kg

      10.24 to 789.82 Mu

      10 meters to 15 meters

      10.24 cm~19.08 cm

      Number + unit + "/" + unit

      CNY 10/kg

      CNY 199 to 299/piece

      CNY 299.99/g to CNY 399.99/g

      Other default readings

      12 bunches

      30 rm

      400,000,000 fellows

      12.897 micrograms

      Format

      Examples

      English output

      Description

      Number + measurement unit

      1.0 kg

      one kilogram

      Integers, decimals, and international expressions separated by commas (,) are supported.

      Common unit abbreviations are supported.

      1,234.01 km

      one thousand two hundred thirty four point zero one kilometres.

      Measurement unit

      mm2

      square millimetre

    • The following table describes the common notations that the <say-as> tag supports.

      Notations

      Pronunciation in English

      !

      exclamation mark

      "

      double quote

      #

      pound

      $

      dollar

      %

      percent

      &

      and

      '

      left quote

      (

      left parenthesis

      )

      right parenthesis

      *

      asterisk

      +

      plus

      ,

      comma

      -

      dash

      .

      dot

      /

      slash

      :

      solon

      ;

      semicolon

      <

      less than

      =

      equals

      >

      greater than

      ?

      question mark

      @

      at

      [

      left bracket

      \

      back slash

      ]

      right bracket

      ^

      caret

      _

      underscore

      `

      back quote

      {

      left brace

      |

      vertical bar

      }

      right brace

      ~

      tilde

      exclamation mark

      "

      left double quote

      "

      right double qute

      '

      left quote

      '

      right quote

      (

      left parenthesis

      )

      right parenthesis

      ,

      comma

      .

      full stop

      --

      em dash

      :

      colon

      ;

      semicolon

      ?:

      question mark

      ,

      enumeration comma

      ...

      ellipsis

      ...

      ellipsis

      left guillemet

      right guillemet

      yuan

      greater than or equal to

      less than or equal to

      not equal

      approximately equal

      ±

      plus or minus

      ×

      times

      π

      pi

      Α

      alpha

      Β

      beta

      Γ

      gamma

      Δ

      delta

      Ε

      epsilon

      Ζ

      zeta

      Θ

      theta

      Ι

      iota

      Κ

      kappa

      lambda

      Μ

      mu

      Ν

      nu

      Ξ

      ksi

      Ο

      omicron

      pi

      Ρ

      rho

      sigma

      Τ

      tau

      Υ

      upsilon

      Φ

      phi

      Χ

      chi

      Ψ

      psi

      Ω

      omega

      α

      alpha

      β

      beta

      γ

      gamma

      δ

      delta

      ε

      epsilon

      ζ

      zeta

      η

      eta

      θ

      theta

      ι

      iota

      κ

      kappa

      λ

      lambda

      μ

      mu

      ν

      nu

      ξ

      ksi

      ο

      omicron

      π

      pi

      ρ

      rho

      σ

      sigma

      τ

      tau

      υ

      upsilon

      φ

      phi

      χ

      chi

      ψ

      psi

      ω

      omega

    • The following table describes the measurement units that the <say-as> tag supports.

      Format

      Type

      English example

      Abbreviation

      Length

      nm (nanometre), μm (micrometre), mm (millimetre), cm (centimetre), m (metre), km (kilometre), ft (foot), and in (inch)

      Area

      cm² (square centimetre), ㎡ (square metre), km2 (square kilometre), and SqFt (square foot)

      Volume

      cm³ (cubic centimetre), m³ (cubic metre), km3(cubic kilometre), mL (millilitre), L (millilitre), gal (gallon)

      Weight

      μg (microgram), mg (microgram), g (gram), and kg (kilogram)

      Time

      min (minute), sec (second), ms (millisecond)

      Electromagnet

      μA (microamp), mA (milliamp), Hz (hertz), kHz (kilohertz), MHz (megahertz), GHz (gigahertz), V (volt), kV (kilovolt), and kWh (kilowatt hour)

      Voice

      dB (decibel)

      Pressure

      Pa (pascal), kPa (kilopascal), MPa (megapascal)

      Other common units

      The following types of English measurement units are also supported: tsp (teaspoon), rpm (round per minute), KB (kilobyte), and mmHg (millimetre of mercury).

  • Tag relationships

    The <sub> tag can contain texts.

  • Examples

    • cardinal

      <speak><say-as interpret-as="cardinal">12345</say-as></speak>

      Synthesis result in Chinese: SSML-say-as_Cardinal.mp3

      <speak><say-as interpret-as="cardinal">10234</say-as></speak>

      Synthesis result in English: en-SSML-say-as_cardinal.mp3

    • digits

      <speak><say-as interpret-as="digits">12345</say-as></speak>

      Synthesis result in Chinese: SSML-say-as_digit.mp3

      <speak><say-as interpret-as="digits">10234</say-as></speak>

      Synthesis result in English: en-SSML-say-as_digits.mp3

    • telephone

      <speak><say-as interpret-as="telephone">12345</say-as></speak>

      Synthesis result in Chinese: SSML-say-as_Telephone.mp3

      <speak><say-as interpret-as="telephone">10234</say-as></speak>

      Synthesis result in English: en-SSML-say-as_telephone.mp3

    • name

      <speak>Her previous name is<say-as interpret-as="name"> Zeng Xiaofan.</say-as></speak>

      Synthesis result: SSML-say-as_Name.mp3

    • address

      <speak><say-as interpret-as="address">No. 304 Unit 3 Building 1 Fuluguoji</say-as></speak>

      Synthesis result: SSML-say-as_Address.mp3

    • id

      <speak><say-as interpret-as="id">myid_1998</say-as></speak>

      Synthesis result: SSML-say-as_id.mp3

    • characters

      <speak><say-as interpret-as="characters">Greek letter αβ</say-as></speak>

      Synthesis result in Chinese: SSML-say-as_characters.mp3

      <speak><say-as interpret-as="characters">*b+3.c$=α</say-as></speak>

      Synthesis result in English: en-SSML-say-as_characters.mp3

    • punctuation

      <speak><say-as interpret-as="punctuation"> -./:;</say-as></speak>

      Synthesis result: SSML-say-as_punctuation.mp3

    • date

      <speak><say-as interpret-as="date">1000-10-10</say-as></speak>

      Synthesis result in Chinese: SSML-say-as_date.mp3

      <speak><say-as interpret-as="date">10-01-2020</say-as></speak>

      Synthesis result in English: en-SSML-say-as_date.mp3

    • time

      <speak><say-as interpret-as="time">5:00am</say-as></speak>

      Synthesis result in Chinese: SSML-say-as_time.mp3

      <speak><say-as interpret-as="time">0500</say-as></speak>

      Synthesis result in English: en-SSML-say-as_time.mp3

    • currency

      <speak><say-as interpret-as="currency">13,000,000.00RMB</say-as></speak>

      Synthesis result in Chinese: SSML-say-as_currency.mp3

      <speak><say-as interpret-as="currency">$1,000.01</say-as></speak>

      Synthesis result in English: en-SSML-say-as_currency.mp3

    • measure

      <speak><say-as interpret-as="measure">100m12cm6mm</say-as></speak>

      Synthesis result in Chinese: SSML-say-as_measure.mp3

      <speak><say-as interpret-as="measure">1,000.01kg</say-as></speak>

      Synthesis result in English: en-SSML-say-as_measure.mp3

Comprehensive example

<speak>In the Northern Song Dynasty, <say-as interpret-as="date">on October 10, 1121</say-as>,<say-as interpret-as="address">the outskirts of Kaifeng City</say-as>was immersed in the joyful atmosphere of<sub alias="Double eleven">Double eleven</sub>shopping festival. As a caravan of pack mules entered the city gate, a beautiful woman<phoneme alphabet="py" ph="de5">approached</phoneme>a man named<say-as interpret-as="name">A Fa, who was in the front of the team. </say-as></speak>
<speak>"Hi there, our store has a special promotion today. All shoes are on sale<say-as interpret-as="digits">199</say-as>get <say-as interpret-as="cardinal">100 off</say-as>. Don't miss out." </speak>
<speak>"Thanks, but we really need to get going. It is<say-as interpret-as="time">09:59:59</say-as>. If we don't deliver these goods on time, the whole supply chain could fail." </speak>
<speak><say-as interpret-as="name">A Fa</say-as>wiped the sweat from his brow as he guided his team through the crowded alleys filled with vendors shouting out to their customers.</speak>
<speak>Get latest colored fabrics here. Buy two and get one free;</speak>
<speak>Best selling hats. We are offering a seven-day unconditional return policy;</speak>
<speak>Treat all types of intractable diseases for both men and women. </speak>
<speak>Suddenly, a horse got scared and started running quickly down the road. A child was also frightened and stumbled into the arms of his mother,<break time="50ms"/>crying:</speak>
<speak>"Mommy, mommy!"</speak>
<speak>At that moment,<say-as interpret-as="name">A Fa</say-as> thought</speak>
<speak>"I’m so scared!"</speak>
<speak>He quickly covered his <phoneme alphabet="py" ph="he2 bao1">wallet</phoneme> and continued on his way to deliver the goods. Along the way,<say-as interpret-as="address">Kaifeng City</say-as>whose bustling scene<say-as interpret-as="name">gives A Fa</say-as>a deep impression. </speak>
<speak>As time passed and the prosperity of the city faded, he picked up his brush and painted on a long scroll during the shopping festival. The scroll painting is named Along the River During the Qingming Festival. </speak>