All Products
Search
Document Center

Intelligent Media Services:Produce a video after intelligent data processing

Last Updated:Oct 23, 2025

This topic describes how to modify the Timeline parameter of the SubmitMediaProducingJob operation to produce a video using intelligent processing.

Usage notes

  • Intelligent production supports editing and compositing, effect rendering, and templates for live streams, VOD files, and material files from Object Storage Service (OSS). For more information, see Intelligent production overview.

  • You can produce a video from one or more videos, audio files, images, and subtitle materials by configuring Timeline parameters and calling the SubmitMediaProducingJob operation.

  • A timeline is created when you add materials and configure effects to create a video. A timeline consists of tracks, materials, and effects. For more information, see Timeline configurations.

  • For more information about how to use the IMS SDK to edit audio and video files, see Preparations.

Use AI_ASR to convert speech to text and merge the captions into a video

Set "Type" to "AI_ASR" to convert speech in audio or video to text. You can also set caption styles, such as font and color.

Note

The speech-to-text service is available only in the China (Shanghai), China (Beijing), China (Hangzhou), and China (Shenzhen) regions.

Effect

Timeline example

{
  "VideoTracks": [{
    "VideoTrackClips": [{
      "MediaURL": "https://ice-document-materials.oss-cn-shanghai.aliyuncs.com/test_media/h5.mp4",
      "Effects": [{
        "Type": "AI_ASR",
        "Font": "AlibabaPuHuiTi",
        "Alignment": "TopCenter",
        "Y": 910,
        "Outline": 10,
        "OutlineColour": "#ffffff",
        "FontSize": 60,
        "FontColor": "#000079",
        "FontFace": {
          "Bold": true,
          "Italic": false,
          "Underline": false
        }
      }]
    }]
  }]
}

Use AI_TTS to convert text to speech and merge the speech into a video

Set "Type" to "AI_TTS" to convert text to speech. This feature can be used with the AI_ASR feature. The Content parameter specifies the text to convert. You can also set speech properties, such as Voice, SpeechRate, PitchRate, and Format.

Note
  • The text-to-speech and speech-to-text services are available only in the China (Shanghai), China (Beijing), and China (Hangzhou) regions.

  • By default, AI_TTS splits sentences based on Chinese punctuation marks such as commas and periods. Developers can control the caption style and line break mode for each sentence segment.

Effect

Timeline example

{
  "VideoTracks": [{
    "VideoTrackClips": [{
      "MediaURL": "https://ice-document-materials.oss-cn-shanghai.aliyuncs.com/test_media/h3.mp4",
      "Effects": [{
        "Type":"Volume",
        "Gain":0
      }]
    }]
  }],
  "AudioTracks": [{
    "AudioTrackClips": [{
      "Type": "AI_TTS",
      "Content": "Do you not see the Yellow River come from the sky, rushing into the sea and never come back? Do you not see the mirrors bright in chambers high, grieve over your snow-white hair though once it was silk-black?",
      "Voice": "sicheng",
      "Effects": [{
        "Type": "AI_ASR",
        "Font": "AlibabaPuHuiTi",
        "Alignment": "TopCenter",
        "Y": 900,
        "FontSize": 80,
        "FontColor": "#ffffff",
        "FontFace": {
          "Bold": true,
          "Italic": false,
          "Underline": false
        }
      },{
        "Type":"Volume",
        "Gain":2
      }]
    }]
  }]
}

Use AI_TTS to convert text to speech and control the rhythm or pronunciation with SSML

The Content field in AI_TTS supports the Speech Synthesis Markup Language (SSML). Use SSML to correct the pronunciation of technical terms, add pauses, and include various emotional sound effects.

Effect

Timeline example

{
  "VideoTracks": [{
    "VideoTrackClips": [{
      "Type": "Image",
      "MediaURL": "https://your-bucket***.oss-cn-shanghai.aliyuncs.com/your-image1.jpg",
      "Duration": 3,
      "Effects": [{
          "Radius": 0.1,
          "Type": "Background",
          "SubType": "Blur"
        },
        {
          "Type": "Transition",
          "SubType": "windowslice",
          "Duration": 0.3
        }
      ]
    }, {
      "Type": "Image",
      "MediaURL": "https://your-bucket***.oss-cn-shanghai.aliyuncs.com/your-image2.jpg",
      "Duration": 3,
      "Effects": [{
          "Radius": 0.1,
          "Type": "Background",
          "SubType": "Blur"
        },
        {
          "Type": "Transition",
          "SubType": "windowslice",
          "Duration": 0.3
        }
      ]
    }, {
      "Type": "Image",
      "MediaURL": "https://your-bucket***.oss-cn-shanghai.aliyuncs.com/your-image3.jpg",
      "Duration": 3,
      "Effects": [{
          "Radius": 0.1,
          "Type": "Background",
          "SubType": "Blur"
        },
        {
          "Type": "Transition",
          "SubType": "windowslice",
          "Duration": 0.3
        }
      ]
    }]
  }],
  "AudioTracks": [{
    "MainTrack": true,
    "AudioTrackClips": [{
      "Type": "AI_TTS",
      "Voice": "zhichu",
      "Content": "PU line, short for Polyurethane line, is a molding made from synthetic PU materials. PU stands for polyurethane, and the molding is made from rigid PU foam.",
      "Effects": [{
        "Type": "AI_ASR",
        "Font": "AlibabaPuHuiTi",
        "Alignment": "TopCenter",
        "Y": 1000,
        "FontSize": 50,
        "FontColor": "#ffffff",
        "AdaptMode": "AutoWrap",
        "Outline": 1,
        "OutlineColour": "#0e0100",
        "FontFace": {
          "Bold": true,
          "Italic": false,
          "Underline": false
        }
      }, {
        "Type": "Volume",
        "Gain": 1
      }]
    }, {
      "Type": "AI_TTS",
      "Voice": "zhichu",
      "Content": "<speak><sub alias=\"Pee You\">PU</sub> line, short for Polyurethane line, is a molding made from synthetic <sub alias=\"Pee You\">PU</sub> materials. PU stands for polyurethane, and the molding is made from rigid <sub alias=\"Pee You\">PU</sub> foam.</speak>",
      "Effects": [{
        "Type": "AI_ASR",
        "Font": "AlibabaPuHuiTi",
        "Alignment": "TopCenter",
        "Y": 1000,
        "FontSize": 50,
        "FontColor": "#ffffff",
        "AdaptMode": "AutoWrap",
        "Outline": 1,
        "OutlineColour": "#0e0100",
        "FontFace": {
          "Bold": true,
          "Italic": false,
          "Underline": false
        }
      }, {
        "Type": "Volume",
        "Gain": 1
      }]
    }]
  }],
  "SubtitleTracks": [{
    "SubtitleTrackClips": [{
      "Type": "Text",
      "X": 0,
      "Y": 200,
      "Font": "AlibabaPuHuiTi",
      "Content": "Standard AI_TTS: The pronunciation of the technical term \"PU\" is inaccurate.",
      "Alignment": "TopCenter",
      "FontSize": 70,
      "FontColorOpacity": 1,
      "FontColor": "#990000",
      "AaiMotionLoopEffect1": "slingshot_in",
      "Outline": 1,
      "OutlineColour": "#ffffff",
      "TimelineIn": 0,
      "TimelineOut": 13,
      "AdaptMode": "AutoWrap",
      "FontFace": {
        "Bold": true,
        "Italic": false,
        "Underline": false
      }
    }, {
      "Type": "Text",
      "X": 0,
      "Y": 200,
      "Font": "AlibabaPuHuiTi",
      "Content": "AI_TTS with SSML tags: The pronunciation of the technical term \"PU\" is corrected.",
      "Alignment": "TopCenter",
      "FontSize": 70,
      "FontColorOpacity": 1,
      "FontColor": "#006633",
      "Outline": 1,
      "OutlineColour": "#ffffff",
      "TimelineIn": 13,
      "AdaptMode": "AutoWrap",
      "FontFace": {
        "Bold": true,
        "Italic": false,
        "Underline": false
      }
    }]
  }]
}

AI_Matting: Green screen matting

Set "Type" to "AI_Matting" to extract a subject from a green screen background and superimpose it onto a specified background video or image.

Note

The green screen matting service is available only in the China (Hangzhou), China (Shanghai), and China (Beijing) regions.

Effect

Timeline example

{
  "VideoTracks": [{
    "VideoTrackClips": [{
      "MediaURL": "https://ice-document-materials.oss-cn-shanghai.aliyuncs.com/test_media/background_v2.jpg",
      "Type": "GlobalImage",
      "Width": 1,
      "Height": 1,
      "AdaptMode": "Cover"
    }]
  }, {
    "VideoTrackClips": [{
      "MediaURL": "https://ice-document-materials.oss-cn-shanghai.aliyuncs.com/test_media/green-matting-1.mp4",
      "Effects": [{
        "Type": "AI_Matting",
        "Color": "green",
        "Auto": 1,
        "Thres": 10
      }]
    }]
  }]
}

AI_RealMatting: Background replacement

Set "Type" to "AI_RealMatting" to extract a person from any real-world background and superimpose them onto a specified background video or image.

Note

The background replacement service is available only in the China (Hangzhou), China (Shanghai), and China (Beijing) regions.

Effect

Timeline example

{
  "VideoTracks": [
    {
      "VideoTrackClips": [
        {
          "MediaURL": "https://ice-document-materials.oss-cn-shanghai.aliyuncs.com/test_media/image/03.jpg",
          "Type": "GlobalImage",
          "Width": 0.5,
          "Height": 1,
          "X": 0.5,
          "Y": 0,
          "AdaptMode": "Cover"
        }
      ]
    },
    {
      "VideoTrackClips": [
        {
          "MediaURL": "https://ice-document-materials.oss-cn-shanghai.aliyuncs.com/test_media/h6.mov",
          "In": 0,
          "Out": 10,
          "Width": 0.5,
          "Height": 1,
          "AdaptMode": "Cover",
          "X": 0.5,
          "Effects": [
            {
              "Type": "AI_RealMatting",
              "Thres": 8
            },
            {
              "Type": "Crop",
              "X": 0.25,
              "Height": 1,
              "Width": 0.5
            },
            {
              "Type": "Text"
            }
          ]
        }
      ]
    },
    {
      "VideoTrackClips": [
        {
          "MediaURL": "https://ice-document-materials.oss-cn-shanghai.aliyuncs.com/test_media/h6.mov",
          "In": 0,
          "Out": 10,
          "Width": 0.5,
          "Height": 1,
          "AdaptMode": "Cover",
          "Effects": [
            {
              "Type": "Crop",
              "X": 0.25,
              "Height": 1,
              "Width": 0.5
            },
            {
              "Type": "Volume",
              "Gain": 0
            }
          ]
        }
      ]
    }
  ]
}

Automatically highlight key content in captions with AI_ASR

Set "NeedHighlighting": true in the AI_ASR `Effect` and configure the highlight style. This automatically highlights recognized captions to emphasize key content.

Effect

Timeline example

{
  "VideoTracks": [
    {
      "VideoTrackClips": [
        {
          "MediaURL": "https://ice-document-materials.oss-cn-shanghai.aliyuncs.com/test_media/h4.mp4",
          "Effects": [
            {
              "Type": "AI_ASR",
              "Font": "AlibabaPuHuiTi",
              "Alignment": "TopCenter",
              "Y": 820,
              "FontSize": 60,
              "FontColor": "#FFFFFF",
              "FontFace": {
                "Bold": true,
                "Italic": false,
                "Underline": false
              },
              "NeedHighlighting": true,
              "HighlightingStyle": {
                "FontColor": "F6DD14",
                "OutlineColour": "873600",
                "Outline": 4
              },
              "SubtitleEffects": [
                {
                  "Type": "Box",
                  "Color": "000000",
                  "Opacity": "0.9",
                  "XBord": 30,
                  "YBord": 20
                }
              ]
            }
          ]
        },
        {
          "MediaURL": "https://ice-document-materials.oss-cn-shanghai.aliyuncs.com/test_media/h1.png",
          "Type": "Image",
          "Duration": 12.31,
          "ClipId": "image",
          "Effects": [
            {
              "Type": "Volume",
              "Gain": 0
            }
          ]
        }
      ]
    }
  ],
  "AudioTracks": [
    {
      "AudioTrackClips": [
        {
          "Type": "AI_TTS",
          "Content": "Alibaba Cloud Intelligent Media Services (IMS) is a one-stop service for live streaming and video-on-demand scenarios. It provides capabilities for media ingestion, asset management, content production, and distribution.",
          "ReferenceClipId": "image",
          "Voice": "sicheng",
          "Effects": [
            {
              "Type": "AI_ASR",
              "Font": "AlibabaPuHuiTi",
              "Alignment": "TopCenter",
              "Y": 820,
              "FontSize": 80,
              "FontColor": "#ffffff",
              "FontFace": {
                "Bold": true,
                "Italic": false,
                "Underline": false
              },
              "TextWidth": "0.8",
              "AdaptMode": "AutoWrap",
              "NeedHighlighting": true,
              "HighlightingStyle": {
                "FontColor": "F6DD14",
                "OutlineColour": "873600",
                "Outline": 4
              }
            },
            {
              "Type": "Volume",
              "Gain": 2
            }
          ]
        }
      ]
    }
  ]
}

References