All Products
Search
Document Center

Intelligent Media Services:Automatically align materials and the material length

Last Updated:Dec 03, 2024

This topic describes how to configure inter-track material alignment in the timeline.

I. Background information

When you use a regular timeline to edit materials and you want the audio and video materials of multiple tracks to play and end at the same time, you must specify the TimelineIn and TimelineOut parameters for each material and make sure that different materials have the same length. This topic describes a more convenient material alignment method. You do not need to specify the start time or the end time of each material on the timeline. You only need to configure the alignment parameters (see the following section) to align audios with videos, audios with audios, videos with audios, and videos with videos between different tracks.

II. Detailed introduction

2.1. Introduction to the timeline protocol

You can use the ClipId and ReferenceClipId parameters to specify the alignment relationship between materials.

{
  "VideoTracks": [
    {
      "VideoTrackClips": [
        {
          "In": 0,
          "Out": 5,
          "MediaURL": "https://your-bucket.oss-cn-shanghai.aliyuncs.com/head.mp4"
        },
        {
          "ReferenceClipId": "audio_1",
          "MediaURL": "https://your-bucket.oss-cn-shanghai.aliyuncs.com/video1.mp4"
        },
        {
          "MediaURL": "https://your-bucket.oss-cn-shanghai.aliyuncs.com/end.mp4",
          "In": 0,
          "Out": 5
        }
      ]
    }
  ],
  "AudioTracks": [
    {
      "AudioTrackClips": [
        {
          "TimelineIn": 5,
          "ClipId": "audio_1",
          "MediaId": "7980d8f************e6f7e5696301",
          "In": 0,
          "Out": 10
        }
      ]
    }
  ]
}

In the preceding example, the length, start time, and end time of the second material of the video track are automatically configured based on the first material of the audio track.

image

Limits:

  1. The ClipId and ReferenceClipId parameters are supported only by audio or video tracks and are not supported by effect tracks and image tracks.

  2. Material alignment takes effect only when the materials are in different tracks. Otherwise, the timeline is invalid and video production fails.

  3. If the TimelineIn, TimelineOut, and ReferenceClipId parameters are configured for a clip, material alignment does not take effect and the TimelineIn and TimelineOut parameters prevail.

  4. If the length of a clip is insufficient during alignment, the speed of the clip is decreased to increase the length. For example, if clip A, which is 10 seconds long, is aligned with clip B, which is 20 seconds long, clip A plays at 0.5x speed. This way, clip A plays for 20 seconds.

  5. If the length of a clip is too long during alignment, the clip is automatically truncated. For example, if clip A, which is 20 seconds long, is aligned with clip B, which is 10 seconds long, the first 10 seconds of clip A is retained.

2.2 Common scenarios

The following section describes several common scenarios that require inter-track material alignment:

2.2.1. Simple audio and video alignment

Align audios with videos

{
  "VideoTracks": [
    {
      "VideoTrackClips": [
        {
          "MediaId": "e6f7e57980************d8f696301",
          "In": 0,
          "Out": 4
        },
        {
          "ClipId":"video_1",
          "MediaId": "e6f7e57980************d8f696301",
          "In": 2,
          "Out": 10
        }
      ]
    }
  ],
  "AudioTracks": [
    {
      "AudioTrackClips": [
        {
          "ReferenceClipId": "video_1",
          "MediaId": "7980d8f************e6f7e5696301",
          "Effects": [
            {
              "Type": "Volume",
              "Gain": "0.2"
            }
          ]
        }
      ]
    }
  ]
}

Align videos with audios

{
  "VideoTracks": [
    {
      "VideoTrackClips": [
        {
          "MediaId": "e6f7e57980************d8f696301",
          "In": 0,
          "Out": 5
        },
        {
          "ReferenceClipId":"audio_1",
          "MediaId": "e6f7e57980************d8f696301"
        }
      ]
    }
  ],
  "AudioTracks": [
    {
      "AudioTrackClips": [
        {
          "TimelineIn": 5,
          "ClipId": "audio_1",
          "MediaId": "7980d8f************e6f7e5696301"
        }
      ]
    }
  ]
}

2.2.2. Align videos with audios: The video track contains transitions and the audio track contains multiple speeches. The videos play based on the length of each speech

The following timeline is used as an example:

  1. The audio track contains three materials, which are speeches generated by AI_TTS.

  2. The video track contains five materials. A 2-second transition exists between the second and third videos, and between the third and fourth videos.

  3. The second, third, and fourth video materials of the video track are aligned with the three speeches on the audio track. The speeches start and end at the middle points of the transitions.

image

{
  "VideoTracks": [{
    "VideoTrackClips": [{
      "Out": 5,
      "MediaId": "e6f7e57980************d8f696301"
    },{
      "ReferenceClipId":"speech_1",
      "MediaId": "e6f7e57980************d8f696301",
      "Effects": [{
        "Type": "Transition",
        "SubType": "waterdrop",
        "Duration": 2
      }]
    }, {
      "ReferenceClipId":"speech_2",
      "MediaId": "e6f7e57980************d8f696301",
      "Effects": [{
        "Type": "Transition",
        "SubType": "waterdrop",
        "Duration": 2
      }]
    }, {
      "ReferenceClipId":"speech_3",
      "MediaId": "e6f7e57980************d8f696301"
    }, {
        "Out": 10,
        "MediaId": "e6f7e57980************d8f696301"
    }]
  }],
  "AudioTracks": [{
    "AudioTrackClips": [{
      "TimelineIn":5,
      "Type": "AI_TTS",
      "Content": "Speech 1 Speech 1 Speech 1. Speech 1 Speech 1 Speech 1 Speech 1. Speech 1 Speech 1 Speech 1. Speech 1 Speech 1 Speech 1. Speech 1 Speech 1. Speech 1. Speech 1 Speech 1 Speech 1 Speech 1.",
      "Voice": "sicheng",
      "ClipId":"speech_1",
      "Effects": [{
        "Type": "AI_ASR",
        "Font": "AlibabaPuHuiTi",
        "Alignment": "TopCenter",
        "Y": 90,
        "FontSize": 56,
        "FontColor": "#ffffff"
      }]
    }, {
      "Type": "AI_TTS",
      "Content": "Speech 2 Speech 2 Speech 2 Speech 2 Speech 2. Speech 2 Speech 2 Speech 2 Speech 2. Speech 2 Speech 2 Speech 2 Speech 2 Speech 2 Speech 2 Speech 2. Speech 2 Speech 2 Speech 2 Speech 2.",
      "Voice": "sicheng",
      "ClipId":"speech_2",
      "Effects": [{
        "Type": "AI_ASR",
        "Font": "AlibabaPuHuiTi",
        "Alignment": "TopCenter",
        "Y": 90,
        "FontSize": 56,
        "FontColor": "#ffffff"
      }]
    }, {
      "Type": "AI_TTS",
      "Content": "Speech 3 Speech 3 Speech 3 Speech 3 Speech 3. Speech 3 Speech 3 Speech 3. Speech 3 Speech 3 Speech 3 Speech 3 Speech 3. Speech 3 Speech 3 Speech 3 Speech 3 Speech 3. Speech 3 Speech 3.",
      "Voice": "sicheng",
      "ClipId":"speech_3",
      "Effects": [{
        "Type": "AI_ASR",
        "Font": "AlibabaPuHuiTi",
        "Alignment": "TopCenter",
        "Y": 90,
        "FontSize": 56,
        "FontColor": "#ffffff"
      }]
    }]
  }]
}

2.2.3. Align audios with videos: The audio is a speech and is truncated based on the video length

The following timeline is used as an example:

  1. The video track contains three materials, and the length of the second video is 8 seconds.

  2. The audio track contains a speech generated by AI_TTS. The original speech is longer than 8 seconds.

  3. The audio material is aligned with the second video material. The audio plays for only 8 seconds, and the excess length is automatically truncated.

{
  "VideoTracks": [
    {
      "VideoTrackClips": [
        {
          "MediaId": "e6f7e57980************d8f696301",
          "In": 0,
          "Out": 5
        },
        {
          "ClipId":"video_1",
          "MediaId": "e6f7e57980************d8f696301",
          "In": 10,
          "Out": 18
        },
        {
          "MediaId": "e6f7e57980************d8f696301",
          "In": 3,
          "Out": 10
        }
      ]
    }
  ],
  "AudioTracks": [
    {
      "AudioTrackClips": [
        {
          "ReferenceClipId": "video_1",
          "Type": "AI_TTS",
          "Content": "Hello everyone, hello everyone, hello everyone, hello everyone, hello everyone, hello everyone, hello everyone, hello everyone, hello everyone, hello everyone, hello everyone, hello everyone, hello everyone, hello everyone, hello everyone, hello everyone, hello everyone, hello everyone, hello everyone, hello everyone, hello everyone, hello everyone, hello everyone, hello everyone, hello everyone, hello everyone, hello everyone, hello everyone, hello everyone, hello everyone, hello everyone, hello everyone, hello everyone, hello everyone, hello everyone, hello everyone, hello everyone, hello everyone, hello everyone, hello everyone, hello everyone, hello everyone, hello everyone, hello everyone, hello everyone, hello everyone, hello everyone, hello everyone, hello everyone, hello everyone, hello everyone, hello everyone, hello everyone, hello everyone, hello everyone, hello everyone, hello everyone, hello everyone, hello everyone, hello everyone, hello everyone.",
          "Voice": "Siqi",
          "SpeechRate": 0,
          "PitchRate": 0,
          "Effects": [
            {
              "Type": "AI_ASR",
              "Font": "WenQuanYi Zen Hei Mono",
              "FontSize": 26,
              "FontColorOpacity": 1,
              "FontColor": "#000000",
              "FontFace": {
                "Bold": true,
                "Italic": true,
                "Underline": false
              }
            }
          ]
        }
      ]
    }
  ]
}

2.2.4. Align videos with videos: The background video plays based on the length of the avatar video

The following timeline is used as an example:

  1. The timeline has two video tracks, and each track contains one video. The material of the first track is a regular video, and the material of the second track is a video that is composed of an avatar, subtitles, and a speech.

  2. The material of the first video track is muted, used as the background, and aligned with the material of the second video track.

{
  "VideoTracks": [
    {
      "VideoTrackClips": [
        {
          "ReferenceClipId": "avatar2",
          "MediaId": "e6f7e57980************d8f696301",
          "Effects": [
            {
              "Type": "Volume",
              "Gain": 0
            }
          ]
        }
      ]
    },
    {
      "VideoTrackClips": [
        {
          "ClipId": "avatar2",
          "Type": "AI_Avatar",
          "AvatarId": "yunxin",
          "Content": "This shopping method stores goods in warehouses, which improves logistics efficiency and the safety of goods. Many e-commerce companies have already started experimenting with this model.",
          "X": 50,
          "Y": 0,
          "Effects": [
            {
              "Type": "AI_ASR",
              "Font": "AlibabaPuHuiTi",
              "Alignment": "BottomCenter",
              "Y": 50,
              "FontSize": 40,
              "FontColor": "#ffffff",
              "FontFace": {
                "Bold": true,
                "Italic": false,
                "Underline": false
              }
            }
          ]
        }
      ]
    }
  ]
}

2.2.5 Align images with videos. An avatar video is used as the background and an image is overlaid on the video

The following timeline is used as an example:

  1. The first video track contains three materials. The first 5 seconds and the last 5 seconds comprise the opening and closing segments of the video. The middle segment of the video is composed of an avatar, subtitles, and speeches.

  2. The material of the second video track is an image that is aligned with the avatar video of the first track. The image is overlaid on the video.

{
  "VideoTracks": [
    {
      "VideoTrackClips": [
        {
          "MediaURL": "http://your-bucket.oss-cn-shanghai.aliyuncs.com/opening.mp4",
          "Out": 5
        },
        {
          "ClipId": "avatar2",
          "Type": "AI_Avatar",
          "AvatarId": "yunxin",
          "Content": "This shopping method stores goods in warehouses, which improves logistics efficiency and the safety of goods. Many e-commerce companies have already started experimenting with this model.",
          "X": 50,
          "Y": 0,
          "Effects": [
            {
              "Type": "AI_ASR",
              "Font": "AlibabaPuHuiTi",
              "Alignment": "BottomCenter",
              "Y": 50,
              "FontSize": 40,
              "FontColor": "#ffffff",
              "FontFace": {
                "Bold": true,
                "Italic": false,
                "Underline": false
              }
            }
          ]
        },
        {
          "MediaURL": "http://your-bucket.oss-cn-shanghai.aliyuncs.com/ending.mp4",
          "Out": 5
        }
      ]
    },
    {
      "VideoTrackClips": [
        {
          "ReferenceClipId": "avatar2",
          "Type": "Image",
          "MediaId": "e6f7e57980************d8f696301",
          "Width": 0.2,
          "Height": 0.2,
          "X": 0.1,
          "Y": 0.1
        }
      ]
    }
  ]
}