ApsaraVideo Media Processing (MPS) allows you to convert an audio or video file to one or more files to adapt to different network bandwidths, terminal processing capabilities, and user needs. MPS performs multimodal analysis on the content, text, speeches, and scenes of media files and offers various features, such as automated review, content recognition, and smart editing.
Audio and video transcoding
The audio and video transcoding feature allows you to convert the definition, encoding format, or container format of audio and video streams to adapt to different network bandwidths and playback devices. MPS supports mainstream encoding and container formats, and allows you to perform simple edit operations and add watermarks and captions during transcoding. The following table describes the specifications of the audio and video transcoding feature. To use specifications that are unavailable in the MPS console or API operations, contact technical support with the help of sales staff.
To use the feature described in the following table, you must submit a transcoding job. The regular transcoding fee is charged based on the specifications and length of the output video. For more information, see Audio and video transcoding fees.
Item | Parameter | Description |
Input file | Container format |
|
Video encoding format | Apple ProRes, AVS+, AVS, AVS2, H.263, H.263+, H.264/AVC, H.265/HEVC, H.266/VVC, MJPEG, MPEG-1, MPEG-2, MPEG-4, QuickTime, RealVideo, VP8, VP9, and WMV. | |
Audio encoding format | AAC, AC3, ADPCM, AMR, DSD, EAC3, MP1, MP2, MP3, PCM, RealAudio, Vorbis, and WMA. | |
File size | The maximum size is 100 GB. | |
Chroma | Examples: 4:2:2 and 4:2:0. | |
Output file | Container format | Note
|
Encoding format |
| |
Encoding profile |
| |
Resolution |
| |
Bitrate |
| |
Frame rate | The maximum output frame rate is 60 frames per second (FPS). | |
Sampling bit depth |
| |
Pixel format | Examples: yuv420p, yuvj420p, yuv422p, yuvj422p, yuv444p, and yuvj444. | |
Bitrate control | Variable bitrate (VBR), constant bitrate (CBR), average bitrate (ABR), and constant rate factor (CRF). | |
Scan mode | Scan mode of input video, automatic deinterlacing, interlaced scan, and sequential scan are supported. |
Narrowband HDTM
Narrowband HDTM is a media processing feature based on the transcoding technologies supported by Alibaba Cloud. This feature allows you to improve video compression efficiency and reduce file sizes without compromising the image quality. This way, you can reduce video stuttering during playback and save storage and traffic costs.
To use the Narrowband HD™ feature described in the following table, you must select an appropriate transcoding template when you submit a transcoding job. The Narrowband HD™ transcoding fee is charged based on the specifications and length of the output video.
Feature | Description |
Narrowband HDTM 1.0 | MPS intelligently analyzes details such as scenes, actions, content, and textures in a video. This reduces the bitrate by 20% to 40% without changing the image quality, or improves the definition of videos under the same network bandwidth conditions. Supported codecs are H.264 and H.265. Other configuration items are the same as those of audio and video transcoding. Start a free trial. |
Narrowband HD TM 2.0 | MPS improves the upper limit of the encoder and integrates the definition restoration and enhancement features. This reduces the bitrate by 40% to 60% without changing the image quality, or improves the definition of videos under the lower network bandwidth conditions. Supported codecs are H.264 and H.265. Other configuration items are the same as those of audio and video transcoding. Start a free trial. |
Audio enhancement
Audio enhancement
ApsaraVideo Audio Lab provides full-scenario audio enhancement and repair solutions by combining signal processing and deep learning technologies.
To use the audio enhancement features described in the following table, you must select an appropriate transcoding template when you submit a transcoding job. The audio enhancement fee is charged based on the specifications and length of the output audio. The video transcoding fee is charged based on the billing rules of the feature that you use. To configure a transcoding template for audio enhancement, search for and join the DingTalk group (ID 32171220) to contact Alibaba Cloud technical support.
Feature | Description |
Sound enhancement | MPS supports sound enhancement for mono audio streams, binaural audio streams, and audio streams that use the 5.1 or 7.1 surround sound format. When you use earphones or speakers to play music, a speech, or a video, MPS provides a high-quality, natural, clear, and customizable sound effect. |
Volume normalization | MPS intelligently normalizes the volume of videos. This way, you can resolve the issue of unstable volume due to the volume differences of content sources in scenarios where the short videos or music are in continuous playback. |
High-speed transcoding
MPS supports the high-speed transcoding feature to split a video into multiple segments and then transcode them in parallel. This increases the transcoding speed by 5 to 30 times and significantly reduces the processing duration. This feature is suitable for important content that requires high timeliness, such as news and events.
To use this feature, you must enable an MPS queue for high-speed transcoding and submit a transcoding job to this MPS queue. The high-speed transcoding fee is charged based on the specifications, length, and transcoding speed of the output video. In addition, you are charged for the audio and video transcoding or audio-visual enhancement feature.
Feature | Description |
Speed boost | The transcoding speed can be boosted by 5 to 30 times depending on the properties of the input video, such as the format, resolution, and bitrate of the video. You can specify the expected speed boost for an MPS queue for high-speed transcoding, such as 5 times, 10 times, 20 times, or 30 times. |
Scenarios | We recommend that you use high-speed transcoding for videos that are longer than 30 minutes, or videos that require high frame rates, ultra high definition, and audio-visual enhancement. For more information, see the Limits on high-speed transcoding section of the "Limits" topic. |
Policy | Splitting is not supported for all videos. If you submit a video that is not supported by high-speed transcoding to an MPS queue for high-speed transcoding, the video is transcoded in the regular way by default. |
More features
Media information
MPS can obtain information about audio and video files that are stored in Object Storage Service (OSS), including the resolution, bitrate, frame rate, codec, and format of the files.
You must call the SubmitMediaInfoJob operation to use this feature. You are charged based on the number of API requests. For more information, see the Pricing for API calls section of the "Audio and video transcoding fees" topic.
Video editing
You can perform simple edit operations on videos. For example, you can extract audio or videos, merge videos, clip videos, and mix audio.
To use the video editing features described in the following table, you must configure relevant parameters when you submit a transcoding job. The transcoding fee is charged based on the specifications and length of the output video.
Feature | Description | Parameter of an API operation | MPS console |
Audio extraction | This feature allows you to extract the audio stream from a video by disabling the video stream. | Remove | Supported |
Video extraction | This feature allows you to extract the video stream from a video by disabling the audio stream. | Remove | Supported |
Black bar removal | This feature allows you to detect whether black bars exist in a video. If black bars exist, the system automatically removes the black bars. | Crop | Not supported |
Video cropping | This feature allows you to resize the video image, adjust the position of the resized image, and remove the gaps between the original image and the resized image. | Crop | Not supported |
Black bar addition | This feature allows you to resize the video image, adjust the position of the resized image, and fill the gaps between the original image and the resized image by using black bars. | Pad | Not supported |
Auto-rotate screen | This feature allows you to convert the resolution of a video based on the long and short sides instead of the width and height of the video. If the input videos include videos in landscape mode and portrait mode, we recommend that you enable this feature. | LongShortMode | Supported |
Video rotation | This feature allows you to set the rotation angle of a video. | Rotate | Supported |
Video merging | This feature allows you to merge up to 100 videos into one. You can set the start point in time and length of each video to be merged. | MergeList or MergeConfigUrl | Not supported |
Video clipping |
| Clip | Supported |
Video head and tail | This feature allows you to add dynamic logos at the beginning of a video and specify the content for the video tail. This helps increase product recognition and highlight copyright information. | OpeningList and TailSlateList | Video tail addition is supported. |
Blurring | This feature allows you to blur the specified area of a video. | DeWatermark | Not supported |
Audio mixing | This feature allows you to merge two audio tracks into one. You can use this feature to add background music. | Amix | Not supported |
Video snapshot
You can use the video snapshot feature to take snapshots of a specific size at a specific point in time of a video. The snapshots are used for video thumbnails, sprites, and progress bar thumbnails.
To use the video snapshot features described in the following table, you must submit a snapshot job. The snapshot fee is charged based on the number of snapshots.
Feature | Description | Parameter of an API operation | MPS console |
Static snapshot | This feature allows you to take snapshots of specific sizes at specific points in time of a video in the JPG format. The following snapshot modes are provided:
| SnapshotConfig | Supported |
Sprite snapshot | This feature allows you to create a sprite by merging the snapshots that are taken into a single image based on specific rules. The sprites are in the JPG format. This type of snapshot is taken asynchronously. Users can send a request to query the information about multiple images at a time. This greatly reduces the number of API requests for images and improves client performance. | TileOut and TileOutputFile | Not supported |
WebVTT snapshot | This feature allows you to generate VTT files for the snapshots that are taken or the sprites that are created. VTT files contain the time when snapshots are taken, paths of snapshots, and coordinates of snapshots in sprites. When a client requests an image, the image is displayed after the corresponding VTT file is obtained and parsed. This feature can be used to display thumbnails on the progress bar. | SubOut | Supported |
Keyframe snapshot | This feature allows you to capture only keyframes. If the frame at the specified point in time is not a keyframe, the adjacent keyframe is captured. | FrameType | Supported |
Black screen detection | This feature allows you to detect whether the first snapshot is a black screen. To use this feature, set the Time parameter to 0, which specifies that the snapshots are taken from the start of a video. You can define a black screen by specifying the portion of black pixels in an image and the color value of black pixels. If the black screen detection feature is enabled, the system checks the frames in the first 5 seconds of a video. If a non-black frame exists, the non-black frame is captured. Otherwise, the job fails if the job is a single-snapshot job, or the first black frame is captured if the job is a multi-snapshot job. | BlackLevel and PixelBlackThreshold | Supported |
Video watermarking
This feature allows you to add visible watermarks, such as enterprise logos and TV station logos, to a video to highlight brands, protect copyrights, and raise product popularity. You can also add blind watermarks to a video for copyright tracing. For more information, see the Digital watermarking section of this topic.
To use the video watermarking features described in the following table, you must submit a transcoding job and specify the watermark materials and watermark template. The watermark template is optional. The transcoding fee is charged based on the specifications and length of the output video. The watermarking fee is charged based on the number of watermarks.
Feature | Description | Parameter of an API operation | MPS console |
Image watermark |
| WaterMarks | |
Text watermark |
| WaterMarks | Not supported |
Caption
This feature allows you to add captions to videos for better comprehension and appreciation.
To use the caption feature described in the following table, you must submit a transcoding job or create a transcoding workflow and trigger the workflow. The transcoding fee is charged based on the specifications and length of the output video.
Feature | Description | Parameter of an API operation | MPS console |
Caption packaging | You can integrate caption files and audio and video streams into a master playlist in the M3U8 or MPD format by using a packaging workflow. You can add up to four captions to a master playlist. This allows you to switch between captions of different versions. An HTTP Live Streaming (HLS) packaging workflow supports captions in the VTT format. A Dynamic Adaptive Streaming over HTTP (DASH) packaging workflow supports captions in the VTT, STL, and TTML formats. |
| Supported |
Video packaging
Packaging indicates the process in which a master playlist is generated for multiple video streams at different bitrates, multiple caption streams, and multiple audio streams. The packaging feature allows you to perform the following operations during streaming media playback:
Adaptive streaming: supports automatic bitrate adjustment to ensure smooth live streaming.
Ad placement: supports video ad placement between segments.
To use the video packaging features described in the following table, you must create a transcoding workflow and trigger the workflow. The transcoding fee is charged based on the specifications and length of the output video.
Feature | Description | Parameter of an API operation | MPS console |
HLS packaging | HLS that supports secondary indexes is used for video packaging. HLS supports index files in the M3U8 format and video files in the TS format. | For more information, see How do I perform HLS package? | Supported |
CMAF packaging | Common Media Application Format (CMAF) that supports the output format of HLS or DASH is used for packaging. | N/A | Not supported |
Custom segment length | You can specify a maximum of 10 points in time at which you want to segment a video, and the length of segments. The segment length ranges from 1 second to 60 seconds. This feature allows you to adapt the media segment length to the network bandwidth of playback clients. This way, the loading time of the first frame is reduced. | Segment | Not supported |
Video encryption
To use the video encryption features described in the following table, you must create a transcoding workflow and trigger the workflow. The transcoding fee is charged based on the specifications and length of the output video.
Feature | Description | Parameter of an API operation | MPS console |
HLS encryption | This feature allows you to encrypt a video based on the HLS AES-128 protocol by using a self-managed or Key Management Service (KMS) key. You can decrypt and play the video on a player that supports HLS streams. This ensures video security on mobile devices, and offers high-level security and excellent terminal compatibility. | N/A | Supported |
Alibaba Cloud proprietary cryptography | This feature allows you to encrypt a video based on the Alibaba Cloud proprietary cryptography protocol and convert the video to an encrypted HLS format. Only KMS keys are supported. You must use ApsaraVideo Player to decrypt and play the video. Otherwise, you cannot play or transmit the video even if you download it to an on-premises device. This ensures video security on mobile devices and Flash players. This feature can be used in scenarios such as online education and subscription-based viewing, which require high-level security. | N/A | Supported |
Video AI
Automated review
This feature allows you to review the content in a media file, such as the title, overview, thumbnail, video, and audio. This way, you can efficiently detect prohibited content in a video. This feature can be used in multiple scenarios, such as short video review, live streaming review, and media review.
To use the automated review feature described in the following table, you must submit a media review job. The automated review fee is charged based on the length of the processed video.
Feature | Content | Description |
Content moderation | Pornography detection | Detects pornographic and sexy content from dimensions such as voice, text, and vision. |
Terroristic content detection | Detects terroristic content from more than 10 dimensions, such as weapon, bloody scene, specific costume, smoke and light scene, special symbol, crowd, and parade. | |
Ad violation detection | Detects different forms of ads, such as advertising text, watermarks, QR codes, illegal ads, and mini program codes. | |
Logo detection | Detects logos in a video or image, such as TV station logos, trademarks, and watermarks. This helps protect copyrights. | |
Undesirable content detection | Detects undesirable scenes in a video or image, such as picture-in-picture (PiP), smoking, live broadcasting while driving, and meaningless images. | |
Audio anti-spam | Detects illegal content in audio, such as pornography, terrorism, and abuse. Chinese and English speech recognitions are supported. |
Media fingerprinting
The media fingerprinting feature is implemented based on video recognition technologies developed by Alibaba Cloud. This feature uses a fingerprint to uniquely mark a media file, and allows you to extract and compare the fingerprints among media files. This helps detect duplicate videos and trace the source of video clips.
To use the media fingerprinting features described in the following table, you must submit a media fingerprinting job. The media fingerprinting fee is charged based on the length of the processed video or audio.
Feature | Description |
Media fingerprinting | You can use this feature to extract the fingerprints of videos, import and analyze video fingerprints in the fingerprint library, and search for similar videos. |
Audio fingerprinting | You can use this feature to extract the fingerprints of audio, import and analyze audio fingerprints in the fingerprint library, and search for similar audio. |
Image fingerprinting | You can use this feature to extract the fingerprints of images, import and analyze image fingerprints in the fingerprint library, and search for similar images. |
Text fingerprinting | You can use this feature to extract the fingerprints of text, import and analyze text fingerprints in the fingerprint library, and search for similar text. |
Service management
Feature | Description | Parameter of an API operation | MPS console |
Media management | You can upload, manage, and publish media files. | N/A | N/A |
Workflow orchestration | MPS automatically runs a workflow in the cloud after an audio or video file is uploaded. | N/A | Supported |
Transcoding template | A transcoding template is a collection of transcoding parameters. You can use a transcoding template to simplify the operations when you create a transcoding job or use a workflow. Transcoding templates can be classified into the following types: custom templates, customized templates, and preset templates. | TemplateId | Supported |
Watermark template | A watermark template specifies the settings of multiple parameters, such as the parameters that determine the position and size of watermarks. You can use a watermark template to simplify the watermarking process. | WaterMarkTemplateId | Supported |
Transcoding priority | You can specify the priority of transcoding jobs in an MPS queue. A maximum of 10 priority levels can be specified. | Priority | Not supported |
Conditional transcoding | If the video bitrate, video resolution, or audio bitrate of an input video is less than the specified output settings, the video is transcoded in original quality or no transcoding is performed. | Examples: IsCheckReso and IsCheckResoFail | Supported |
MPS queue | MPS jobs such as transcoding and asynchronous snapshot jobs are asynchronously processed. You must add the jobs to an MPS queue for scheduling and execution. You can create multiple MPS queues and specify the priority of jobs in an MPS queue. A maximum of 10 priority levels are supported. | Priority | Not supported |
Message notification | MPS jobs such as transcoding and asynchronous snapshot jobs are asynchronously processed. You can integrate Message Service (MNS) to associate an MNS topic or queue with an MPS queue or workflow. If a job in the MPS queue is complete or the workflow starts or stops, MPS sends a notification to the specified contact. | NotifyConfig | Supported |