Multimedia analysis - Platform For AI - Alibaba Cloud Documentation Center

Platform for AI (PAI) provides out-of-the-box multimedia analysis services that are powered by advanced algorithms. These services include Basic Model Service and Advanced Model Service. This topic describes the billing of multimedia analysis and how to use multimedia analysis.

Background information

Multimedia analysis provides the following algorithm-powered services:

Basic Model Service: model services for images, including tagging, quality evaluation, facial attribute analysis (such as appearance, face shape, hairstyle, and hair color), age analysis, figure modification (such as slimming or enlarging), and watermark removal.
Advanced Model Service: model services for videos, including tagging, quality evaluation, multi-modal content classification and tagging (used for posts that contain texts, images, and videos), and AI-generated image tagging (used to improve the performance of text-to-image models).

Billing

Multimedia analysis supports the pay-as-you-go and subscription billing methods. For more information, see Billing of multimedia analysis.

Work with multimedia analysis

Enable multimedia analysis and purchase a resource plan

To enable multimedia analysis in the Scenario-based Solution section of the PAI console, perform the following steps:

Log on to the PAI console.
Follow the instructions in the following figure to enable multimedia analysis.
By default, the pay-as-you-go billing method is used. Fees are calculated based on the number of service calls.

You can purchase resource plans to use multimedia analysis services at a more favorable price. To purchase a resource plan, perform the following steps:

On the Basic Model Service tab of the Multimedia Analytics page, click Purchase Resource Plan.
On the Subscription Model Service page, configure the Quantity, Scenarios, and API Calls parameters, and then click Buy Now.
To use multimedia analysis services, set the Scenarios parameter to Multimedia Analysis-Basic Model Service or Multimedia Analysis-Advanced Model Service. Configure the other parameters based on your business requirements.

Usage notes on multimedia analysis SDK for Python

After you enable multimedia analysis, you can use the multimedia analysis SDK for Python to call the model services. For more information, see Usage notes on multimedia analysis SDK for Python.

Usage notes on multimedia analysis SDK for Java

After you enable multimedia analysis, you can use the multimedia analysis SDK for Java to call the model services. For more information, see Java SDK on GitHub. The SDK for Java has the same parameters as the SDK for Python. For more information about the parameters, see Usage notes on multimedia analysis SDK for Python.

Features of multimedia analysis

Scenario	Service name	Calls consumed per service	Description	Sample value
Basic Model Service	Image quality evaluation	1	Returns a score of the image quality. The score is a floating-point number from 0 to 100.	`"iqa_result":66.88`
	Facial attribute analysis	1	Returns information about facial attributes, including face shape, hair color, hairstyle, and appearance. Detects multiple faces by analyzing facial area coordinates. If no faces are detected, an empty array is returned.	Face shapes: triangular, round, heart, square, oval, diamond, and oblong. Female hairstyles: Bangs types: curtain bangs, braided bangs, side-swept bangs, no bangs, see-through bangs, and blunt bangs. Curl types: flow perm, big wavy curls, small curls, smooth curls, air wave perm, jelly perm, and cone curls. Hairstyles: curly hair, bun, straight hair, tied hair, and braided hair. Hair length: medium-length hair, short hair, and long hair. Male hairstyles: curtains, buzz cut, bullet, crew cut, induction buzz cut, disconnected buzz cut, disconnected bob, and slicked back. Hair colors: black, coffee brown, granny gray, chestnut, brown, gradient, claret, blond, buff, and other colors. Appearance: a score of 0 to 5.
	Age analysis	1	Determines the age of the most prominent face in the image. If multiple faces are detected, only the result for the face that occupies the largest area is returned. If no faces are detected, an error message is returned.	Age groups: `'0-2'`, `'3-9'`, `'10-19'`, `'20-29'`, `'30-39'`, `'40-49'`, `'50-59'`, `'60-69'`, and `'70+'`.
	Image tagging	1	Adds multiple tags to an image. This service can return the Top K tags that have the highest probabilities and their respective probabilities, or return high-dimensional features of the image.	Examples of most-used tags: female, selfie, male, lifestyle, screenshot, food, vehicle, delicacy, game, cartoon, animal, and Korean-style outfit.
	Figure modification	1	Modifies the figure in the image. You can upload a portrait image and modify the figure to be larger or slimmer by adjusting the degree parameter. For example, you can set `degree` to a value that is greater than 0 to slim the figure.	The modified image is returned as a Base64-encoded string.
	Watermark removal	1	Removes the watermarks in the image.	The image without watermarks is returned as a Base64-encoded string.
	AI-generated image tagging	1	Adds tags to images that are generated by text-to-image models, such as Stable Diffusion, to improve the model performance.	Supported tagging models: WD14, Bootstrapping Language-Image Pre-training (BLIP), GenerativeImage2Text (GIT), and Recognize Anything Model (RAM). Sample caption: `"sensitive, 1girl, solo, long hair, looking at viewer, smile, black hair, brown eyes, scarf, lips, realistic"`.
	Custom model service	N. The value of N varies based on the complexity of the custom model service.	Provides customized model services for images and videos.	Determined by the actual model type.
Advanced Model Service	Multi-modal content classification and tagging	1	Adds tags to multi-modal content (such as social media posts). You can categorize and add tags to content that consists of texts and images or texts and videos. The model service also returns the embedding results of high-dimensional features.	Example of most-used categories: life, film and television show, sports, traveling, game, food, and fitness. Example of most-used tags: sports, food, dance, fitness, cooking, traveling, and selfie. Examples of embedding results: `0.915,0.882,0.943,0.978,1.027,1.181,1.066,1.029,0.866,0.716,0.628,1.203,0.689,0.533,0.734,1.038,0.98,0.613,0.96,0.88,0.586,0.702,1.515,0.697,0.987,0.699,1.179,4.274,0.757,0.89,0.805,0.901`
	Video quality evaluation	1	Returns a score of the video quality. The score is a floating-point number from 0 to 100.	`"video_score":20.57`
	Video classification and tagging	1	Returns the category of the video, the top K tags that have the highest probabilities and their respective probabilities, and the high-dimensional features of the video.	Example of most-used categories: life, knowledge, music, technology, game. Example of most-used tags: subtitles, young lady, social news, slimming and body shaping, plots, drama clips, and natural scenery.

Technical support

For additional tests and services, contact Alibaba Cloud technical support.