Parameters for intelligent production API operations - ApsaraVideo Media Processing

This topic describes the JobParams and Output request parameters of the SubmitIProductionJob operation, and the Job response parameter of the QueryIProductionJob operation.

CaptionExtraction

Parameter	Type	Description
Output	STRING	If the JobParams parameter is configured to separate Chinese and English, `{resultType}` placeholders are supported in the output file path to specify whether the output caption file is in Chinese or English. zh indicates Chinese and en indicates English.

Parameter description of JobParams

Parameter	Type	Required	Description
fps	INT	No	The sampling frame rate. This parameter is optional. The value is an integer. Valid values: [2,10]. Default value: 5.
roi	LIST	No	The region of interest. If you specify a region of interest, only the text within the region is extracted, and the text outside the region is ignored. By default, if you do not specify this parameter, the text within the bottom quarter of the video image is extracted. Set the value in the following format: [[top, bottom], [left, right]]. Default value: N/A.
sep	BOOLEAN	No	Specifies whether to generate separate Chinese and English SRT files. This parameter is optional. Default value: False.
formatter	STRING	No	The format string of the SRT caption. Example: "{\an8}". This parameter is optional. Default value: N/A.

Callback format

JSON format

{
  "Code":"Success",
  "Details":[],"FunctionName":"CaptionExtraction",
  "JobId":"39f8e0bc005e4f309379701645f4****",
  "Message":"success","State":"Succes"
}

Parameter description of Job

Parameter	Type	Description
Result	STRING	The detailed information about the job result. Example of success result information: `{"Code":"Success","Message":"Successful.","Data":"{\"result\":[{\"file\":\"captionextraction/b48d02b58e9b6a0d1c13271bcf9aa6d7-161121379****.srt\"}]}"}`.

VideoGreenScreenMatting

Parameter description of JobParams

Parameter

Type

Required

Description

bgimage

STRING

The background image for replacement. Example: http://example-image-****.example-location.aliyuncs.com/example/example.jpg.

If you specify this parameter, an MP4 video whose background image is replaced is returned.
If you do not specify this parameter, a WebM video with alpha channels is returned.

Callback format

JSON format

{
  "Code":"Success",
  "Details":[],"FunctionName":"VideoGreenScreenMatting",
  "JobId":"39f8e0bc005e4f309379701645f4****",
  "Message":"success",
  "State":"Success",
  "Type":"IProduction"
}

Parameter description of Job

Parameter	Type	Description
Result	STRING	The detailed information about the job result. Example of success result information: `{"Code":"Success","Message":"Successful.","Data":"{\"result\":[{\"file\":\"videogreenscreenmatting/16e6bc5ca802e12429d082010164dba3-160275535****_matting.mp4\"}]}"}`.

MusicSegmentDetect

Parameter description of JobParams

Parameter	Type	Required	Description
None	None	None	None

Callback format

JSON format

{
  "Code":"Success",
  "Details":[],
  "FunctionName":"MusicSegmentDetect",
  "JobId":"39f8e0bc005e4f309379701645f4****",
  "Message":"success",
  "State":"Success",
  "Type":"IProduction"
}

Parameter description of Job

Parameter	Type	Description
Result	STRING	The detailed information about the job result. Example of success result information: `{"Code":"Success","Data":"{\"result\":[{\"start\":39.32,\"end\":63.85,\"title\":\"Chorus\"},{\"start\":86.69,\"end\":114.45,\"title\":\"Chorus\"},{\"start\":135.75,\"end\":160.27,\"title\":\"Chorus\"}]}","Message":"Successful."}`.

VideoDetext

Parameter description of JobParams

Parameter

Type

Required

Description

Text

LIST

The location of a caption box that you want to remove. A maximum of two caption boxes are supported. Example: [[bx1, by1, bw1, bh1], [bx2, by2, bw2, bh2]].

Note

The location of a caption box must be specified by bx, by, bw, and bh at the same time.

bx: The ratio of the normalized x-coordinate of the upper-left corner of the caption box to the video width. Example: 0.1.
by: The ratio of the normalized y-coordinate of the upper-left corner of the caption box to the video height. Example: 0.0.
bw: The ratio of the normalized width of the caption box to the video width. Example: 0.3.
bh: The ratio of the normalized height of the caption box to the video height. Example: 0.2.

LimitRegion

LIST

The area in which you want to remove captions. The system detects the captions within the specified area and removes the detected captions. This parameter has a lower priority than the Text parameter that directly specifies the location of caption boxes to be removed. Example: [[0, 0.6, 1, 0.4]]. In this example, the system detects the captions within the bottom 40% of the video image and removes the detected captions.

Callback format

JSON format

{
  "Code":"Success",
  "Details":[], 
  "FunctionName":"VideoDetext",
  "JobId":"39f8e0bc005e4f309379701645f4****",
  "Message":"success",
  "State":"Success",
  "Type":"IProduction"
}

Parameter description of Job

Parameter	Type	Description
Result	STRING	The detailed information about the job result. Example of success result information: `{"Details":[],"Message":"success","Code":"Success"}`.

VideoH2V

Parameter description of JobParams

Parameter	Type	Required	Description
None	None	None	None

Callback format

JSON format

{
  "Code":"Success",
  "Details":[],
  "FunctionName":"VideoH2V",
  "JobId":"39f8e0bc005e4f309379701645f4****",
  "Message":"success",
  "State":"Success",
  "Type":"IProduction"
}

Parameter description of Job

Parameter	Type	Description
Result	STRING	The detailed information about the job result. Example of success result information: `{"Details":[],"Message":"success","Code":"Success"}`.

VideoDelogo

Parameter description of JobParams

Parameter	Type	Required	Description
Logo	STRING	No	The position of a logo that you want to remove. Set the value in the format of [xmin, ymin, width, height]. You can remove up to two logos at a time. Example: [[0, 0, 0.3, 0.3], [0.7, 0, 0.3, 0.3]].

Callback format

JSON format

{
  "Code":"Success",
  "Details":[],
  "FunctionName":"VideoDelogo",
  "JobId":"39f8e0bc005e4f309379701645f4****",
  "Message":"success",
  "State":"Success",
  "Type":"IProduction"
}

Parameter description of Job

Parameter	Type	Description
Result	STRING	The detailed information about the job result. Example of success result information: `{"Details":[],"Message":"success","Code":"Success"}`.

Cover

Parameter description of JobParams

Parameter	Type	Required	Description
Model	STRING	No	The smart thumbnail model. A still thumbnail is generated if this parameter is left empty, and an animated thumbnail is generated if this parameter is set to gif.

Callback format

JSON format

{
  "Code":"Success",
  "Details":[],
  "FunctionName":"Cover",
  "JobId":"39f8e0bc005e4f309379701645f4****",
  "Message":"success",
  "State":"Success",
  "Type":"IProduction"
}

Parameter description of Job

Parameter	Type	Description
Result	STRING	The detailed information about the job result. Example of success result information: `{"Details":[],"Message":"success","Code":"Success"}{"Message":"success","Data":"[{\"Score\":8.270855992569906,\"Time\":\"28278.25\",\"Url\":\"cover/test-00001.jpg\"},{\"Score\":7.474117489692728,\"Time\":\"25942.583333333332\",\"Url\":\"cover/test-00002.jpg\"}]","Code":"Success"}`. In this example, `Score` indicates the confidence of the thumbnail result, `Time` indicates the timestamp of the thumbnail frame, and `Url` indicates the URL of the thumbnail.

VideoClip

Parameter description of JobParams

Parameter	Type	Required	Description
None	None	None	None

Callback format

JSON format

{
  "Code":"Success",
  "Details":[],
  "FunctionName":"VideoClip",
  "JobId":"39f8e0bc005e4f309379701645f4****",
  "Message":"success",
  "State":"Success",
  "Type":"IProduction"
}

Parameter description of Job

Parameter

Type

Description

Result

STRING

The detailed information about the job result. Example of success result information:

{"Code":"Success","Message":"Successful.","Data":"{\"result\":[{\"file\":\"videoclip/16e6bc5ca802e12429d082010164****-1602755353502-origin.mp4\"}]}"}

ImageH2V

Parameter description of JobParams

Parameter	Type	Required	Description
None	None	None	None

Callback format

JSON format

{
  "Code":"Success",
  "Details":[],
  "FunctionName":"ImageH2V",
  "JobId":"39f8e0bc005e4f309379701645f4****",
  "Message":"success",
  "State":"Success",
  "Type":"IProduction"
}

Parameter description of Job

Parameter	Type	Description
Result	STRINGN	The detailed information about the job result. Example of success result information: `{"Details":[],"Message":"success","Code":"Success"}`.

ImageDelogo

Parameter description of JobParams

Parameter	Type	Required	Description
None	None	None	None

Callback format

JSON format

{
  "Code":"Success",
  "Details":[],
  "FunctionName":"ImageDelogo",
  "JobId":"39f8e0bc005e4f309379701645f4****",
  "Message":"success",
  "State":"Success",
  "Type":"IProduction"
}

Parameter description of Job

Parameter	Type	Description
Result	STRING	The detailed information about the job result. Example of success result information: `{"Details":[],"Message":"success","Code":"Success"}`.

AudioBeatDetection

Parameter description of JobParams

Parameter	Type	Required	Description
None	None	None	None

Callback format

JSON format

{
  "Code":"Success",
  "Details":[],
  "FunctionName":"AudioBeatDetection",
  "JobId":"39f8e0bc005e4f309379701645f4****",
  "Message":"success",
  "State":"Success",
  "Type":"IProduction"
}

Parameter description of Job

Parameter	Type	Description
Result	STRING	The detailed information about the job result. Example of success result information: `{"Code":"Success","Data":"{\"result\":[{\"file\":\"detectresult/normalvideo-161225931****.txt\"}]}","Message":"Successful."}`.

AudioMixing

Parameter description of JobParams

Parameter	Type	Required	Description
inputs	STRING	No	The list of URLs of the audio track files to be mixed. You can specify only one URL. Example: `{"file":"http://example-bucket-****.oss-cn-shanghai.aliyuncs.com/2.mp4"}`.

Callback format

JSON format

{
  "Code":"Success",
  "FunctionName":"AudioMixing",
  "JobId":"158688059d8443a68b78a65e55b3****",
  "Message":"Successful.",
  "State":"Success",
  "Type":"IProduction",
  "UserData":"test"
}

Parameter description of Job

Parameter	Type	Description
Result	STRING	The detailed information about the job result. Example of success result information: `{"Message":"Successful.","Data":"{\"result\":[{\"file\":\"audiomix/alibaba-161283935****-origin.mp4\"}]}","Code":"Success"}`.

ImageCartoonize

Parameter description of Output

Parameter	Type	Description
Output	STRING	`{resultType}` placeholders are supported in the path to distinguish whether the result file is a cartoonized image or the original image. result indicates a cartoonized image, and origin indicates the original image.

Callback format

JSON format

{
 "Code":"Success",
 "Details":[],
 "FunctionName":"ImageCartoonize",
 "JobId":"39f8e0bc005e4f309379701645f4744c",
 "Message":"success",
 "State":"Success",
 "Type":"IProduction"
}

Parameter description of Job

Parameter	Type	Description
Result	STRING	The detailed information about the job result. Example of success result information: `{"Code":"Success","Data":"{\"result\":[{\"file\":\"iproduction/test-result.jpg\"},{\"file\":\"iproduction/test-origin.jpg\"}]}","Message":"Successful."}`.

AudioQualityAssessment

Parameter description of Job

Parameter	Type	Description
Result	STRING	The detailed information about the job result. The following code shows an example of success result information.

Sample result information:

{
  "Code" : "Success",
  "Data" : "{
    \"result\":[{
        \"Discontinuity\":\"Good\",
        \"Loudness\":\"Excellent\",
        \"Worst MOS(0-5)\":\"0.38\",
        \"Discontinuity(0-5)\":\"3.52\",
        \"Speech Ratio\":\"48.55\",
        \"Loudness(0-5)\":\"4.91\",
        \"Worst Discontinuity(0-5)\":\"0.88\",
        \"Worst Coloration(0-5)\":\"0.42\",
        \"Channel\":\"1\",
        \"Coloration(0-5)\":\"0.99\",
        \"Bad Mute Ratio(%)\":\"0.0\",
        \"Time\":\"2022-12-02 16:14:06\",
        \"Noisiness(0-5)\":\"3.28\",
        \"MOS\":\"Poor\",
        \"Worst Noisiness(0-5)\":\"0.91\",
        \"Double Talk Ratio(%)\":\"19.23\",
        \"Input\":\"/home/admin/algo/quality****/example.wav\",
        \"Total Duration\":\"42.78\",
        \"Noisiness\":\"Good\",
        \"Tag\":\"Valid\",
        \"MOS(0-5)\":\"1.01\",
        \"Loudness(-90dB-0dB)\":\"-0.59\",
        \"Coloration\":\"Bad\",
        \"Saturated Ratio(%)\":\"37.55\"
    },
    {
        \"Discontinuity\":\"Fair\",
        \"Loudness\":\"Excellent\",
        \"Worst MOS(0-5)\":\"0.65\",
        \"Discontinuity(0-5)\":\"2.45\",
        \"Speech Ratio\":\"41.68\",
        \"Loudness(0-5)\":\"4.52\",
        \"Worst Discontinuity(0-5)\":\"0.66\",
        \"Worst Coloration(0-5)\":\"0.72\",
        \"Channel\":\"2\",
        \"Coloration(0-5)\":\"2.34\",
        \"Bad Mute Ratio(%)\":\"0.0\",
        \"Time\":\"2022-12-02 16:14:06\",
        \"Noisiness(0-5)\":\"2.53\",
        \"MOS\":\"Poor\",
        \"Worst Noisiness(0-5)\":\"0.67\",
        \"Double Talk Ratio(%)\":\"25.93\",
        \"Input\":\"/home/admin/algo/quality****/example.wav\",
        \"Total Duration\":\"42.78\",
        \"Noisiness\":\"Fair\",
        \"Tag\":\"Valid\",
        \"MOS(0-5)\":\"1.69\",
        \"Loudness(-90dB-0dB)\":\"-4.82\",
        \"Coloration\":\"Fair\",
        \"Saturated Ratio(%)\":\"0.0\"
    }]
  }",
  "Message" : "Successful."
}

Parameters

Parameter	Description
Time	The timestamp generated when the input file was scored.
Input	The name of the input file.
Total Duration	The duration of the input file. Unit: seconds.
Speech Ratio	The ratio of the duration of the audio data to the duration of the input file. Valid values: [0,100]. Unit: percentage.
Tag	The tag for the input file, which is used to indicate the validity of the detection. Valid values: Valid: The detection is valid, which indicates that subsequent key metrics and the mean opinion score (MOS) are valid. File too Short: The duration of the input file is less than 2s. Mute: The input file does not contain audio data. Voice too Short: The duration of the audio data is less than 2s. Note The preceding four events are mutually exclusive. If the tag for an input file is one of the last three tags, the MOS, Discontinuity, Coloration, and Noisiness parameters are meaningless for the file and the parameter values are 0.
MOS(0-5)	The MOS of the input file, which describes the quality of the audio data. Valid values: [0,5].
MOS	The description of the MOS. Valid values: (4,5]: The quality of the audio data is excellent. [3,4): The quality of the audio data is good. [2,3): The quality of the audio data is fair. [1,2): The quality of the audio data is poor. [0,1): The quality of the audio data is bad.
Discontinuity(0-5)	The continuity score of the audio data. The continuity score decreases due to the following reasons: the stuttering issue of audio data capture, echo issue due to multi-channel audio, and packet loss issue due to poor network connectivity. Valid values: [0,5].
Discontinuity	The description of the continuity score. Valid values: (4,5]: The continuity of the audio data is excellent. [3,4): The continuity of the audio data is good. [2,3): The continuity of the audio data is fair. [1,2): The continuity of the audio data is poor. [0,1): The continuity of the audio data is bad.
Coloration(0-5)	The intelligibility score of the audio data. The intelligibility score decreases due to the following reasons: large reverberation, low bitrate, encoding error, and ambiguous pronunciation. Valid values: [0,5].
Coloration	The description of the intelligibility score. Valid values: (4,5]: The intelligibility of the audio data is excellent. [3,4): The intelligibility of the audio data is good. [2,3): The intelligibility of the audio data is fair. [1,2): The intelligibility of the audio data is poor. [0,1): The intelligibility of the audio data is bad.
Noisiness(0-5)	The noise score of the audio data. Valid values: [0,5]. Note The noise in the audio data includes environmental noise, such as the noise from fans and streets, background noise from the device of poor quality, and residual noise caused by the incomplete echo processing of the noise pickup equipment. If noise is not eliminated well during audio data processing, the noise score increases.
Noisiness	The description of the noise score. Valid values: (4,5]: The noiselessness of the audio data is excellent. [3,4): The noiselessness of the audio data is good. [2,3): The noiselessness of the audio data is fair. [1,2): The noiselessness of the audio data is poor. [0,1): The noiselessness of the audio data is bad.
Loudness(0-5)	The loudness score of the human voice. If the human voice is clear and strong, the loudness score is high; if the human voice is hard to hear, the loudness score tends to be 0. Valid values: [0,5].
Loudness	The description of the loudness score. Valid values: (4,5]: The loudness of the human voice is excellent. [3,4): The loudness of the human voice is good. [2,3): The loudness of the human voice is fair. [1,2): The loudness of the human voice is poor. [0,1): The loudness of the human voice is bad.
Loudness(-90dB-0dB)	The average volume of the human voice. Valid values: [-90,0]. Unit: decibel. This parameter describes the volume of the human voice in decibels. In most cases, if the parameter value is less than -24, the human voice sounds low. Default value: -90.0. This value indicates that no explicit human voice is detected.
Double Talk Ratio(%)	The ratio of the duration of two-channel audio data to the duration of the audio data. This parameter helps determine the possible factors of the low continuity score. Valid values: [0,100]. Unit: percentage. Note Two-channel audio data indicates that sounds simultaneously exist in two channels, such as the scenario in which the device leaks residual echo. This scenario may result in a low continuity score. Therefore, this parameter helps determine the possible factors of a low continuity score.
Bad Mute Ratio(%)	The percentage of abnormal mute frames. All abnormal mute frames of the audio data that does not include two-channel audio data are counted, excluding mute frames caused by cutting two-channel audio data. Valid values: [0,100]. Unit: percentage.
Saturated Ratio(%)	The percentage of the sonic boom segment to the voice segment. This parameter helps determine whether the excessive volume results in a large-scale sonic boom. Valid values: [0,100]. Unit: percentage.
Worst MOS(0-5)	The lowest MOS during the scoring process. Valid values: [0,5].
Worst Discontinuity(0-5)	The lowest continuity score during the scoring process. Valid values: [0,5].
Worst Noisiness(0-5)	The highest noise score during the scoring process. Valid values: [0,5].
Worst Coloration(0-5)	The lowest intelligibility score during the scoring process. Valid values: [0,5].

FaceBeauty

Parameter description of Job

Parameter	Type	Required	Description
beauty_params	STRING	No	The parameters of the FaceBeauty operation. Example: "whiten=20,smooth=50,face_thin=50"

Callback format

JSON format

{
	"Code":"Success",
  "Details":[],
  "FunctionName":"FaceBeauty",
  "JobId":"39f8e0bc005e4f309379701645f4****",
  "Message":"success",
  "State":"Success",
  "Type":"IProduction"
 }

Parameters

Parameter	Type	Description
skin_beauty_enable	INT	Specifies whether to enable skin polishing. Valid values: [0,1]. 0: disables skin polishing. 1: enables skin polishing. Default value: 1.
shape_beauty_enable	INT	Specifies whether to enable face shaping. Valid values: [0,1]. 0: disables face shaping. 1: enables face shaping. Default value: 1.
whiten	INT	The degree of skin whitening. The greater the value, the whiter the skin looks. Valid values: [0,100]. Default value: 20
smooth	INT	The degree of skin smoothing. The greater the value, the more smooth the skin looks. Valid values: [0,100]. Default value: 20
detail	INT	The degree of skin granularity. The greater the value, the more fine-grained the skin is, and the more skin details exist. Valid values: [0,100]. Default value: 20
skin_model	INT	Specifies whether to enable the skin model feature. If you enable this feature, skin whitening is valid only for sections that are detected as skin. Valid values: [0,1]. 0: disables the skin model feature. 1: enables the skin model feature. Default value: 1.
cheek_thin	FLOAT	The degree of frontal bone thinning. Valid values: [0,100]. Default value: 0.
face_cut	FLOAT	The degree of cheekbone narrowing. Valid values: [0,100]. Default value: 0.
face_thin	FLOAT	The degree of face thinning. Valid values: [0,100]. Default value: 0.
face_length	FLOAT	The degree of face length adjustment (two-way). Valid values: [-100,100]. Default value: 0.
chin_length	FLOAT	The degree of chin length adjustment (two-way). Valid values: [-100,100]. Default value: 0.
chin_thin	FLOAT	The degree of chin thinning. Valid values: [0,100]. Default value: 0.
eye_size	FLOAT	The degree of eye widening. Valid values: [0,100]. Default value: 0.
eye_corner1	FLOAT	The degree of vertical canthus adjustment (two-way). Valid values: [-100,100]. Default value: 0.
eye_distance	FLOAT	The degree of eye distance adjustment (two-way). Valid values: [-100,100]. Default value: 0.
nose_thin	FLOAT	The degree of nose slimming (two-way). Valid values: [-100,100]. Default value: 0.
nose_wing	FLOAT	The degree of nasal alar slimming (two-way). Valid values: [-100,100]. Default value: 0.
nose_length	FLOAT	The degree of nose length adjustment (two-way). Valid values: [-100,100]. Default value: 0.
mouth_size	FLOAT	The degree of mouth size adjustment (two-way). Valid values: [-100,100]. Default value: 0.
mouth_position	FLOAT	The degree of philtrum length adjustment (two-way). Valid values: [-100,100]. Default value: 0.
lip_thickness	FLOAT	The degree of lip thickness adjustment (two-way). Valid values: [-100,100]. Default value: 0.
hair_line	FLOAT	The degree of hairline adjustment (two-way). Valid values: [-100,100]. Default value: 0.
smile	FLOAT	The degree of smiling adjustment. Valid values: [0,100]. Default value: 0.
detect_mode	FLOAT	The facial detection mode. Valid values: [0,1]. 0: video mode. 1: image mode. Default value: 1. Note In video mode, multiple frames are used to trace faces to ensure more stable results.
detect_level	FLOAT	The resolution of the face detector. Smaller faces may not be detected at low resolution. Valid values: [0,2]. 0: the lowest resolution at the fastest detection speed. 1: the medium resolution at medium detection speed. 2: the highest resolution at the slowest detection speed. Default value: 1.
threshold	FLOAT	The threshold of confidence of a facial detection. Valid values: [0,1]. Default value: 0.8.
detect_interval	FLOAT	The number of frames between two consecutive facial detections in video mode. Valid values: [1,65535]. Default value: 5.
max_face_num	FLOAT	The maximum number of faces that can be detected. Valid values: [0,32]. Default value: 32.
min_face	FLOAT	The smallest width of a face. Valid values: [10,1024]. Default value: 40.

Parameter description of Job

Parameter	Type	Description
Result	STRING	The detailed information about the job result. Example of success result information: {"Code":"Success","Data":"{\"result\":[{\"file\":\"result.mp4\"}]}","Message":"Successful."}.

SpeechDenoise

The input audio file must be in the WAV format with a sampling rate of 16,000 Hz or 48,000 Hz.

The format and sampling rate of the output audio file are the same as those of the input audio file.