QuerySmarttagJob - Intelligent Media Services - Alibaba Cloud Documentation Center

Queries the information about a smart tagging job.

Debugging

You can run this interface directly in OpenAPI Explorer, saving you the trouble of calculating signatures. After running successfully, OpenAPI Explorer can automatically generate SDK code samples.

Debug

Authorization information

The following table shows the authorization information corresponding to the API. The authorization information can be used in the Action policy element to grant a RAM user or RAM role the permissions to call this API operation. Description:

Operation: the value that you can use in the Action element to specify the operation on a resource.
Access level: the access level of each operation. The levels are read, write, and list.
Resource type: the type of the resource on which you can authorize the RAM user or the RAM role to perform the operation. Take note of the following items:
- The required resource types are displayed in bold characters.
- If the permissions cannot be granted at the resource level, All Resources is used in the Resource type column of the operation.
Condition Key: the condition key that is defined by the cloud service.
Associated operation: other operations that the RAM user or the RAM role must have permissions to perform to complete the operation. To complete the operation, the RAM user or the RAM role must have the permissions to perform the associated operations.

Operation	Access level	Resource type	Condition key	Associated operation
ice:QuerySmarttagJob		All Resources ``	none	none

Request parameters

Parameter	Type	Required	Description	Example
JobId	string	Yes	The ID of the smart tagging job that you want to query. You can obtain the job ID from the response parameters of the SubmitSmarttagJob operation.	88c6ca184c0e47098a5b665e2****
Params	string	No	The extra parameters that you want to query in the request. The value is a JSON string. Example: {"labelResultType":"auto"}. The value of labelResultType is of the STRING type. Valid values: auto: machine tagging hmi: tagging by human and machine	{"labelResultType":"auto"}

Response parameters

Parameter	Type	Description	Example
	object
JobStatus	string	The status of the job. Valid values: Success: The job was successful. Fail: The job failed. Processing: The job is in progress. Submitted: The job is submitted and waiting to be processed.	Success
RequestId	string	The request ID.	****11-DB8D-4A9A-875B-275798****
UserData	string	The content of callback messages that are sent to Simple Message Queue (SMQ) when the information of the smart tagging job changes. For more information about the parameters contained in the callback message, see the "Callback parameters" section of this topic.	{"userId":"123432412831"}
Results	array<object>	The analysis results of the smart tagging job. The value is an array.
Result	object
Type	string	The type of the analysis result. The type of the analysis result based on Smart tagging V1.0. Valid values: TextLabel: the text tag. VideoLabel: the video tag. ASR: the original result of automatic speech recognition (ASR). By default, this type of result is not returned. OCR: the original result of optical character recognition (OCR). By default, this type of result is not returned. NLP: the natural language processing (NLP)-based result. By default, this type of result is not returned. The type of the analysis result based on Smart tagging V2.0. Valid values: CPVLabel Meta: the information about the video file, such as the title of the video. By default, this type of information is not returned. The type of the analysis result based on Smart tagging V2.0-custom. Valid values: CPVLabel Meta: the information about the video file, such as the title of the video. By default, this type of information is not returned.	Meta
Data	string	The details of the analysis result. The value is a JSON string. For more information about the parameters of different result types, see the "Parameters of different result types" section of this topic.	{"title":"example-title-****"}

Callback parameters When the status of the smart tagging job changes, ApsaraVideo Media Processing (MPS) sends a message to the specified SMQ queue. For more information about how to specify an SMQ queue for receiving callbacks, see the UpdatePipeline topic. The callback message is a JSON string that contains the parameters described in the following table.

Parameter	Type	Description
Type	String	The type of the job. The fixed value is smarttag, which indicates a smart tagging job.
JobId	String	The unique ID of the job.
State	String	The current status of the job. The value is the same as that of the JobStatus response parameter of the QuerySmarttagJob operation.
State	String	The current status of the job. The value is the same as that of the JobStatus response parameter of the QuerySmarttagJob operation.
UserData	String	The UserData information passed for the SubmitSmarttagJob operation.
UserData	String	The UserData information passed for the SubmitSmarttagJob operation.

Parameters of different result types

Parameters of VideoLabel

Parameter	Type	Description
persons	JSONArray	The information of figures identified by the smart tagging job.
persons.name	String	The name of the identified figure.
persons.category	String	The type of the identified figure. Valid values: `celebrity`, `politician`, `sensitive`, and `unknown`. A figure is identified as unknown based on the custom figure library. In this case, the ID of the custom figure is returned.
persons.ratio	double	The appearance rate of the figure. Valid values: 0 to 1.
persons.occurrences	JSONArray	The details of the appearances of the figure.
persons.occurrences.score	double	The score for the confidence level.
persons.occurrences.from	double	The point in time when the figure appears. Unit: seconds.
persons.occurrences.to	double	The point in time when the figure disappears. Unit: seconds.
persons.occurrences.position	JSONObject	The face coordinates of the figure.
persons.occurrences.position.leftTop	int[]	The x and y coordinates of the upper-left corner.
persons.occurrences.position.rightBottom	int[]	The x and y coordinates of the lower-right corner.
persons.occurrences.timestamp	double	The timestamp of the face coordinates. Unit: seconds.
persons.occurrences.scene	String	The camera shot of the figure. Valid values: `closeUp`, `medium-closeUp`, `medium`, and `medium-long`.
tags	JSONArray	The tags of the detected objects. For more information, see the following table.
tags.mainTagName	String	The main tag.
tags.subTagName	String	The subtag.
tags.ratio	double	The appearance rate of the tag. Valid values: 0 to 1.
tags.occurrences	JSONArray	The details of the appearances of the tag.
tags.occurrences.score	double	The score for the confidence level.
tags.occurrences.from	double	The point in time when the tag appears. Unit: seconds.
tags.occurrences.to	double	The point in time when the tag disappears. Unit: seconds.
classifications	JSONArray	The category of the video.
classifications.score	double	The score for the confidence level.
classifications.category1	String	The level-1 category, such as daily life activity, animation, and automobile.
classifications.category2	String	The level-2 category, such as health and home under the level-1 category daily life activity.

Examples of video tags

mainTagName	subTagName
Program	Examples: Dad Where Are We Going and Top Funny Comedian.
Figure role	Examples: doctor, nurse, and teacher.
Object	Examples: piano, cup, table, scrambled eggs with tomatoes, car, and cosmetics.
TV channel logo	Examples: CCTV-1, CCTV-2, YOUKU, and Dragon TV.
Action	Examples: dancing, kissing, hugging, meeting, singing, telephoning, horseback riding, and fighting.
Location	Examples: Tian'anmen Square, the Statue of Liberty, Leshan Giant Buddha, China, and America.
Scene	Examples: bedroom, subway station, terraced field, beach, and desert.

Parameters of ImageLabel

Parameter	Type	Description
persons	JSONArray	The information of figures identified by the smart tagging job.
persons.name	String	The name of the identified figure.
persons.category	String	The type of the identified figure. Valid values: celebrity, politician, and sensitive.
persons.score	double	The score for the confidence level of the identified figure.
persons.position	JSONObject	The face coordinates of the figure.
persons.position.leftTop	int[]	The x and y coordinates of the upper-left corner.
persons.position.rightBottom	int[]	The x and y coordinates of the lower-right corner.
persons.scene	String	The camera shot of the figure. Valid values: closeUp, medium-closeUp, medium, and medium-long.
tags	JSONArray	The tags of the detected objects. For more information, see the following table.
tags.mainTagName	String	The main tag.
tags.subTagName	String	The subtag.
tags.score	double	The score for the confidence level.

Examples of image tags

mainTagName	subTagName
Figure role	Examples: doctor, nurse, and teacher.
Location	Examples: Tian'anmen Square, the Statue of Liberty, Leshan Giant Buddha, China, and America.
Action	Example: talking.
TV channel logo	Examples: CCTV-1, CCTV-2, YOUKU, and Dragon TV.
Action	Examples: dancing, kissing, hugging, meeting, singing, telephoning, horseback riding, and fighting.
Object	Examples: piano, cup, table, scrambled eggs with tomatoes, car, and cosmetics.
Scene	Examples: bedroom, subway station, terraced field, beach, and desert.

Parameters of TextLabel (from ASR and OCR)

Parameter	Type	Description
tags	JSONArray	The text tags. For more information, see the following table.
tags.name	String	The type of the tag.
tags.value	String	The values of the tag. Multiple tag values are separated by commas (,).

Examples of text tags

name	value
Location	Examples: Tian'anmen Square, the Statue of Liberty, Leshan Giant Buddha, China, and America.
Organization	Examples: China Wildlife Conservation Association and China Media Group (CMG).
Brand name	Examples: Nike and Li-Ning.
Keyword	Example: backbone force.

Parameters of CPVLabel

cates: the tagging category, including level-1 category, level-2 category, and level-3 category.
entities: the properties of the tagging category, including the knowledge graph information.
hotwords: the hotwords to which you pay attention.
freeTags: keywords.

Parameter	Type	Example	Description
type	String	hmi	The type of the result. Valid values: hmi and autp. A value of hmi indicates the results of tagging by human and machine. A value of autp indicates the results of machine tagging.
cates	JSONArray	-	The information about the category of the tagging result.
cates.labelLevel1	String	Tourism	The level-1 tag.
cates.labelLevel2	String	Tourist landscape	The level-2 tag.
cates.label	String	""	The name of the tag. An empty value may be returned by the algorithm.
cates.appearanceProbability	double	0.96	The appearance rate of the tag.
cates.detailInfo	JSONArray	-	-
cates.detailInfo.score	double	0.9	The score for the confidence level.
cates.detailInfo.startTime	double	0.021	The point in time when the object appears in the video.
cates.detailInfo.endTime	double	29.021	The point in time when the object disappears in the video.
entities	JSONArray	-	-
entities.labelLevel1	String	Location	The level-1 tag.
entities.labelLevel2	String	Landmark	The level-2 tag.
entities.label	String	Huangguoshu Waterfall	The name of the tag.
entities.appearanceProbability	double	0.067	The appearance rate of the tag.
entities.knowledgeInfo	String	{"name": "Huangguoshu Waterfall", "nameEn": "Huangguoshu Waterfall", "description": "One of the four largest waterfalls in Asia"}	The knowledge graph information. The fields contained in the knowledge graph information are provided in Appendix, such as fields related to the intellectual property (IP) that is featured in films and television shows, music, figures, landmarks, and objects.
entities.detailInfo	JSONArray	-	-
entities.detailInfo.score	double	0.33292606472969055	The score for the confidence level.
entities.detailInfo.startTime	double	6.021	The point in time when the object appears in the video.
entities.detailInfo.endTime	double	8.021	The point in time when the object disappears in the video.
entities.detailInfo.trackData	JSONArray	-	The structured information about the tag of the object.
entities.detailInfo.trackData.score	double	0.32	The score for the confidence level.
entities.detailInfo.trackData.bbox	integer[]	23, 43, 45, 67	The coordinates of the object.
entities.detailInfo.trackData.timestamp	double	7.9	The timestamp of the coordinates. Unit: seconds.
hotwords	JSONArray	-	-
hotwords.labelLevel1	String	The information about the hotword.	The level-1 tag.
hotwords.labelLevel1	String	Hotword	The level-2 tag.
hotwords.labelLevel2	String	""	The level-2 tag.
hotwords.label	String	China Meteorological Administration	The content of the hotword.
hotwords.appearanceProbability	double	0.96	The appearance rate of the hotword.
hotwords.detailInfo	JSONArray
hotwords.detailInfo.score	double	1.0	The score for the confidence level.
hotwords.detailInfo.startTime	double	0.021	The point in time when the hotword appears in the video.
hotwords.detailInfo.endTime	double	29.021	The point in time when the hotword disappears in the video.
freeTags	JSONArray
freeTags.labelLevel1	String	Keyword	The level-1 tag.
freeTags.labelLevel2	String	""	The level-2 tag.
freeTags.label	String	Central Meteorological Observatory	The content of the keyword.
freeTags.appearanceProbability	double	0.96	The appearance rate of the keyword.
freeTags.detailInfo	JSONArray
freeTags.detailInfo.score	double	0.9	The score for the confidence level.
freeTags.detailInfo.startTime	double	0.021	The point in time when the keyword appears in the video.
freeTags.detailInfo.endTime	double	29.021	The point in time when the keyword disappears in the video.

Parameters of the ASR result

Parameter	Type	Description
details	JSONArray	The details of the result.
details.from	double	The start timestamp of the recognition. Unit: seconds.
details.to	double	The end timestamp of the recognition. Unit: seconds.
details.content	String	The recognized text.

Parameters of the OCR result

Parameter	Type	Description
details	JSONArray	The details of the result.
details.timestamp	double	The timestamp information. Unit: seconds.
details.info	JSONArray	The details of the recognized text at the specified timestamp.
details.info.score	double	The score for the confidence level.
details.info.position	JSONObject	The coordinates of the text.
details.info.position.leftTop	int[]	The x and y coordinates of the upper-left corner.
details.info.position.rightBottom	int[]	The x and y coordinates of the lower-right corner.
details.info.content	String	The recognized text.

Parameter of returned metadata

Note If you do not use tagging by human and machine and you specify the needMetaData parameter when you call the SubmitSmarttagJob operation, the original title of the video is returned in the result.

Parameter	Type	Description
title	String	The title of the video.

Parameters of the extracted caption

Parameter	Type	Description
details	JSONArray	The details of the result.
details.allResultUrl	String	The URL of the file that contains all captions. The URL is valid for half a year after the job is complete.
details.chResultUrl	String	The URL of the file that contains only Chinese captions. The URL is valid for half a year after the job is complete.
details.engResultUrl	String	The URL of the file that contains only English captions. The URL is valid for half a year after the job is complete.

Note The content of the caption file is in the Serial number + Time range + Caption content format. Each line in the file contains a sentence.

Parameters of the NLP-based result

Parameter	Type	Description
transcription	object	The speech-to-text result.
autoChapters	object	The chapter overview.
summarization	object	The large model summary.
meetingAssistance	object	The intelligent minutes.
translation	object	The text translation result.

Parameters of transcription

Parameter	Type	Description
transcription	object	The speech-to-text result.
transcription.paragraphs	list[]	A list of paragraphs that contain the speech-to-text result.
transcription.paragraphs[i].paragraphId	string	The paragraph ID.
transcription.paragraphs[i].speakerId	string	The speaker ID.
transcription.paragraphs[i].words	list[]	The words contained in the paragraph.
transcription.paragraphs[i].words[i].id	int	The word ID. You do not need to pay attention to it.
transcription.paragraphs[i].words[i].sentenceId	int	The sentence ID. The words that have the same sentence ID can be assembled into a sentence.
transcription.paragraphs[i].words[i].start	long	The start time of the word. The value is a timestamp representing the number of milliseconds that have elapsed since the audio starts.
transcription.paragraphs[i].words[i].end	long	The end time of the word. The value is a timestamp representing the number of milliseconds that have elapsed since the audio starts.
transcription.paragraphs[i].words[i].text	string	The word.

Parameters of summarization

Parameter	Type	Description
summarization	object	The summary results. The results may be empty or of different summary types.
summarization.paragraphSummary	string	The summary of the full text.
summarization.conversationalSummary	list[]	A list of summary results for a conversation.
summarization.conversationalSummary[i].speakerId	string	The speaker ID.
summarization.conversationalSummary[i].speakerName	string	The name of the speaker.
summarization.conversationalSummary[i].summary	string	The summary corresponding to the speaker.
summarization.questionsAnsweringSummary	list[]	A list of summary results for an Q&A.
summarization.questionsAnsweringSummary[i].question	string	The question.
summarization.questionsAnsweringSummary[i].sentenceIdsOfQuestion	list[]	A list of IDs of the sentences that are generated based on the original speech corresponding to the question.
summarization.questionsAnsweringSummary[i].answer	string	The answer to the question.
summarization.questionsAnsweringSummary[i].sentenceIdsOfAnswer	list[]	A list of IDs of the sentences that are generated based on the original speech corresponding to the answer.
summarization.mindMapSummary	list[object]	The mind map of the summary results. The mind map may contain the summary of each topic and the relationship between topics.
summarization.mindMapSummary[i].title	string	The title of the topic.
summarization.mindMapSummary[i].topic	list[object]	An array that contains each topic and its subtopics.
summarization.mindMapSummary[i].topic[i].title	string	The title of the topic.
summarization.mindMapSummary[i].topic[i].topic	list[object]	An array that contains the subtopics of the topic. The array can be empty.

Parameters of translation

Parameter	Type	Description
translation	object	The translation result.
translation.paragraphs	list[]	A list of paragraphs that contain the translation result, which corresponds to the ASR result.
translation.paragraphs.paragraphId	string	The paragraph ID, which corresponds to the paragraph ID in the ASR result.
translation.paragraphs.sentences	list[]	A list of translated text sentences.
translation.paragraphs.sentences[i].sentenctId	long	The sentence ID.
translation.paragraphs.sentences[i].start	long	The start time of the sentence. The value is a timestamp representing the number of milliseconds that have elapsed since the audio starts.
translation.paragraphs.sentences[i].end	long	The end time of the sentence. The value is a timestamp representing the number of milliseconds that have elapsed since the audio starts.
translation.paragraphs.sentences[i].text	string	The translated text, which corresponds to the ASR result.

Parameters of autoChapters

Parameter	Type	Description
autoChapters	list[]	The chapter overview result that may contain the overview of zero, one, or multiple chapters.
autoChapters[i].id	int	The serial number of the chapter.
autoChapters[i].start	long	The start time of the chapter. The value is a timestamp representing the number of milliseconds that have elapsed since the audio starts.
autoChapters[i].end	long	The end time of the chapter. The value is a timestamp representing the number of milliseconds that have elapsed since the audio starts.
autoChapters[i].headline	string	The headline of the chapter.
autoChapters[i].summary	string	The chapter overview.

Parameters of meetingAssistance

Parameter	Type	Description
meetingAssistance	object	The result of the intelligent minutes, which may be empty or of different types.
meetingAssistance.keywords	list[]	A list of extracted keywords.
meetingAssistance.keySentences	list[]	A list of extracted key sentences.
meetingAssistance.keySentences[i].id	long	The serial number of the key sentence.
meetingAssistance.keySentences[i].sentenceId	long	The ID of the key sentence, which corresponds to the sentence ID in the original ASR result.
meetingAssistance.keySentences[i].start	long	The start time of the key sentence. The value is a timestamp representing the number of milliseconds that have elapsed since the audio starts.
meetingAssistance.keySentences[i].end	long	The end time of the key sentence. The value is a timestamp representing the number of milliseconds that have elapsed since the audio starts.
meetingAssistance.keySentences[i].text	string	The key sentence information.
meetingAssistance.actions	list[]	A list of to-do items.
meetingAssistance.actions[i].id	long	The serial number of the to-do item.
meetingAssistance.actions[i].sentenceId	long	The ID of the key sentence, which corresponds to the sentence ID in the original ASR result.
meetingAssistance.actions[i].start	long	The start time of the key sentence. The value is a timestamp representing the number of milliseconds that have elapsed since the audio starts.
meetingAssistance.actions[i].end	long	The end time of the key sentence. The value is a timestamp representing the number of milliseconds that have elapsed since the audio starts.
meetingAssistance.actions[i].text	string	The content of the to-do item.
meetingAssistance.classifications	object	The scenario type. Only three types of scenarios are supported.
meetingAssistance.classifications.interview	float	The score for the confidence level of the interview scenario.
meetingAssistance.classifications.lecture	float	The score for the confidence level of the presentation scenario.
meetingAssistance.classifications.meeting	float	The score for the confidence level of the meeting scenario.

Examples

Sample success responses

JSONformat

{
  "JobStatus": "Success",
  "RequestId": "******11-DB8D-4A9A-875B-275798******",
  "UserData": "{\"userId\":\"123432412831\"}",
  "Results": {
    "Result": [
      {
        "Type": "Meta",
        "Data": "{\"title\":\"example-title-****\"}\t\n"
      }
    ]
  }
}

Error codes

For a list of error codes, visit the Service error codes.

Change history

Change time	Summary of changes	Operation
2022-08-25	Add Operation	View Change Details