Queries the information about a smart tagging job.
Debugging
Authorization information
The following table shows the authorization information corresponding to the API. The authorization information can be used in the Action
policy element to grant a RAM user or RAM role the permissions to call this API operation. Description:
- Operation: the value that you can use in the Action element to specify the operation on a resource.
- Access level: the access level of each operation. The levels are read, write, and list.
- Resource type: the type of the resource on which you can authorize the RAM user or the RAM role to perform the operation. Take note of the following items:
- The required resource types are displayed in bold characters.
- If the permissions cannot be granted at the resource level,
All Resources
is used in the Resource type column of the operation.
- Condition Key: the condition key that is defined by the cloud service.
- Associated operation: other operations that the RAM user or the RAM role must have permissions to perform to complete the operation. To complete the operation, the RAM user or the RAM role must have the permissions to perform the associated operations.
Operation | Access level | Resource type | Condition key | Associated operation |
---|---|---|---|---|
ice:QuerySmarttagJob | *All Resources * |
| none |
Request parameters
Parameter | Type | Required | Description | Example |
---|---|---|---|---|
JobId | string | Yes | The ID of the smart tagging job that you want to query. You can obtain the job ID from the response parameters of the SubmitSmarttagJob operation. | 88c6ca184c0e47098a5b665e2**** |
Params | string | No | The extra parameters that you want to query in the request. The value is a JSON string. Example: {"labelResultType":"auto"}. The value of labelResultType is of the STRING type. Valid values:
| {"labelResultType":"auto"} |
Response parameters
Callback parameters When the status of the smart tagging job changes, ApsaraVideo Media Processing (MPS) sends a message to the specified SMQ queue. For more information about how to specify an SMQ queue for receiving callbacks, see the UpdatePipeline topic. The callback message is a JSON string that contains the parameters described in the following table.
Parameter | Type | Description |
---|---|---|
Type | String | The type of the job. The fixed value is smarttag, which indicates a smart tagging job. |
JobId | String | The unique ID of the job. |
State | String | The current status of the job. The value is the same as that of the JobStatus response parameter of the QuerySmarttagJob operation. |
State | String | The current status of the job. The value is the same as that of the JobStatus response parameter of the QuerySmarttagJob operation. |
UserData | String | The UserData information passed for the SubmitSmarttagJob operation. |
UserData | String | The UserData information passed for the SubmitSmarttagJob operation. |
Parameters of different result types
Parameters of VideoLabel
Parameter | Type | Description |
---|---|---|
persons | JSONArray | The information of figures identified by the smart tagging job. |
persons.name | String | The name of the identified figure. |
persons.category | String | The type of the identified figure. Valid values: celebrity , politician , sensitive , and unknown . A figure is identified as unknown based on the custom figure library. In this case, the ID of the custom figure is returned. |
persons.ratio | double | The appearance rate of the figure. Valid values: 0 to 1. |
persons.occurrences | JSONArray | The details of the appearances of the figure. |
persons.occurrences.score | double | The score for the confidence level. |
persons.occurrences.from | double | The point in time when the figure appears. Unit: seconds. |
persons.occurrences.to | double | The point in time when the figure disappears. Unit: seconds. |
persons.occurrences.position | JSONObject | The face coordinates of the figure. |
persons.occurrences.position.leftTop | int[] | The x and y coordinates of the upper-left corner. |
persons.occurrences.position.rightBottom | int[] | The x and y coordinates of the lower-right corner. |
persons.occurrences.timestamp | double | The timestamp of the face coordinates. Unit: seconds. |
persons.occurrences.scene | String | The camera shot of the figure. Valid values: closeUp , medium-closeUp , medium , and medium-long . |
tags | JSONArray | The tags of the detected objects. For more information, see the following table. |
tags.mainTagName | String | The main tag. |
tags.subTagName | String | The subtag. |
tags.ratio | double | The appearance rate of the tag. Valid values: 0 to 1. |
tags.occurrences | JSONArray | The details of the appearances of the tag. |
tags.occurrences.score | double | The score for the confidence level. |
tags.occurrences.from | double | The point in time when the tag appears. Unit: seconds. |
tags.occurrences.to | double | The point in time when the tag disappears. Unit: seconds. |
classifications | JSONArray | The category of the video. |
classifications.score | double | The score for the confidence level. |
classifications.category1 | String | The level-1 category, such as daily life activity, animation, and automobile. |
classifications.category2 | String | The level-2 category, such as health and home under the level-1 category daily life activity. |
Examples of video tags
mainTagName | subTagName |
---|---|
Program | Examples: Dad Where Are We Going and Top Funny Comedian. |
Figure role | Examples: doctor, nurse, and teacher. |
Object | Examples: piano, cup, table, scrambled eggs with tomatoes, car, and cosmetics. |
TV channel logo | Examples: CCTV-1, CCTV-2, YOUKU, and Dragon TV. |
Action | Examples: dancing, kissing, hugging, meeting, singing, telephoning, horseback riding, and fighting. |
Location | Examples: Tian'anmen Square, the Statue of Liberty, Leshan Giant Buddha, China, and America. |
Scene | Examples: bedroom, subway station, terraced field, beach, and desert. |
Parameters of ImageLabel
Parameter | Type | Description |
---|---|---|
persons | JSONArray | The information of figures identified by the smart tagging job. |
persons.name | String | The name of the identified figure. |
persons.category | String | The type of the identified figure. Valid values: celebrity, politician, and sensitive. |
persons.score | double | The score for the confidence level of the identified figure. |
persons.position | JSONObject | The face coordinates of the figure. |
persons.position.leftTop | int[] | The x and y coordinates of the upper-left corner. |
persons.position.rightBottom | int[] | The x and y coordinates of the lower-right corner. |
persons.scene | String | The camera shot of the figure. Valid values: closeUp, medium-closeUp, medium, and medium-long. |
tags | JSONArray | The tags of the detected objects. For more information, see the following table. |
tags.mainTagName | String | The main tag. |
tags.subTagName | String | The subtag. |
tags.score | double | The score for the confidence level. |
Examples of image tags
mainTagName | subTagName |
---|---|
Figure role | Examples: doctor, nurse, and teacher. |
Location | Examples: Tian'anmen Square, the Statue of Liberty, Leshan Giant Buddha, China, and America. |
Action | Example: talking. |
TV channel logo | Examples: CCTV-1, CCTV-2, YOUKU, and Dragon TV. |
Action | Examples: dancing, kissing, hugging, meeting, singing, telephoning, horseback riding, and fighting. |
Object | Examples: piano, cup, table, scrambled eggs with tomatoes, car, and cosmetics. |
Scene | Examples: bedroom, subway station, terraced field, beach, and desert. |
Parameters of TextLabel (from ASR and OCR)
Parameter | Type | Description |
---|---|---|
tags | JSONArray | The text tags. For more information, see the following table. |
tags.name | String | The type of the tag. |
tags.value | String | The values of the tag. Multiple tag values are separated by commas (,). |
Examples of text tags
name | value |
---|---|
Location | Examples: Tian'anmen Square, the Statue of Liberty, Leshan Giant Buddha, China, and America. |
Organization | Examples: China Wildlife Conservation Association and China Media Group (CMG). |
Brand name | Examples: Nike and Li-Ning. |
Keyword | Example: backbone force. |
Parameters of CPVLabel
- cates: the tagging category, including level-1 category, level-2 category, and level-3 category.
- entities: the properties of the tagging category, including the knowledge graph information.
- hotwords: the hotwords to which you pay attention.
- freeTags: keywords.
Parameter | Type | Example | Description |
---|---|---|---|
type | String | hmi | The type of the result. Valid values: hmi and autp. A value of hmi indicates the results of tagging by human and machine. A value of autp indicates the results of machine tagging. |
cates | JSONArray | - | The information about the category of the tagging result. |
cates.labelLevel1 | String | Tourism | The level-1 tag. |
cates.labelLevel2 | String | Tourist landscape | The level-2 tag. |
cates.label | String | "" | The name of the tag. An empty value may be returned by the algorithm. |
cates.appearanceProbability | double | 0.96 | The appearance rate of the tag. |
cates.detailInfo | JSONArray | - | - |
cates.detailInfo.score | double | 0.9 | The score for the confidence level. |
cates.detailInfo.startTime | double | 0.021 | The point in time when the object appears in the video. |
cates.detailInfo.endTime | double | 29.021 | The point in time when the object disappears in the video. |
entities | JSONArray | - | - |
entities.labelLevel1 | String | Location | The level-1 tag. |
entities.labelLevel2 | String | Landmark | The level-2 tag. |
entities.label | String | Huangguoshu Waterfall | The name of the tag. |
entities.appearanceProbability | double | 0.067 | The appearance rate of the tag. |
entities.knowledgeInfo | String | {"name": "Huangguoshu Waterfall", "nameEn": "Huangguoshu Waterfall", "description": "One of the four largest waterfalls in Asia"} | The knowledge graph information. The fields contained in the knowledge graph information are provided in Appendix, such as fields related to the intellectual property (IP) that is featured in films and television shows, music, figures, landmarks, and objects. |
entities.detailInfo | JSONArray | - | - |
entities.detailInfo.score | double | 0.33292606472969055 | The score for the confidence level. |
entities.detailInfo.startTime | double | 6.021 | The point in time when the object appears in the video. |
entities.detailInfo.endTime | double | 8.021 | The point in time when the object disappears in the video. |
entities.detailInfo.trackData | JSONArray | - | The structured information about the tag of the object. |
entities.detailInfo.trackData.score | double | 0.32 | The score for the confidence level. |
entities.detailInfo.trackData.bbox | integer[] | 23, 43, 45, 67 | The coordinates of the object. |
entities.detailInfo.trackData.timestamp | double | 7.9 | The timestamp of the coordinates. Unit: seconds. |
hotwords | JSONArray | - | - |
hotwords.labelLevel1 | String | The information about the hotword. | The level-1 tag. |
hotwords.labelLevel1 | String | Hotword | The level-2 tag. |
hotwords.labelLevel2 | String | "" | The level-2 tag. |
hotwords.label | String | China Meteorological Administration | The content of the hotword. |
hotwords.appearanceProbability | double | 0.96 | The appearance rate of the hotword. |
hotwords.detailInfo | JSONArray | ||
hotwords.detailInfo.score | double | 1.0 | The score for the confidence level. |
hotwords.detailInfo.startTime | double | 0.021 | The point in time when the hotword appears in the video. |
hotwords.detailInfo.endTime | double | 29.021 | The point in time when the hotword disappears in the video. |
freeTags | JSONArray | ||
freeTags.labelLevel1 | String | Keyword | The level-1 tag. |
freeTags.labelLevel2 | String | "" | The level-2 tag. |
freeTags.label | String | Central Meteorological Observatory | The content of the keyword. |
freeTags.appearanceProbability | double | 0.96 | The appearance rate of the keyword. |
freeTags.detailInfo | JSONArray | ||
freeTags.detailInfo.score | double | 0.9 | The score for the confidence level. |
freeTags.detailInfo.startTime | double | 0.021 | The point in time when the keyword appears in the video. |
freeTags.detailInfo.endTime | double | 29.021 | The point in time when the keyword disappears in the video. |
Parameters of the ASR result
Parameter | Type | Description |
---|---|---|
details | JSONArray | The details of the result. |
details.from | double | The start timestamp of the recognition. Unit: seconds. |
details.to | double | The end timestamp of the recognition. Unit: seconds. |
details.content | String | The recognized text. |
Parameters of the OCR result
Parameter | Type | Description |
---|---|---|
details | JSONArray | The details of the result. |
details.timestamp | double | The timestamp information. Unit: seconds. |
details.info | JSONArray | The details of the recognized text at the specified timestamp. |
details.info.score | double | The score for the confidence level. |
details.info.position | JSONObject | The coordinates of the text. |
details.info.position.leftTop | int[] | The x and y coordinates of the upper-left corner. |
details.info.position.rightBottom | int[] | The x and y coordinates of the lower-right corner. |
details.info.content | String | The recognized text. |
Parameter of returned metadata
Note If you do not use tagging by human and machine and you specify the needMetaData parameter when you call the SubmitSmarttagJob operation, the original title of the video is returned in the result.
Parameter | Type | Description |
---|---|---|
title | String | The title of the video. |
Parameters of the extracted caption
Parameter | Type | Description |
---|---|---|
details | JSONArray | The details of the result. |
details.allResultUrl | String | The URL of the file that contains all captions. The URL is valid for half a year after the job is complete. |
details.chResultUrl | String | The URL of the file that contains only Chinese captions. The URL is valid for half a year after the job is complete. |
details.engResultUrl | String | The URL of the file that contains only English captions. The URL is valid for half a year after the job is complete. |
Note The content of the caption file is in the Serial number + Time range + Caption content
format. Each line in the file contains a sentence.
Parameters of the NLP-based result
Parameter | Type | Description |
---|---|---|
transcription | object | The speech-to-text result. |
autoChapters | object | The chapter overview. |
summarization | object | The large model summary. |
meetingAssistance | object | The intelligent minutes. |
translation | object | The text translation result. |
Parameters of transcription
Parameter | Type | Description |
---|---|---|
transcription | object | The speech-to-text result. |
transcription.paragraphs | list[] | A list of paragraphs that contain the speech-to-text result. |
transcription.paragraphs[i].paragraphId | string | The paragraph ID. |
transcription.paragraphs[i].speakerId | string | The speaker ID. |
transcription.paragraphs[i].words | list[] | The words contained in the paragraph. |
transcription.paragraphs[i].words[i].id | int | The word ID. You do not need to pay attention to it. |
transcription.paragraphs[i].words[i].sentenceId | int | The sentence ID. The words that have the same sentence ID can be assembled into a sentence. |
transcription.paragraphs[i].words[i].start | long | The start time of the word. The value is a timestamp representing the number of milliseconds that have elapsed since the audio starts. |
transcription.paragraphs[i].words[i].end | long | The end time of the word. The value is a timestamp representing the number of milliseconds that have elapsed since the audio starts. |
transcription.paragraphs[i].words[i].text | string | The word. |
Parameters of summarization
Parameter | Type | Description |
---|---|---|
summarization | object | The summary results. The results may be empty or of different summary types. |
summarization.paragraphSummary | string | The summary of the full text. |
summarization.conversationalSummary | list[] | A list of summary results for a conversation. |
summarization.conversationalSummary[i].speakerId | string | The speaker ID. |
summarization.conversationalSummary[i].speakerName | string | The name of the speaker. |
summarization.conversationalSummary[i].summary | string | The summary corresponding to the speaker. |
summarization.questionsAnsweringSummary | list[] | A list of summary results for an Q&A. |
summarization.questionsAnsweringSummary[i].question | string | The question. |
summarization.questionsAnsweringSummary[i].sentenceIdsOfQuestion | list[] | A list of IDs of the sentences that are generated based on the original speech corresponding to the question. |
summarization.questionsAnsweringSummary[i].answer | string | The answer to the question. |
summarization.questionsAnsweringSummary[i].sentenceIdsOfAnswer | list[] | A list of IDs of the sentences that are generated based on the original speech corresponding to the answer. |
summarization.mindMapSummary | list[object] | The mind map of the summary results. The mind map may contain the summary of each topic and the relationship between topics. |
summarization.mindMapSummary[i].title | string | The title of the topic. |
summarization.mindMapSummary[i].topic | list[object] | An array that contains each topic and its subtopics. |
summarization.mindMapSummary[i].topic[i].title | string | The title of the topic. |
summarization.mindMapSummary[i].topic[i].topic | list[object] | An array that contains the subtopics of the topic. The array can be empty. |
Parameters of translation
Parameter | Type | Description |
---|---|---|
translation | object | The translation result. |
translation.paragraphs | list[] | A list of paragraphs that contain the translation result, which corresponds to the ASR result. |
translation.paragraphs.paragraphId | string | The paragraph ID, which corresponds to the paragraph ID in the ASR result. |
translation.paragraphs.sentences | list[] | A list of translated text sentences. |
translation.paragraphs.sentences[i].sentenctId | long | The sentence ID. |
translation.paragraphs.sentences[i].start | long | The start time of the sentence. The value is a timestamp representing the number of milliseconds that have elapsed since the audio starts. |
translation.paragraphs.sentences[i].end | long | The end time of the sentence. The value is a timestamp representing the number of milliseconds that have elapsed since the audio starts. |
translation.paragraphs.sentences[i].text | string | The translated text, which corresponds to the ASR result. |
Parameters of autoChapters
Parameter | Type | Description |
---|---|---|
autoChapters | list[] | The chapter overview result that may contain the overview of zero, one, or multiple chapters. |
autoChapters[i].id | int | The serial number of the chapter. |
autoChapters[i].start | long | The start time of the chapter. The value is a timestamp representing the number of milliseconds that have elapsed since the audio starts. |
autoChapters[i].end | long | The end time of the chapter. The value is a timestamp representing the number of milliseconds that have elapsed since the audio starts. |
autoChapters[i].headline | string | The headline of the chapter. |
autoChapters[i].summary | string | The chapter overview. |
Parameters of meetingAssistance
Parameter | Type | Description |
---|---|---|
meetingAssistance | object | The result of the intelligent minutes, which may be empty or of different types. |
meetingAssistance.keywords | list[] | A list of extracted keywords. |
meetingAssistance.keySentences | list[] | A list of extracted key sentences. |
meetingAssistance.keySentences[i].id | long | The serial number of the key sentence. |
meetingAssistance.keySentences[i].sentenceId | long | The ID of the key sentence, which corresponds to the sentence ID in the original ASR result. |
meetingAssistance.keySentences[i].start | long | The start time of the key sentence. The value is a timestamp representing the number of milliseconds that have elapsed since the audio starts. |
meetingAssistance.keySentences[i].end | long | The end time of the key sentence. The value is a timestamp representing the number of milliseconds that have elapsed since the audio starts. |
meetingAssistance.keySentences[i].text | string | The key sentence information. |
meetingAssistance.actions | list[] | A list of to-do items. |
meetingAssistance.actions[i].id | long | The serial number of the to-do item. |
meetingAssistance.actions[i].sentenceId | long | The ID of the key sentence, which corresponds to the sentence ID in the original ASR result. |
meetingAssistance.actions[i].start | long | The start time of the key sentence. The value is a timestamp representing the number of milliseconds that have elapsed since the audio starts. |
meetingAssistance.actions[i].end | long | The end time of the key sentence. The value is a timestamp representing the number of milliseconds that have elapsed since the audio starts. |
meetingAssistance.actions[i].text | string | The content of the to-do item. |
meetingAssistance.classifications | object | The scenario type. Only three types of scenarios are supported. |
meetingAssistance.classifications.interview | float | The score for the confidence level of the interview scenario. |
meetingAssistance.classifications.lecture | float | The score for the confidence level of the presentation scenario. |
meetingAssistance.classifications.meeting | float | The score for the confidence level of the meeting scenario. |
Examples
Sample success responses
JSON
format
{
"JobStatus": "Success",
"RequestId": "******11-DB8D-4A9A-875B-275798******",
"UserData": "{\"userId\":\"123432412831\"}",
"Results": {
"Result": [
{
"Type": "Meta",
"Data": "{\"title\":\"example-title-****\"}\t\n"
}
]
}
}
Error codes
For a list of error codes, visit the Service error codes.
Change history
Change time | Summary of changes | Operation |
---|---|---|
2022-08-25 | Add Operation | View Change Details |