Intelligent Media Management (IMM) uses a semantic vector retrieval model to retrieve media data based on semantics. This topic describes how to use semantic retrieval.
Feature description
The traditional scalar retrieval method relies on metadata attributes such as the file name, creation time, and format, to retrieve information. Unlike scalar retrieval, semantic retrieval uses a vector retrieval method to retrieve information based on the content meaning or semantics, such as "a bird's-eye view of forests", "snow city", and "grassland of last summertime." You can use the semantic retrieval feature to retrieve data stored in Object Storage Service (OSS) and Photo and Drive Service.
Scenarios
Work file search
Semantic retrieval allows you to search for desired work files based on semantic content or keywords, such as "ERP system instructions", "device repair process", and "business operations analysis for 2024", to facilitate file search for better work efficiency.
Multimedia retrieval
Semantic retrieval allows you to integrate fast and efficient media data search and retrieval capabilities into your multimedia networking applications. For example, you can implement semantic retrieval in an image-intensive social network application to allow users to search for images by using semantic content such as "suburb outing in the last spring", "Chinese New Year reunion", and "my oceanic experiences." This provides greater convenience and fun.
Online storage
Many online storage services provide scalar-based file search based on attributes such as file names, creation time, and extensions. Semantic retrieval allows you to efficiently retrieve specific types of data, such as documents and images based on semantic content.
Surveillance video retrieval
Semantic retrieval allows surveillance videos to be searched for and retrieved based on semantic keywords, such as "outdoor snow surveillance video yesterday" and "orchard on sunny days."
Limits
Semantic retrieval supports only images and documents.
Semantic retrieval is available only in the China (Beijing) region.
Semantic retrieval supports the following image formats: JPG, PNG, BMP, GIF, WebP, TIFF, HEIC, and AVIF.
The feature supports an image that is up to 20 MB in size and whose width or height does not exceed 30,000 pixels and total number of pixels does not exceed 250 million. The total number of pixels of a dynamic image, such as a GIF image is calculated by using the following formula: Width × Height × Number of frames. The total number of pixels of a static image, such as a PNG image is calculated by using the following formula: Width × Height.
The character length limit of a document is 300,000. Trailing characters beyond the length limit are truncated.
Data indexing and analysis are performed asynchronously. When you call indexing API operations such as IndexFileMeta to create an index, you need to use a callback to check whether data analysis is complete. Data analysis requires a completion time ranging from seconds to minutes based on the data type, size, and analysis complexity. After data analysis is complete, the storage engine creates an index, which requires several seconds to complete. After the index is created, you can search for the data by using semantic retrieval.
Prerequisites
An index is created based on the metadata in your application scenario. For more information, see Create a metadata index.
The dataset uses the Official:CognitionImageManagement workflow template for semantic retrieval of images or the Official:DocumentManagement workflow template for semantic retrieval of documents.
If you no longer require semantic retrieval, we recommend that you delete the dataset. A dataset automatically and continuously extracts metadata from OSS, which generates API calling fees.
Billing
IMM billing: Semantic retrieval generates fees related to metadata management. For more information, see Billable items.
To use semantic retrieval, you must select the Official:CognitionImageManagement or Official:DocumentManagement workflow template. The Official:CognitionImageManagement workflow template is required for semantic retrieval of images, and the Official:DocumentManagement workflow template is required for semantic retrieval of documents. The two workflow templates contain many operators. Semantic retrieval operators are free of charge, whereas other operators generate fees. For more information, see Workflow templates and operators.
OSS billing: For more information, see Billing overview.
Usage
Call the SemanticQuery operation to perform semantic search. This topic provides examples to show how to perform semantic search in the test-dataset dataset of the test-project project.
Semantic retrieval of images
For example, a photo album contains many travel photos, some of which are panda photos taken in the Chengdu Research Base of Giant Panda Breeding in July 2020. To use semantic retrieval to search for the panda photos, you can create a dataset to store and index the metadata of photos in the photo album. Then, you can use phrases such as "pandas in Chengdu in July 2020" to retrieve the panda photos.
The following example searches the test-dataset dataset of the test-project project for panda photos that were taken in Chengdu in July 2020:
Sample request
{
"ProjectName": "test-project",
"DatasetName": "test-dataset",
"Query": "Pandas in Chengdu in July 2020"
}
Sample response
{
"RequestId": "645FB6D9-5EA0-02C9-B253-****",
"Files": [
{
"ProduceTime": "2020-07-19T17:11:11+08:00",
"ObjectACL": "default",
"ContentType": "image/jpeg",
"ProjectName": "test-project",
"Size": 22868,
"URI": "oss://test-bucket/test-object.jpg",
"Addresses": [
{
"Language": "zh-Hans",
"Township": "Sanhe Sub-district",
"AddressLine": "Chengdu Research Base of Giant Panda Breeding, Sanhe Sub-district, Xindu District, Chengdu City, Sichuan Province",
"Country": "China",
"City": "Chengdu",
"District": "Xindu District",
"Province": "Sichuan Province"
}
],
"ObjectType": "file",
"OwnerId": "****",
"FileModifiedTime": "2021-05-13T10:22:44+08:00",
"ImageWidth": 270,
"OSSStorageClass": "Standard",
"MediaType": "image",
"ObjectId": "****",
"CreateTime": "2022-07-06T07:10:18.497753661+08:00",
"Filename": "1.jpg",
"Labels": [
{
"CentricScore": 0.757,
"Language": "zh-Hans",
"LabelConfidence": 0.946,
"LabelName": "Panda",
"LabelLevel": 2,
"ParentLabelName": "Wildlife"
},
...
],
"Orientation": 1,
"EXIF": "...",
"ContentMd5": "HZwoCnxPZ/fvhz4oRJ****",
"ImageHeight": 270,
"ImageScore": {
"OverallQualityScore": 0.719
},
"ETag": "\"1D9C280A7C4F67F7EF873E28449D****\"",
"DatasetName": "test-dataset",
"FileHash": "\"1D9C280A7C4F67F7EF873E2****\"",
"UpdateTime": "2022-07-06T07:10:18.497753661+08:00",
"OSSCRC64": "5634447745650079669",
"OSSTaggingCount": 0,
"LatLong": "34.000000,119.000000",
"OSSObjectType": "Normal"
}
]
}
Sample code (IMM SDK for Python 1.27.3)
# -*- coding: utf-8 -*-
import os
from alibabacloud_imm20200930.client import Client as imm20200930Client
from alibabacloud_tea_openapi import models as open_api_models
from alibabacloud_imm20200930 import models as imm_20200930_models
from alibabacloud_tea_util import models as util_models
from alibabacloud_tea_util.client import Client as UtilClient
class Sample:
def __init__(self):
pass
@staticmethod
def create_client(
access_key_id: str,
access_key_secret: str,
) -> imm20200930Client:
"""
Use your AccessKey ID and AccessKey secret to initialize the client.
@param access_key_id:
@param access_key_secret:
@return: Client
@throws Exception
"""
config = open_api_models.Config(
access_key_id=access_key_id,
access_key_secret=access_key_secret
)
config.endpoint = f'imm.cn-beijing.aliyuncs.com'
return imm20200930Client(config)
@staticmethod
def main() -> None:
# The AccessKey pair of an Alibaba Cloud account has permissions on all API operations. To prevent security risks, we recommend that you call API operations or perform routine O&M as a RAM user.
# We recommend that you do not include your AccessKey pair (AccessKey ID and AccessKey secret) in your project code. Otherwise, the AccessKey pair may be leaked and the security of all the resources within your account may be compromised.
# In this example, the AccessKey pair is read from the environment variables to implement identity verification for API access. For information about how to configure environment variables, visit https://help.aliyun.com/document_detail/2361894.html.
imm_access_key_id = os.getenv("AccessKeyId")
imm_access_key_secret = os.getenv("AccessKeySecret")
client = Sample.create_client(imm_access_key_id, imm_access_key_secret)
semantic_query_request = imm_20200930_models.SemanticQueryRequest(
query='Pandas in Chengdu in July 2020',
project_name='test-project',
dataset_name='test-dataset',
max_results=100
)
runtime = util_models.RuntimeOptions()
try:
# Print the response of the API operation.
response = client.semantic_query_with_options(semantic_query_request, runtime)
print(response.body.to_map())
except Exception as error:
# Print the error message if necessary.
UtilClient.assert_as_string(error.message)
print(error)
if __name__ == '__main__':
Sample.main()
Semantic retrieval of documents
For example, you store various documents in online storage. To use semantic retrieval to retrieve a document about the IT service process from the online storage, you can create a dataset to index the documents and use keywords such as "IT service process" to retrieve the document.
The following example searches the test-dataset dataset of the test-project project for the document about the IT service process.
Sample request
{
"ProjectName": "test-project",
"DatasetName": "test-dataset",
"Query": "IT service process"
}
Sample response
{
"RequestId": "CD870E69-D2E8-031B-BD3E-****",
"Files": [
{
"ObjectACL": "default",
"ContentType": "application/vnd.openxmlformats-officedocument.wordprocessingml.document",
"ProjectName": "test-project",
"ObjectId": "2f66ba6e902e5ad42341a9e7365b19f6130d4a077e4f57150450e281d0b7afd9",
"Size": 28340,
"CreateTime": "2024-03-08T10:13:19.569053164+08:00",
"Filename": "3839a9a0-c630-420d-ae69-ea24792412fd.docx",
"URI": "oss://test-bucket/test-object.docx",
"ObjectType": "file",
"ContentMd5": "Y7SmYa831Hq1qryuRyl6mg==",
"OwnerId": "****",
"FileModifiedTime": "2024-01-10T16:18:31+08:00",
"ETag": "\"63B4A661AF37D47AB5AABCAE47297A9A\"",
"DatasetName": "test-dataset",
"FileHash": "63B4A661AF37D47AB5AABCAE47297A9A",
"UpdateTime": "2024-03-08T10:13:19.569053164+08:00",
"OSSStorageClass": "Standard",
"MediaType": "document",
"OSSCRC64": "6833019149643646551",
"OSSTaggingCount": 0,
"OSSObjectType": "Normal"
}
]
}
Sample code (IMM SDK for Python 1.27.3)
# -*- coding: utf-8 -*-
import os
from alibabacloud_imm20200930.client import Client as imm20200930Client
from alibabacloud_tea_openapi import models as open_api_models
from alibabacloud_imm20200930 import models as imm_20200930_models
from alibabacloud_tea_util import models as util_models
from alibabacloud_tea_util.client import Client as UtilClient
class Sample:
def __init__(self):
pass
@staticmethod
def create_client(
access_key_id: str,
access_key_secret: str,
) -> imm20200930Client:
"""
Use your AccessKey ID and AccessKey secret to initialize the client.
@param access_key_id:
@param access_key_secret:
@return: Client
@throws Exception
"""
config = open_api_models.Config(
access_key_id=access_key_id,
access_key_secret=access_key_secret
)
config.endpoint = f'imm.cn-beijing.aliyuncs.com'
return imm20200930Client(config)
@staticmethod
def main() -> None:
# The AccessKey pair of an Alibaba Cloud account has permissions on all API operations. To prevent security risks, we recommend that you call API operations or perform routine O&M as a RAM user.
# We recommend that you do not include your AccessKey pair (AccessKey ID and AccessKey secret) in your project code. Otherwise, the AccessKey pair may be leaked and the security of all the resources within your account may be compromised.
# In this example, the AccessKey pair is read from the environment variables to implement identity verification for API access. For information about how to configure environment variables, visit https://help.aliyun.com/document_detail/2361894.html.
imm_access_key_id = os.getenv("AccessKeyId")
imm_access_key_secret = os.getenv("AccessKeySecret")
client = Sample.create_client(imm_access_key_id, imm_access_key_secret)
semantic_query_request = imm_20200930_models.SemanticQueryRequest(
query='IT service process',
project_name='test-project',
dataset_name='test-dataset',
max_results=100
)
runtime = util_models.RuntimeOptions()
try:
# Print the response of the API operation.
response = client.semantic_query_with_options(semantic_query_request, runtime)
print(response.body.to_map())
except Exception as error:
# Print the error message if necessary.
UtilClient.assert_as_string(error.message)
print(error)
if __name__ == '__main__':
Sample.main()