All Products
Search
Document Center

Intelligent Media Management:Create a dataset

Last Updated:Nov 04, 2024

A dataset is a container for metadata in Intelligent Media Management (IMM). This topic describes how to create a dataset.

Usage notes

  • Searches across datasets are not supported. Therefore, we recommend that you store related files in the same dataset and unrelated files in different datasets.

  • The number of datasets in a project cannot exceed specified upper limit.

  • The number of files in a dataset cannot exceed the specified maximum number of files that the dataset can hold. The total number of files in all datasets of a project cannot exceed the specified maximum number of files that the project can hold.

  • The number of Object Storage Service (OSS) buckets mapped to a dataset cannot exceed the specified maximum number of OSS buckets that can be mapped to the dataset. The total number of OSS buckets mapped to all datasets of a project cannot exceed the specified maximum number of mapped OSS buckets in the project.

  • When you create a metadata index for a dataset in a project, the workflow template of the dataset takes precedence over the workflow template of the project. If the workflow template of the dataset is empty, the workflow template of the project is used. For more information about workflow templates, see Workflow templates and operators.

Prerequisites

  • An AccessKey pair is created and obtained. For more information, see Create an AccessKey pair.

  • OSS is activated, a bucket is created, and objects are uploaded to the bucket. For more information, see Upload objects.

  • IMM is activated. For more information, see Activate IMM.

  • A project is created in the IMM console. For more information, see Create a project.

    Note
    • You can call the CreateProject operation to create a project. For more information, see CreateProject.

    • You can call the ListProjects operation to query the existing projects in a specific region. For more information, see ListProjects.

Examples

Create a dataset

The following sample code provides an example on how to call the CreateDataset operation to create a dataset named test-dataset with the Dataset 1 description in the test-project project by using the Official:AllFunction workflow template.

Warning

Important: The Official:AllFunction workflow template contains all IMM capabilities. IMM adds operators to or remove operators from the workflow template based on feature availability. Take note that operator addition or removal may cause billing changes. If you need only a set of specific capabilities, select a purpose-specific workflow template. For more information about workflow templates, see Workflow templates and operators. For more information about operators in workflow templates and fees that operators incur, see Billable items. To avoid unexpected fees, we recommend that you use a bucket that contains a small amount of data for test purposes.

  • Sample request

    {
     "ProjectName": "test-project",
     "DatasetName": "test-dataset",
     "Description": "Dataset 1",
     "TemplateId": "Official:AllFunction"
    }
  • Sample response

    {
        "RequestId": "9AB4BD43-C4E5-06AA-A7AB-****",
        "Dataset": {
            "FileCount": 0,
            "BindCount": 0,
            "ProjectName": "test-project",
            "CreateTime": "2022-07-05T10:43:32.429344821+08:00",
            "DatasetMaxTotalFileSize": 90000000000000000,
            "DatasetMaxRelationCount": 100000000000,
            "DatasetMaxFileCount": 100000000,
            "DatasetName": "test-dataset",
            "DatasetMaxBindCount": 10,
            "UpdateTime": "2022-07-05T10:43:32.429344821+08:00",
            "DatasetMaxEntityCount": 10000000000,
            "TotalFileSize": 0,
            "TemplateId": "Official:AllFunction"
        }
    }
  • Complete sample code (for IMM SDK for Python V1.27.3)

    # -*- coding: utf-8 -*-
    
    import os
    from alibabacloud_imm20200930.client import Client as imm20200930Client
    from alibabacloud_tea_openapi import models as open_api_models
    from alibabacloud_imm20200930 import models as imm_20200930_models
    from alibabacloud_tea_util import models as util_models
    from alibabacloud_tea_util.client import Client as UtilClient
    
    
    class Sample:
        def __init__(self):
            pass
    
        @staticmethod
        def create_client(
            access_key_id: str,
            access_key_secret: str,
        ) -> imm20200930Client:
            """
            Use your AccessKey ID and AccessKey secret to initialize the client. 
            @param access_key_id:
            @param access_key_secret:
            @return: Client
            @throws Exception
            """
            config = open_api_models.Config(
                access_key_id=access_key_id,
                access_key_secret=access_key_secret
            )
            # Specify the endpoint. 
            config.endpoint = f'imm.cn-shenzhen.aliyuncs.com'
            return imm20200930Client(config)
    
        @staticmethod
        def main() -> None:
            # The AccessKey pair of an Alibaba Cloud account has permissions on all API operations. To prevent security risks, we recommend that you call API operations or perform routine O&M as a RAM user. 
            # We recommend that you do not include your AccessKey pair (AccessKey ID and AccessKey secret) in your project code for data security reasons. 
            # In this example, the AccessKey pair is read from the environment variables to implement identity verification for API access. For information about how to configure environment variables, visit https://help.aliyun.com/document_detail/2361894.html. 
            imm_access_key_id = os.getenv("AccessKeyId")
            imm_access_key_secret = os.getenv("AccessKeySecret")
            client = Sample.create_client(imm_access_key_id, imm_access_key_secret)
            create_dataset_request = imm_20200930_models.CreateDatasetRequest(
                project_name='test-project',
                dataset_name='test-dataset',
                description='Dataset 1',
                template_id='Official:AllFunction'
            )
            runtime = util_models.RuntimeOptions()
            try:
                # Print the response of the API operation. 
                response = client.create_dataset_with_options(create_dataset_request, runtime)
                print(response.body.to_map())
            except Exception as error:
                # Print the error message if necessary. 
                UtilClient.assert_as_string(error.message)
                print(error)
    
    
    if __name__ == '__main__':
        Sample.main()

Query dataset information

The following sample code provides an example on how to call the GetDataset operation to query information about the test-dataset dataset in the test-project project.

  • Sample request

    {
     "ProjectName": "test-project",
     "DatasetName": "test-dataset"
    }
  • Sample response

    {
        "RequestId": "9AB4BD43-C4E5-06AA-E4B2-****",
        "Dataset": {
            "FileCount": 0,
            "BindCount": 0,
            "ProjectName": "test-project",
            "CreateTime": "2022-07-05T10:43:32.429344821+08:00",
            "DatasetMaxTotalFileSize": 90000000000000000,
            "DatasetMaxRelationCount": 100000000000,
            "DatasetMaxFileCount": 100000000,
            "DatasetName": "test-dataset",
            "DatasetMaxBindCount": 10,
            "UpdateTime": "2022-07-05T10:43:32.429344821+08:00",
            "DatasetMaxEntityCount": 10000000000,
            "TotalFileSize": 0,
            "TemplateId": "Official:AllFunction"
        }
    }
  • Complete sample code (for IMM SDK for Python V1.27.3)

    # -*- coding: utf-8 -*-
    
    import os
    from alibabacloud_imm20200930.client import Client as imm20200930Client
    from alibabacloud_tea_openapi import models as open_api_models
    from alibabacloud_imm20200930 import models as imm_20200930_models
    from alibabacloud_tea_util import models as util_models
    from alibabacloud_tea_util.client import Client as UtilClient
    
    
    class Sample:
        def __init__(self):
            pass
    
        @staticmethod
        def create_client(
            access_key_id: str,
            access_key_secret: str,
        ) -> imm20200930Client:
            """
            Use your AccessKey ID and AccessKey secret to initialize the client. 
            @param access_key_id:
            @param access_key_secret:
            @return: Client
            @throws Exception
            """
            config = open_api_models.Config(
                access_key_id=access_key_id,
                access_key_secret=access_key_secret
            )
            # Specify the endpoint. 
            config.endpoint = f'imm.cn-shenzhen.aliyuncs.com'
            return imm20200930Client(config)
    
        @staticmethod
        def main() -> None:
            # The AccessKey pair of an Alibaba Cloud account has permissions on all API operations. To prevent security risks, we recommend that you call API operations or perform routine O&M as a RAM user. 
            # We recommend that you do not include your AccessKey pair (AccessKey ID and AccessKey secret) in your project code for data security reasons. 
            # In this example, the AccessKey pair is read from the environment variables to implement identity verification for API access. For information about how to configure environment variables, visit https://help.aliyun.com/document_detail/2361894.html. 
            imm_access_key_id = os.getenv("AccessKeyId")
            imm_access_key_secret = os.getenv("AccessKeySecret")
            client = Sample.create_client(imm_access_key_id, imm_access_key_secret)
            get_dataset_request = imm_20200930_models.GetDatasetRequest(
                # Specify the name of the IMM project. 
                project_name='test-project',
                # Specify the name of the dataset. 
                dataset_name='test-dataset',
                # Specify that the operation does not return statistics such as the number of files and file size. 
                with_statistics=False
            )
            runtime = util_models.RuntimeOptions()
            try:
                # Print the response of the API operation. 
                response = client.get_dataset_with_options(get_dataset_request, runtime)
                print(response.body.to_map())
            except Exception as error:
                # Print the error message if necessary. 
                UtilClient.assert_as_string(error.message)
                print(error)
    
    
    if __name__ == '__main__':
        Sample.main()