All Products
Search
Document Center

Resource Orchestration Service:ALIYUN::PAI::Dataset

Last Updated:Oct 31, 2024

ALIYUN::PAI::Dataset is used to create a dataset.

Syntax

{
  "Type": "ALIYUN::PAI::Dataset",
  "Properties": {
    "Options": String,
    "Description": String,
    "Accessibility": String,
    "DatasetName": String,
    "SourceType": String,
    "SourceId": String,
    "DataSourceType": String,
    "WorkspaceId": String,
    "DataType": String,
    "Uri": String,
    "Property": String
  }
}

Properties

Property

Type

Required

Editable

Description

Constraint

Options

String

No

Yes

The extended fields.

When you use the dataset in Deep Learning Containers (DLC), you can use the mountPath field to specify the default mount path of the dataset. Example:

{ "mountPath": "/mnt/data/" }

Description

String

No

Yes

The description of the dataset.

The dataset is used for labeling scenarios.

Accessibility

String

No

Yes

The accessibility of the workspace.

Valid values:

  • PRIVATE (default): The workspace can be accessed only by the administrator and you.

  • PUBLIC: The workspace can be accessed by all users.

DatasetName

String

Yes

Yes

The name of the dataset.

The name must meet the following requirements:

  • It must start with a letter or digit.  

  • It can contain underscores (_) and hyphens (-).  

  • It must be 1 to 127 characters in length.  

SourceType

String

No

No

The type of the data source.

Valid values:

  • USER (default): The data source is provided by you.

  • ITAG: The data source is provided by iTAG.

  • PAI_PUBLIC_DATASETPAI: The data source is provided by a public dataset of Machine Learning Platform for AI (PAI).

SourceId

String

No

No

The ID of the data source.

Valid values:

  • Set the value of this property to a custom ID when SourceType is set to USER.

  • Set the value of this property to a job ID when SourceType is set to ITAG. The job refers to the labeling job that iTAG processes based on the dataset.  

  • This property is automatically left empty when SourceType is set to PAI_PUBLIC_DATASET. A value of PAI_PUBLIC_DATASET specifies that the dataset is created from a public dataset of PAI.  

DataSourceType

String

Yes

No

The storage service in which the data source is stored.

Valid values:

  • NAS: File Storage NAS (NAS)

  • OSS: Object Storage Service (OSS)

WorkspaceId

String

Yes

No

The ID of the workspace to which the dataset belongs.

None.

DataType

String

No

No

The type of the dataset.

Valid values:

  • COMMON (default): regular

  • PIC: picture

  • TEXT: text

  • Video: video

  • AUDIO: audio

Uri

String

Yes

No

The URI configuration.

Value formats:

  • Value format when DataSourceType is set to OSS: oss://bucket.endpoint/object

  • Value formats when DataSourceType is set to NAS:

    • Value format for a General-purpose NAS file system: nas://<nasfisid>.region/subpath/to/dir/

    • Value format for a Cloud Parallel File Storage (CPFS) 1.0 file system:

      nas://<cpfs-fsid>.region/subpath/to/dir/

    • Value format for a CPFS 2.0 file system:

      nas://<cpfs-fsid>.region/<protocolserviceid>/

    Note

    You can distinguish CPFS 1.0 and CPFS 2.0 file systems based on the format of the file system ID. The ID of a CPFS 1.0 file system is in the CPFS-<8-bit ASCII characters> format. The ID of a CPFS 2.0 file system is in the CPFS-<16-bit ASCII characters> format.

Property

String

Yes

No

The property of the dataset.

Valid values:

  • FILE: file

  • DIRECTORY: folder

Return values

Fn::GetAtt

  • Options: the extended field.

  • Description: the description of the dataset.

  • Accessibility: the accessibility of the workspace.

  • SourceId: the ID of the data source.

  • CreateTime: the time when the dataset was created.

  • SourceType: the type of the data source.

  • WorkspaceId: the ID of the workspace to which the dataset belongs.

  • Uri: the URI configuration.

  • GmtModifiedTime: the time when the dataset was updated.

  • DatasetId: the ID of the dataset.

  • OwnerId: the ID of the Alibaba Cloud account.

  • DatasetName: the name of the dataset.

  • UserId: the user ID.

  • DataSourceType: the storage service in which the data source is stored.

  • DataType: the type of the dataset.

  • Property: the property of the dataset.

Examples

YAML format

ROSTemplateFormatVersion: '2015-09-01'
Parameters:
  DataSourceType:
    AllowedValues:
    - OSS
    - NAS
    Description: 'The data source type. The following values are supported:

      - OSS: Alibaba Cloud Object Storage (OSS).

      - NAS: Alibaba cloud file storage (NAS).'
    Type: String
  DatasetName:
    Description: 'The name of the dataset. The naming rules are as follows:

      - Start with a lowercase letter, uppercase letter, number, or Chinese.

      - Can contain an underscore (_) or a dash (-).

      - 1~127 characters in length.'
    Type: String
  Property:
    AllowedValues:
    - FILE
    - DIRECTORY
    Description: 'The properties of the dataset. The following values are supported:

      - FILE: FILE.

      - DIRECTORY: folder.'
    Type: String
  Uri:
    Description: 'The Uri configuration sample is as follows:

      - The data source type is OSS:''oss:// bucket.endpoint/object''

      - The data source type is NAS:

      The general NAS format is: ''nas://.region/subpath/to/dir/'';

      CPFS1.0:''nas://.region/subpath/to/dir /'';

      CPFS2.0:''nas://.region//''.

      CPFS1.0 and CPFS2.0 are distinguished by the format of fsid: CPFS1.0 is cpfs-<8-bit ascii characters>;CPFS2.0 is cpfs-<16 ascii characters>.'
    Type: String
  WorkspaceId:
    Description: 'The ID of the workspace where the dataset is located. For details
      about how to obtain the workspace ID, see [ListWorkspaces](~~ 449124 ~~).

      If this parameter is not configured, the default workspace is used. If the default
      workspace does not exist, an error is reported.'
    Type: String
Resources:
  ExtensionResource:
    Properties:
      DataSourceType:
        Ref: DataSourceType
      DatasetName:
        Ref: DatasetName
      Property:
        Ref: Property
      Uri:
        Ref: Uri
      WorkspaceId:
        Ref: WorkspaceId
    Type: ALIYUN::PAI::Dataset
Outputs:
  Accessibility:
    Description: Workspace visibility.
    Value:
      Fn::GetAtt:
      - ExtensionResource
      - Accessibility
  CreateTime:
    Description: The creation time of the resource.
    Value:
      Fn::GetAtt:
      - ExtensionResource
      - CreateTime
  DataSourceType:
    Description: The data source type.
    Value:
      Fn::GetAtt:
      - ExtensionResource
      - DataSourceType
  DataType:
    Description: The dataset type. The default value is COMMON.
    Value:
      Fn::GetAtt:
      - ExtensionResource
      - DataType
  DatasetId:
    Description: The first ID of the resource.
    Value:
      Fn::GetAtt:
      - ExtensionResource
      - DatasetId
  DatasetName:
    Description: The name of the dataset.
    Value:
      Fn::GetAtt:
      - ExtensionResource
      - DatasetName
  Description:
    Description: Custom descriptions of datasets to distinguish between different
      datasets.
    Value:
      Fn::GetAtt:
      - ExtensionResource
      - Description
  GmtModifiedTime:
    Description: Update time.
    Value:
      Fn::GetAtt:
      - ExtensionResource
      - GmtModifiedTime
  Options:
    Description: The extended field, which is of the JsonString type.
    Value:
      Fn::GetAtt:
      - ExtensionResource
      - Options
  OwnerId:
    Description: The ID of the primary account.
    Value:
      Fn::GetAtt:
      - ExtensionResource
      - OwnerId
  Property:
    Description: The properties of the dataset.
    Value:
      Fn::GetAtt:
      - ExtensionResource
      - Property
  SourceId:
    Description: The data source ID.
    Value:
      Fn::GetAtt:
      - ExtensionResource
      - SourceId
  SourceType:
    Description: The data source type. The default value is USER.
    Value:
      Fn::GetAtt:
      - ExtensionResource
      - SourceType
  Uri:
    Description: The Uri configuration sample is as follows:.
    Value:
      Fn::GetAtt:
      - ExtensionResource
      - Uri
  UserId:
    Description: The ID of the user to which the dataset belongs.
    Value:
      Fn::GetAtt:
      - ExtensionResource
      - UserId
  WorkspaceId:
    Description: The ID of the workspace where the dataset is located. For details
      about how to obtain the workspace ID, see [ListWorkspaces](~~ 449124 ~~).
    Value:
      Fn::GetAtt:
      - ExtensionResource
      - WorkspaceId

JSON format

{
  "ROSTemplateFormatVersion": "2015-09-01",
  "Parameters": {
    "DataSourceType": {
      "AllowedValues": [
        "OSS",
        "NAS"
      ],
      "Description": "The data source type. The following values are supported:\n- OSS: Alibaba Cloud Object Storage (OSS).\n- NAS: Alibaba cloud file storage (NAS).",
      "Type": "String"
    },
    "DatasetName": {
      "Description": "The name of the dataset. The naming rules are as follows:\n- Start with a lowercase letter, uppercase letter, number, or Chinese.\n- Can contain an underscore (_) or a dash (-).\n- 1~127 characters in length.",
      "Type": "String"
    },
    "Property": {
      "AllowedValues": [
        "FILE",
        "DIRECTORY"
      ],
      "Description": "The properties of the dataset. The following values are supported:\n- FILE: FILE.\n- DIRECTORY: folder.",
      "Type": "String"
    },
    "Uri": {
      "Description": "The Uri configuration sample is as follows:\n- The data source type is OSS:'oss:// bucket.endpoint/object'\n- The data source type is NAS:\nThe general NAS format is: 'nas://.region/subpath/to/dir/';\nCPFS1.0:'nas://.region/subpath/to/dir /';\nCPFS2.0:'nas://.region//'.\nCPFS1.0 and CPFS2.0 are distinguished by the format of fsid: CPFS1.0 is cpfs-<8-bit ascii characters>;CPFS2.0 is cpfs-<16 ascii characters>.",
      "Type": "String"
    },
    "WorkspaceId": {
      "Description": "The ID of the workspace where the dataset is located. For details about how to obtain the workspace ID, see [ListWorkspaces](~~ 449124 ~~).\nIf this parameter is not configured, the default workspace is used. If the default workspace does not exist, an error is reported.",
      "Type": "String"
    }
  },
  "Resources": {
    "ExtensionResource": {
      "Properties": {
        "DataSourceType": {
          "Ref": "DataSourceType"
        },
        "DatasetName": {
          "Ref": "DatasetName"
        },
        "Property": {
          "Ref": "Property"
        },
        "Uri": {
          "Ref": "Uri"
        },
        "WorkspaceId": {
          "Ref": "WorkspaceId"
        }
      },
      "Type": "ALIYUN::PAI::Dataset"
    }
  },
  "Outputs": {
    "Accessibility": {
      "Description": "Workspace visibility.",
      "Value": {
        "Fn::GetAtt": [
          "ExtensionResource",
          "Accessibility"
        ]
      }
    },
    "CreateTime": {
      "Description": "The creation time of the resource.",
      "Value": {
        "Fn::GetAtt": [
          "ExtensionResource",
          "CreateTime"
        ]
      }
    },
    "DataSourceType": {
      "Description": "The data source type.",
      "Value": {
        "Fn::GetAtt": [
          "ExtensionResource",
          "DataSourceType"
        ]
      }
    },
    "DataType": {
      "Description": "The dataset type. The default value is COMMON.",
      "Value": {
        "Fn::GetAtt": [
          "ExtensionResource",
          "DataType"
        ]
      }
    },
    "DatasetId": {
      "Description": "The first ID of the resource.",
      "Value": {
        "Fn::GetAtt": [
          "ExtensionResource",
          "DatasetId"
        ]
      }
    },
    "DatasetName": {
      "Description": "The name of the dataset.",
      "Value": {
        "Fn::GetAtt": [
          "ExtensionResource",
          "DatasetName"
        ]
      }
    },
    "Description": {
      "Description": "Custom descriptions of datasets to distinguish between different datasets.",
      "Value": {
        "Fn::GetAtt": [
          "ExtensionResource",
          "Description"
        ]
      }
    },
    "GmtModifiedTime": {
      "Description": "Update time.",
      "Value": {
        "Fn::GetAtt": [
          "ExtensionResource",
          "GmtModifiedTime"
        ]
      }
    },
    "Options": {
      "Description": "The extended field, which is of the JsonString type.",
      "Value": {
        "Fn::GetAtt": [
          "ExtensionResource",
          "Options"
        ]
      }
    },
    "OwnerId": {
      "Description": "The ID of the primary account.",
      "Value": {
        "Fn::GetAtt": [
          "ExtensionResource",
          "OwnerId"
        ]
      }
    },
    "Property": {
      "Description": "The properties of the dataset.",
      "Value": {
        "Fn::GetAtt": [
          "ExtensionResource",
          "Property"
        ]
      }
    },
    "SourceId": {
      "Description": "The data source ID.",
      "Value": {
        "Fn::GetAtt": [
          "ExtensionResource",
          "SourceId"
        ]
      }
    },
    "SourceType": {
      "Description": "The data source type. The default value is USER.",
      "Value": {
        "Fn::GetAtt": [
          "ExtensionResource",
          "SourceType"
        ]
      }
    },
    "Uri": {
      "Description": "The Uri configuration sample is as follows:.",
      "Value": {
        "Fn::GetAtt": [
          "ExtensionResource",
          "Uri"
        ]
      }
    },
    "UserId": {
      "Description": "The ID of the user to which the dataset belongs.",
      "Value": {
        "Fn::GetAtt": [
          "ExtensionResource",
          "UserId"
        ]
      }
    },
    "WorkspaceId": {
      "Description": "The ID of the workspace where the dataset is located. For details about how to obtain the workspace ID, see [ListWorkspaces](~~ 449124 ~~).",
      "Value": {
        "Fn::GetAtt": [
          "ExtensionResource",
          "WorkspaceId"
        ]
      }
    }
  }
}