When submitting a training job in Deep Learning Containers (DLC), you can use Object Storage Service (OSS), File Storage NAS (NAS), Cloud Parallel File Storage (CPFS), or MaxCompute storage by code or mounting. This enables direct data read from and write to the storage during training. This topic describes how to configure OSS, NAS, CPFS, or MaxCompute storage for a DLC job.
Prerequisites
PAI-DLC is activated and a workspace is created. For more information, see Activate PAI and create a default workspace.
(Optional) For OSS:
OSS is activated and related permissions are granted to PAI. For more information, see Get started by using the OSS console and Grant permissions to the service-linked role.
An OSS bucket is created. For more information, see Get started by using the OSS console.
(Optional) For NAS: A general-purpose NAS file system is created. For more information, see Create a file system.
(Optional) For MaxCompute storage: MaxCompute is activated and a project is created. For more information, see Activate MaxCompute and DataWorks and Create a MaxCompute project.
Use OSS
Configure OSS by mounting
You can mount OSS dataset when creating a job. The following table describes supported mounting type. For more information, see Submit training jobs.
Mounting type | Description |
Data Set | Mount a custom or public dataset. Public datasets only support read-only mounting. Select a dataset of the OSS type and configure Mount Path. During the DLC job, the system can access OSS data based on this path. |
Direct mount | Mount a path in the OSS bucket. |
DLC uses JindoFuse to mount OSS. The default DLC configurations may have limitations (For more information, see JindoFuse) and may not be suitable for all scenarios. Take the following steps to adjust the parameters to meet your requirements. For more information, see JindoFuse.
Configure OSS without mounting
DLC jobs can read and write OSS data by using OSS Pytorch Connector or OSS SDK. You can configure code builds when creating a job. For code samples, see OSS Connector for AI/ML or OSS SDK.
Use NAS or CPFS
You can mount NAS or CPFS datasets when creating a job. For more information, see Use NAS.
Use MaxCompute storage
You can configure code builds to use MaxCompute when creating a job. For code samples, see Use MaxCompute.
FAQ
Why does the log shows killed even though no error occurs when you use PAIIO to read data from the table?
Due to limited resources and the absence of limits in PAIIO, MaxCompute data may expand significantly when loaded into memory. The operating system and other system components also consume some of the memory.