All Products
Search
Document Center

Object Storage Service:Use OSS Connector for AI/ML to access and store OSS data in PyTorch training jobs

Last Updated:Dec 05, 2024

Object Storage Service (OSS) Connector for AI/ML is a Python library that is used to efficiently access and store OSS data in PyTorch training jobs.

Benefits

Item

Do not use OSS Connector for AI/ML

Use OSS Connector for AI/ML

Performance

You must manually optimize performance, which may be inefficient.

OSS Connector for AI/ML automatically optimizes the performance of OSS data download and checkpoint storage.

Data loading method

You must download data in advance, which increases costs and management workloads.

OSS Connector for AI/ML supports stream load to reduce cost and management complexity.

Data access

You must read and write data by using adapters, which increases access complexity.

OSS Connector for AI/ML directly reads and writes data in OSS to simplify access.

Configuration difficulty

You must compile code, which makes configuration difficult.

OSS Connector for AI/ML provides simple configuration items to improve development efficiency.

How it works

The following figure shows how OSS Connector for AI/ML runs PyTorch training jobs by using data in OSS.

image

Feature description

The following table describes the main features of OSS Connector for AI/ML.

Item

Feature

Class

Method

Map-style dataset

Suitable for random access to facilitate quick access to specific data during training.

OssMapDataset

The OssMapDataset and OssIterableDataset classes provide the same methods to build a dataset.

  • from_prefix()

    Use the OSS_URI prefix to build a dataset. This method is suitable for scenarios in which the storage paths of OSS data have uniform rules.

  • from_objects()

    Use the OSS_URI list in OSS to build a dataset. This method is suitable for scenarios in which the storage paths of OSS data are clear but scattered.

  • from_manifest_file()

    Create a manifest file and use the manifest file to build a dataset. This method is suitable for scenarios in which the dataset that you want to create contains a large number of files, such as tens of millions, the dataset is frequently loaded, and data indexing is enabled for the bucket.

Iterable-style dataset

Suitable for sequential streaming reading and allows you to efficiently process a large number of continuous data streams.

OssIterableDataset

Checkpoint API operations

Loads checkpoints from OSS during model training and saves checkpoints to OSS after periodic model training. This way, workflow is simplified.

OssCheckpoint

  • OssCheckpoint()

    Initialize an OssCheckpoint object that is used to read and write checkpoints during model training.

  • reader()

    Read checkpoints from OSS.

  • writer()

    Write checkpoints to OSS.

Procedure

Use cases

  • If you want to quickly learn how to use OSS data to run a PyTorch training job and save the training results to OSS, we provide a demo that uses OSS Connector for AI/ML to train a handwritten digit recognition model. For more information, see Get started with OSS Connector for AI/ML.

  • To further improve the performance of OSS Connector for AI/ML, we recommend that you use the accelerated endpoint of an OSS accelerator instead of the OSS internal endpoint. For more information about the performance comparison between OSS Connector for AI/ML that uses an OSS internal endpoint and OSS Connector for AI/ML that uses the accelerated endpoint of an OSS accelerator, see Performance testing.

  • If you want to use OSS Connector for AI/ML in a containerized environment, you can use a Docker image that contains an OSS Connector for AI/ML environment. For more information about how to build a Docker image, see Build a Docker image that contains an OSS Connector for AI/ML environment.