Object Storage Service (OSS) Connector for AI/ML is a Python library that is used to efficiently access and store OSS data in PyTorch training jobs.
Benefits
Item | Do not use OSS Connector for AI/ML | Use OSS Connector for AI/ML |
Performance | You must manually optimize performance, which may be inefficient. | OSS Connector for AI/ML automatically optimizes the performance of OSS data download and checkpoint storage. |
Data loading method | You must download data in advance, which increases costs and management workloads. | OSS Connector for AI/ML supports stream load to reduce cost and management complexity. |
Data access | You must read and write data by using adapters, which increases access complexity. | OSS Connector for AI/ML directly reads and writes data in OSS to simplify access. |
Configuration difficulty | You must compile code, which makes configuration difficult. | OSS Connector for AI/ML provides simple configuration items to improve development efficiency. |
How it works
The following figure shows how OSS Connector for AI/ML runs PyTorch training jobs by using data in OSS.
Feature description
The following table describes the main features of OSS Connector for AI/ML.
Item | Feature | Class | Method |
Map-style dataset | Suitable for random access to facilitate quick access to specific data during training. | The OssMapDataset and OssIterableDataset classes provide the same methods to build a dataset.
| |
Iterable-style dataset | Suitable for sequential streaming reading and allows you to efficiently process a large number of continuous data streams. | ||
Checkpoint API operations | Loads checkpoints from OSS during model training and saves checkpoints to OSS after periodic model training. This way, workflow is simplified. |
|
Procedure
Before you access and store data in OSS in a PyTorch training job, you must install and configure OSS Connector for AI/ML. For more information, see Install OSS Connector for AI/ML and Configure OSS Connector for AI/ML.
After you install and configure OSS Connector for AI/ML, you can perform the following operations in Pytorch training jobs:
Use OssMapDataset to build a map-style dataset suitable for random reading. For more information, see Use data in OSS to build a map dataset suitable for random reading.
Use OssIterableDataset to build an iterable-style dataset suitable for sequential streaming reading. For more information, see Use data in OSS to build an iterable dataset suitable for sequential streaming reading.
Use OssCheckpoint to store and access checkpoints. For more information, see Store and access checkpoints in OSS.
- Note
Data in map-style and iterable-style datasets and checkpoints is of the same type. For more information about the supported methods of the data type, see Data type in OSS Connector for AI/ML.
Use cases
If you want to quickly learn how to use OSS data to run a PyTorch training job and save the training results to OSS, we provide a demo that uses OSS Connector for AI/ML to train a handwritten digit recognition model. For more information, see Get started with OSS Connector for AI/ML.
To further improve the performance of OSS Connector for AI/ML, we recommend that you use the accelerated endpoint of an OSS accelerator instead of the OSS internal endpoint. For more information about the performance comparison between OSS Connector for AI/ML that uses an OSS internal endpoint and OSS Connector for AI/ML that uses the accelerated endpoint of an OSS accelerator, see Performance testing.
If you want to use OSS Connector for AI/ML in a containerized environment, you can use a Docker image that contains an OSS Connector for AI/ML environment. For more information about how to build a Docker image, see Build a Docker image that contains an OSS Connector for AI/ML environment.