If you want to efficiently access and store data sets in Object Storage Service (OSS) for PyTorch training jobs, you can install OSS Connector for AI/ML.
Deployment environment
Operating system: 64-bit x86 Linux
glibc: 2.17 or later
Python: 3.8 to 3.12
PyTorch: 2.0 or later
To use the OSS checkpoint feature, the Linux kernel must support userfaultfd.
NoteIn this example, Ubuntu is used. You can run the
sudo grep CONFIG_USERFAULTFD /boot/config-$(uname -r)
command to check whether the Linux kernel supports userfaultfd. IfCONFIG_USERFAULTFD=y
is returned, the Linux kernel supports userfaultfd. IfCONFIG_USERFAULTFD=n
is returned, the Linux kernel does not support userfaultfd. In this case, you cannot use the OSS checkpoint feature.
Procedure
The following example describes how to install OSS Connector for AI/ML for Python 3.12.
Run the
pip3.12 install osstorchconnector
command to install OSS Connector for AI/ML in the container that is generated by using Linux or an image based on Linux.pip3.12 install osstorchconnector
Run the
pip3.12 show osstorchconnector
command to check whether the OSS Connector for AI/ML is installed.pip3.12 show osstorchconnector
If the version information of osstorchconnector is returned, OSS Connector for AI/ML is installed.
What to do next
To ensure that OSS Connector for AI/ML can communicate with OSS and correctly initialize the configuration items, you need to configure access credentials and complete the OSS connector settings. For more information, see Configure OSS Connector for AI/ML.