Data Science Workshop (DSW) allows you to mount datasets or Object Storage Service (OSS) paths. This way, you can access and process data in the cloud in a convenient manner. This facilitates data sharing and collaboration among multiple users, simplifies data management and maintenance, and ensures data consistency and timeliness. This topic describes how to mount datasets or OSS paths in DSW.
Background information
Platform for AI (PAI) provides a cloud disk with a specific quota for a DSW instance that is created by using the public resource group. You can use the disk to persistently store data. If you stop an DSW instance and do not start it again within 15 days, the disk is cleared. DSW instances that are created by using a dedicated resource group provide non-persistent local storage. If you want to persist DSW data, create an File Storage NAS (NAS), OSS, or Cloud Parallel File Storage (CPFS) dataset and mount the dataset to the specified path of DSW. This way, you can read data from and write data to the dataset in DSW.
Mount modes
DSW allows you to mount datasets or OSS paths in different mount modes. The following table provides the details.
Mount item | Supported mount mode | |
Non-OSS dataset | None. | |
OSS dataset |
| |
The following code provides the Jindo configurations of each mount mode. For more information about how to use JindoFuse, see User guide of JindoFuse.
Quick Read/write: ensures quick reads and writes. However, data inconsistency may occur during concurrent reads or writes. You can mount training data and models to the mount path of this mode. We recommend that you do not use the mount path of this mode as the working directory.
{ "fs.oss.download.thread.concurrency": "Twice the number of CPU cores", "fs.oss.upload.thread.concurrency": "Twice the number of CPU cores", "fs.jindo.args": "-oattr_timeout=3 -oentry_timeout=0 -onegative_timeout=0 -oauto_cache -ono_symlink" }
Incremental Read/Write: ensures data consistency during incremental writing. If original data is overwritten, data inconsistency may occur. The reading speed is slightly slow. You can use this mode to save the model weight files for training data.
{ "fs.oss.upload.thread.concurrency": "Twice the number of CPU cores", "fs.jindo.args": "-oattr_timeout=3 -oentry_timeout=0 -onegative_timeout=0 -oauto_cache -ono_symlink" }
Consistent Read/write: ensures data consistency during concurrent reads or writes and is suitable for scenarios that require high data consistency and do not require quick reads. You can use this mode to save the code of your projects.
{ "fs.jindo.args": "-oattr_timeout=0 -oentry_timeout=0 -onegative_timeout=0 -oauto_cache -ono_symlink" }
Read-only: allows only reads. You can use this mode to mount public datasets.
{ "fs.oss.download.thread.concurrency": "Twice the number of CPU cores", "fs.jindo.args": "-oro -oattr_timeout=7200 -oentry_timeout=7200 -onegative_timeout=7200 -okernel_cache -ono_symlink" }
Limits
You cannot mount multiple datasets to the same path.
We recommend that you do not frequently perform write operations on the path to which an OSS path is mounted.
Mount a custom dataset
Step 1: Create a dataset
In the PAI console, choose AI Asset Management > Datasets. On the Custom Dataset tab of the Dataset page, click Create Dataset. DSW allows you to mount only a path instead of a file. Therefore, you must set the Property parameter to Folder in the Create Dataset panel.
For more information, see Create and manage datasets.
Step 2: Mount the dataset
Choose Model Training > Data Science Workshop (DSW). On the Data Science Workshop (DSW) page, click Create Instance to create an instance, or modify the configurations of an existing instance. On the Create Instance page, set Dataset to the custom dataset that you created and configure Mount Path and Mount Mode based on your business requirements.
For more information about other parameters, see Create a DSW instance.
If you use a CPFS dataset, you must configure a virtual private cloud (VPC) for the instance. The VPC you select must be the same as the VPC where the CPFS dataset resides. Otherwise, the DSW instance may fail to be created.
If you use a NAS dataset, you must configure network settings and select a security group for the instance.
If you select a dedicated resource group, NAS provides better support for the Filesystem in Userspace (FUSE) interface than OSS. Therefore, the first dataset that you add must be of the NAS type and mounted to the specified path and the default DSW working directory /home/admin/workspace.
Mount a public dataset
Step 1: Create a dataset
Choose AI Asset Management > Datasets and click the Public Dataset tab. For more information, see the "Create a dataset by registering a public dataset" section in the Create and manage datasets topic.
Step 2: Mount the dataset
Choose Model Training > Data Science Workshop (DSW). On the Data Science Workshop (DSW) page, click Create Instance to create an instance, or modify the configurations of an existing instance. On the Create Instance page, set Dataset to the existing public dataset and configure Mount Path and Mount Mode based on your business requirements.
For more information about other parameters, see Create a DSW instance.
Mount an OSS path
Step 1: Create an OSS bucket
Activate OSS and create a bucket. For more information, see Get started with OSS and Create a bucket.
The region where the bucket resides must be the same as the region where PAI resides. You cannot change the region of a bucket after the bucket is created.
Step 2: Mount an OSS path
Choose Model Training > Data Science Workshop (DSW). On the Data Science Workshop (DSW) page, click Create Instance to create an instance, or modify the configurations of an existing instance. In the Mount Settings section, select the path of the created OSS bucket for the OSS parameter and configure Mount Path and Mount Mode based on your business requirements.
View mount configurations
In the instance list on the Data Science Workshop (DSW) page, click Open in the Actions column of the DSW instance that you want to manage.
In the top navigation bar of the Data Science Workshop page, click the Terminal tab. Follow the instructions to open the terminal.
On the Terminal page, run the following commands to check whether the NAS and OSS datasets are mounted:
# Query the mount path of a NAS dataset. mount | grep nas # Query the mount path of an OSS dataset. mount | grep oss
If the following output is returned, the datasets are mounted.
NAS datasets are mounted to the /mnt/data_nas, /mnt/workspace, and /home/admin/workspace paths. /mnt/data_nas indicates the mount path that you specified when you created the DSW instance. The other two paths are the default working directories of DSW provided for your first NAS dataset. As long as your NAS resources and server work as expected, your data and code persist.
The OSS dataset is mounted to the /mnt/data_oss path.