By default, DSW instances in public and dedicated resource groups have limited data storage, and the data is cleared after a set period. To expand your instance's storage, persist data, or share it, you can mount a Dataset or a storage path directly to the instance.
For DSW instances in a public resource group, data is stored on a free Cloud Disk with limited space (100 GiB). After you delete the instance, or if it is stopped for more than 15 days, the system clears the data on the Cloud Disk.
For DSW instances in a dedicated resource group, data is stored on the instance's system disk. After the instance is stopped or deleted, the system clears this temporary storage.
Differences between mounting a dataset and mounting a storage path directly
If you need long-term storage and team collaboration, choose to mount a Dataset. If you only need storage for temporary tasks or to quickly expand storage capacity, mount a storage path directly.
Feature | Mount a dataset | Mount a storage path |
Supported cloud products | Object Storage Service (OSS), File Storage NAS, Cloud Parallel File Storage (CPFS) | |
Version management | Supports version management and data acceleration | Does not support version management |
Data sharing | Supports sharing across multiple instances | Available only to the current instance |
Operational complexity | Requires creating and configuring a dataset | Simple; requires only providing the path. |
Scenarios | Long-term storage, team collaboration, and high security requirements | Temporary tasks, rapid storage expansion |
Differences between mounting at startup and dynamic mounting
There are two ways to mount storage: mounting at startup and dynamic mounting.
Mount at startup: Configure this option when you create an instance or change its configuration. The instance must be restarted to apply the change.
Dynamic mounting: You mount storage using the PAI software development kit (SDK) in a running instance. This method does not require an instance restart.
Limitations
Unique path: The mount path for each Dataset must be unique.
Write limit: Avoid frequent write operations in an OSS mount directory. This can lead to performance degradation or failed operations.
Git limit: Git operations are not supported in OSS mount directories. Execute Git commands in a local directory or another non-mounted path.
Dynamic mounting limits
Read-only limit: Dynamic mounting is read-only. It is suitable for scenarios that require fast mounting or temporary read-only access.
Storage type restrictions: Dynamic mounting only supports OSS and NAS.
Resource restrictions: Dynamic mounting does not support Lingjun resources.
Mount at startup
To mount storage at startup, configure the Dataset Mounting or Storage Path Mounting parameters on the instance configuration page. You must restart the instance to apply the configuration.
Mount a dataset
Create a dataset
Log in to the PAI console. Go to the AI Asset Management > Dataset page and create a custom or public dataset. For more information, see Create and manage datasets.
Mount the dataset
On the configuration page that appears when you create a new DSW instance, find the Mount Dataset section. For an existing instance, click Change Settings to open the page. Click Custom Dataset, select the dataset that you created, and then enter a Mount Path.
Notes on mounting a custom dataset:
CPFS dataset: When you configure a CPFS Dataset, the Virtual Private Cloud (VPC) of the DSW instance must be the same as that of the CPFS file system. Otherwise, instance creation will fail.
NAS dataset: When you configure a NAS Dataset, set up the network and select a Security Group.
Using a dedicated resource group: When using a dedicated resource group, the first Dataset must be a NAS type. This Dataset is mounted to both the path you specify and the default DSW working directory at
/home/admin/workspace.
Mount a storage path directly
This section uses mounting an Object Storage Service (OSS) path as an example.
Create an OSS bucket
Activate OSS and create a bucket.
ImportantThe region of the bucket must be the same as the region of PAI. You cannot change the region of a bucket after it is created.
Mount the OSS path
On the DSW instance configuration page (opened when creating an instance or by clicking Change Settings for an existing one), find the Storage Path Mounting. Click OSS, select the OSS Bucket path you created, and enter a Mount Path. The Advanced Configurations is empty by default. Configure it as needed. For more information, see Advanced mount configuration.
Dynamic mounting
Dynamic mounting lets you mount a Dataset or storage path using the PAI SDK within a DSW instance, without needing to restart the instance.
Note: Dynamic mounting is read-only, supports only OSS and NAS mounts, and does not currently support Lingjun resources.
Preparations
Install the PAI Python SDK. Open the DSW instance Terminal and run the following command to install the PAI Python SDK. Python 3.8 or later is required.
python -m pip install pai>=0.4.11Configure the SDK access key for PAI.
Method 1: Configure the DSW instance with the default PAI role or a custom RAM role. Open the instance configuration page, and at the bottom, click Show More ato select the instance RAM role. For more information, see Configure an instance RAM role for a DSW instance.
Method 2: Configure it manually using the command-line tool provided by the PAI Python SDK. Run the following command in the Terminal to configure access parameters. For an example, see Initialization.
python -m pai.toolkit.config
Examples
Dynamic mounting lets you mount storage without reconfiguring and restarting your DSW instance.
Mount to the default path
The data is mounted to the default mount path inside the instance. The default path for official pre-built instance images is
/mnt/dynamic/.from pai.dsw import mount # Mount an OSS path mount_point = mount("oss://<YourBucketName>/Path/Data/Directory/") # Mount a dataset. The input parameter is the dataset ID. # mount_point = mount("d-m7rsmu350********")
Mount to a specified path
Dynamic mounting requires mounting data to a specific path (or a subdirectory) within the container. Get the dynamic mount path using the SDK API.
from pai.dsw import mount, default_dynamic_mount_path # Get the default mount path of the instance default_path = default_dynamic_mount_path() mount_point = mount("oss://<YourBucketName>/Path/Data/Directory" , mount_point=default_path + "tmp/output/model")
Dynamically mount NAS
from pai.dsw import mount, default_dynamic_mount_path # Get the default mount path of the instance default_path = default_dynamic_mount_path() # Mount NAS. The NAS endpoint and the instance must be in the same VPC. Replace <region> with the region ID, such as cn-hangzhou. mount("nas://06ba748***-xxx.<region>.nas.aliyuncs.com/", default_path+"mynas3/")View all mount configurations in the instance
from pai.dsw import list_dataset_configs print(list_dataset_configs())Unmount the mounted data
from pai.dsw import mount, unmount mount_point = mount("oss://<YourBucketName>/Path/Data/Directory/") # The input parameter is the mount path, which is the MountPath queried by list_dataset_configs. # After you run the unmount command, it takes a few seconds for the change to take effect. unmount(mount_point)
Advanced mount configuration
To adapt to different read/write scenarios, such as fast reads/writes, incremental writes, and read-only access, and to optimize read and write performance, you can set advanced parameters when configuring a mount.
View mount configurations
Open the DSW instance and, in the Terminal, enter the following commands to verify that the NAS and OSS Datasets are mounted.
# View all mounts
mount
# Query the NAS mount path
mount | grep nas
# Query the OSS mount path
mount | grep ossOutput similar to the following indicates a successful mount.
The NAS Dataset is mounted to the
/mnt/data_nas,/mnt/workspace, and/home/admin/workspacedirectories. Here,/mnt/data_nasis the mount path specified when creating the DSW instance, and the other two paths are the default working directories where the first NAS Dataset is mounted. As long as your NAS volume and service are running correctly, your data and code are stored persistently.The OSS Dataset is mounted to the
/mnt/data_ossdirectory in the DSW instance.
FAQ
Q: Why are my mounted OSS files not showing up in the JupyterLab file browser?
This happens because the JupyterLab file browser displays the default working directory (/home/admin/workspace), but your OSS path was likely mounted to a different location (e.g., /mnt/data).
Here are three ways to access your files:
Use the absolute path in code: Your files are already mounted successfully. In your code, you must use the full mount path to access them, for example,
open('/mnt/data/my_file.csv').Mount to a subdirectory of the workspace: To easily see the files in the UI, set the mount path to a subdirectory of the working directory when you configure the mount, such as
/mnt/workspace/my_oss_data. After the mount is complete, you can see your OSS files in themy_oss_datafolder in the file browser.Access via the Terminal: In the DSW Terminal, you can use the
cd /mnt/datacommand to enter the mount directory. Then, you can use commands such aslsto view and manage the files.
Q: Why do I get a "Transport endpoint is not connected" or "Input/output error" when accessing a mounted OSS path in DSW?
These errors indicate that the connection between your DSW instance and the OSS mount has been lost. This is often due to one of the following reasons:
Insufficient RAM Role Permissions: The RAM role configured for your DSW instance may lack the necessary permissions to access OSS. Ensure the role (e.g.,
AliyunPAIDLCAccessingOSSRole) is correctly assigned and has read/write permissions for the target bucket.Mount Service Crash (OOM): During intensive I/O operations (e.g., reading many small files), the underlying mount service (
ossfsorJindoFuse) can run out of memory and crash. You can mitigate this by adjusting memory limits or disabling the metadata cache in the Advanced Configuration of your mount settings. For more information, see JindoFuse.How to Restore the Connection:
For startup mounts: The simplest solution is to restart the DSW instance. The system will automatically re-establish the mount connection.
For dynamic mounts: You can execute a remount command using the PAI SDK in your notebook or terminal without restarting the instance.
Q: What storage can I mount in DSW, and is it possible to mount Alibaba Cloud Drive or MaxCompute tables?
You can mount storage from OSS, NAS, and CPFS by creating a Dataset or by mounting a storage path directly. However, some services cannot be mounted like a file system:
Alibaba Cloud Drive: Direct mounting is not supported. The recommended approach is to first upload the data you need to an OSS bucket and then mount that bucket in your DSW instance.
MaxCompute Tables: You cannot mount a MaxCompute table as a directory. To access data in MaxCompute, you must use the appropriate SDK, such as PyODPS, within your DSW code. For more information, see Use PyODPS to read and write MaxCompute tables.
Q: Will my code and data be lost if my DSW instance is stopped or deleted? How do I persist data and code after a DSW instance is stopped or deleted?
Yes, any data stored on the local system disk of a DSW instance is temporary and will be deleted.
For instances in a public resource group, data is cleared if the instance is stopped for more than 15 days.
For instances in a dedicated resource group, data is cleared as soon as the instance is stopped or deleted.
To ensure your work is not lost, you must use an externally mounted storage service.
Persistence Solution: Save all your important files (including code, datasets, and models) to a mounted OSS or NAS path. Data stored in your personal OSS or NAS is persistent and independent of the DSW instance's lifecycle.
Migration Solution: To move your work to a new DSW instance, simply mount the same OSS or NAS path that contains your persisted data. This is the most efficient way to migrate your environment.
References
For more information, see DSW FAQ