Data Science Workshop (DSW) provides a cloud-based integrated development environment (IDE) for AI development. If you are familiar with tools such as Jupyter Notebook or Visual Studio Code, you can quickly start developing models. This topic describes how to create a DSW instance and provides answers to frequently asked questions (FAQs) about starting and deleting instances.
Quickly create a basic DSW instance
Log on to the PAI console, select a Region, and in the navigation pane on the left, click Workspaces. Select and go to the target workspace.
In the navigation pane on the left, click Interactive Modeling (DSW) > Create Instance. Configure the following key parameters and use the default settings for the others. For a complete list of console parameters, see Complete list of console parameters.
Parameter | Description |
Instance Name | Example: dsw_test. |
Resource Type | Select Public Resources. This resource type uses the pay-as-you-go billing method. |
Instance Type | Select an instance type, such as ecs.gn7i-c8g1.2xlarge (1 × A10 GPU, 8 vCPUs, 30 GiB memory). If this instance type is out of stock, try selecting another one from the list. |
Image config | Select Alibaba Cloud Image, then search for and select modelscope:1.31.0-pytorch2.8.0-gpu-py311-cu124-ubuntu22.04 (Python 3.11, CUDA 12.4). We recommend using ModelScope images because they offer broad compatibility and a comprehensive set of third-party libraries. |
Click OK to create the instance. The instance is created when its status changes to Running.
If the instance fails to start, see DSW instance startup.
On the DSW instance list page, find the instance and click Open in the Actions column to go to the DSW instance and start developing your models.
For more information about the DSW instance interface and how to stop, delete, or modify a DSW instance, see Access and manage DSW instances from the console.
Warning After you create a DSW instance that uses public resources, the instance starts to be billed based on its runtime as soon as it enters the Running state, even if you do not open the WebIDE or run any code. To avoid unnecessary charges, stop or delete the instance promptly when you no longer need it.
The data of a DSW instance created from public resources is stored on a free cloud disk by default. If the instance remains stopped for more than 15 days, the content on the cloud disk is deleted and cannot be recovered. Promptly back up important data, or mount a cloud storage service and transfer your data to it.
Configurations for typical scenarios
A basic DSW instance configuration may not meet all your AI development needs. The following table summarizes configurations for typical scenarios.
Scenario | Needs or pain points | Key configurations | Related documentation |
Persistently store code and data | The system disk of a DSW instance provides only temporary storage. Data is deleted if the instance is deleted or remains stopped for an extended period. You need to save important files for long-term use or share data among multiple instances. | Use Dataset Mounting or Mount storage to mount cloud storage, such as Object Storage Service (OSS), to a specified directory on an instance. | Mount a dataset, OSS bucket, NAS file system, or CPFS file system |
Increase public network download speed | DSW instances use a shared gateway by default. Because of bandwidth limits, the download speed for large files may be insufficient. | In the network information section, configure a virtual private cloud (VPC) and use a Private Gateway. You also need to create a NAT Gateway and an Elastic IP Address (EIP) for the VPC. | Use a dedicated gateway to increase the public network access speed |
Develop remotely over SSH | You are accustomed to using local tools such as VSCode or PyCharm for development and debugging and do not want to be limited to a web IDE. | In the access configuration, select Enable SSH, enter the SSH Public Key, and select Access over Internet. Associate an existing NAT Gateway and Elastic IP Address (EIP). | Remote connection: Connect directly using SSH |
Access web services within the instance | You need to publish a web application that runs inside the instance to the Internet so it can be accessed or shared using a URL. | In the access configuration, add a Custom Services, configure the service port, and enable public network access. You must also add an inbound rule to the security group to allow traffic on that port. | Access services in an instance over the Internet |
Complete list of console parameters
Basic information
Parameter | Description |
Instance Name | Configure the instance name based on the on-screen prompts. |
Tag | Add tags to the instance as needed. Tags help you search for, locate, manage, and allocate costs for resources from multiple dimensions. |
Resource information
Parameter | Description |
Resource Type | Public Resources: This resource type uses the pay-as-you-go billing method. You cannot change the billing method to subscription.
Note GPU card limit: When you use public resources, each Alibaba Cloud account is limited to two GPU cards in each region. If you use more resources than the quota, an error may occur. To increase the quota, submit a ticket. Instance Type: You can select GPU, CPU, or free trial resources. For more information about instance types, see Instance families. Bidding Purchase: You can use a spot instance to reduce running costs. If the No preemptible instances in stock message appears, try a different instance type. This parameter is supported only in the China (Hangzhou), China (Shanghai), China (Beijing), China (Ulanqab), China (Shenzhen), China (Guangzhou), Japan (Tokyo), and Singapore regions. Driver Settings: For GPU-accelerated instances that use public resources, you can set the driver version. The drop-down list displays the major driver versions supported by the GPU type.
Resource Quota: This resource type uses the subscription billing method. Resource Quota: You can select general computing resources or Lingjun resources. If no resources are available, click Associate Resource Quota to configure one. Instance Type: Set the GPU, CPU, and memory as needed. Priority: The priority level ranges from 1 to 9. A larger value indicates a higher priority. CPU Affinity: Binds the processes in a container or pod to specific CPU cores for execution. This reduces CPU cache misses and context switches, which improves CPU utilization and application performance. This feature is suitable for performance-sensitive and real-time scenarios. This parameter is supported only in the China (Beijing) and China (Shenzhen) regions. Driver Settings: For Lingjun resource quotas, you can set the driver version for the instance. The drop-down list displays the major driver versions that are supported by the GPU type.
|
Environment information
Parameter | Description |
Image Configuration | The following image types are supported: Alibaba Cloud Image: PAI provides images for popular open source frameworks and Python versions. For example, the pytorch:2.4.1-gpu-py312-cu124-ubuntu22.04 image is designed for PyTorch 2.4.1, runs on a GPU-accelerated instance, and includes Python 3.12 with CUDA 12.4. To find an image with a specific version dependency, you can search for a keyword in the search box. For example, searching for cu124 returns images that use CUDA 12.4. Custom Image: You can use a custom image that has been added to PAI. The image repository must be set to allow public pulls, or the image must be stored in Container Registry (ACR). For more information, see Custom images. Image Address: You can configure the URL of a custom or official image that is accessible on the public network. To increase the image pull speed, see Image acceleration. If it is a private image URL, click Enter Username and Password and configure the image repository username and password.
|
System Disk | Used to store files during development. When you set Resource Type to Public Resources, or when you set Resource Quota to subscription general computing resources (CPU cores ≥ 2 and memory ≥ 4 GB, or equipped with a GPU), each instance receives a 100 GiB free disk as a system disk. You can expand the disk. The expansion price is subject to the console interface.
Warning If you only use the free quota for the cloud disk, its contents are deleted if the instance is stopped for more than 15 days. After a scale-out, the entire disk, including its free and paid portions, is no longer released if the instance is stopped for 15 days. However, billing for the scaled-out portion continues until the instance is deleted. You cannot scale in a disk after you scale it out. Expand the disk as needed. When an instance is deleted, the cloud disk is also deleted. Ensure you back up all necessary data before deletion.
To use persistent storage, configure Dataset Mounting or Mount storage. |
Dataset Mounting | Stores datasets for reading or persists files created during development. The following two dataset types are supported: Custom Dataset: You can create a custom dataset to store your training data files. You can set it to read-only and select a specific version. Public Dataset: PAI provides pre-built public datasets, which can only be mounted in read-only mode.
Mount Path: The path where the dataset is mounted in the DSW instance, for example, /mnt/data. Access the dataset from your code using this path.
Note The mount paths for multiple datasets cannot be the same. If you configure a CPFS type dataset, you must configure the network settings and ensure the selected VPC is the same as the one used by CPFS. Otherwise, the DSW instance may fail to be created. When the resource group is a dedicated resource group, the first dataset must be a NAS type, and it will be mounted to both your specified path and the default DSW working directory /mnt/workspace/.
For more information about mounting, see Mount a dataset, OSS bucket, NAS file system, or CPFS file system. |
Mount storage | You can also use storage mounting to access datasets or persist files. For more information about mounting, see Mount a dataset, OSS bucket, NAS file system, or CPFS file system. |
Working Directory | The startup directory for JupyterLab and the Web IDE. The default is /mnt/workspace. |
Additional configurations
Parameter | Description |
Custom Startup Script | Customizes the environment or performs initialization tasks during instance startup. The custom script runs after the image and resources are ready but before development applications such as JupyterLab and Code Server start.
Note Timeout is 3 minutes: The custom script increases instance startup time and has a timeout of 3 minutes. Do not perform long-running tasks such as image downloads in the script. View script run logs: After the instance starts, find the logs generated by the custom script in the /var/log/user-command/ directory.
|
Environment Variable | Used for the main container startup, system processes, and user processes. Add custom environment variables or override system defaults as needed. Note: Do not modify the following environment variables: # Modifications do not take effect
USER_NAME # Overwritten by the logic in the service
# Do not modify the following system variables. Modifications may affect normal use.
JUPYTER_NAME: Constructed from instance information by default. Can be used to modify the JupyterLab URL access path.
JUPYTER_COMMAND: Jupyter startup command. Default is set to lab to start JupyterLab.
JUPYTER_SERVER_ADDR: JupyterLab service listening address. Default is 0.0.0.0.
JUPYTER_SERVER_PORT: JupyterLab service listening port. Default is 8088.
JUPYTER_SERVER_AUTH: JupyterLab access password. Default is empty.
JUPYTER_SERVER_ROOT: Jupyter working directory. Priority is lower than WORKSPACE_DIR.
CODE_SERVER_ADDR: code-server service listening address. Default is 0.0.0.0.
CODE_SERVER_PORT: code-server service listening port. Default is 8082.
CODE_SERVER_AUTH: code-server access password. Default is empty.
WORKSPACE_DIR: The system sets this environment variable based on the working directory parameter set when the instance is created. It can change the startup directory of Jupyter and code-server. An error may occur if the path does not exist.
|
Advanced Configurations | Allows users to adjust certain secure kernel parameters required by their services. This is currently supported only for Lingjun resource group instances. For parameter details, see the table below. |
Advanced configuration parameter | Default value | Description | Notes |
VmMaxMapCount | 65530 | Sets the maximum number of memory mapping (mmap) areas a process can have. For example, you can set it to 1024000. | Values below 65530 do not take effect. Excessively high values can lead to wasted memory resources. |
Network information
Parameter | Description |
VPC Settings | This parameter is available only when Resource Type is set to Public Resources. To use a DSW instance within a virtual private cloud (VPC), create a VPC in the same region as the DSW instance and configure this parameter. You also need to configure a vSwitch and a Security Group. For details on configuration policies for different scenarios, see Network configuration. |
vSwitch | This parameter can be configured when a VPC is configured. A vSwitch is a subnet within a VPC. Your DSW instance and other cloud resources connect to the vSwitch. |
Security Group | This parameter is required when a VPC is configured. A security group is a virtual firewall for a DSW instance. It controls all inbound and outbound network traffic. |
Internet Access Gateway | The following configuration methods are supported: Public Gateway: The network bandwidth is limited. During periods of high user concurrency or when downloading large files, the network speed may be insufficient. Private Gateway: To solve the bandwidth limits of the shared gateway, create an Internet NAT gateway in the DSW instance's VPC, bind an EIP, and configure SNAT entries. For more information, see Use a dedicated gateway to increase the public network access speed.
The following parameters are available only when a CPFS dataset is selected for Mount Configuration:
Note If a CPFS dataset is mounted, you must configure a VPC, and the selected VPC must be the same as the one used by CPFS. |
Extended CIDR Block | This parameter can be configured after you configure a vSwitch. If the number of available IP addresses in the VPC is insufficient for your growing business, or if the initial network planning resulted in an address shortage, you can use a secondary CIDR block to expand the VPC address space. For more information, see Use a secondary CIDR block to expand the address space of a CIDR block. |
Access configuration
Parameter | Description |
Enable SSH | For remote connection to the instance. This option is available only after you select a VPC. When enabled, a Custom Services named SSH appears. If you use a custom image, ensure that sshd is installed. |
SSH Public Key | You can configure this parameter after you turn on the SSH Configuration switch.
Note To support both VPC and public network logon, add public keys from multiple clients. Add each public key on a new line. You can add up to 10 public keys. |
Custom Services | Used to configure SSH remote access or access services in an instance over the Internet. |
Create Private Zone in VPC | Creates an internal authoritative domain name (PrivateZone). Use this domain name within the VPC to access the instance's SSH service or other custom services. This avoids the inconvenience of a changing instance IP address. Creating a PrivateZone domain name incurs charges. For more information, see Alibaba Cloud DNS product billing. |
NAT Gateway | When accessing a service in the instance from the public network, this gateway maps public requests (EIP:Port) to the private DSW instance (Private IP:Port). |
EIP | Provides a public IP address for accessing services in the instance from the public network. |
Roles and permissions
Parameter | Description |
Visibility | You can select Visible to the Instance Owner or Visible to Current Workspace. |
Instance Owner | Only the workspace administrator can change the instance owner. |
Show advanced configurations
Parameter | Description |
Instance RAM Role | When you access other cloud resources from a DSW instance, you can associate a RAM role with the instance. This method uses temporary credentials from Security Token Service (STS) to access other cloud resources. This avoids using long-term AccessKey pairs and reduces the risk of key exposure. You can configure one of the following instance RAM roles: Default Roles of PAI: Has permissions to access internal PAI products, MaxCompute, and OSS. Temporary access credentials issued based on the default PAI role have permissions equivalent to the DSW instance owner when accessing internal PAI products and MaxCompute tables. When accessing OSS, it can only access the default storage path bucket configured for the current workspace. Custom Roles: Configure a custom role for customized or more fine-grained permission management. Does Not Associate Role: Select this if you want to access other cloud products directly using an AccessKey pair.
For more information about how to configure instance RAM roles, see Configure an instance RAM role for a DSW instance. |
FAQ
DSW instance startup
Click to expand
Q: How can I diagnose and fix DSW instance startup failures?
First, check the Events tab on the instance details page for specific error messages. Common errors and their solutions are listed below.

The following are common errors and their solutions:
Your requested resource type [ecs.******] is not enough currently, please try other regions or other resource types
Cause: The selected instance type is unavailable in the current region.
Solution: Try creating the instance again later, or switch to a different instance type or region.
Your resource usage has exceeded the default limitation. Please contact us via ticket system to raise the limitation.
Cause: Your account has a default quota that limits instance specifications (e.g., a maximum of two GPUs per region per creation). This error occurs if your selection exceeds this limit.
Solution: To request a quota increase, submit a ticket.
Sales of this resource are temporarily suspended in the specified zone. We recommend that you use the multi-zone creation function to avoid the risk of insufficient resource.
CommodityInstanceNotAvailableError: Commodity instance has been released due to prolonged arrears at past. Please create a new instance for use
The charge of current ECI instance has been stopped, but the related resources are still being cleaned. or The cluster resources are fully utilized. Please try later or other regions. or Create ECI failed because the specified instance is out of stock.
back-off 10s restarting failed container=dsw-notebook pod
Cause: This error indicates your system disk is full. You can check disk usage in the DSW terminal:


Solution: Expand the system disk. Go to the instance details page and use the Change Configuration feature.

Important An expanded system disk is billed continuously, even when the instance is stopped. To stop all billing, you must delete the instance. Before you delete the instance, make sure that you have backed up all necessary data.
the available zone with vSwitch is out of stock
Cause: If you configured a VPC at creation, the associated vSwitch locks the resource search to a single availability zone, which may have a resource shortage.
Solution: Change the configuration of the DSW instance and set the VPC to empty. This allows the system to search for resources across all AZs in the region.

Note To use a VPC, we recommend that you switch to another zone and create a new vSwitch and DSW instance. This expands the range of available resources and prevents shortages caused by a limited resource scope.
Startup failed with the message "Workspace member not found"
failed to create containerd container: failed to prepare layer from archive: failed to validate archive quota ...
Cause: The Image used for the instance is too large for the current system disk size.
Solution: Expand the System Disk via the Change Settings feature on the instance details page. Note that this incurs additional storage charges.

Other Cause: Overdue Payments
Q: Can I run a Python file when a DSW instance starts?
Use the Custom Startup Script option. You can set this when creating a new instance or by modifying an existing one via the Change Settings panel.

The script runs after the instance's underlying resources are ready but before development tools like JupyterLab or WebIDE are started. This is useful for environment customization or initialization tasks.
Note 3-Minute Timeout: The script has a 3-minute timeout. Avoid long-running tasks like downloading large files.
Log Location: After startup, you can find the script's execution logs in the /var/log/user-command/ directory inside the instance.
Q: How do I find my DSW instance if it's not visible in the console?
Your instance may be in a different region or workspace. Use the dropdown menus on the PAI-DSW console page to switch between available regions and workspaces.

Q: Why is my DSW page blank, or why is the Notebook/Terminal unresponsive?
These issues are typically caused by your local browser or network environment. Try the following steps in order:
Clear your browser's cache and cookies, then reload the page.
Open the DSW page in your browser's incognito or private mode.
Switch your network. For example, if you are on a corporate network, try a mobile hotspot to rule out firewall issues.
Try a different browser, such as Chrome or Firefox.
Q: What happens to my data on a DSW instance's Cloud Disk when I stop, restart, or modify the instance?
Data persistence depends on the action taken and whether the instance uses a Cloud Disk or Temporary Storage for its System Disk. This answer applies to instances with a Cloud Disk System Disk.
Stopping the instance: Data may be lost.
If the Cloud Disk was expanded or the instance is stopped for fewer than 15 days, your data is preserved.
If the Cloud Disk was not expanded and the instance remains stopped for more than 15 days, the system permanently erases all data.
Restarting the instance: Data is not lost. All files and installed packages on the System Disk are retained.
Changing the instance specification (CPU/GPU/Memory): Data is not lost.
Changing the instance image: Data on the System Disk may be lost. The system may reset the disk contents. Data on mounted storage (like OSS or NAS) is unaffected. Always back up your System Disk data before changing the image.
For instances using Temporary Storage, all data on the System Disk is permanently deleted upon stopping, restarting, or any configuration change.
Q: How can I recover a DSW instance from a public resource group that was auto-released after 15 days of being stopped?
You can't. An instance created from public resources with an un-expanded Cloud Disk is automatically and permanently deleted if it remains stopped for more than 15 consecutive days. The data cannot be recovered.
Stopping or releasing a DSW instance
Click to expand
Q: What is the correct way to release a DSW instance and its resources?
On the DSW console, you can either Stop or Delete an instance.

Stop: Halts compute billing but preserves the instance for later use.
Delete: Permanently removes the instance and all its resources, stopping all billing.
Important: If you expanded the System Disk, it continues to incur storage charges even when the instance is stopped. To stop all charges, you must delete the instance.
Q: How do I find my DSW instance if it's not visible in the console?
Your instance may be in a different region or workspace. Use the dropdown menus on the PAI-DSW console page to switch between available regions and workspaces.

Q: Do I need to manually release a free trial resource package?
No. Free trial resource packages expire automatically and do not need to be manually stopped or released.
Q: What is the difference between stopping and deleting a DSW instance, and how do I stop all billing?
To completely stop all billing associated with a DSW instance, you must delete it.
Stop: This action releases the compute resources (CPU/GPU) and pauses billing for them. However, if you have expanded the System Disk, storage charges for the disk will continue.
Delete: This action permanently removes the instance and all its resources, including the system disk. All associated billing stops completely.
How to choose:
Use Stop if you plan to use the instance again and want to preserve its environment and data.
Use Delete if you no longer need the instance and want to avoid all future charges. Always back up your data before deleting.
Q: What should I do if my DSW instance is stuck in the 'Stopping' or 'Deleting' state?
This can happen if processes inside the instance are not terminating correctly or if high memory usage prevents the instance from responding.
Solution: Wait a few minutes and refresh the page. The system is designed to eventually terminate the instance safely. The status should update to "Stopped" or the instance will be removed from the list if deleted.
Q: Is my data preserved when I stop or delete a DSW instance?
It depends on the action and the type of system disk your instance uses.
Recommendation: To ensure data persistence, save your important files to a mounted storage solution like OSS or NAS. Always back up your data before deleting an instance.
Q: Why did my DSW instance shut down automatically while it was running?
Your instance was likely stopped by the Idle Auto-shutdown policy. This feature automatically stops an instance if its CPU and GPU utilization remains below a set threshold for a specified period (e.g., 3 hours). It is enabled by default on free trial instances to conserve resources.
How to disable or modify the policy:
Manual stop: To ensure resource savings, you can manually stop the instance when it is not in use. The auto-shutdown policy is not guaranteed to be triggered every time.
Modify policy: To run long-term tasks, you can modify or disable this policy. The steps are as follows:
Modify the DSW auto-shutdown policy
Navigate to your workspace details page and go to Configure Workspace > Configure Scheduling.

In the DSW configuration section, you can modify the shutdown policy or add your instance's name to the exclusion list to prevent it from being automatically stopped.

Q: Why am I still being billed or seeing a 'Running' status after stopping/deleting all my DSW instances?
This usually happens for one of three reasons:
"Running" status refers to a resource package: The "Running" status you see on a billing or free trial page may refer to an active resource package (e.g., "250 compute hours/month"), not an actual DSW instance. The package remains active until it expires.
An expanded system disk is still being billed: Stopping an instance only pauses compute charges. If you expanded the System Disk, you are still being charged for storage. To stop these charges, you must delete the instance.
Billing data has a delay: There is a delay between resource usage and when the bill is generated. The charges you see may be for usage that occurred before you stopped or deleted the instance.