You can upload and download data files in Data Science Workshop (DSW). You can use the data files that you upload for model training and evaluation. After model training is complete, you can download the prediction results to evaluate the model performance or download the trained model for service deployment in other applications or systems. This topic describes how to upload and download data files on the Notebook or WebIDE page of a DSW instance.
Background information
When you upload or download a data file in DSW, you can select a method based on the file size.
If the file size is less than or equal to 5 GB, you can use one of the following methods to upload or download the data file:
- Note
DSW provides a file transfer station in Notebook. If you want to upload large models or large data files on your on-premises machine to a DSW instance, the file transfer station accelerates the upload process, and you are not charged additional fees. You can save and use a large file that you uploaded on multiple DSW instances that belong to the same RAM user.
If the file size is greater than 5 GB, we recommend that you use ossutil to upload the file to an Object Storage Service (OSS) bucket. Then, create an OSS dataset based on the data file and mount the dataset to a DSW instance. This way, the DSW instance can read data from the data file. For more information, see Read and write dataset data.
Limits
Take note of the following limits when you use the file transfer station of Notebook to upload data files:
You can upload up to five data files at a time. If you upload more than five data files, the excess files wait in a queue to be uploaded.
You cannot upload a folder. If you want to upload a folder, we recommend that you compress the folder into a package and upload the package. For more information, see FAQ.
The default validity period of data files in the file transfer station is seven days. Before a data file expires, you can extend the validity period of the file for another seven days with one click.
You cannot extend the validity period of a data file that expires.
The capacity of the file transfer station is 10 GB. You can store up to 1,000 data files in the file transfer station.
The working directory of Notebook is
/mnt/workspace/
. If the file that you want to download is stored in other directories, move the file to the/mnt/workspace/
directory.
Prerequisites
A DSW instance is created. For more information, see Create a DSW instance.
Upload or download data files on the Notebook page
Log on to the PAI console and open the DSW instance that you want to manage.
Upload or download a data file on the Notebook page.
Area
Upload/Download
Operation
Area 1
Upload
Click the icon or drag the data file to the blank area of the file list to upload the file. The system determines the upload method based on the file size.
If the file size is less than or equal to 10 MB, the data file is uploaded to the DSW instance by using the current browser.
If the file size is greater than 10 MB but less than or equal to 5 GB, the data file is automatically uploaded to the file transfer station and saved to the instance.
Area 2
Download
Right-click the data file and select Download. The data file is downloaded to your on-premises machine.
(Optional) Click the icon in the left-side toolbar to go to the File Transfer Station page. On the page, you can view the transfer list or perform operations on data files.
Area
Operation
Area 1
Click the icon to view the transfer list in the file transfer station.
Area 2
Click the icon to the right of the data file that you want to manage to perform the following operations:
After the file transfer is complete, the data file is automatically saved to the instance. If you want to repeatedly save the data file to the current instance, click Save to Instance. If you want to use the data file on other instances that belong to the same RAM user, click Save to Instance.
The default validity period of data files in the file transfer station is seven days. You can click Extend Expiration to extend the validity period of a data file before the validity period ends.
Click Delete to delete the data file from the file transfer station. The data file is not deleted from the file list.
Upload or download data files on the WebIDE page
Open the DSW instance that you want to manage and click WebIDE in the top navigation bar of the page that appears.
On the WebIDE page, click the icon in the left-side toolbar to upload data files to or download data files from the file list.
Upload a data file: Right-click the directory in which you want to save the data file, select Upload from the shortcut menu, and then upload the file as prompted.
Download a data file: Right-click the data file that you want to download and select Download from the shortcut menu.
NoteYou can download only data files. You cannot download folders. If you want to download a folder, compress the folder on the Terminal page, right-click the package on the WebIDE page, and then select Download from the shortcut menu.
FAQ
How do I upload or download a folder?
DSW does not support uploading and downloading folders. However, you can compress a folder into a package to upload or download the package. DSW Terminal provides a Linux environment that allows you to compress files by using standard Linux CLI tools, such as tar
, gzip
, and unzip
. In this example, tar
is used.
Run the
tar --version
command to check whether tar is installed. If tar is not installed, run the following commands to install tar:# Installation command used in the Debian operating systems, such as Ubuntu sudo apt install tar # Installation command used in Red Hat operating systems, such as CentOS and Fedora sudo yum install tar
Run the following commands to compress a folder or decompress a compressed folder:
# Compress a folder that is stored in the /path/to/directory directory. tar -cvf archive_name.tar /path/to/directory # Decompress a compressed folder. tar -xvf archive_name.tar
References
DSW is a cloud-based integrated development environment (IDE) for machine learning in PAI. After you upload a data file, you can start to use DSW. For more information, see DSW overview.
DSW supports a variety of data sources, such as OSS, File Storage NAS (NAS), and MaxCompute data sources. For more information, see Read and write data.