Submit a standalone training job that uses PyTorch - Platform For AI

This topic describes how to use Deep Learning Containers (DLC) to train offline transfer learning based on PyTorch.

Step 1: Prepare data

In this topic, the data used for training is pre-stored in a public storage medium. You can download the data directly and do not need to prepare additional data.

Step 2: Prepare the training code and model storage file

In this topic, the training code package is pre-stored in a public storage medium. You can download the code package directly and do not need to develop additional code.

Step 3: Create a job

Go to the Create Job page.
1. Log on to the PAI console. Select a region and a workspace. Then, click Enter Deep Learning Containers (DLC).
2. On the Deep Learning Containers (DLC) page, click Create Job.

On the Create Job page, configure the parameters listed in the following table, and use default values for the remaining parameters.

Parameter		Description
Basic Information	Job Name	Enter a name for the job. Example: torch-sample.
Environment Information	Node Image	Click Alibaba Cloud Image and select a PyTorch image.
	Data Set	If you want to save training results to your local machine, you can mount a custom dataset and save the results to the file system of the dataset. Take an OSS dataset as an example. Click Custom Dataset and configure the following parameters: Custom Dataset: Select a created OSS dataset. For more information about how to create a dataset, see Create and manage datasets. Mount Path: Set to `/mnt/data/`.
	Startup Command	Enter the following command to perform the following operations: download data, download code package, run training jobs, check models, and steps to save the training results to the mounted dataset. `wget https://pai-public-data.oss-cn-beijing.aliyuncs.com/hol-pytorch-transfer-cv/data.tar.gz && tar -xf ./data.tar.gz && mv ./hymenoptera_data/ ./input && mkdir output && wget https://pai-public-data.oss-cn-beijing.aliyuncs.com/hol-pytorch-transfer-cv/main.py && python main.py -i ./input -o ./output && ls ./output`
Resource Information	Source	Select Public Resources.
	Framework	Select PyTorch.
	Job Resource	Number of Nodes: Set the value to 1. Instance Type: Click and select an instance type. For example, ecs.gn6e-c12g1.3xlarge. If the type is not available in your current region, you can create a job in another region. For information about regions that support the pay-as-you-go billing method, see Deep Learning Containers (DLC).

Click OK.
The Deep Learning Containers (DLC) page appears.

Step 4: View the details and logs of the training job

On the Deep Learning Containers (DLC) page, click the job name.
On the job details page, view Basic Information and Resource Information of the job.
In the Instance section at the bottom of the job details page, find the desired node and click Log in the Actions column to view the logs of the node.
Go to the file system of the mounted dataset to view the results. Take OSS as an example: