This topic describes how to install the Pai-Megatron-Patch image in DLC or DSW to accelerate model training.
Limits
-
Pai-Megatron-Patch requires GPU-accelerated instances.
-
The GPU driver version must be 460.32 or later.
Procedure
Install a Pai-Megatron-Patch image in DLC
Deep Learning Containers (DLC) is a cloud-native deep learning training platform that supports custom images, distributed training, and multiple frameworks.
DLC lets you load custom images for Pai-Megatron-Patch deployment. After installation, you can run large-scale distributed training on multi-GPU servers.
Perform the following steps:
-
Log on to the PAI console.
-
In the left pane, click Workspace List. On the Workspace List page, click a workspace.
-
In the left pane, choose Model Development and Training > Deep Learning Containers (DLC), and click Create Job.
-
Configure the following parameters. For other parameters, see Create a training job.
-
Environment Information: Set Node Image to Image Address, and enter the following address:
pai-image-manage-registry.cn-wulanchabu.cr.aliyuncs.com/pai/pytorch-training:2.0-ubuntu20.04-py3.10-cuda11.8-megatron-patch-llm -
Resource Information:
-
Set Framework to PyTorch.
-
Job Resource: Click
in the Resource Specification column, and select a GPU-accelerated node type and specifications.
-


-
-
Click OK.
Install a Pai-Megatron-Patch image in DSW
Data Science Workshop (DSW) is a cloud-based deep learning development environment that integrates JupyterLab and supports custom plug-ins without O&M configuration.
DSW also supports custom images. After installation, you can debug Pai-Megatron-Patch training acceleration programs.
Perform the following steps:
-
Log on to the PAI console.
-
In the left pane, click Workspace List. On the Workspace List page, click a workspace.
-
In the left pane, choose Model Development and Training > Data Science Workshop (DSW), and click Create Instance.
-
Configure the following parameters. For other parameters, see Create a DSW instance.
-
Resource Quota: Select Public Resources (Pay-as-you-go).
-
Resource Specification: Click
and select a GPU-accelerated instance specification. -
Image: Enter the following address:
pai-image-manage-registry.cn-wulanchabu.cr.aliyuncs.com/pai/pytorch-training:2.0-ubuntu20.04-py3.10-cuda11.8-megatron-patch-llm

-
-
Click OK to create the DSW instance.
Post-installation usage
After installation, find examples in the examples folder of Pai-Megatron-Patch.