Pai-Megatron-Patch applies various technologies to optimize the training of PyTorch Transformers and deliver optimal training performance. This topic describes how Pai-Megatron-Patch works and how you can use it.
Background information
Pai-Megatron-Patch is a tool developed by Alibaba Cloud Platform for AI (PAI) team based on best practice solutions of foundation models in the intelligent computing LINGJUN platform. Pai-Megatron-Patch is used to facilitate foundation model developers throughout the whole foundation model development process, such as getting started with LINGJUN services and performing efficient distributed trainings on large language models (LLM), supervised fine-tuning (SFT), and offline model inference. Pai-Megatron-Patch provides a Megatron-LM-based training and offline inference verification process for mainstream open source foundation models in the industry, which helps you get started with foundation model training.
How Pai-Megatron-Patch works
Pai-Megatron-Patch strengthens the Megatron-LM capabilities without source code modification by providing feature support through patches. This way, Pai-Megatron-Patch can build an independent LLM training process without modifying the core library of Megatron-LM and ensure compatibility with Megatron-LM updates without impairing user experience.
Pai-Megatron-Patch provides tools and features that include a model library, tokenizers, model transformation tools, reinforcement learning, offline text generation, and multiple usage examples and toolsets to help you quickly deploy foundation model training and inference.
The model library includes multiple popular foundation models, such as Baichuan, BLOOM, ChatGLM, Falcon, Galactica, GLM, Llama, Qwen, and StarCoder. In addition, the patch supports conversion between the Hugging Face model weights and Megatron model weights. This allows you to load Hugging Face model weights in the Megatron environment for pre-training or fine-tuning, or convert Megatron model weights to the Hugging Face model weights for evaluation and inference.
In terms of reinforcement learning, Pai-Megatron-Patch provides a Proximal Policy Optimization (PPO) training process that you can use to perform SFT and reward model (RM) trainings. The Pai-Megatron-Patch tools and usage examples provide you with a comprehensive solution for foundation model training and evaluation.
Procedure
Perform the following operations to use Pai-Megatron-Patch: