This topic aims to help foundation model developers get started with PAI-Lingjun AI Computing Service and develop the foundation models of Qwen-7B, Qwen-14B, and Qwen-72B. The development process includes distributed training, fine-tuning, offline inference, and online deployment. In this example, a Qwen-7B model is used to describe the best practice for developing a Qwen model in PAI-Lingjun AI Computing Service.
Prerequisites
In this example, Qwen-7B V1.1.4 is used. Before you start, make sure that the following prerequisites are met:
Platform for AI (PAI) is activated, including Data Science Workshop (DSW), Deep Learning Containers (DLC), and Elastic Algorithm Service (EAS). The default workspace is created. For more information, see Activate PAI and create a default workspace.
Lingjun resources are purchased, and a resource quota is created for the purchased Lingjun resources. The following table describes the resource specifications that are supported by different numbers of model parameters. Select appropriate resource specifications based on your actual number of model parameters. For more information about the node specifications of Lingjun resources, see the Pricing of nodes section of the "Billing of Lingjun resources (Serverless Edition)" topic. For more information, see Create a resource group and purchase Lingjun resources and Lingjun resource quotas.
Number of model parameters
Full-parameter training resources
Minimum inference resources
Model parallelism for Megatron-based training
7 billion
Eight gu7xf GPUs or eight gu7ef GPUs
One NVIDIA V100 GPU (32 GB of memory) or one NVIDIA A10 GPU (24 GB of memory)
TP1 and PP1
14 billion
Eight gu7xf GPUs or eight gu7ef GPUs
Two NVIDIA V100 GPUs (32 GB of memory) or two NVIDIA A10 GPUs (24 GB of memory)
TP2 and PP1
72 billion
Four servers, each with eight gu7xf GPUs or eight gu7ef GPUs
Six NVIDIA V100 GPUs (32 GB of memory) or two gu7xf GPUs
TP8 and PP2
A dataset is created based on a General-purpose NAS file system of File Storage NAS to store the files and result files required for training. The default mount directory is
/mnt/data/nas
. For more information, see Create and manage datasets.A DSW instance is created based on the following key parameters. For more information, see Create a DSW instance.
Resource Quota: Select the resource quota that is created for the purchased Lingjun resources.
Instance Type: Configure the following resource specifications:
vCPUs: 90
Memory (GiB): 1024
Shared Memory (GiB): 1024
GPUs: at least 8
Mount Settings: Click Add, select the created dataset, and then specify the default mount directory.
Image: Click Image Address and enter the following image URL:
pai-image-manage-registry.cn-wulanchabu.cr.aliyuncs.com/pai/pytorch-training:1.12-ubuntu20.04-py3.10-cuda11.3-megatron-patch-llm
.
A Resource Access Management (RAM) user is granted the required permissions on DSW, DLC, and EAS if you perform the operations in this best practice as the RAM user. For more information, see Grant the permissions that are required to use DSW, Grant the permissions that are required to use DLC, and Grant the permissions that are required to use EAS.
Limits
This best practice is supported only in the China (Ulanqab) region.
Step 1: Prepare a Qwen model
You can download a model by using one of the methods described in this best practice. Perform the following steps:
Go to the development environment of DSW.
Log on to the PAI console.
In the upper-left corner of the page, select the China (Ulanqab) region.
In the left-side navigation pane, click Workspaces. On the Workspaces page, click the name of the workspace that you want to manage.
In the left-side navigation pane, choose .
Find the DSW instance that you want to manage and click Open in the Actions column.
In the top navigation bar, click Terminal. On this tab, click Create Terminal or the plus (+) icon in the upper-right corner.
Download a Qwen model.
Download a model from the ModelScope community
Run the following command on the Terminal tab to install ModelScope:
Run the following command to go to the Python environment:
The following sample code downloads the package of a Qwen-7B model:
Press
Ctrl+D
to exit the Python environment.Run the following commands to move the downloaded model to the corresponding folder:
pip install modelscope
python
# ### Loading Model and Tokenizer from modelscope.hub.snapshot_download import snapshot_download model_dir = snapshot_download('qwen/Qwen-7B', 'v1.1.4') # model_dir = snapshot_download('qwen/Qwen-14B', 'v1.0.4') # model_dir = snapshot_download('qwen/Qwen-72B') # Display the directory of the downloaded model. print(model_dir) # /root/.cache/modelscope/hub/qwen/Qwen-7B
# mkdir -p /mnt/workspace/qwen-ckpts/${The ckpt folder with the hf suffix} mkdir -p /mnt/workspace/qwen-ckpts/qwen-7b-hf # cp -r ${The directory of the downloaded model}/* /mnt/workspace/qwen-ckpts/${The ckpt folder with the hf suffix} cp -r /root/.cache/modelscope/hub/qwen/Qwen-7B/* /mnt/workspace/qwen-ckpts/qwen-7b-hf
Download a model from the Hugging Face community
Run the following commands on the Terminal tab of DSW to download the package of a model. In this example, the package of a Qwen-7B model is downloaded. If you want to download the package of a Qwen-14B or Qwen-72B model, modify the following sample code based on your business requirements:
mkdir /mnt/workspace/qwen-ckpts cd /mnt/workspace/qwen-ckpts git clone https://huggingface.co/Qwen/Qwen-7B # git clone https://huggingface.co/Qwen/Qwen-7B-Chat # git clone https://huggingface.co/Qwen/Qwen-14B # git clone https://huggingface.co/Qwen/Qwen-14B-Chat # git clone https://huggingface.co/Qwen/Qwen-72B # git clone https://huggingface.co/Qwen/Qwen-72B-Chat
Step 2: Prepare data for pre-training
We recommend that you prepare the data used for pre-training in the DSW instance. In this example, the WuDaoCorpora 2.0 dataset is used to describe how to preprocess data for Megatron-based training. This dataset is used only for research. You can directly download the small-scale sample data processed by PAI. You can also prepare the data used for pre-training on your own.
Use the small-scale sample data processed by PAI
To help you use this best practice, PAI provides the processed small-scale sample data. You can run the following commands on the Terminal tab of DSW to download the sample data:
mkdir /mnt/workspace/qwen-datasets/
cd /mnt/workspace/qwen-datasets
wget https://atp-modelzoo-wlcb-pai.oss-cn-wulanchabu.aliyuncs.com/release/models/pai-megatron-patch/qwen-datasets/alpaca_zh-qwen-train.json
wget https://atp-modelzoo-wlcb-pai.oss-cn-wulanchabu.aliyuncs.com/release/models/pai-megatron-patch/qwen-datasets/alpaca_zh-qwen-valid.json
mkdir -p /mnt/workspace/qwen-datasets/wudao
cd /mnt/workspace/qwen-datasets/wudao
wget https://atp-modelzoo-wlcb-pai.oss-cn-wulanchabu.aliyuncs.com/release/models/pai-megatron-patch/qwen-datasets/wudao_qwenbpe_content_document.bin
wget https://atp-modelzoo-wlcb-pai.oss-cn-wulanchabu.aliyuncs.com/release/models/pai-megatron-patch/qwen-datasets/wudao_qwenbpe_content_document.idx
Process data on your own
Download the open source WuDaoCorpora 2.0 dataset to the
/mnt/workspace/qwen-datasets
working directory. In this example, the extracted folder is named wudao_200g.The small-scale sample data processed by PAI is also sourced from this dataset. You can run the following commands on the Terminal tab of DSW to download and decompress the dataset:
mkdir /mnt/workspace/qwen-datasets cd /mnt/workspace/qwen-datasets wget https://atp-modelzoo.oss-cn-hangzhou.aliyuncs.com/release/datasets/WuDaoCorpus2.0_base_sample.tgz tar zxvf WuDaoCorpus2.0_base_sample.tgz mv WuDaoCorpus2.0_base_sample wudao_200g
Run the following commands on the Terminal tab to perform data cleansing on the WuDaoCorpora 2.0 dataset, convert the file format, and then generate the merged_wudao_cleaned.json file:
#! /bin/bash set -ex # Specify the directory of the WuDaoCorpora 2.0 dataset. data_dir=/mnt/workspace/qwen-datasets/wudao_200g # Start the data cleansing process. dataset_dir=$(dirname $data_dir) mkdir -p ${dataset_dir}/cleaned_wudao_dataset cd ${dataset_dir}/cleaned_wudao_dataset wget https://atp-modelzoo-wlcb-pai.oss-cn-wulanchabu.aliyuncs.com/release/models/pai-megatron-patch/llama2-codes/preprocess_wudao2.py # Set the -k option to text. python preprocess_wudao2.py -i ${data_dir} -o ${dataset_dir}/cleaned_wudao_dataset -k text -p 32 # Merge the cleansed data. mkdir ${dataset_dir}/wudao cd ${dataset_dir}/wudao find ${dataset_dir}/cleaned_wudao_dataset -name "*.json" -exec cat {} + > ${dataset_dir}/wudao/merged_wudao_cleaned.json rm -rf ${dataset_dir}/cleaned_wudao_dataset
The following sample code shows the structure of the
qwen-datasets
directory after the preceding commands are run. The wudao folder is created.qwen-datasets ├── wudao_200g └── wudao └── merged_wudao_cleaned.json
Run the following commands on the Terminal tab to split data into several groups and compress the data of each group by processing the generated merged_wudao_cleaned.json file. This facilitates multithreading in subsequent operations.
apt-get update apt-get install zstd # Split data into 10 groups. If data processing is slow, you can split data into more groups. NUM_PIECE=10 # Process the merged_wudao_cleaned.json file. mkdir -p ${dataset_dir}/cleaned_zst/ # Query the total length of data and split the data. NUM=$(sed -n '$=' ${dataset_dir}/wudao/merged_wudao_cleaned.json) echo "total line of dataset is $NUM, data will be split into $NUM_PIECE pieces for processing" NUM=`expr $NUM / $NUM_PIECE` echo "each group is processing $NUM sample" split_dir=${dataset_dir}/split mkdir $split_dir split -l $NUM --numeric-suffixes --additional-suffix=.jsonl ${dataset_dir}/wudao/merged_wudao_cleaned.json $split_dir/ # Compress the data of each group. o_path=${dataset_dir}/cleaned_zst/ mkdir -p $o_path files=$(ls $split_dir/*.jsonl) for filename in $files do f=$(basename $filename) zstd -z $filename -o $o_path/$f.zst & done rm -rf $split_dir rm ${dataset_dir}/wudao/merged_wudao_cleaned.json
The following sample code shows the structure of the
qwen-datasets
directory after the preceding commands are run. Thecleaned_zst
folder is created and contains 10 compressed files.qwen-datasets ├── wudao_200g ├── wudao └── cleaned_zst ├── 00.jsonl.zst │ ... └── 09.jsonl.zst
Generate the dataset used for pre-training in the MMAP format.
MMAP is a file format in which data is tokenized in advance. It reduces the amount of time required to read data from the dataset during training and fine-tuning, especially when you process large amounts of data. Perform the following steps:
Run the following commands on the Terminal tab of DSW to copy the PAI-Megatron-Patch file that contains the source code of the Megatron-based training tool to the
/mnt/workspace/
working directory of DSW:cd /mnt/workspace/ # Method 1: Obtain the source code of the training tool from GitHub. git clone --recurse-submodules https://github.com/alibaba/Pai-Megatron-Patch.git # Method 2: Obtain the source code of the training tool by running the wget command. Then, run the tar zxvf Pai-Megatron-Patch.tgz command to decompress the downloaded file. wget https://atp-modelzoo.oss-cn-hangzhou.aliyuncs.com/release/models/Pai-Megatron-Patch.tgz
Run the following commands on the Terminal tab to convert the dataset to the MMAP format:
After the commands are run, the
.bin
and.idx
files are generated in the/mnt/workspace/qwen-datasets/wudao
directory.# Install the tokenizer library on which Qwen depends. pip install tiktoken # Specify the directory of the dataset and the working directory. export dataset_dir=/mnt/workspace/qwen-datasets export WORK_DIR=/mnt/workspace # Generate the training set and validation set used for pre-training in the MMAP format. cd ${WORK_DIR}/Pai-Megatron-Patch/toolkits/pretrain_data_preprocessing bash run_make_pretraining_dataset.sh \ ../../Megatron-LM-23.04 \ ${WORK_DIR}/Pai-Megatron-Patch/ \ ${dataset_dir}/cleaned_zst/ \ qwenbpe \ ${dataset_dir}/wudao/ \ ${WORK_DIR}/qwen-ckpts/qwen-7b-hf rm -rf ${dataset_dir}/cleaned_zst
The following table describes the six parameters that you must specify to run the run_make_pretraining_dataset.sh script.
Parameter
Description
MEGATRON_PATH=$1
The directory of the source code of the Megatron-based training tool.
MEGATRON_PATCH_PATH=$2
The directory of the Pai-Megatron-Patch folder.
input_data_dir=$3
The directory of the processed and packaged WuDaoCorpora 2.0 dataset.
tokenizer=$4
The type of the tokenizer. In this example, the value is set to qwenbpe.
output_data_dir=$5
The directory of the generated
.bin
and.idx
files.load_dir=$6
The directory of the generated tokenizer_config.json file.
The following sample code shows the structure of the
qwen-datasets
directory after the script is run:qwen-datasets ├── wudao_200g └── wudao ├── wudao_qwenbpe_content_document.bin └── wudao_qwenbpe_content_document.idx
Step 3: Perform Megatron-based training
You can perform the following operations to perform Megatron-based training:
Convert the model format
You must convert the model format from Hugging Face to Megatron.
Download the converted Megatron model
To help you use this best practice, PAI provides the model whose format has been converted. You can run the following commands on the Terminal tab to download the model:
cd /mnt/workspace/
mkdir qwen-ckpts
cd qwen-ckpts
wget https://atp-modelzoo-wlcb-pai.oss-cn-wulanchabu.aliyuncs.com/release/models/pai-megatron-patch/qwen-ckpts/qwen-7b-hf-to-mg-tp1-pp1.tgz
tar -zxf qwen-7b-hf-to-mg-tp1-pp1.tgz
mv qwen-7b-hf-to-mg-tp1-pp1 qwen-7b-hf-to-megatron-tp1-pp1
Convert the model format from Hugging Face to Megatron
Run the following commands on the Terminal tab to use the model conversion tool provided by PAI to convert the model format from Hugging Face to Megatron:
# Convert the model format.
cd /mnt/workspace/Pai-Megatron-Patch/toolkits/model_checkpoints_convertor/qwen
sh model_convertor.sh \
../../../Megatron-LM-main \
/mnt/workspace/qwen-ckpts/qwen-7b-hf \
/mnt/workspace/qwen-ckpts/qwen-7b-hf-to-megatron-tp1-pp1 \
1 \
1 \
qwen-7b \
0 \
false
The following table describes the parameters that you must specify to run the model_convertor.sh script.
Parameter | Description |
MEGATRON_PATH=$1 | The directory of the source code of the Megatron-based training tool. |
SOURCE_CKPT_PATH=$2 | The directory of the Hugging Face model. |
TARGET_CKPT_PATH=$3 | The directory of the converted Megatron model. |
TP=$4 | The size of tensor parallelism, which must be the same as that for training. The size varies based on the number of model parameters. You must modify the size when you convert the model format.
|
PP=$5 | The size of pipeline parallelism, which must be the same as that for training. The size varies based on the number of model parameters. You must modify the size when you convert the model format.
|
MN=$6 | The name of the model, such as qwen-7b, qwen-14b, or qwen-72b. |
EXTRA_VOCAB_SIZE=$7 | The size of the extra vocabulary. |
mg2hf=$8 | Specifies whether to convert the model format from Megatron to Hugging Face. |
Pre-train the model
You can submit a standalone job to train the model in DSW, or submit a distributed job to train the model on multiple multi-GPU servers in DLC. The training process lasts about 2 hours. After the job is run, a model file is exported to the /mnt/workspace/output_megatron_qwen/
directory.
Run a standalone job to pre-train the model in DSW
The following sample code runs a standalone job to train a Qwen-7B model on the Terminal tab:
export WORK_DIR=/mnt/workspace
cd ${WORK_DIR}/Pai-Megatron-Patch/examples/qwen
sh run_pretrain_megatron_qwen.sh \
dsw \
${WORK_DIR}/Pai-Megatron-Patch \
7B \
1 \
8 \
1e-5 \
1e-6 \
2048 \
2048 \
85 \
fp16 \
1 \
1 \
sel \
true \
false \
false \
false \
100000 \
${WORK_DIR}/qwen-datasets/wudao/wudao_qwenbpe_content_document \
${WORK_DIR}/qwen-ckpts/qwen-7b-hf-to-megatron-tp1-pp1 \
100000000 \
10000 \
${WORK_DIR}/output_megatron_qwen/
The following table describes the parameters that you must specify to run the run_pretrain_megatron_qwen.sh script.
Parameter | Description |
ENV=$1 | The runtime environment. Valid values:
|
MEGATRON_PATH=$2 | The directory of the source code of the Megatron-based training tool. |
MODEL_SIZE=$3 | The number of model parameters. Valid values: 7B, 14B, and 72B. |
BATCH_SIZE=$4 | The number of samples on each GPU for each training iteration. Valid values: 4 and 8. |
GLOBAL_BATCH_SIZE=$5 | The total number of samples for training iterations. |
LR=$6 | The learning rate. Valid values: 1e-5 and 5e-5. |
MIN_LR=$7 | The minimum learning rate. Valid values: 1e-6 and 5e-6. |
SEQ_LEN=$8 | The length of the sequence. |
PAD_LEN=${9} | The length of the padding sequence. |
EXTRA_VOCAB_SIZE=${10} | The size of the extra vocabulary. The size varies based on the number of model parameters.
|
PR=${11} | The training precision. Valid values: fp16 and bf16. |
TP=${12} | The size of tensor parallelism. |
PP=${13} | The size of pipeline parallelism. |
AC=${14} | The activation checkpointing mode. Valid values:
|
DO=${15} | Specifies whether to use the ZeRO-1 optimizer for Megatron. Valid values:
|
FL=${16} | Specifies whether to enable Flash Attention. Valid values:
|
SP=${17} | Specifies whether to use sequence parallelism. Valid values:
|
TE=${18} | Specifies whether to enable the acceleration technology of Transformer Engine. If you want to enable this technology, gu8xf GPUs are required. |
SAVE_INTERVAL=${19} | The interval at which the checkpoint file is saved. |
DATASET_PATH=${20} | The directory of the training set. |
PRETRAIN_CHECKPOINT_PATH=${21} | The directory of the pre-trained model. |
TRAIN_TOKENS=${22} | The number of tokens for training. |
WARMUP_TOKENS=${23} | The number of tokens for warm-up. |
OUTPUT_BASEPATH=${24} | The directory of the output model file generated after training. |
Run a distributed job to pre-train the model in DLC
After you train the model in DSW, you can configure a distributed job to train the model on multiple multi-GPU servers in DLC. Perform the following steps:
Go to the Create Job page.
Log on to the PAI console.
In the left-side navigation pane, click Workspaces. On the Workspaces page, click the name of the workspace that you want to manage.
In the left-side navigation pane, choose . On the Deep Learning Containers (DLC) page, click Create Job. The Create Job page appears.
On the Create Job page, configure the parameters that are described in the following table. You can use the default values for other parameters. For more information, see Submit training jobs.
Parameter
Description
Basic Information
Job Name
The name of the training job. In this example, the value is set to test_qwen_dlc.
Environment Information
Node Image
Click Image Address and enter the following image URL in the field:
pai-image-manage-registry.cn-wulanchabu.cr.aliyuncs.com/pai/pytorch-training:1.12-ubuntu20.04-py3.10-cuda11.3-megatron-patch-llm
.Mount Settings
Click Add. Select Custom Dataset as Mount Type and configure the following parameters:
Datasets: Select the dataset created based on the General-purpose NAS file system of File Storage NAS.
Mount Path: Enter
/mnt/workspace/
.
Startup Command
Enter the following commands. The parameters that you must specify to run the run_pretrain_megatron_qwen.sh script are the same as those that you specify when you submit a standalone job to train the model in DSW.
export WORK_DIR=/mnt/workspace cd ${WORK_DIR}/Pai-Megatron-Patch/examples/qwen sh run_pretrain_megatron_qwen.sh \ dlc \ ${WORK_DIR}/PAI-Megatron-Patch \ 7B \ 1 \ 8 \ 1e-5 \ 1e-6 \ 2048 \ 2048 \ 85 \ fp16 \ 1 \ 1 \ sel \ true \ false \ false \ false \ 100000 \ ${WORK_DIR}/qwen-datasets/wudao/wudao_qwenbpe_content_document \ ${WORK_DIR}/qwen-ckpts/qwen-7b-hf-to-megatron-tp1-pp1 \ 100000000 \ 10000 \ ${WORK_DIR}/output_megatron_qwen/
Resource Information
Resource Type
Select Lingjun Resources.
Source
Select Resource Quota.
Resource Quota
Select the resource quota that is created for the purchased Lingjun resources.
Framework
Select PyTorch.
Job Resource
Configure the following parameters for worker nodes:
Number of Nodes: Enter 2. If you want to train the model on more servers, you can increase the value of the Number of Nodes parameter.
GPUs: Enter 8.
vCPUs: Enter 90.
NoteThe number of CPU cores cannot be greater than 96.
Memory (GiB): Enter 1024.
Shared Memory (GiB): Enter 1024.
Click OK. You are navigated to the Deep Learning Containers (DLC) page. If the state of the job changes to Succeeded, the training job is run.
Perform supervised fine-tuning
You can submit a standalone job to fine-tune the model in DSW, or submit a distributed job to fine-tune the model on multiple multi-GPU servers in DLC. The fine-tuning process lasts about 2 hours. After the job is run, a model file is exported to the /mnt/workspace/output_megatron_qwen/
directory.
Before you fine-tune the model, go to Step 2: Prepare data for pre-training. Use the sample code on the Use the small-scale sample data processed by PAI tab to download the JSON files.
Fine-tune the model.
Run a standalone job to fine-tune the model in DSW
The following sample code runs a standalone job to fine-tune a Qwen-7B model on the Terminal tab:
export WORK_DIR=/mnt/workspace cd ${WORK_DIR}/Pai-Megatron-Patch/examples/qwen sh run_finetune_megatron_qwen_withGA.sh \ dsw \ ${WORK_DIR}/Pai-Megatron-Patch \ 7B \ 1 \ 96 \ 1e-5 \ 1e-6 \ 2048 \ 2048 \ 85 \ bf16 \ 1 \ 1 \ sel \ true \ false \ false \ false \ 1000 \ ${WORK_DIR}/qwen-datasets/alpaca_zh-qwen-train.json \ ${WORK_DIR}/qwen-datasets/alpaca_zh-qwen-valid.json \ ${WORK_DIR}/qwen-ckpts/qwen-7b-hf-to-megatron-tp1-pp1 \ 2000 \ 10 \ ${WORK_DIR}/output_megatron_qwen/
The following table describes the parameters that you must specify to run the run_finetune_megatron_qwen_withGA.sh script.
Parameter
Description
ENV=$1
The runtime environment. Valid values:
dlc
dsw
MEGATRON_PATH=$2
The directory of the source code of the Megatron-based training tool.
MODEL_SIZE=$3
The number of model parameters. Valid values: 7B, 14B, and 72B.
BATCH_SIZE=$4
The number of samples on each GPU for each fine-tuning iteration. Valid values: 1, 2, 4, and 8.
GLOBAL_BATCH_SIZE=$5
The total number of samples for fine-tuning iterations. Valid values: 64, 96, and 128.
LR=$6
The learning rate. Valid values: 1e-5 and 5e-5.
MIN_LR=$7
The minimum learning rate. Valid values: 1e-6 and 5e-6.
SEQ_LEN=$8
The length of the sequence.
PAD_LEN=$9
The length of the padding sequence.
EXTRA_VOCAB_SIZE=${10}
The size of the extra vocabulary. The size varies based on the number of model parameters.
Qwen-7B: 85
Qwen-14B: 213
Qwen-72B: 213
PR=${11}
The training precision. Valid values: fp16 and bf16.
TP=${12}
The size of tensor parallelism.
PP=${13}
The size of pipeline parallelism.
AC=${14}
The activation checkpointing mode. Valid values: full and sel.
DO=${15}
Specifies whether to use the ZeRO-1 optimizer for Megatron. Valid values:
true
false
FL=${16}
Specifies whether to enable Flash Attention. Valid values:
true
false
SP=${17}
Specifies whether to use sequence parallelism. Valid values:
true
false
TE=${18}
Specifies whether to enable the acceleration technology of Transformer Engine. If you want to enable this technology, gu8xf GPUs are required.
SAVE_INTERVAL=${19}
The interval at which the model is saved.
DATASET_PATH=${20}
The directory of the training set.
VALID_DATASET_PATH=${21}
The directory of the validation set.
PRETRAIN_CHECKPOINT_PATH=${22}
The directory of the pre-trained model.
TRAIN_ITERS=${23}
The number of training iterations.
LR_WARMUP_ITERS=${24}
The number of warm-up iterations for the learning rate.
OUTPUT_BASEPATH=${25}
The directory of the output model file generated after training.
Run a distributed job to fine-tune the model in DLC
After you fine-tune the model in DSW, you can configure a distributed job to fine-tune the model on multiple multi-GPU servers in DLC. When you submit a training job in DLC, enter the following commands for the Startup Command parameter. For more information about other parameters, see the Pre-train the model section of this topic.
export WORK_DIR=/mnt/workspace cd ${WORK_DIR}/Pai-Megatron-Patch/examples/qwen sh run_finetune_megatron_qwen_withGA.sh \ dlc \ ${WORK_DIR}/Pai-Megatron-Patch \ 7B \ 1 \ 96 \ 1e-5 \ 1e-6 \ 2048 \ 2048 \ 85 \ bf16 \ 1 \ 1 \ sel \ true \ false \ false \ false \ 1000 \ ${WORK_DIR}/qwen-datasets/alpaca_zh-qwen-train.json \ ${WORK_DIR}/qwen-datasets/alpaca_zh-qwen-valid.json \ ${WORK_DIR}/qwen-ckpts/qwen-7b-hf-to-megatron-tp1-pp1 \ 2000 \ 10 \ ${WORK_DIR}/output_megatron_qwen/
The parameters that you must specify to run the run_finetune_megatron_qwen_withGA.sh script are the same as those that you specify when you submit a standalone job to fine-tune the model in DSW.
Step 4: Use the model for offline inference
After the model is trained, you can perform offline inference by using the model based on Megatron to evaluate the effects of the model. Perform the following steps:
Download the pred_input.jsonl file that contains test samples and upload the file to the
/mnt/workspace
directory of DSW. For more information, see Upload or download data files.NoteThe data used for inference must be organized in the same way as that for fine-tuning.
Copy all the JSON files and the tokenizer.model file in the model directory before training to the directory of the output model file generated after training. Then, the files are placed in the
{OUTPUT_BASEPATH }/checkpoint
directory and in the same folder as the latest_checkpointed_iteration.txt file.NoteReplace the directories in the commands with your actual directories.
cd /mnt/workspace/qwen-ckpts/qwen-7b-hf-to-megatron-tp1-pp1 cp *.json /mnt/workspace/output_megatron_qwen/checkpoint/dswXXX/ cp tokenizer.model /mnt/workspace/output_megatron_qwen/checkpoint/dswXXX/
Run the following commands on the Terminal tab to perform offline inference by using the model. The inference results are generated in the
/mnt/workspace/qwen_pred.txt
file. You can evaluate the effects of the model based on the inference results.NoteBefore you run the commands, you must set the CUDA_VISIBLE_DEVICES parameter to 0 and the GPUS_PER_NODE parameter to 1 in the run_text_generation_megatron_qwen.sh script.
export WORK_DIR=/mnt/workspace cd ${WORK_DIR}/Pai-Megatron-Patch/examples/qwen bash run_text_generation_megatron_qwen.sh \ dsw \ ${WORK_DIR}/PAI-Megatron-Patch \ /mnt/workspace/output_megatron_qwen/checkpoint/dswXXX \ 7B \ 1 \ 1 \ 1024 \ 1024 \ 85 \ fp16 \ 10 \ 512 \ 512 \ ${WORK_DIR}/pred_input.jsonl \ ${WORK_DIR}/qwen_pred.txt \ 0 \ 1.0 \ 1.2
The following table describes the parameters that you must specify to run the run_text_generation_megatron_qwen.sh script.
Parameter
Description
ENV=$1
The runtime environment. Valid values:
dlc
dsw
MEGATRON_PATCH_PATH=$2
The directory of the Pai-Megatron-Patch folder.
CHECKPOINT_PATH=$3
The directory of the model during training.
ImportantReplace this directory with your actual model directory.
MODEL_SIZE=$4
The number of model parameters. Valid values: 7B, 14B, and 72B.
TP=$5
The size of tensor parallelism.
ImportantIf you set this parameter to 1, you can use a single GPU for inference.
If you set this parameter to a value greater than 1, you must use the corresponding number of GPUs for inference.
BS=$6
The number of samples on each GPU for each inference iteration. Valid values: 1, 4, and 8.
SEQ_LEN=$7
The length of the sequence. Valid values: 256, 512, and 1024.
PAD_LEN=$8
The length of the padding sequence, which is the length of the concatenated text.
EXTRA_VOCAB_SIZE=${9}
The number of tokens increased during model conversion. The number varies based on the number of model parameters.
Qwen-7B: 85
Qwen-14B: 213
Qwen-72B: 213
PR=${10}
The inference precision. Valid values: fp16 and bf16.
TOP_K=${11}
The number of top n candidate words to be selected. Valid values: 0 to n. Examples: 0, 5, 10, and 20.
INPUT_SEQ_LEN=${12}
The length of the input sequence. Set the value to 512.
OUTPUT_SEQ_LEN=${13}
The length of the output sequence. Set the value to 256.
INPUT_FILE=${14}
The file that contains the text to be used for inference. In this example, the pred_input.jsonl file is used, in which each line contains a sample.
OUTPUT_FILE=${15}
The output file generated after inference. In this example, the qwen_pred.txt file is used.
TOP_P=${16}
The percentage of top candidate words to be selected. Valid values: 0 to 1. Examples: 0, 0.85, and 0.95.
NoteYou must set one of the TOP_K and TOP_P parameters to 0.
TEMPERATURE=${17}
The randomness of the sampling process. Valid values: 1 to n.
REPETITION_PENALTY=${18}
The repetition penalty of the content generated by the model. Valid values: 1 to 2. Default value: 1.2.
Step 5: Convert the model format
If the effects of the model meet your expectations after offline inference is performed by using the model, you can convert the model format from Megatron to Hugging Face. Then, you can deploy the converted Hugging Face model as a model service.
Run the following commands on the Terminal tab to convert the model format from Megatron to Hugging Face:
export WORK_DIR=/mnt/workspace cd /mnt/workspace/Pai-Megatron-Patch/toolkits/model_checkpoints_convertor/qwen sh model_convertor.sh \ ../../../Megatron-LM-main \ ${WORK_DIR}/output_megatron_qwen/checkpoint/${Directory}/iter_******* \ /mnt/workspace/qwen-ckpts/qwen-7b-mg-to-hf-tp1-pp1/ \ 1 \ 1 \ qwen-7b \ 0 \ true
The following table describes the parameters that you must specify to run the model_convertor.sh script.
Parameter
Description
MEGATRON_PATH=$1
The directory of the source code of the Megatron-based training tool.
SOURCE_CKPT_PATH=$2
The directory of the trained model in the Megatron format, including the
iter_*
folder. Example:${WORK_DIR}/output_megatron_qwen/checkpoint/dsw-pretrain-megatron-qwen-7B-lr-1e-5-bs-1-seqlen-2048-pr-bf16-tp-1-pp-1-ac-sel-do-true-sp-false-tt--wt-/iter_*******
.ImportantReplace this directory with your actual model directory.
If you need to convert the format of a pre-trained model, you must delete all the distrib_optim.pt files in the model directory.
TARGET_CKPT_PATH=$3
The directory of the converted Hugging Face model.
TP=$4
The size of tensor parallelism, which must be the same as that for training.
PP=$5
The size of pipeline parallelism, which must be the same as that for training.
MN=$6
The name of the model, such as qwen-7b, qwen-14b, or qwen-72b.
EXTRA_VOCAB_SIZE=$7
The size of the extra vocabulary.
mg2hf=$8
Specifies whether to convert the model format from Megatron to Hugging Face.
Copy the
.json
,.py
, and.tiktoken
files in the/mnt/workspace/qwen-ckpts/qwen-7b-hf
directory of the open source Hugging Face model to the/mnt/workspace/qwen-ckpts/qwen-7b-mg-to-hf-tp1-pp1
directory to ensure that the model can be properly used.ImportantTake note that you do not need to copy the pytorch_model.bin.index.json file.
Step 6: Deploy the model as a model service and call the model service
After you perform offline inference and evaluate the effects of the model, you can deploy the converted Hugging Face model as an online model service and call the model service in the actual production environment to perform inference. Perform the following steps:
Deploy the model as a model service
Go to the EAS-Online Model Services page.
Log on to the PAI console.
In the left-side navigation pane, click Workspaces. On the Workspaces page, click the name of the workspace to which you want to deploy the model.
In the left-side navigation pane, choose Model Deployment > Elastic Algorithm Service (EAS) to go to the Elastic Algorithm Service (EAS) page.
On the Elastic Algorithm Service (EAS) page, click Deploy Service. In the dialog box that appears, select Custom Deployment and click OK.
On the Create Service page, configure the parameters that are described in the following table. You can use the default values for other parameters.
Parameter
Description
Model Service Information
Service Name
The custom name of the model service. The name must be unique in a region. In this example, the value is set to test_qwen.
Deployment Method
In this example, Deploy Web App by Using Image is selected.
Select Image
Select Image Address, enter
pai-image-manage-registry.cn-wulanchabu.cr.aliyuncs.com/pai/llm-inference:vllm-0.2.1-v4
in the field, and then read and agree to the Machine Learning Platform for AI Terms of Service by selecting the check box.Model Settings
Click Specify Model Settings and configure the model directory.
Select Mount NAS File System and configure the following parameters:
NAS Mount Target: the General-purpose NAS file system and mount target based on which the dataset is created.
NAS Source Path: the directory of the converted Hugging Face model that is stored in the NAS file system. In this example, the
/qwen-ckpts/qwen-7b-mg-to-hf-tp1-pp1
directory is used.Mount Path: the mount directory of the model. In this example, the value is set to
/qwen-7b
.
Command to Run
In this example, the following command is run:
nohup python -m fastchat.serve.controller > tmp1.log 2>&1 & python -m fastchat.serve.gradio_web_server_pai --model-list-mode reload > tmp2.log 2>&1 & python -m fastchat.serve.vllm_worker --model-path /qwen-7b --tensor-parallel-size 1 --trust-remote-code
.Where:
--model-path: the mount directory of the model, which must be the same as that in the model settings.
--tensor-parallel-size: the size of tensor parallelism, which must be adjusted based on the number of GPUs. For example, set this parameter to 1 for a Qwen-7B model or 2 for a Qwen-72B model.
Port number: In this example, port 7860 is used.
Resource Deployment Information
Resource Group Type
In this example, Intelligent Computing Lingjun Resources is selected.
Select Quota
Select the resource quota that is created for the purchased Lingjun resources.
Instance Count
Configure the parameters based on the model and the selected resources. For a Qwen-7B model, set the Instance Count parameter to 1 and select the instance type based on the following resource specifications:
vCPUs: 16
Memory: 64,000 MB
GPUs: 1
VPC Settings
VPC
After you configure the NAS Mount Target parameter, the system automatically matches the virtual private cloud (VPC), vSwitch, and security group of the specified NAS file system.
vSwitch
Security Group Name
Click Deploy.
If the state of the service changes to Running, the service is deployed.
Call the model service
After the model service is deployed, you can call the service to perform inference. Perform the following steps:
On the Inference Service tab, find the service that you want to call and click View Web App in the Service Type column.
On the WebUI page, perform inference.