All Products
Search
Document Center

Platform For AI:Develop Qwen models in PAI-Lingjun AI Computing Service

Last Updated:Oct 31, 2024

This topic aims to help foundation model developers get started with PAI-Lingjun AI Computing Service and develop the foundation models of Qwen-7B, Qwen-14B, and Qwen-72B. The development process includes distributed training, fine-tuning, offline inference, and online deployment. In this example, a Qwen-7B model is used to describe the best practice for developing a Qwen model in PAI-Lingjun AI Computing Service.

Prerequisites

In this example, Qwen-7B V1.1.4 is used. Before you start, make sure that the following prerequisites are met:

  • Platform for AI (PAI) is activated, including Data Science Workshop (DSW), Deep Learning Containers (DLC), and Elastic Algorithm Service (EAS). The default workspace is created. For more information, see Activate PAI and create a default workspace.

  • Lingjun resources are purchased, and a resource quota is created for the purchased Lingjun resources. The following table describes the resource specifications that are supported by different numbers of model parameters. Select appropriate resource specifications based on your actual number of model parameters. For more information about the node specifications of Lingjun resources, see the Pricing of nodes section of the "Billing of Lingjun resources (Serverless Edition)" topic. For more information, see Create a resource group and purchase Lingjun resources and Lingjun resource quotas.

    Number of model parameters

    Full-parameter training resources

    Minimum inference resources

    Model parallelism for Megatron-based training

    7 billion

    Eight gu7xf GPUs or eight gu7ef GPUs

    One NVIDIA V100 GPU (32 GB of memory) or one NVIDIA A10 GPU (24 GB of memory)

    TP1 and PP1

    14 billion

    Eight gu7xf GPUs or eight gu7ef GPUs

    Two NVIDIA V100 GPUs (32 GB of memory) or two NVIDIA A10 GPUs (24 GB of memory)

    TP2 and PP1

    72 billion

    Four servers, each with eight gu7xf GPUs or eight gu7ef GPUs

    Six NVIDIA V100 GPUs (32 GB of memory) or two gu7xf GPUs

    TP8 and PP2

  • A dataset is created based on a General-purpose NAS file system of File Storage NAS to store the files and result files required for training. The default mount directory is /mnt/data/nas. For more information, see Create and manage datasets.

  • A DSW instance is created based on the following key parameters. For more information, see Create a DSW instance.

    • Resource Quota: Select the resource quota that is created for the purchased Lingjun resources.

    • Instance Type: Configure the following resource specifications:

      • vCPUs: 90

      • Memory (GiB): 1024

      • Shared Memory (GiB): 1024

      • GPUs: at least 8

    • Mount Settings: Click Add, select the created dataset, and then specify the default mount directory.

    • Image: Click Image Address and enter the following image URL: pai-image-manage-registry.cn-wulanchabu.cr.aliyuncs.com/pai/pytorch-training:1.12-ubuntu20.04-py3.10-cuda11.3-megatron-patch-llm.

  • A Resource Access Management (RAM) user is granted the required permissions on DSW, DLC, and EAS if you perform the operations in this best practice as the RAM user. For more information, see Grant the permissions that are required to use DSW, Grant the permissions that are required to use DLC, and Grant the permissions that are required to use EAS.

Limits

This best practice is supported only in the China (Ulanqab) region.

Step 1: Prepare a Qwen model

You can download a model by using one of the methods described in this best practice. Perform the following steps:

  1. Go to the development environment of DSW.

    1. Log on to the PAI console.

    2. In the upper-left corner of the page, select the China (Ulanqab) region.

    3. In the left-side navigation pane, click Workspaces. On the Workspaces page, click the name of the workspace that you want to manage.

    4. In the left-side navigation pane, choose Model Training > Data Science Workshop (DSW).

    5. Find the DSW instance that you want to manage and click Open in the Actions column.

  2. In the top navigation bar, click Terminal. On this tab, click Create Terminal or the plus (+) icon in the upper-right corner.

  3. Download a Qwen model.

    Download a model from the ModelScope community

    1. Run the following command on the Terminal tab to install ModelScope:

    2. pip install modelscope

      View the returned results. You can ignore the WARNING information in the returned results.

      Looking in indexes: https://mirrors.cloud.aliyuncs.com/pypi/simple
      Collecting modelscope
        Downloading https://mirrors.cloud.aliyuncs.com/pypi/packages/ac/05/75b5d750608d7354dc3dd023dca7101e5f3b4645cb3e5b816536d472a058/modelscope-1.9.5-py3-none-any.whl (5.4 MB)
           ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 5.4/5.4 MB 104.7 MB/s eta 0:00:00
      Requirement already satisfied: pyyaml in /opt/*/lib/python3.8/site-packages (from modelscope) (5.4.1)
      Requirement already satisfied: pandas in /opt/*/lib/python3.8/site-packages (from modelscope) (1.5.3)
      Requirement already satisfied: addict in /opt/*/lib/python3.8/site-packages (from modelscope) (2.4.0)
      Requirement already satisfied: numpy in /opt/*/lib/python3.8/site-packages (from modelscope) (1.22.2)
      Collecting simplejson>=3.3.0
        Downloading https://mirrors.cloud.aliyuncs.com/pypi/packages/33/5f/b9506e323ea89737b34c97a6eda9d22ad6b771190df93f6eb72657a3b996/simplejson-3.19.2-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (136 kB)
           ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 136.6/136.6 kB 70.2 MB/s eta 0:00:00
      Collecting gast>=0.2.2
        Downloading https://mirrors.cloud.aliyuncs.com/pypi/packages/fa/39/5aae571e5a5f4de9c3445dae08a530498e5c53b0e74410eeeb0991c79047/gast-0.5.4-py3-none-any.whl (19 kB)
      Requirement already satisfied: Pillow>=6.2.0 in /opt/*/lib/python3.8/site-packages (from modelscope) (9.3.0)
      Requirement already satisfied: oss2 in /opt/*/lib/python3.8/site-packages (from modelscope) (2.17.0)
      Requirement already satisfied: filelock>=3.3.0 in /opt/*/lib/python3.8/site-packages (from modelscope) (3.11.0)
      Requirement already satisfied: urllib3>=1.26 in /opt/*/lib/python3.8/site-packages (from modelscope) (1.26.12)
      Requirement already satisfied: datasets<=2.13.0,>=2.8.0 in /opt/*/lib/python3.8/site-packages (from modelscope) (2.11.0)
      Requirement already satisfied: attrs in /opt/*/lib/python3.8/site-packages (from modelscope) (22.2.0)
      Requirement already satisfied: scipy in /opt/*/lib/python3.8/site-packages (from modelscope) (1.9.3)
      Requirement already satisfied: yapf in /opt/*/lib/python3.8/site-packages (from modelscope) (0.32.0)
      Requirement already satisfied: pyarrow!=9.0.0,>=6.0.0 in /opt/*/lib/python3.8/site-packages (from modelscope) (11.0.0)
      Requirement already satisfied: setuptools in /opt/*/lib/python3.8/site-packages (from modelscope) (65.5.0)
      Requirement already satisfied: requests>=2.25 in /opt/*/lib/python3.8/site-packages (from modelscope) (2.28.1)
      Requirement already satisfied: einops in /opt/*/lib/python3.8/site-packages (from modelscope) (0.6.0)
      Requirement already satisfied: python-dateutil>=2.1 in /opt/*/lib/python3.8/site-packages (from modelscope) (2.8.2)
      Collecting sortedcontainers>=1.5.9
        Downloading https://mirrors.cloud.aliyuncs.com/pypi/packages/32/46/9cb0e58b2deb7f82b84065f37f3bffeb12413f947f9388e4cac22c4621ce/sortedcontainers-2.4.0-py2.py3-none-any.whl (29 kB)
      Requirement already satisfied: tqdm>=4.64.0 in /opt/*/lib/python3.8/site-packages (from modelscope) (4.65.0)
      Requirement already satisfied: dill<0.3.7,>=0.3.0 in /opt/*/lib/python3.8/site-packages (from datasets<=2.13.0,>=2.8.0->modelscope) (0.3.6)
      Requirement already satisfied: multiprocess in /opt/*/lib/python3.8/site-packages (from datasets<=2.13.0,>=2.8.0->modelscope) (0.70.14)
      Requirement already satisfied: aiohttp in /opt/*/lib/python3.8/site-packages (from datasets<=2.13.0,>=2.8.0->modelscope) (3.8.4)
      Requirement already satisfied: responses<0.19 in /opt/*/lib/python3.8/site-packages (from datasets<=2.13.0,>=2.8.0->modelscope) (0.18.0)
      Requirement already satisfied: huggingface-hub<1.0.0,>=0.11.0 in /opt/*/lib/python3.8/site-packages (from datasets<=2.13.0,>=2.8.0->modelscope) (0.16.4)
      Requirement already satisfied: fsspec[http]>=2021.11.1 in /opt/*/lib/python3.8/site-packages (from datasets<=2.13.0,>=2.8.0->modelscope) (2023.4.0)
      Requirement already satisfied: packaging in /opt/*/lib/python3.8/site-packages (from datasets<=2.13.0,>=2.8.0->modelscope) (21.3)
      Requirement already satisfied: xxhash in /opt/*/lib/python3.8/site-packages (from datasets<=2.13.0,>=2.8.0->modelscope) (3.2.0)
      Requirement already satisfied: six>=1.5 in /opt/*/lib/python3.8/site-packages (from python-dateutil>=2.1->modelscope) (1.16.0)
      Requirement already satisfied: certifi>=2017.4.17 in /opt/*/lib/python3.8/site-packages (from requests>=2.25->modelscope) (2022.9.24)
      Requirement already satisfied: charset-normalizer<3,>=2 in /opt/*/lib/python3.8/site-packages (from requests>=2.25->modelscope) (2.0.4)
      Requirement already satisfied: idna<4,>=2.5 in /opt/*/lib/python3.8/site-packages (from requests>=2.25->modelscope) (3.4)
      Requirement already satisfied: aliyun-python-sdk-kms>=2.4.1 in /opt/*/lib/python3.8/site-packages (from oss2->modelscope) (2.16.0)
      Requirement already satisfied: aliyun-python-sdk-core>=2.13.12 in /opt/*/lib/python3.8/site-packages (from oss2->modelscope) (2.13.36)
      Requirement already satisfied: crcmod>=1.7 in /opt/*/lib/python3.8/site-packages (from oss2->modelscope) (1.7)
      Requirement already satisfied: pycryptodome>=3.4.7 in /opt/*/lib/python3.8/site-packages (from oss2->modelscope) (3.15.0)
      Requirement already satisfied: pytz>=2020.1 in /opt/*/lib/python3.8/site-packages (from pandas->modelscope) (2022.7.1)
      Requirement already satisfied: cryptography>=2.6.0 in /opt/*/lib/python3.8/site-packages (from aliyun-python-sdk-core>=2.13.12->oss2->modelscope) (38.0.3)
      Requirement already satisfied: jmespath<1.0.0,>=0.9.3 in /opt/*/lib/python3.8/site-packages (from aliyun-python-sdk-core>=2.13.12->oss2->modelscope) (0.10.0)
      Requirement already satisfied: async-timeout<5.0,>=4.0.0a3 in /opt/*/lib/python3.8/site-packages (from aiohttp->datasets<=2.13.0,>=2.8.0->modelscope) (4.0.2)
      Requirement already satisfied: yarl<2.0,>=1.0 in /opt/*/lib/python3.8/site-packages (from aiohttp->datasets<=2.13.0,>=2.8.0->modelscope) (1.8.2)
      Requirement already satisfied: frozenlist>=1.1.1 in /opt/*/lib/python3.8/site-packages (from aiohttp->datasets<=2.13.0,>=2.8.0->modelscope) (1.3.3)
      Requirement already satisfied: multidict<7.0,>=4.5 in /opt/*/lib/python3.8/site-packages (from aiohttp->datasets<=2.13.0,>=2.8.0->modelscope) (6.0.4)
      Requirement already satisfied: aiosignal>=1.1.2 in /opt/*/lib/python3.8/site-packages (from aiohttp->datasets<=2.13.0,>=2.8.0->modelscope) (1.3.1)
      Requirement already satisfied: typing-extensions>=3.7.*.* in /opt/*/lib/python3.8/site-packages (from huggingface-hub<1.0.0,>=0.11.0->datasets<=2.13.0,>=2.8.0->modelscope) (4.4.0)
      Requirement already satisfied: pyparsing!=3.0.5,>=2.0.2 in /opt/*/lib/python3.8/site-packages (from packaging->datasets<=2.13.0,>=2.8.0->modelscope) (3.0.9)
      Requirement already satisfied: cffi>=1.12 in /opt/*/lib/python3.8/site-packages (from cryptography>=2.6.0->aliyun-python-sdk-core>=2.13.12->oss2->modelscope) (1.15.1)
      Requirement already satisfied: pycparser in /opt/*/lib/python3.8/site-packages (from cffi>=1.12->cryptography>=2.6.0->aliyun-python-sdk-core>=2.13.12->oss2->modelscope) (2.21)
      Installing collected packages: sortedcontainers, simplejson, gast, modelscope
      Successfully installed gast-0.5.4 modelscope-1.9.5 simplejson-3.19.2 sortedcontainers-2.4.0
      WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
    3. Run the following command to go to the Python environment:

    4. python
    5. The following sample code downloads the package of a Qwen-7B model:

    6. # ### Loading Model and Tokenizer
      from modelscope.hub.snapshot_download import snapshot_download
      model_dir = snapshot_download('qwen/Qwen-7B', 'v1.1.4')
      # model_dir = snapshot_download('qwen/Qwen-14B', 'v1.0.4')
      # model_dir = snapshot_download('qwen/Qwen-72B')
      # Display the directory of the downloaded model.
      print(model_dir)
      # /root/.cache/modelscope/hub/qwen/Qwen-7B
    7. Press Ctrl+D to exit the Python environment.

    8. Run the following commands to move the downloaded model to the corresponding folder:

    9. # mkdir -p /mnt/workspace/qwen-ckpts/${The ckpt folder with the hf suffix}
      mkdir -p /mnt/workspace/qwen-ckpts/qwen-7b-hf
      # cp -r ${The directory of the downloaded model}/* /mnt/workspace/qwen-ckpts/${The ckpt folder with the hf suffix}
      cp -r /root/.cache/modelscope/hub/qwen/Qwen-7B/* /mnt/workspace/qwen-ckpts/qwen-7b-hf

    Download a model from the Hugging Face community

    Run the following commands on the Terminal tab of DSW to download the package of a model. In this example, the package of a Qwen-7B model is downloaded. If you want to download the package of a Qwen-14B or Qwen-72B model, modify the following sample code based on your business requirements:

    mkdir /mnt/workspace/qwen-ckpts
    cd /mnt/workspace/qwen-ckpts
    git clone https://huggingface.co/Qwen/Qwen-7B
    # git clone https://huggingface.co/Qwen/Qwen-7B-Chat
    # git clone https://huggingface.co/Qwen/Qwen-14B
    # git clone https://huggingface.co/Qwen/Qwen-14B-Chat
    # git clone https://huggingface.co/Qwen/Qwen-72B
    # git clone https://huggingface.co/Qwen/Qwen-72B-Chat

Step 2: Prepare data for pre-training

We recommend that you prepare the data used for pre-training in the DSW instance. In this example, the WuDaoCorpora 2.0 dataset is used to describe how to preprocess data for Megatron-based training. This dataset is used only for research. You can directly download the small-scale sample data processed by PAI. You can also prepare the data used for pre-training on your own.

Use the small-scale sample data processed by PAI

To help you use this best practice, PAI provides the processed small-scale sample data. You can run the following commands on the Terminal tab of DSW to download the sample data:

mkdir /mnt/workspace/qwen-datasets/
cd /mnt/workspace/qwen-datasets
wget https://atp-modelzoo-wlcb-pai.oss-cn-wulanchabu.aliyuncs.com/release/models/pai-megatron-patch/qwen-datasets/alpaca_zh-qwen-train.json
wget https://atp-modelzoo-wlcb-pai.oss-cn-wulanchabu.aliyuncs.com/release/models/pai-megatron-patch/qwen-datasets/alpaca_zh-qwen-valid.json
mkdir -p /mnt/workspace/qwen-datasets/wudao
cd /mnt/workspace/qwen-datasets/wudao
wget https://atp-modelzoo-wlcb-pai.oss-cn-wulanchabu.aliyuncs.com/release/models/pai-megatron-patch/qwen-datasets/wudao_qwenbpe_content_document.bin
wget https://atp-modelzoo-wlcb-pai.oss-cn-wulanchabu.aliyuncs.com/release/models/pai-megatron-patch/qwen-datasets/wudao_qwenbpe_content_document.idx

Process data on your own

  1. Download the open source WuDaoCorpora 2.0 dataset to the /mnt/workspace/qwen-datasets working directory. In this example, the extracted folder is named wudao_200g.

    The small-scale sample data processed by PAI is also sourced from this dataset. You can run the following commands on the Terminal tab of DSW to download and decompress the dataset:

    mkdir /mnt/workspace/qwen-datasets
    cd /mnt/workspace/qwen-datasets
    wget https://atp-modelzoo.oss-cn-hangzhou.aliyuncs.com/release/datasets/WuDaoCorpus2.0_base_sample.tgz
    tar zxvf WuDaoCorpus2.0_base_sample.tgz 
    mv WuDaoCorpus2.0_base_sample wudao_200g
  2. Run the following commands on the Terminal tab to perform data cleansing on the WuDaoCorpora 2.0 dataset, convert the file format, and then generate the merged_wudao_cleaned.json file:

    #! /bin/bash
    set -ex
    # Specify the directory of the WuDaoCorpora 2.0 dataset. 
    data_dir=/mnt/workspace/qwen-datasets/wudao_200g
    
    # Start the data cleansing process. 
    dataset_dir=$(dirname $data_dir)
    mkdir -p ${dataset_dir}/cleaned_wudao_dataset
    cd ${dataset_dir}/cleaned_wudao_dataset
    wget https://atp-modelzoo-wlcb-pai.oss-cn-wulanchabu.aliyuncs.com/release/models/pai-megatron-patch/llama2-codes/preprocess_wudao2.py
    # Set the -k option to text. 
    python preprocess_wudao2.py -i ${data_dir} -o ${dataset_dir}/cleaned_wudao_dataset -k text -p 32
    
    # Merge the cleansed data. 
    mkdir ${dataset_dir}/wudao
    cd ${dataset_dir}/wudao
    find ${dataset_dir}/cleaned_wudao_dataset -name "*.json" -exec cat {} + > ${dataset_dir}/wudao/merged_wudao_cleaned.json
    rm -rf ${dataset_dir}/cleaned_wudao_dataset
    

    The following sample code shows the structure of the qwen-datasets directory after the preceding commands are run. The wudao folder is created.

    qwen-datasets
    ├── wudao_200g 
    └── wudao
        └── merged_wudao_cleaned.json
  3. Run the following commands on the Terminal tab to split data into several groups and compress the data of each group by processing the generated merged_wudao_cleaned.json file. This facilitates multithreading in subsequent operations.

    apt-get update
    apt-get install zstd
    
    # Split data into 10 groups. If data processing is slow, you can split data into more groups. 
    NUM_PIECE=10
    
    # Process the merged_wudao_cleaned.json file. 
    mkdir -p ${dataset_dir}/cleaned_zst/
    # Query the total length of data and split the data. 
    NUM=$(sed -n '$=' ${dataset_dir}/wudao/merged_wudao_cleaned.json)
    echo "total line of dataset is $NUM, data will be split into $NUM_PIECE pieces for processing"
    NUM=`expr $NUM / $NUM_PIECE`
    echo "each group is processing $NUM sample"
    split_dir=${dataset_dir}/split
    mkdir $split_dir
    split -l $NUM --numeric-suffixes --additional-suffix=.jsonl ${dataset_dir}/wudao/merged_wudao_cleaned.json $split_dir/
    
    # Compress the data of each group. 
    o_path=${dataset_dir}/cleaned_zst/
    mkdir -p $o_path
    files=$(ls $split_dir/*.jsonl)
    for filename in $files
    do
       f=$(basename $filename)
       zstd -z $filename -o $o_path/$f.zst &
    done
    rm -rf $split_dir
    rm ${dataset_dir}/wudao/merged_wudao_cleaned.json
    

    The following sample code shows the structure of the qwen-datasets directory after the preceding commands are run. The cleaned_zst folder is created and contains 10 compressed files.

    qwen-datasets
    ├── wudao_200g
    ├── wudao
    └── cleaned_zst
        ├── 00.jsonl.zst
    		│   ...
        └── 09.jsonl.zst
  4. Generate the dataset used for pre-training in the MMAP format.

    MMAP is a file format in which data is tokenized in advance. It reduces the amount of time required to read data from the dataset during training and fine-tuning, especially when you process large amounts of data. Perform the following steps:

    1. Run the following commands on the Terminal tab of DSW to copy the PAI-Megatron-Patch file that contains the source code of the Megatron-based training tool to the /mnt/workspace/ working directory of DSW:

      cd /mnt/workspace/
      # Method 1: Obtain the source code of the training tool from GitHub. 
      git clone --recurse-submodules https://github.com/alibaba/Pai-Megatron-Patch.git
      # Method 2: Obtain the source code of the training tool by running the wget command. Then, run the tar zxvf Pai-Megatron-Patch.tgz command to decompress the downloaded file. 
      wget https://atp-modelzoo.oss-cn-hangzhou.aliyuncs.com/release/models/Pai-Megatron-Patch.tgz
    2. Run the following commands on the Terminal tab to convert the dataset to the MMAP format:

      After the commands are run, the .bin and .idx files are generated in the /mnt/workspace/qwen-datasets/wudao directory.

      # Install the tokenizer library on which Qwen depends. 
      pip install tiktoken
      # Specify the directory of the dataset and the working directory. 
      export dataset_dir=/mnt/workspace/qwen-datasets
      export WORK_DIR=/mnt/workspace
      
      # Generate the training set and validation set used for pre-training in the MMAP format. 
      cd ${WORK_DIR}/Pai-Megatron-Patch/toolkits/pretrain_data_preprocessing
      bash run_make_pretraining_dataset.sh \
      ../../Megatron-LM-23.04 \
      ${WORK_DIR}/Pai-Megatron-Patch/ \
      ${dataset_dir}/cleaned_zst/ \
      qwenbpe \
      ${dataset_dir}/wudao/ \
      ${WORK_DIR}/qwen-ckpts/qwen-7b-hf
      rm -rf ${dataset_dir}/cleaned_zst

      The following table describes the six parameters that you must specify to run the run_make_pretraining_dataset.sh script.

      Parameter

      Description

      MEGATRON_PATH=$1

      The directory of the source code of the Megatron-based training tool.

      MEGATRON_PATCH_PATH=$2

      The directory of the Pai-Megatron-Patch folder.

      input_data_dir=$3

      The directory of the processed and packaged WuDaoCorpora 2.0 dataset.

      tokenizer=$4

      The type of the tokenizer. In this example, the value is set to qwenbpe.

      output_data_dir=$5

      The directory of the generated .bin and .idx files.

      load_dir=$6

      The directory of the generated tokenizer_config.json file.

      The following sample code shows the structure of the qwen-datasets directory after the script is run:

      qwen-datasets
      ├── wudao_200g
      └── wudao
         ├── wudao_qwenbpe_content_document.bin
         └── wudao_qwenbpe_content_document.idx

Step 3: Perform Megatron-based training

You can perform the following operations to perform Megatron-based training:

Convert the model format

You must convert the model format from Hugging Face to Megatron.

Download the converted Megatron model

To help you use this best practice, PAI provides the model whose format has been converted. You can run the following commands on the Terminal tab to download the model:

cd /mnt/workspace/
mkdir qwen-ckpts
cd qwen-ckpts
wget https://atp-modelzoo-wlcb-pai.oss-cn-wulanchabu.aliyuncs.com/release/models/pai-megatron-patch/qwen-ckpts/qwen-7b-hf-to-mg-tp1-pp1.tgz
tar -zxf qwen-7b-hf-to-mg-tp1-pp1.tgz
mv qwen-7b-hf-to-mg-tp1-pp1 qwen-7b-hf-to-megatron-tp1-pp1

Convert the model format from Hugging Face to Megatron

Run the following commands on the Terminal tab to use the model conversion tool provided by PAI to convert the model format from Hugging Face to Megatron:

# Convert the model format. 
cd /mnt/workspace/Pai-Megatron-Patch/toolkits/model_checkpoints_convertor/qwen
sh model_convertor.sh \
../../../Megatron-LM-main        \
/mnt/workspace/qwen-ckpts/qwen-7b-hf         \
/mnt/workspace/qwen-ckpts/qwen-7b-hf-to-megatron-tp1-pp1  \
1  \
1  \
qwen-7b \
0 \
false

The following table describes the parameters that you must specify to run the model_convertor.sh script.

Parameter

Description

MEGATRON_PATH=$1

The directory of the source code of the Megatron-based training tool.

SOURCE_CKPT_PATH=$2

The directory of the Hugging Face model.

TARGET_CKPT_PATH=$3

The directory of the converted Megatron model.

TP=$4

The size of tensor parallelism, which must be the same as that for training. The size varies based on the number of model parameters. You must modify the size when you convert the model format.

  • Qwen-7B: 1

  • Qwen-14B: 2

  • Qwen-72B: 8

PP=$5

The size of pipeline parallelism, which must be the same as that for training. The size varies based on the number of model parameters. You must modify the size when you convert the model format.

  • Qwen-7B: 1

  • Qwen-14B: 1

  • Qwen-72B: 2

MN=$6

The name of the model, such as qwen-7b, qwen-14b, or qwen-72b.

EXTRA_VOCAB_SIZE=$7

The size of the extra vocabulary.

mg2hf=$8

Specifies whether to convert the model format from Megatron to Hugging Face.

Pre-train the model

You can submit a standalone job to train the model in DSW, or submit a distributed job to train the model on multiple multi-GPU servers in DLC. The training process lasts about 2 hours. After the job is run, a model file is exported to the /mnt/workspace/output_megatron_qwen/ directory.

Run a standalone job to pre-train the model in DSW

The following sample code runs a standalone job to train a Qwen-7B model on the Terminal tab:

export WORK_DIR=/mnt/workspace
cd ${WORK_DIR}/Pai-Megatron-Patch/examples/qwen
sh run_pretrain_megatron_qwen.sh  \
dsw  \
${WORK_DIR}/Pai-Megatron-Patch  \
7B   \
1    \
8 \
1e-5   \
1e-6   \
2048  \
2048  \
85   \
fp16  \
1   \
1  \
sel  \
true   \
false  \
false   \
false  \
100000  \
${WORK_DIR}/qwen-datasets/wudao/wudao_qwenbpe_content_document   \
${WORK_DIR}/qwen-ckpts/qwen-7b-hf-to-megatron-tp1-pp1   \
100000000   \
10000   \
${WORK_DIR}/output_megatron_qwen/   

The following table describes the parameters that you must specify to run the run_pretrain_megatron_qwen.sh script.

Parameter

Description

ENV=$1

The runtime environment. Valid values:

  • dsw

  • dlc

MEGATRON_PATH=$2

The directory of the source code of the Megatron-based training tool.

MODEL_SIZE=$3

The number of model parameters. Valid values: 7B, 14B, and 72B.

BATCH_SIZE=$4

The number of samples on each GPU for each training iteration. Valid values: 4 and 8.

GLOBAL_BATCH_SIZE=$5

The total number of samples for training iterations.

LR=$6

The learning rate. Valid values: 1e-5 and 5e-5.

MIN_LR=$7

The minimum learning rate. Valid values: 1e-6 and 5e-6.

SEQ_LEN=$8

The length of the sequence.

PAD_LEN=${9}

The length of the padding sequence.

EXTRA_VOCAB_SIZE=${10}

The size of the extra vocabulary. The size varies based on the number of model parameters.

  • Qwen-7B: 85

  • Qwen-14B: 213

  • Qwen-72B: 213

PR=${11}

The training precision. Valid values: fp16 and bf16.

TP=${12}

The size of tensor parallelism.

PP=${13}

The size of pipeline parallelism.

AC=${14}

The activation checkpointing mode. Valid values:

  • full

  • sel

DO=${15}

Specifies whether to use the ZeRO-1 optimizer for Megatron. Valid values:

  • true

  • false

FL=${16}

Specifies whether to enable Flash Attention. Valid values:

  • true

  • false

SP=${17}

Specifies whether to use sequence parallelism. Valid values:

  • true

  • false

TE=${18}

Specifies whether to enable the acceleration technology of Transformer Engine. If you want to enable this technology, gu8xf GPUs are required.

SAVE_INTERVAL=${19}

The interval at which the checkpoint file is saved.

DATASET_PATH=${20}

The directory of the training set.

PRETRAIN_CHECKPOINT_PATH=${21}

The directory of the pre-trained model.

TRAIN_TOKENS=${22}

The number of tokens for training.

WARMUP_TOKENS=${23}

The number of tokens for warm-up.

OUTPUT_BASEPATH=${24}

The directory of the output model file generated after training.

Run a distributed job to pre-train the model in DLC

After you train the model in DSW, you can configure a distributed job to train the model on multiple multi-GPU servers in DLC. Perform the following steps:

  1. Go to the Create Job page.

    1. Log on to the PAI console.

    2. In the left-side navigation pane, click Workspaces. On the Workspaces page, click the name of the workspace that you want to manage.

    3. In the left-side navigation pane, choose Model Training > Deep Learning Containers (DLC). On the Deep Learning Containers (DLC) page, click Create Job. The Create Job page appears.

  2. On the Create Job page, configure the parameters that are described in the following table. You can use the default values for other parameters. For more information, see Submit training jobs.

    Parameter

    Description

    Basic Information

    Job Name

    The name of the training job. In this example, the value is set to test_qwen_dlc.

    Environment Information

    Node Image

    Click Image Address and enter the following image URL in the field: pai-image-manage-registry.cn-wulanchabu.cr.aliyuncs.com/pai/pytorch-training:1.12-ubuntu20.04-py3.10-cuda11.3-megatron-patch-llm.

    Mount Settings

    Click Add. Select Custom Dataset as Mount Type and configure the following parameters:

    • Datasets: Select the dataset created based on the General-purpose NAS file system of File Storage NAS.

    • Mount Path: Enter /mnt/workspace/.

    Startup Command

    Enter the following commands. The parameters that you must specify to run the run_pretrain_megatron_qwen.sh script are the same as those that you specify when you submit a standalone job to train the model in DSW.

    export WORK_DIR=/mnt/workspace
    cd ${WORK_DIR}/Pai-Megatron-Patch/examples/qwen
    sh run_pretrain_megatron_qwen.sh  \
    dlc  \
    ${WORK_DIR}/PAI-Megatron-Patch  \
    7B   \
    1    \
    8 \
    1e-5   \
    1e-6   \
    2048  \
    2048  \
    85   \
    fp16  \
    1   \
    1  \
    sel  \
    true   \
    false  \
    false   \
    false \
    100000  \
    ${WORK_DIR}/qwen-datasets/wudao/wudao_qwenbpe_content_document   \
    ${WORK_DIR}/qwen-ckpts/qwen-7b-hf-to-megatron-tp1-pp1   \
    100000000   \
    10000   \
    ${WORK_DIR}/output_megatron_qwen/    

    Resource Information

    Resource Type

    Select Lingjun Resources.

    Source

    Select Resource Quota.

    Resource Quota

    Select the resource quota that is created for the purchased Lingjun resources.

    Framework

    Select PyTorch.

    Job Resource

    Configure the following parameters for worker nodes:

    • Number of Nodes: Enter 2. If you want to train the model on more servers, you can increase the value of the Number of Nodes parameter.

    • GPUs: Enter 8.

    • vCPUs: Enter 90.

      Note

      The number of CPU cores cannot be greater than 96.

    • Memory (GiB): Enter 1024.

    • Shared Memory (GiB): Enter 1024.

  3. Click OK. You are navigated to the Deep Learning Containers (DLC) page. If the state of the job changes to Succeeded, the training job is run.

Perform supervised fine-tuning

You can submit a standalone job to fine-tune the model in DSW, or submit a distributed job to fine-tune the model on multiple multi-GPU servers in DLC. The fine-tuning process lasts about 2 hours. After the job is run, a model file is exported to the /mnt/workspace/output_megatron_qwen/ directory.

  1. Before you fine-tune the model, go to Step 2: Prepare data for pre-training. Use the sample code on the Use the small-scale sample data processed by PAI tab to download the JSON files.

  2. Fine-tune the model.

    Run a standalone job to fine-tune the model in DSW

    The following sample code runs a standalone job to fine-tune a Qwen-7B model on the Terminal tab:

    export WORK_DIR=/mnt/workspace
    cd ${WORK_DIR}/Pai-Megatron-Patch/examples/qwen
    sh run_finetune_megatron_qwen_withGA.sh  \
    dsw  \
    ${WORK_DIR}/Pai-Megatron-Patch  \
    7B     \
    1      \
    96 \
    1e-5   \
    1e-6   \
    2048   \
    2048     \
    85      \
    bf16   \
    1      \
    1      \
    sel    \
    true   \
    false  \
    false  \
    false \
    1000 \
    ${WORK_DIR}/qwen-datasets/alpaca_zh-qwen-train.json   \
    ${WORK_DIR}/qwen-datasets/alpaca_zh-qwen-valid.json   \
    ${WORK_DIR}/qwen-ckpts/qwen-7b-hf-to-megatron-tp1-pp1   \
    2000   \
    10 \
    ${WORK_DIR}/output_megatron_qwen/

    The following table describes the parameters that you must specify to run the run_finetune_megatron_qwen_withGA.sh script.

    Parameter

    Description

    ENV=$1

    The runtime environment. Valid values:

    • dlc

    • dsw

    MEGATRON_PATH=$2

    The directory of the source code of the Megatron-based training tool.

    MODEL_SIZE=$3

    The number of model parameters. Valid values: 7B, 14B, and 72B.

    BATCH_SIZE=$4

    The number of samples on each GPU for each fine-tuning iteration. Valid values: 1, 2, 4, and 8.

    GLOBAL_BATCH_SIZE=$5

    The total number of samples for fine-tuning iterations. Valid values: 64, 96, and 128.

    LR=$6

    The learning rate. Valid values: 1e-5 and 5e-5.

    MIN_LR=$7

    The minimum learning rate. Valid values: 1e-6 and 5e-6.

    SEQ_LEN=$8

    The length of the sequence.

    PAD_LEN=$9

    The length of the padding sequence.

    EXTRA_VOCAB_SIZE=${10}

    The size of the extra vocabulary. The size varies based on the number of model parameters.

    • Qwen-7B: 85

    • Qwen-14B: 213

    • Qwen-72B: 213

    PR=${11}

    The training precision. Valid values: fp16 and bf16.

    TP=${12}

    The size of tensor parallelism.

    PP=${13}

    The size of pipeline parallelism.

    AC=${14}

    The activation checkpointing mode. Valid values: full and sel.

    DO=${15}

    Specifies whether to use the ZeRO-1 optimizer for Megatron. Valid values:

    • true

    • false

    FL=${16}

    Specifies whether to enable Flash Attention. Valid values:

    • true

    • false

    SP=${17}

    Specifies whether to use sequence parallelism. Valid values:

    • true

    • false

    TE=${18}

    Specifies whether to enable the acceleration technology of Transformer Engine. If you want to enable this technology, gu8xf GPUs are required.

    SAVE_INTERVAL=${19}

    The interval at which the model is saved.

    DATASET_PATH=${20}

    The directory of the training set.

    VALID_DATASET_PATH=${21}

    The directory of the validation set.

    PRETRAIN_CHECKPOINT_PATH=${22}

    The directory of the pre-trained model.

    TRAIN_ITERS=${23}

    The number of training iterations.

    LR_WARMUP_ITERS=${24}

    The number of warm-up iterations for the learning rate.

    OUTPUT_BASEPATH=${25}

    The directory of the output model file generated after training.

    Run a distributed job to fine-tune the model in DLC

    After you fine-tune the model in DSW, you can configure a distributed job to fine-tune the model on multiple multi-GPU servers in DLC. When you submit a training job in DLC, enter the following commands for the Startup Command parameter. For more information about other parameters, see the Pre-train the model section of this topic.

    export WORK_DIR=/mnt/workspace
    cd ${WORK_DIR}/Pai-Megatron-Patch/examples/qwen
    sh run_finetune_megatron_qwen_withGA.sh  \
    dlc  \
    ${WORK_DIR}/Pai-Megatron-Patch  \
    7B     \
    1      \
    96 \
    1e-5   \
    1e-6   \
    2048   \
    2048     \
    85      \
    bf16   \
    1      \
    1      \
    sel    \
    true   \
    false  \
    false  \
    false \
    1000 \
    ${WORK_DIR}/qwen-datasets/alpaca_zh-qwen-train.json   \
    ${WORK_DIR}/qwen-datasets/alpaca_zh-qwen-valid.json   \
    ${WORK_DIR}/qwen-ckpts/qwen-7b-hf-to-megatron-tp1-pp1   \
    2000   \
    10 \
    ${WORK_DIR}/output_megatron_qwen/

    The parameters that you must specify to run the run_finetune_megatron_qwen_withGA.sh script are the same as those that you specify when you submit a standalone job to fine-tune the model in DSW.

Step 4: Use the model for offline inference

After the model is trained, you can perform offline inference by using the model based on Megatron to evaluate the effects of the model. Perform the following steps:

  1. Download the pred_input.jsonl file that contains test samples and upload the file to the /mnt/workspace directory of DSW. For more information, see Upload or download data files.

    Note

    The data used for inference must be organized in the same way as that for fine-tuning.

  2. Copy all the JSON files and the tokenizer.model file in the model directory before training to the directory of the output model file generated after training. Then, the files are placed in the {OUTPUT_BASEPATH }/checkpoint directory and in the same folder as the latest_checkpointed_iteration.txt file.

    Note

    Replace the directories in the commands with your actual directories.

    cd /mnt/workspace/qwen-ckpts/qwen-7b-hf-to-megatron-tp1-pp1
    cp *.json /mnt/workspace/output_megatron_qwen/checkpoint/dswXXX/
    cp tokenizer.model /mnt/workspace/output_megatron_qwen/checkpoint/dswXXX/
  3. Run the following commands on the Terminal tab to perform offline inference by using the model. The inference results are generated in the /mnt/workspace/qwen_pred.txt file. You can evaluate the effects of the model based on the inference results.

    Note

    Before you run the commands, you must set the CUDA_VISIBLE_DEVICES parameter to 0 and the GPUS_PER_NODE parameter to 1 in the run_text_generation_megatron_qwen.sh script.

    export WORK_DIR=/mnt/workspace
    cd ${WORK_DIR}/Pai-Megatron-Patch/examples/qwen
    bash run_text_generation_megatron_qwen.sh \
    dsw \
    ${WORK_DIR}/PAI-Megatron-Patch \
    /mnt/workspace/output_megatron_qwen/checkpoint/dswXXX \
    7B \
    1 \
    1 \
    1024 \
    1024 \
    85 \
    fp16 \
    10 \
    512 \
    512 \
    ${WORK_DIR}/pred_input.jsonl \
    ${WORK_DIR}/qwen_pred.txt \
    0 \
    1.0 \
    1.2

    The following table describes the parameters that you must specify to run the run_text_generation_megatron_qwen.sh script.

    Parameter

    Description

    ENV=$1

    The runtime environment. Valid values:

    • dlc

    • dsw

    MEGATRON_PATCH_PATH=$2

    The directory of the Pai-Megatron-Patch folder.

    CHECKPOINT_PATH=$3

    The directory of the model during training.

    Important

    Replace this directory with your actual model directory.

    MODEL_SIZE=$4

    The number of model parameters. Valid values: 7B, 14B, and 72B.

    TP=$5

    The size of tensor parallelism.

    Important
    • If you set this parameter to 1, you can use a single GPU for inference.

    • If you set this parameter to a value greater than 1, you must use the corresponding number of GPUs for inference.

    BS=$6

    The number of samples on each GPU for each inference iteration. Valid values: 1, 4, and 8.

    SEQ_LEN=$7

    The length of the sequence. Valid values: 256, 512, and 1024.

    PAD_LEN=$8

    The length of the padding sequence, which is the length of the concatenated text.

    EXTRA_VOCAB_SIZE=${9}

    The number of tokens increased during model conversion. The number varies based on the number of model parameters.

    • Qwen-7B: 85

    • Qwen-14B: 213

    • Qwen-72B: 213

    PR=${10}

    The inference precision. Valid values: fp16 and bf16.

    TOP_K=${11}

    The number of top n candidate words to be selected. Valid values: 0 to n. Examples: 0, 5, 10, and 20.

    INPUT_SEQ_LEN=${12}

    The length of the input sequence. Set the value to 512.

    OUTPUT_SEQ_LEN=${13}

    The length of the output sequence. Set the value to 256.

    INPUT_FILE=${14}

    The file that contains the text to be used for inference. In this example, the pred_input.jsonl file is used, in which each line contains a sample.

    OUTPUT_FILE=${15}

    The output file generated after inference. In this example, the qwen_pred.txt file is used.

    TOP_P=${16}

    The percentage of top candidate words to be selected. Valid values: 0 to 1. Examples: 0, 0.85, and 0.95.

    Note

    You must set one of the TOP_K and TOP_P parameters to 0.

    TEMPERATURE=${17}

    The randomness of the sampling process. Valid values: 1 to n.

    REPETITION_PENALTY=${18}

    The repetition penalty of the content generated by the model. Valid values: 1 to 2. Default value: 1.2.

Step 5: Convert the model format

If the effects of the model meet your expectations after offline inference is performed by using the model, you can convert the model format from Megatron to Hugging Face. Then, you can deploy the converted Hugging Face model as a model service.

  1. Run the following commands on the Terminal tab to convert the model format from Megatron to Hugging Face:

    export WORK_DIR=/mnt/workspace
    cd /mnt/workspace/Pai-Megatron-Patch/toolkits/model_checkpoints_convertor/qwen
    sh model_convertor.sh \
    ../../../Megatron-LM-main        \
    ${WORK_DIR}/output_megatron_qwen/checkpoint/${Directory}/iter_*******         \
    /mnt/workspace/qwen-ckpts/qwen-7b-mg-to-hf-tp1-pp1/  \
    1  \
    1  \
    qwen-7b \
    0 \
    true

    The following table describes the parameters that you must specify to run the model_convertor.sh script.

    Parameter

    Description

    MEGATRON_PATH=$1

    The directory of the source code of the Megatron-based training tool.

    SOURCE_CKPT_PATH=$2

    The directory of the trained model in the Megatron format, including the iter_* folder. Example: ${WORK_DIR}/output_megatron_qwen/checkpoint/dsw-pretrain-megatron-qwen-7B-lr-1e-5-bs-1-seqlen-2048-pr-bf16-tp-1-pp-1-ac-sel-do-true-sp-false-tt--wt-/iter_*******.

    Important
    • Replace this directory with your actual model directory.

    • If you need to convert the format of a pre-trained model, you must delete all the distrib_optim.pt files in the model directory.

    TARGET_CKPT_PATH=$3

    The directory of the converted Hugging Face model.

    TP=$4

    The size of tensor parallelism, which must be the same as that for training.

    PP=$5

    The size of pipeline parallelism, which must be the same as that for training.

    MN=$6

    The name of the model, such as qwen-7b, qwen-14b, or qwen-72b.

    EXTRA_VOCAB_SIZE=$7

    The size of the extra vocabulary.

    mg2hf=$8

    Specifies whether to convert the model format from Megatron to Hugging Face.

  2. Copy the .json, .py, and .tiktoken files in the /mnt/workspace/qwen-ckpts/qwen-7b-hf directory of the open source Hugging Face model to the /mnt/workspace/qwen-ckpts/qwen-7b-mg-to-hf-tp1-pp1 directory to ensure that the model can be properly used.

    Important

    Take note that you do not need to copy the pytorch_model.bin.index.json file.

Use the Hugging Face model for offline inference

You can perform offline inference by using the converted Hugging Face model based on Hugging Face and DeepSpeed. For example, create an infer.py file that contains the following content in a directory for a Qwen-7B model on the Terminal tab. Run the infer.py file to perform offline inference by using the model and evaluate the effects of the model based on the inference results.

#!/usr/bin/env python
#encoding=utf-8
from transformers import AutoTokenizer, LlamaTokenizer
from transformers import LlamaForCausalLM
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
 
checkpoint = '/mnt/workspace/qwen-ckpts/qwen-7b-mg-to-hf-tp1-pp1'
print(checkpoint)
device = "cuda"
tokenizer = AutoTokenizer.from_pretrained(checkpoint, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(checkpoint,device_map="auto", trust_remote_code=True)
 
prompts = 'Write a quick sorting algorithm.'
p = f"Human:{prompts}"
print(p)
inputs = tokenizer.encode(p, return_tensors="pt").to(model.device)
outputs = model.generate(inputs,max_new_tokens=512)
print(tokenizer.decode(outputs[0]))

Replace checkpoint with the directory of the converted Hugging Face model. In this example, the /mnt/workspace/qwen-ckpts/qwen-7b-mg-to-hf-tp1-pp1 directory is used.

Step 6: Deploy the model as a model service and call the model service

After you perform offline inference and evaluate the effects of the model, you can deploy the converted Hugging Face model as an online model service and call the model service in the actual production environment to perform inference. Perform the following steps:

Deploy the model as a model service

  1. Go to the EAS-Online Model Services page.

    1. Log on to the PAI console.

    2. In the left-side navigation pane, click Workspaces. On the Workspaces page, click the name of the workspace to which you want to deploy the model.

    3. In the left-side navigation pane, choose Model Deployment > Elastic Algorithm Service (EAS) to go to the Elastic Algorithm Service (EAS) page. image

  2. On the Elastic Algorithm Service (EAS) page, click Deploy Service. In the dialog box that appears, select Custom Deployment and click OK.

  3. On the Create Service page, configure the parameters that are described in the following table. You can use the default values for other parameters.

    Parameter

    Description

    Model Service Information

    Service Name

    The custom name of the model service. The name must be unique in a region. In this example, the value is set to test_qwen.

    Deployment Method

    In this example, Deploy Web App by Using Image is selected.

    Select Image

    Select Image Address, enter pai-image-manage-registry.cn-wulanchabu.cr.aliyuncs.com/pai/llm-inference:vllm-0.2.1-v4 in the field, and then read and agree to the Machine Learning Platform for AI Terms of Service by selecting the check box.

    Model Settings

    Click Specify Model Settings and configure the model directory.

    • Select Mount NAS File System and configure the following parameters:

      • NAS Mount Target: the General-purpose NAS file system and mount target based on which the dataset is created.

      • NAS Source Path: the directory of the converted Hugging Face model that is stored in the NAS file system. In this example, the /qwen-ckpts/qwen-7b-mg-to-hf-tp1-pp1 directory is used.

      • Mount Path: the mount directory of the model. In this example, the value is set to /qwen-7b.

    Command to Run

    • In this example, the following command is run: nohup python -m fastchat.serve.controller > tmp1.log 2>&1 & python -m fastchat.serve.gradio_web_server_pai --model-list-mode reload > tmp2.log 2>&1 & python -m fastchat.serve.vllm_worker --model-path /qwen-7b --tensor-parallel-size 1 --trust-remote-code.

      Where:

      • --model-path: the mount directory of the model, which must be the same as that in the model settings.

      • --tensor-parallel-size: the size of tensor parallelism, which must be adjusted based on the number of GPUs. For example, set this parameter to 1 for a Qwen-7B model or 2 for a Qwen-72B model.

    • Port number: In this example, port 7860 is used.

    Resource Deployment Information

    Resource Group Type

    In this example, Intelligent Computing Lingjun Resources is selected.

    Select Quota

    Select the resource quota that is created for the purchased Lingjun resources.

    Instance Count

    Configure the parameters based on the model and the selected resources. For a Qwen-7B model, set the Instance Count parameter to 1 and select the instance type based on the following resource specifications:

    • vCPUs: 16

    • Memory: 64,000 MB

    • GPUs: 1

    VPC Settings

    VPC

    After you configure the NAS Mount Target parameter, the system automatically matches the virtual private cloud (VPC), vSwitch, and security group of the specified NAS file system.

    vSwitch

    Security Group Name

  4. Click Deploy.

    If the state of the service changes to Running, the service is deployed.

Call the model service

After the model service is deployed, you can call the service to perform inference. Perform the following steps:

  1. On the Inference Service tab, find the service that you want to call and click View Web App in the Service Type column.image.png

  2. On the WebUI page, perform inference.