Fine-Tuning a Llama3-8B Model in PAI DSW

This article describes how to fine-tune the parameters of a Llama 3 model in DSW to enable the model to better align with and adapt to specific scenarios.

Data Science Workshop (DSW) of Platform for AI (PAI) is an interactive modeling platform that you can use to implement custom model fine-tuning and optimize model performance. This topic describes how to fine-tune the parameters of a Llama 3 model in DSW to enable the model to better align with and adapt to specific scenarios, and improve the model performance on specific tasks. In this topic, a Meta-Llama-3-8B-Instruct model is used as an example.

Background Information

Llama 3 is the latest model family in the Llama series provided by Meta in April 2024. Llama 3 is trained on more than 15 trillion tokens, which is approximately 7 times the size of the Llama 2 dataset. Llama 3 supports 8K tokens and an improved tokenizer that has a vocabulary size of 128K tokens. This ensures more precise and efficient processing of complex contexts and technical terms.

Llama3 provides pretrained and instruction-tuned versions of models in 8B and 70B sizes suitable for various scenarios.

8B: Llama 3 8B is suitable for efficient deployment and development based on consumer-grade GPUs. You can use Llama 3 8B in scenarios that require high model response speed and cost-effectiveness.
- Meta-Llama-3-8B: pretrained version
- Meta-Llama-3-8B-Instruct: instruction-tuned version
70B: Llama 3 70B leverages the large scale parameter size and is suitable for large-scale AI applications. You can use this model in advanced and complex tasks and model performance optimization tasks.
- Meta-Llama-3-70B: pretrained version
- Meta-Llama-3-70B-Instruct: instruction-tuned version

Prerequisites

A workspace is created. For more information, see Create a workspace.
A DSW instance is created. Take note of the following key parameters. For more information, see Create a DSW instance.
- Instance type: We recommend that you use an instance whose GPU memory is at least 16 GB, such as the V100 GPU.
- Python: Python 3.9 or later.
- Image: In this example, the following image URL is used: dsw-registry-vpc. REGION.cr.aliyuncs.com/pai-training-algorithm/llm_deepspeed_peft:v0.0.3. Replace REGION with the ID of the region in which your DSW instance resides. Example: cn-hangzhou or cn-shanghai. The following table describes the region IDs.

Region	Region ID
China (Hangzhou)	cn-hangzhou
China (Shanghai)	cn-shanghai
China (Beijing)	cn-beijing
China (Shenzhen)	cn-shenzhen

Before you use the Llama 3 model, read the official Meta license.

Note: If you cannot access the web page, you may need to configure a proxy and then try again.

Step 1: Download the Model

Method 1: Download the Model in DSW

1. Go to the DSW development environment.

a) Log on to the PAI console.

b) In the top navigation bar, select the region in which the DSW instance resides.

c) In the left-side navigation pane, click Workspaces. On the Workspaces page, click the name of the default workspace.

d) In the left-side navigation pane, choose Model Development and Training > Interactive Modeling (DSW).

e) Click Open in the Actions column of the DSW instance that you want to manage to go to the development environment of the DSW instance.

2. On the Launcher tab, click Python 3 in the Notebook pane of the Quick Start section.

3. Run the following code in the Notebook to download the model file. The system automatically selects an appropriate download address and downloads the model file to the current directory.

! pip install modelscope==1.12.0 transformers==4.37.0

from modelscope.hub.snapshot_download import snapshot_download
snapshot_download('LLM-Research/Meta-Llama-3-8B-Instruct', cache_dir='.', revision='master')

Method 2: Download the Model in Meta

Go to the Meta website to apply for the model.

Note: If you cannot access the web page, you may need to configure a proxy and then try again.

Step 2: Prepare a Dataset

In this example, an English poetry dataset is used to fine-tune the Llama 3 model to improve the poetic expressiveness of the generated poems. Run the following command in the Notebook of DSW to download the training dataset required by the model:

!wget https://atp-modelzoo-sh.oss-cn-shanghai.aliyuncs.com/tutorial/llm_instruct/en_poetry_train.json

You can use your dataset that is suitable for your business scenario based on the format of the sample training dataset. You can improve the response accuracy of a large language model (LLM) for specific tasks by fine-tuning the LLM.

Step 3: Fine-Tune the Model

Lightweight LoRA Training

In this example, a /ml/code/sft.py training script is used to perform lightweight Low-Rank Adaptation (LoRA) training on the model. After training, the system quantizes model parameters to reduce the GPU memory required for inference.

When you run the accelerate launch command, the system uses the parameters to launch specific Python scripts and performs training based on the computing resources that are specified in the multi_gpu.yaml configuration file.

! accelerate launch --num_processes 1 --config_file /ml/code/multi_gpu.yaml /ml/code/sft.py \
    --model_name  ./LLM-Research/Meta-Llama-3-8B-Instruct/ \
    --model_type llama \
    --train_dataset_name chinese_medical_train_sampled.json \
    --num_train_epochs 3 \
    --batch_size 8 \
    --seq_length 128 \
    --learning_rate 5e-4 \
    --lr_scheduler_type linear \
    --target_modules k_proj o_proj q_proj v_proj \
    --output_dir lora_model/ \
    --apply_chat_template \
    --use_peft \
    --load_in_4bit \
    --peft_lora_r 32 \
    --peft_lora_alpha 32

The following section describes the parameters used in this example. Modify the parameters based on your business requirements.

● The accelerate launch command is used to launch and manage deep learning training scripts on multiple GPUs.

num_processes: the number of parallel processing processes. In this example, this parameter is set to 1 to disable multi-process parallel processing.
config_file/ml/code/multi_gpu.yaml: The path of the configuration file.
/ml/code/sft.py: the path of the Python script that you want to run.

● To run the /ml/code/sft.py script, configure the following parameters:

--model_name./LLM-Research/Meta-Llama-3-8B-Instruct/: The path of the pretrained model.
--model_type: the type of the model. In this example, Llama is used.
--train_dataset_namechinese_medical_train_sampled.json: the path of the training dataset.
--num_train_epochs: the number of training epochs. In this example, set the parameter to 3.
--batch_size: the size of the batch. In this example, set the parameter to 8.
--seq_length: the length of the sequence. In this example, set the parameter to 128.
--learning_rate: the learning rate. In this example, set the parameter to 5e-4, which is equal to 0.0005.
--lr_scheduler_type: the type of the learning rate scheduler. In this example, set the parameter to linear.
--target_modules: the model sections to focus on during fine-tuning. In this example, set the parameter to k_proj o_proj q_proj v_proj.
--output_dir: the output directory in which the fine-tuned model is saved. In this example, set the parameter to lora_model/.
--apply_chat_template: the chat template that you want to use during training.
--use_peft: Use Parameter-Efficient Fine-Tuning (PEFT) during training.
--load_in_4bit: Load the model weights with 4-bit precision to reduce memory consumption.
--peft_lora_r: the value of LoRA rank. In this example, set the parameter to 32.
--peft_lora_alpha: the alpha value of LoRA. In this example, set the parameter to 32.

Fuse LoRA Weights with the Model

Run the following command to fuse the LoRA weights with the Llama 3 model:

! RANK=0 python /ml/code/convert.py \
    --model_name ./LLM-Research/Meta-Llama-3-8B-Instruct/ \
    --model_type llama \
    --output_dir trained_model/ \
    --adapter_dir lora_model/

The following section describes the parameters and values used in this example:

RANK=0: The RANK environment variable is used to specify the sequence number of the current process among all processes in distributed training. A value of 0 specifies that the current process is an independent process or acts as the main process in distributed training.
python /ml/code/convert.py: Run the convert.py script for weight conversion or other conversion tasks.
--model_name ./LLM-Research/Meta-Llama-3-8B-Instruct/: the path to the model.
--model_type llama: the type of the model. In this example, Llama is used.
--output_dir train ed_model/: the output directory in which the converted model and weights are saved.
--adapter_dir lora_model/: the directory in which the LoRA adapter weights reside.

Step 4: Perform model inference

Run the following command to perform model inference and verify the fine-tuning effect. In this example, the model is required to generate a poem about spring:

import torch, transformers

# model_id = "./LLM-Research/Meta-Llama-3-8B-Instruct/"
model_id = "./trained_model/"
pipeline = transformers.pipeline(
    "text-generation",
    model=model_id,
    model_kwargs={"torch_dtype": torch.bfloat16},
    device="cuda",
)

messages = [
    {"role": "user", "content": "Write a poem on a topic 'spring' "},
]

prompt = pipeline.tokenizer.apply_chat_template(
        messages, 
        tokenize=False, 
        add_generation_prompt=True
)

terminators = [
    pipeline.tokenizer.eos_token_id,
    pipeline.tokenizer.convert_tokens_to_ids("<|eot_id|>")
]

outputs = pipeline(
    prompt,
    max_new_tokens=1024,
    eos_token_id=terminators,
    do_sample=True,
    temperature=0.6,
    top_p=0.9,
)
print(outputs[0]["generated_text"][len(prompt):])

The following section provides a sample response of the model, which proves that the model can generate high-quality poems after fine-tuning.

Here's a poem on the topic of "Spring":

As winter's chill begins to fade,
The earth awakens from its shade,
And spring's sweet breath begins to blow,
Bringing life to all that's cold and slow.

The trees regain their vibrant hue,
And flowers bloom, both old and new,
Their petals dancing in the breeze,
As sunshine warms the world with ease.

The air is filled with sweet perfume,
As blossoms burst forth in their room,
And robins sing their morning song,
As spring's awakening is strong.

The world is fresh, and new, and bright,
As spring's warm light begins to take flight,
And all around, new life unfolds,
As winter's grip begins to grow old.

So let us bask in spring's warm rays,
And let our spirits soar and sway,
For in this season, we're reborn,
And all around, new life is sworn.

I hope you enjoy it!

Step 5. Deploy the model

You can upload the fine-tuned model weights to Object Storage Service (OSS) and deploy the fine-tuned Llama 3 model in Elastic Algorithm Service (EAS) of PAI. For more information, see Deploy LLM applications in EAS.

Community

Fine-Tuning a Llama3-8B Model in PAI DSW

Background Information

Prerequisites

Step 1: Download the Model

Method 1: Download the Model in DSW

Method 2: Download the Model in Meta

Step 2: Prepare a Dataset

Step 3: Fine-Tune the Model

Lightweight LoRA Training

Fuse LoRA Weights with the Model

Step 4: Perform model inference

Step 5. Deploy the model

Read previous post:

Read next post:

Alibaba Cloud Data Intelligence

You may also like

Comments

Alibaba Cloud Data Intelligence

Related Products

Platform For AI

Alibaba Cloud for Generative AI

AI Acceleration Solution

Conversational AI Service