Llama3-8B大模型微調訓練 - Platform For AI

DSW是一款互動式建模平台，適合需要定製化微調模型並追求最佳化效果的開發人員。本文以Llama-3-8B-Instruct模型為例，為您介紹如何在DSW中對Llama3大模型進行參數微調訓練，以使模型更好地理解和適應特定的任務，提高模型在指定任務上的表現和效能。

背景資訊

Llama3是Meta於2024年4月開放的Llama系列的最新模型。基於超過15萬億tokens的訓練，相當於Llama2資料集的7倍之多。該模型不僅支援8K長文本，還配備了經過改進的Tokenizer，其詞彙量高達128K token，確保在處理複雜語境和專業術語時實現更精準高效的效能表現。

Llama3提供8B和70B兩個版本以滿足不同情境的需求，每個版本都包括基礎和指令調優兩種形式：

8B版本
適合消費級GPU環境，確保在有限計算資源下實現快速部署與開發工作，適用於對模型響應速度與成本效益要求較高的應用情境。
- Meta-Llama-3-8b：8B基本模型
- Meta-Llama-3-8b-instruct：8B基本模型的指令調優版
70B版本
面向大規模AI應用定製，憑藉其龐大的參數規模，適合處理高複雜度任務和追求更優模型效能的高端專案。
- Meta-Llama-3-70b：70B基本模型
- Meta-Llama-3-70b-instruct：70B基本模型的指令調優版

前提條件

建立工作空間，詳情請參見建立工作空間。
建立DSW執行個體，其中關鍵參數配置如下。具體操作，請參見建立DSW執行個體。
- 執行個體規格選擇：推薦使用V100（16GB）或更高顯存的GPU。
- Python環境：建議使用Python3.9及以上版本。
- 鏡像選擇：本方案在鏡像URL中輸入dsw-registry-vpc.REGION.cr.aliyuncs.com/pai-training-algorithm/llm_deepspeed_peft:v0.0.3，其中，REGION需要替換為DSW執行個體所屬地區對應的代號，例如cn-hangzhou、cn-shanghai等，更多地區和REGION對應關係請參見下表。
  地區
  REGION代號
  杭州
  cn-hangzhou
  上海
  cn-shanghai
  北京
  cn-beijing
  深圳
  cn-shenzhen
使用Llama3大模型前請閱讀Meta官方許可證。
說明
如果無法訪問，您可能需要設定代理後再嘗試重新訪問。

步驟一、下載模型

方式一：在DSW下載模型

進入DSW開發環境。
1. 登入PAI控制台。
2. 在頁面左上方，選擇DSW執行個體所在的地區。
3. 在左側導覽列單擊工作空間列表，在工作空間列表頁面中單擊預設工作空間名稱，進入對應工作空間內。
4. 在左側導覽列，選擇模型開發與訓練>互動式建模（DSW）。
5. 單擊需要開啟的執行個體操作列下的開啟，進入DSW執行個體開發環境。
在Launcher頁面中，單擊快速開始地區Notebook下的Python3。

在Notebook中執行以下代碼下載模型檔案。代碼會自動選擇適當的下載地址，並將模型檔案下載至目前的目錄。

! pip install modelscope==1.12.0 transformers==4.37.0

from modelscope.hub.snapshot_download import snapshot_download
snapshot_download('LLM-Research/Meta-Llama-3-8B-Instruct', cache_dir='.', revision='master')

方式二：在Meta下載模型

前往Meta申請下載模型。

說明

如果無法訪問，您可能需要設定代理後再嘗試重新訪問。

步驟二、準備資料集

本案例準備了英文詩歌資料集，用於微調Llama 3模型，提高其產生詩歌的表現能力。在DSW的Notebook中執行以下命令，即可下載模型所需的訓練資料集。

!wget https://atp-modelzoo-sh.oss-cn-shanghai.aliyuncs.com/tutorial/llm_instruct/en_poetry_train.json

您也可以參考該資料集的格式，根據自己的使用情境，準備所需的資料集。通過微調，能夠提高大語言模型在特定任務上的回答準確性。

步驟三、微調模型

LoRA輕量化訓練

基於已有的訓練指令碼/ml/code/sft.py進行模型的LoRA輕量化訓練。訓練結束之後，將模型參數進行量化，以便使用更少的顯存進行推理。

當運行accelerate launch命令時，會使用這些參數啟動指定的Python指令碼，並且根據multi_gpu.yaml設定檔中的設定，在計算資源允許的範圍內進行訓練。

! accelerate launch --num_processes 1 --config_file /ml/code/multi_gpu.yaml /ml/code/sft.py \
    --model_name  ./LLM-Research/Meta-Llama-3-8B-Instruct/ \
    --model_type llama \
    --train_dataset_name chinese_medical_train_sampled.json \
    --num_train_epochs 3 \
    --batch_size 8 \
    --seq_length 128 \
    --learning_rate 5e-4 \
    --lr_scheduler_type linear \
    --target_modules k_proj o_proj q_proj v_proj \
    --output_dir lora_model/ \
    --apply_chat_template \
    --use_peft \
    --load_in_4bit \
    --peft_lora_r 32 \
    --peft_lora_alpha 32

樣本使用的參數說明如下，請您根據實際情況進行修改：

accelerate launch命令列工具用於在多GPU中啟動和管理深度學習訓練指令碼。
- num_processes 1：設定平行處理的進程數量為1，即不進行多進程平行處理。
- config_file/ml/code/multi_gpu.yaml：指定設定檔的路徑。
- /ml/code/sft.py：指定要啟動並執行Python指令碼的路徑。
指令碼/ml/code/sft.py接受的參數：
- --model_name./LLM-Research/Meta-Llama-3-8B-Instruct/：指定預訓練模型的路徑。
- --model_type llama：指定模型的類型，此處為Llama。
- --train_dataset_namechinese_medical_train_sampled.json：指定訓練資料集的路徑。
- --num_train_epochs 3：設定訓練的輪次為3。
- --batch_size 8：設定批處理的大小為8。
- --seq_length 128：設定序列的長度為128。
- --learning_rate 5e-4：設定學習率為0.0005。
- --lr_scheduler_type linear：設定學習率調度器類型為線性。
- --target_modules k_proj o_proj q_proj v_proj：指定在微調中需要特別關注的模型模組。
- --output_dir lora_model/：指定輸出目錄路徑，微調後的模型將被儲存在這裡。
- --apply_chat_template：指定訓練時應用聊天模板。
- --use_peft：在訓練過程中使用參數有效調優PEFT（Parameter-Efficient Fine-Tuning）方法。
- --load_in_4bit：指示模型權重載入時使用4位精度，減少記憶體消耗。
- --peft_lora_r 32：如果使用了LoRA（Low-Rank Adaptation）作為參數有效調優方法的一部分，這會指定LoRA的值為32。
- --peft_lora_alpha 32：設定LoRA參數的另一部分，alpha的大小為32。

將LoRA權重與基本模型融合

執行以下命令，將LoRA權重與基本模型融合。

! RANK=0 python /ml/code/convert.py \
    --model_name ./LLM-Research/Meta-Llama-3-8B-Instruct/ \
    --model_type llama \
    --output_dir trained_model/ \
    --adapter_dir lora_model/

樣本使用的參數說明如下：

RANK=0：環境變數RANK用於分布式訓練中，表示當前進程在所有進程中的序號。設為0表明它是單進程或者是分布式訓練中的主進程。
python /ml/code/convert.py：執行convert.py指令碼，用於權重轉換或其他轉換工作。
--model_name ./LLM-Research/Meta-Llama-3-8B-Instruct/：指定基本模型的路徑。
--model_type llama：指定模型類型，此處為Llama。
--output_dir trained_model/：指定轉換後的模型和權重應該輸出儲存的目錄。
--adapter_dir lora_model/：指定包含LoRA適配器權重的目錄。

步驟四、推理模型

執行以下命令進行模型推理，驗證微調效果。這裡我們讓模型寫一首關於春天的詩歌：

import torch, transformers

# model_id = "./LLM-Research/Meta-Llama-3-8B-Instruct/"
model_id = "./trained_model/"
pipeline = transformers.pipeline(
    "text-generation",
    model=model_id,
    model_kwargs={"torch_dtype": torch.bfloat16},
    device="cuda",
)

messages = [
    {"role": "user", "content": "Write a poem on a topic 'spring' "},
]

prompt = pipeline.tokenizer.apply_chat_template(
        messages, 
        tokenize=False, 
        add_generation_prompt=True
)

terminators = [
    pipeline.tokenizer.eos_token_id,
    pipeline.tokenizer.convert_tokens_to_ids("<|eot_id|>")
]

outputs = pipeline(
    prompt,
    max_new_tokens=1024,
    eos_token_id=terminators,
    do_sample=True,
    temperature=0.6,
    top_p=0.9,
)
print(outputs[0]["generated_text"][len(prompt):])

以下是模型產生的回答，可以看出微調後的模型在詩歌寫作方面表現很好：

Here's a poem on the topic of "Spring":

As winter's chill begins to fade,
The earth awakens from its shade,
And spring's sweet breath begins to blow,
Bringing life to all that's cold and slow.

The trees regain their vibrant hue,
And flowers bloom, both old and new,
Their petals dancing in the breeze,
As sunshine warms the world with ease.

The air is filled with sweet perfume,
As blossoms burst forth in their room,
And robins sing their morning song,
As spring's awakening is strong.

The world is fresh, and new, and bright,
As spring's warm light begins to take flight,
And all around, new life unfolds,
As winter's grip begins to grow old.

So let us bask in spring's warm rays,
And let our spirits soar and sway,
For in this season, we're reborn,
And all around, new life is sworn.

I hope you enjoy it!

步驟五、部署模型

您可以將微調後的模型權重上傳至OSS，使用EAS ChatLLM部署微調後的Llama3模型服務。具體操作，請參見5分鐘使用EAS一鍵部署LLM大語言模型應用。

附錄：從DSW Gallery中運行Llama3

當前，DSW Gallery已經內預置Llama3的Notebook案例，您可以在DSW執行個體中開啟案例並一鍵運行，或根據實際需求進行修改。具體操作，請參見Notebook Gallery。

Platform For AI：Llama3-8B大模型微調訓練