All Products
Search
Document Center

Platform For AI:One-click fine-tuning of DeepSeek-R1 distill models

Last Updated:Jan 26, 2026

DeepSeek-R1 excels at math, coding, and reasoning tasks. DeepSeek open-sourced six dense distill models based on Llama and Qwen. This topic demonstrates fine-tuning DeepSeek-R1-Distill-Qwen-7B in PAI Model Gallery.

Supported models

Model Gallery supports LoRA SFT for the six distill models. The following table lists minimum configurations with default parameters:

Distill model

Base model

Training method

Minimum configuration

DeepSeek-R1-Distill-Qwen-1.5B

Qwen2.5-Math-1.5B

LoRA supervised fine-tuning

1 x A10 (24 GB video memory)

DeepSeek-R1-Distill-Qwen-7B

Qwen2.5-Math-7B

1 x A10 (24 GB video memory)

DeepSeek-R1-Distill-Llama-8B

Llama-3.1-8B

1 x A10 (24 GB video memory)

DeepSeek-R1-Distill-Qwen-14B

Qwen2.5-14B

1 x GU8IS (48 GB video memory)

DeepSeek-R1-Distill-Qwen-32B

Qwen2.5-32B

2 x GU8IS (48 GB video memory)

DeepSeek-R1-Distill-Llama-70B

Llama-3.3-70B-Instruct

8 x GU100 (80 GB video memory)

Train the model

  1. Go to the Model Gallery page.

    1. Log on to the PAI console.

    2. In the upper-left corner, select a region.

    3. In the left pane, click Workspaces. On the Workspaces page, click a workspace name.

    4. In the left pane, choose QuickStart > Model Gallery.

  2. On the Model Gallery page, click the DeepSeek-R1-Distill-Qwen-7B model card to go to the details page.

    This page displays deployment, training details, SFT data format, and invocation methods.

    image

  3. Click Train in the upper-right corner and configure the following key parameters:

    • Dataset Configuration: Upload prepared data to an OSS bucket.

    • Computing Resources: Minimum configurations are listed in Supported models. Adjusting hyperparameters may require more memory.

    • Hyperparameters: Adjust these LoRA SFT hyperparameters based on your data and resources. For details, see Guide to fine-tuning LLMs.

      Hyperparameter

      Type

      Default value

      (for 7B model as an example)

      Description

      learning_rate

      float

      5e-6

      Controls weight adjustment magnitude.

      num_train_epochs

      int

      6

      Number of training epochs (dataset iterations).

      per_device_train_batch_size

      int

      2

      Samples per GPU per iteration. Higher values increase efficiency and memory usage.

      gradient_accumulation_steps

      int

      2

      The number of gradient accumulation steps.

      max_length

      int

      1024

      Max tokens per sample.

      lora_rank

      int

      8

      LoRA dimension.

      lora_alpha

      int

      32

      LoRA scaling factor.

      lora_dropout

      float

      0

      LoRA dropout rate. Randomly drops neurons during training to prevent overfitting.

      lorap_lr_ratio

      float

      16

      LoRA+ learning rate ratio (λ = ηB/ηA). Uses different rates for adapter matrices A and B. Set to 0 for standard LoRA.

  4. Click Train. The training page shows job status and logs.

    image

    • On success, the model is registered in AI Asset Management - Models for deployment. See Register and manage models.

    • On failure, click image next to Status or check the Task log tab. For common errors, see FAQ and Model Gallery FAQ.

    • Metric Curve shows the loss progression.

      image

  5. After training, click Deploy to create an EAS service. Invocation follows the original distill model. See the model detail page or One-click deployment of DeepSeek-V3 and DeepSeek-R1 models.

    image

Billing

Model Gallery training uses DLC, billed by job duration. Resources stop automatically when jobs end. See Billing of Deep Learning Containers (DLC).

FAQ

Why does my Model Gallery training job fail?

  • Cause: max_length too small. Data exceeding this limit is discarded:

    imageSolution: Increase max_length. If too much data is discarded, training or validation datasets may become empty, causing failure:

    image

  • Error: failed to compose dlc job specs, resource limiting triggered, you are trying to use more GPU resources than the threshold

    Solution: Training is limited to 2 simultaneous GPUs. Wait for ongoing jobs to finish, or submit a ticket to increase quota.

  • Error: the specified vswitch vsw-**** cannot create the required resource ecs.gn7i-c32g1.8xlarge, zone not match

    Solution: The requested instance type is unavailable in the current zone. Try one of these:

    • Leave vSwitch empty. DLC auto-selects one based on inventory.

    • Switch to a different instance type.

How do I download the trained model from Model Gallery?

Set the model output path to an OSS directory when creating the training job, then download the model from OSS.

image

How can I improve poor model performance after fine-tuning?

Try the following approaches:

  1. Use a larger model with better baseline performance, such as DeepSeek or Qwen3 series with higher parameter counts.

  2. Refine your prompts.

  3. Increase max_tokens.

  4. Break complex tasks into smaller subtasks for the model to handle separately.

References