Unlock the Power of AI

1 million free tokens

88% Price Reduction

NaNDayNaN:NaN:NaN
Activate Now

Fine-tune, evaluate, compress, and deploy a Qwen2.5-Coder model

Updated at: 2025-01-02 09:56

Tongyi Qianwen 2.5-Coder (Qwen2.5-Coder or CodeQwen) is a new series of large language models (LLMs) released by Alibaba Cloud and developed for code processing. The LLMs support the following mainstream sizes to meet the diverse requirements of developers: 0.5B, 1.5B, 3B, 7B, 14B, and 32B. Qwen2.5-Coder achieves optimized performance in coding scenarios and maintains strong mathematical and inference capabilities by training on massive amounts of code data. Platform for AI (PAI) can provide full support for the models. This topic describes how to fine-tune, evaluate, compress, and deploy a Qwen2.5-Coder model in Model Gallery. In this topic, the Qwen2.5-Coder-32B-Instruct model is used.

Overview

Qwen2.5-Coder is a series of models with powerful programming capabilities launched by Alibaba Cloud. Qwen2.5-Coder supports a context length of up to 128,000 tokens and is compatible with 92 programming languages. The models perform well in various code-related tasks, including multilingual code generation, code completion, and code repair. Alibaba Cloud fine-tuned instructions based on Qwen2.5-Coder and released the Qwen2.5-Coder-Instruct model which is further improved in various tasks and demonstrates excellent generalization capabilities.

  • Multilingual programming capabilities

    The Qwen2.5-Coder-Instruct model demonstrates superior multilingual programming capabilities. The model was extensively tested based on the McEval benchmark. The test covers more than 40 programming languages, including several niche ones. The results show that the model performs well in multilingual programming tasks.

  • Code inference

    The Qwen2.5-Coder-Instruct model is outstanding in code inference tasks. The model demonstrates strong inference capabilities based on the CRUXEval evaluation benchmark. The model performs well in complex instruction execution as code inference capabilities improve. This provides a new perspective for subsequent research on how code capabilities affect general inference capabilities.

  • Mathematical capabilities

    The Qwen2.5-Coder-Instruct model excels in mathematics and coding tasks. As a foundational discipline for coding, mathematics is closely related to programming. The outstanding performance of the model in both areas demonstrates its strong capabilities in the sciences.

  • Basic capabilities

    The Qwen2.5-Coder-Instruct model maintains the advantages of Qwen2.5 based on the general capability evaluation. This proves that the model is suitable and remains stable across various tasks.

The preceding features allow the Qwen2.5-Coder series of models to provide strong technical support for multilingual programming and complex task processing.

Environment requirements

  • The Qwen2.5-Coder-Instruct model can be run in Model Gallery in the China (Beijing), China (Shanghai), China (Shenzhen), China (Hangzhou), China (Ulanqab), or Singapore region.

  • Make sure that your computing resources match the model size. The following table describes the requirements for each model size.

    Model size

    Requirement

    Model size

    Requirement

    Qwen2.5-Coder-0.5B/1.5B

    • Training: Use GPUs, such as T4, P100, or V100, with 16 GB of memory or GPUs with higher specifications.

    • Deployment: Use at least one P4 GPU. We recommend that you use one GU30, A10, V100, or T4 GPU.

    Qwen2.5-Coder-3B/7B

    • Training: Use GPUs, such as A10 or T4, with 24 GB of memory or GPUs with higher specifications.

    • Deployment: Use at least one P100, P4, or V100 (gn6v) GPU. We recommend that you use one GU30 or A10 GPU.

    Qwen2.5-Coder-14B

    • Training: Use GPUs, such as V100, with 32 GB of memory or GPUs with higher specifications.

    • Deployment: Use at least one L20 GPU, one GU60 GPU, or two GU30 GPUs. We recommend that you use two GU60 GPUs or two L20 GPUs.

    Qwen2.5-Coder-32B

    • Training: Use GPUs, such as A800 or H800, with 80 GB of memory or GPUs with higher specifications.

    • Deployment: Use at least two GU60 GPUs, two L20 GPUs, or four A10 GPUs. We recommend that you use four GU60 GPUs, four L20 GPUs, or eight V100 GPUs with 32 GB of memory.

Use a model in Model Gallery

Deploy and call a model service

  1. Go to the Model Gallery page.

    1. Log on to the PAI console.

    2. In the upper-left corner, select a region based on your business requirements.

    3. In the left-side navigation pane, click Workspaces. On the Workspaces page, click the name of the workspace that you want to use.

    4. In the left-side navigation pane, choose QuickStart > Model Gallery.

  2. In the model list of the Model Gallery page, search for and click the Qwen2.5-Coder-32B-Instruct model.

  3. In the upper-right corner of the model details page, click Deploy. In the Deploy panel, configure the parameters to deploy the model to Elastic Algorithm Service (EAS) as a model service.

    image

  4. Call the model service.

    On the Model Gallery page, click Job Management. On the Training jobs tab of the Job Management page, click the Deployment Jobs tab. On the tab that appears, click the name of the model service. On the model service details page, click View Web App in the upper-right corner.

    image

Train a model

Model Gallery provides out-of-the-box fine-tuning algorithms for the Qwen2.5-Coder-32B-Instruct model, including the Supervised Fine-Tuning (SFT) and Direct Preference Optimization (DPO) algorithms.

SFT
DPO

The SFT algorithm supports inputs in the JSON format. Each data record consists of a question specified by the instruction field and an answer specified by the output field. Examples:

[
  {
    "instruction": "Create a function to calculate the sum of a sequence of integers.",
    "output": "# Python code\ndef sum_sequence(sequence):\n  sum = 0\n  for num in sequence:\n    sum += num\n  return sum"
  },
  {
    "instruction": "Generate a Python code for crawling a website for a specific type of data.",
    "output": "import requests\nimport re\n\ndef crawl_website_for_phone_numbers(website):\n    response = requests.get(website)\n    phone_numbers = re.findall('\\d{3}-\\d{3}-\\d{4}', response.text)\n    return phone_numbers\n    \nif __name__ == '__main__':\n    print(crawl_website_for_phone_numbers('www.example.com'))"
  }
]

The DPO algorithm supports inputs in the JSON format. Each data record consists of a question specified by the prompt field, an expected answer specified by the chosen field, and an unexpected answer specified by the rejected field. Examples:

[
  {
    "prompt": "Create a function to calculate the sum of a sequence of integers.",
    "chosen": "# Python code\ndef sum_sequence(sequence):\n  sum = 0\n  for num in sequence:\n    sum += num\n  return sum",
    "rejected": "[x*x for x in [1, 2, 3, 5, 8, 13]]"
  },
  {
    "prompt": "Generate a Python code for crawling a website for a specific type of data.",
    "chosen": "import requests\nimport re\n\ndef crawl_website_for_phone_numbers(website):\n    response = requests.get(website)\n    phone_numbers = re.findall('\\d{3}-\\d{3}-\\d{4}', response.text)\n    return phone_numbers\n    \nif __name__ == '__main__':\n    print(crawl_website_for_phone_numbers('www.example.com'))",
    "rejected": "def remove_duplicates(string): \n    result = \"\" \n    prev = '' \n\n    for char in string:\n        if char != prev: \n            result += char\n            prev = char\n    return result\n\nresult = remove_duplicates(\"AAABBCCCD\")\nprint(result)"
  }
]
  1. In the upper-right corner of the model details page, click Train. In the Train panel, configure the following parameters:

    • Dataset Configuration: You can specify the Object Storage Service (OSS) path that contains datasets you prepared or select a dataset file that is stored in File Storage NAS (NAS) or Cloud Parallel File Storage (CPFS). You can also select the default path to use the public datasets provided by PAI.

    • Computing resources: The fine-tuning algorithm requires A800 or H800 GPUs with 80 GB of memory, or GPUs with higher specifications. Make sure that the resource quota that you use has sufficient computing resources. For information about the resource specifications required for models of other sizes, see Environment requirements.

    • Hyper-parameters: Configure the hyperparameters of the fine-tuning algorithm based on your business requirements. The following table describes the hyperparameters.

      Hyperparameter

      Type

      Default value

      Required

      Description

      Hyperparameter

      Type

      Default value

      Required

      Description

      training_strategy

      string

      sft

      Yes

      The fine-tuning algorithm. Valid values: SFT and DPO.

      learning_rate

      float

      5e-5

      Yes

      The learning rate, which controls the extent to which the model is adjusted.

      num_train_epochs

      int

      1

      Yes

      The number of epochs. An epoch is a full cycle of exposing each sample in the training dataset to the algorithm.

      per_device_train_batch_size

      int

      1

      Yes

      The number of samples processed by each GPU in one training iteration. A higher value results in higher training efficiency and higher memory usage.

      seq_length

      int

      128

      Yes

      The length of the input data processed by the model in one training iteration.

      lora_dim

      int

      32

      No

      The inner dimensions of the low-rank matrices that are used in Low-Rank Adaptation (LoRA) or Quantized Low-Rank Adaptation (QLoRA) training. Set this parameter to a value greater than 0.

      lora_alpha

      int

      32

      No

      The LoRA or QLoRA weights. This parameter takes effect only if you set the lora_dim parameter to a value greater than 0.

      dpo_beta

      float

      0.1

      No

      The extent to which the model relies on preference information during training.

      load_in_4bit

      bool

      true

      No

      Specifies whether to load the model in 4-bit quantization.

      This parameter takes effect only if you set the lora_dim parameter to a value greater than 0 and the load_in_8bit parameter to false.

      load_in_8bit

      bool

      false

      No

      Specifies whether to load the model in 8-bit quantization.

      This parameter takes effect only if you set the lora_dim parameter to a value greater than 0 and the load_in_4bit parameter to false.

      gradient_accumulation_steps

      int

      8

      No

      The number of gradient accumulation steps.

      apply_chat_template

      bool

      true

      No

      Specifies whether the algorithm combines the training data with the default chat template. A Qwen2 model is used in the following format:

      • Question: <|im_end|>\n<|im_start|>user\n + instruction + <|im_end|>\n

      • Answer: <|im_start|>assistant\n + output + <|im_end|>\n

      system_prompt

      string

      You are a helpful assistant

      No

      The system prompt used to train the model.

  2. After you configure the parameters, click Train. On the training job details page, you can view the status and log of the training job.

    image

    The trained model is automatically registered to the Models of the AI Asset Management module. You can view or deploy the model. For more information, see Register and manage models.

Evaluate a model

Scientific model evaluation helps developers measure and compare the performance of different models in an efficient manner. The evaluation also guides developers in selecting and optimizing models in an accurate manner. This accelerates AI innovation and application development.

Model Gallery provides out-of-the-box evaluation algorithms for the Qwen2.5-Coder-32B-Instruct model or the trained Qwen2.5-Coder-32B-Instruct model. For more information about model evaluation, see Model evaluation and Best practices for LLM evaluation.

Compress a model

Before you deploy a trained model, you can quantize and compress the model. This effectively reduces the consumption of storage and computing resources. For more information, see Model compression.

References

  • On this page (1, T)
  • Overview
  • Environment requirements
  • Use a model in Model Gallery
  • Deploy and call a model service
  • Train a model
  • Evaluate a model
  • Compress a model
  • References
Feedback
phone Contact Us

Chat now with Alibaba Cloud Customer Service to assist you in finding the right products and services to meet your needs.

alicare alicarealicarealicare