Fine-tune, evaluate, compress, and deploy a Qwen2.5-Coder model

Tongyi Qianwen 2.5-Coder (Qwen2.5-Coder or CodeQwen) is a new series of large language models (LLMs) released by Alibaba Cloud and developed for code processing. The LLMs support the following mainstream sizes to meet the diverse requirements of developers: 0.5B, 1.5B, 3B, 7B, 14B, and 32B. Qwen2.5-Coder achieves optimized performance in coding scenarios and maintains strong mathematical and inference capabilities by training on massive amounts of code data. Platform for AI (PAI) can provide full support for the models. This topic describes how to fine-tune, evaluate, compress, and deploy a Qwen2.5-Coder model in Model Gallery. In this topic, the Qwen2.5-Coder-32B-Instruct model is used.

Overview

Qwen2.5-Coder is a series of models with powerful programming capabilities launched by Alibaba Cloud. Qwen2.5-Coder supports a context length of up to 128,000 tokens and is compatible with 92 programming languages. The models perform well in various code-related tasks, including multilingual code generation, code completion, and code repair. Alibaba Cloud fine-tuned instructions based on Qwen2.5-Coder and released the Qwen2.5-Coder-Instruct model which is further improved in various tasks and demonstrates excellent generalization capabilities.

Multilingual programming capabilities
The Qwen2.5-Coder-Instruct model demonstrates superior multilingual programming capabilities. The model was extensively tested based on the McEval benchmark. The test covers more than 40 programming languages, including several niche ones. The results show that the model performs well in multilingual programming tasks.
Code inference
The Qwen2.5-Coder-Instruct model is outstanding in code inference tasks. The model demonstrates strong inference capabilities based on the CRUXEval evaluation benchmark. The model performs well in complex instruction execution as code inference capabilities improve. This provides a new perspective for subsequent research on how code capabilities affect general inference capabilities.
Mathematical capabilities
The Qwen2.5-Coder-Instruct model excels in mathematics and coding tasks. As a foundational discipline for coding, mathematics is closely related to programming. The outstanding performance of the model in both areas demonstrates its strong capabilities in the sciences.
Basic capabilities
The Qwen2.5-Coder-Instruct model maintains the advantages of Qwen2.5 based on the general capability evaluation. This proves that the model is suitable and remains stable across various tasks.

The preceding features allow the Qwen2.5-Coder series of models to provide strong technical support for multilingual programming and complex task processing.

Environment requirements

The Qwen2.5-Coder-Instruct model can be run in Model Gallery in the China (Beijing), China (Shanghai), China (Shenzhen), China (Hangzhou), China (Ulanqab), or Singapore region.

Make sure that your computing resources match the model size. The following table describes the requirements for each model size.

Model size	Requirement

Model size	Requirement
Qwen2.5-Coder-0.5B/1.5B	Training: Use GPUs, such as T4, P100, or V100, with 16 GB of memory or GPUs with higher specifications. Deployment: Use at least one P4 GPU. We recommend that you use one GU30, A10, V100, or T4 GPU.
Qwen2.5-Coder-3B/7B	Training: Use GPUs, such as A10 or T4, with 24 GB of memory or GPUs with higher specifications. Deployment: Use at least one P100, P4, or V100 (gn6v) GPU. We recommend that you use one GU30 or A10 GPU.
Qwen2.5-Coder-14B	Training: Use GPUs, such as V100, with 32 GB of memory or GPUs with higher specifications. Deployment: Use at least one L20 GPU, one GU60 GPU, or two GU30 GPUs. We recommend that you use two GU60 GPUs or two L20 GPUs.
Qwen2.5-Coder-32B	Training: Use GPUs, such as A800 or H800, with 80 GB of memory or GPUs with higher specifications. Deployment: Use at least two GU60 GPUs, two L20 GPUs, or four A10 GPUs. We recommend that you use four GU60 GPUs, four L20 GPUs, or eight V100 GPUs with 32 GB of memory.

Use a model in Model Gallery

Deploy and call a model service

Go to the Model Gallery page.
1. Log on to the PAI console.
2. In the upper-left corner, select a region based on your business requirements.
3. In the left-side navigation pane, click Workspaces. On the Workspaces page, click the name of the workspace that you want to use.
4. In the left-side navigation pane, choose QuickStart > Model Gallery.
In the model list of the Model Gallery page, search for and click the Qwen2.5-Coder-32B-Instruct model.
In the upper-right corner of the model details page, click Deploy. In the Deploy panel, configure the parameters to deploy the model to Elastic Algorithm Service (EAS) as a model service.
Call the model service.
On the Model Gallery page, click Job Management. On the Training jobs tab of the Job Management page, click the Deployment Jobs tab. On the tab that appears, click the name of the model service. On the model service details page, click View Web App in the upper-right corner.

Train a model

Model Gallery provides out-of-the-box fine-tuning algorithms for the Qwen2.5-Coder-32B-Instruct model, including the Supervised Fine-Tuning (SFT) and Direct Preference Optimization (DPO) algorithms.

SFT

DPO

The SFT algorithm supports inputs in the JSON format. Each data record consists of a question specified by the instruction field and an answer specified by the output field. Examples:

[
  {
    "instruction": "Create a function to calculate the sum of a sequence of integers.",
    "output": "# Python code\ndef sum_sequence(sequence):\n  sum = 0\n  for num in sequence:\n    sum += num\n  return sum"
  },
  {
    "instruction": "Generate a Python code for crawling a website for a specific type of data.",
    "output": "import requests\nimport re\n\ndef crawl_website_for_phone_numbers(website):\n    response = requests.get(website)\n    phone_numbers = re.findall('\\d{3}-\\d{3}-\\d{4}', response.text)\n    return phone_numbers\n    \nif __name__ == '__main__':\n    print(crawl_website_for_phone_numbers('www.example.com'))"
  }
]

The DPO algorithm supports inputs in the JSON format. Each data record consists of a question specified by the prompt field, an expected answer specified by the chosen field, and an unexpected answer specified by the rejected field. Examples:

[
  {
    "prompt": "Create a function to calculate the sum of a sequence of integers.",
    "chosen": "# Python code\ndef sum_sequence(sequence):\n  sum = 0\n  for num in sequence:\n    sum += num\n  return sum",
    "rejected": "[x*x for x in [1, 2, 3, 5, 8, 13]]"
  },
  {
    "prompt": "Generate a Python code for crawling a website for a specific type of data.",
    "chosen": "import requests\nimport re\n\ndef crawl_website_for_phone_numbers(website):\n    response = requests.get(website)\n    phone_numbers = re.findall('\\d{3}-\\d{3}-\\d{4}', response.text)\n    return phone_numbers\n    \nif __name__ == '__main__':\n    print(crawl_website_for_phone_numbers('www.example.com'))",
    "rejected": "def remove_duplicates(string): \n    result = \"\" \n    prev = '' \n\n    for char in string:\n        if char != prev: \n            result += char\n            prev = char\n    return result\n\nresult = remove_duplicates(\"AAABBCCCD\")\nprint(result)"
  }
]

In the upper-right corner of the model details page, click Train. In the Train panel, configure the following parameters:

Dataset Configuration: You can specify the Object Storage Service (OSS) path that contains datasets you prepared or select a dataset file that is stored in File Storage NAS (NAS) or Cloud Parallel File Storage (CPFS). You can also select the default path to use the public datasets provided by PAI.
Computing resources: The fine-tuning algorithm requires A800 or H800 GPUs with 80 GB of memory, or GPUs with higher specifications. Make sure that the resource quota that you use has sufficient computing resources. For information about the resource specifications required for models of other sizes, see Environment requirements.

Hyper-parameters: Configure the hyperparameters of the fine-tuning algorithm based on your business requirements. The following table describes the hyperparameters.

Hyperparameter	Type	Default value	Required	Description

Hyperparameter	Type	Default value	Required	Description
training_strategy	string	sft	Yes	The fine-tuning algorithm. Valid values: SFT and DPO.
learning_rate	float	5e-5	Yes	The learning rate, which controls the extent to which the model is adjusted.
num_train_epochs	int	1	Yes	The number of epochs. An epoch is a full cycle of exposing each sample in the training dataset to the algorithm.
per_device_train_batch_size	int	1	Yes	The number of samples processed by each GPU in one training iteration. A higher value results in higher training efficiency and higher memory usage.
seq_length	int	128	Yes	The length of the input data processed by the model in one training iteration.
lora_dim	int	32	No	The inner dimensions of the low-rank matrices that are used in Low-Rank Adaptation (LoRA) or Quantized Low-Rank Adaptation (QLoRA) training. Set this parameter to a value greater than 0.
lora_alpha	int	32	No	The LoRA or QLoRA weights. This parameter takes effect only if you set the lora_dim parameter to a value greater than 0.
dpo_beta	float	0.1	No	The extent to which the model relies on preference information during training.
load_in_4bit	bool	true	No	Specifies whether to load the model in 4-bit quantization. This parameter takes effect only if you set the lora_dim parameter to a value greater than 0 and the load_in_8bit parameter to false.
load_in_8bit	bool	false	No	Specifies whether to load the model in 8-bit quantization. This parameter takes effect only if you set the lora_dim parameter to a value greater than 0 and the load_in_4bit parameter to false.
gradient_accumulation_steps	int	8	No	The number of gradient accumulation steps.
apply_chat_template	bool	true	No	Specifies whether the algorithm combines the training data with the default chat template. A Qwen2 model is used in the following format: Question: `<\|im_end\|>\n<\|im_start\|>user\n + instruction + <\|im_end\|>\n` Answer: `<\|im_start\|>assistant\n + output + <\|im_end\|>\n`
system_prompt	string	You are a helpful assistant	No	The system prompt used to train the model.

After you configure the parameters, click Train. On the training job details page, you can view the status and log of the training job.
The trained model is automatically registered to the Models of the AI Asset Management module. You can view or deploy the model. For more information, see Register and manage models.

Evaluate a model

Scientific model evaluation helps developers measure and compare the performance of different models in an efficient manner. The evaluation also guides developers in selecting and optimizing models in an accurate manner. This accelerates AI innovation and application development.

Model Gallery provides out-of-the-box evaluation algorithms for the Qwen2.5-Coder-32B-Instruct model or the trained Qwen2.5-Coder-32B-Instruct model. For more information about model evaluation, see Model evaluation and Best practices for LLM evaluation.

Compress a model

Before you deploy a trained model, you can quantize and compress the model. This effectively reduces the consumption of storage and computing resources. For more information, see Model compression.

Overview

Environment requirements

Use a model in Model Gallery

Deploy and call a model service

Train a model

Evaluate a model

Compress a model

References

Sales Support

Technical Support

Connect & Report Abuse

About Alibaba Cloud

Our Global Network

Quick Start

Global Offices

Olympic Games Paris 2024 New

Stade Roland Garros – Glitz from the Past New

Place de la Concorde – “Breaking” the Barriers New

Vaires-sur-Marne Nautical Stadium – Sports with Sustainability New

International Broadcast Center – Images, Sounds, and Data that Captivate Billions New

Customer Success Stories New

Trust Center

Security & Compliance Center

Cloud Compliance Resources

Security Compliance FAQs

Product & Feature Update New

Cloud Forward

Press Room

Alibaba Cloud e-Magazine New

Alibaba Cloud in Analyst Research

Notice

Go Global Service New

Go Global Alliance with Alibaba Cloud

Asia Accelerator Hot

Information Compliance

China Gateway - MLPS 2.0 Compliance New

China Gateway - Networking

China Gateway - Global Application Acceleration New

China Gateway - Security

China Gateway - Data Security New

ICP Support Hot

China Gateway - Omnichannel Data Mid-End New

China Gateway - Organizational Data Mid-End New

China Gateway - Business Mid-End New

China Gateway - AI Service for Conversational Chatbots New

China Gateway - Online Education

China Gateway - Domain Registration

Work at Alibaba Cloud

Experienced Professionals

Students and Graduates

Free Trial

Pricing

Promo Center

Price Reduction

Pay Less and Deploy More

FinOps

Elastic Compute Service (ECS)

Simple Application Server (SAS)

Elastic GPU Service

Elastic Desktop Service (EDS)

Object Storage Service (OSS)

Cloud Enterprise Network (CEN)

Web Application Firewall (WAF)

Domain Names

Lingma

Container Compute Service (ACS)

Secure Access Service Edge (SASE)

Intelligent Media Services(IMS)

Edge Security Acceleration (ESA)(Original DCDN)

Intelligent Media Management

DingTalk Enterprise

YiDA

Alibaba Cloud Model Studio

Apsara Prime - For Easy Cloud Product Selection

Alibaba Cloud ECS - Cater All Your Cloud Hosting Needs

1TB CDN—Get Free 1 TB Outbound Traffic Plan Now

Security—Under Attack? Get Free Security Support

Short Message Service - Free Testing is Available

Elastic Compute Service (ECS) Hot

CloudBox

Compute Nest

Dedicated Host Hot

ECS Bare Metal Instance

Elastic GPU Service Featured

Simple Application Server (SAS) Hot

Auto Scaling

Cloud Phone Beta

Elastic Desktop Service (EDS) Featured

Batch Compute

Elastic High Performance Computing (E-HPC)

Super Computing Cluster (SCC)