The QuickStart module in the Platform for AI (PAI) console provides a no-code method to train and deploy machine learning models. QuickStart enables you to deploy Llama 2 models as an online service that can be accessed on a web UI or by calling APIs. QuickStart also supports customization of Llama 2 models before deployment. You can use custom datasets to fine-tune Llama 2 models based on your business requirements.
Llama 2 is a collection of generative text models that are developed by Meta and primarily pre-trained on English datasets. Llama 2 models have a range of parameter sizes: 7 billion (7b), 13 billion(13b), and 70 billion (70b). Each category features a variation that is called Llama-2-chat, which is fine-tuned to enhance performance in dialogue scenarios.
The llama-2-7b-chat model provided by PAI is an adaptation of the Llama-2-7b-chat model provided by Hugging Face. llama-2-7b-chat is a large language model (LLM) that is built based on the Transformer architecture and trained by using diverse open-source datasets. You can use the model in a wide range of common English dialogue scenarios.
This article uses llama-2-7b-chat as an example to describe how to use QuickStart to deploy a model as a service in Elastic Algorithm Service (EAS) and call the service.
QuickStart is available in the following regions: China (Beijing), China (Shanghai), China (Hangzhou), China (Shenzhen), China (Ulanqab), and Singapore.
Note: To use QuickStart in the China (Ulanqab) region, contact your account manager.
The llama-2-7b-chat model is suitable for most non-professional scenarios. If you want to tailor the model to your business scenario or add domain-specific knowledge to the model, you can fine-tune the model to improve performance.
Before you consider fine-tuning, note that LLMs including llama-2-7b-chat can also acquire basic knowledge through conversational interactions. Fine-tuning in QuickStart is based on Low-Rank Adaptation of Large Language Models (LoRA), which significantly reduces training costs and time compared with other methods such as supervised fine-tuning (SFT).
1. Log on to the PAI console. In the left-side navigation pane, click QuickStart.
2. Select a workspace and click Enter Quick Start.
3. On the Model List page, enter llama-2-7b-chat in the search box and click Search.
Note: You can select another model based on your business requirements. The llama-2-7b-chat model requires at least 64 GiB of memory and 24 GiB of GPU memory. Make sure that the computing resources of the workspace are sufficient. Otherwise, the deployment may fail.
4. Click the llama-2-7b-chat card. On the model details page, click Deploy.
5. In the lower part of the Deploy panel, click Deploy.
6. In the Billing Notification dialog box, click OK.
7. The Service details tab appears. When the Status of the service changes to In operation, the model is deployed as an inference service.
8. After you deploy the model, you can call the inference service.
a) On the Service details tab, navigate to the Web Application section and click View Web App.
b) Call the inference service by using the following methods.
1. Log on to the PAI console. In the left-side navigation pane, click QuickStart.
2. Select a workspace and click Enter Quick Start.
3. On the Model List page, enter llama-2-7b-chat in the search box and click Search.
Note: You can select another model based on your business requirements. The llama-2-7b-chat model requires at least 64 GiB of memory and 24 GiB of GPU memory. Make sure that the computing resources of the workspace are sufficient. Otherwise, the deployment may fail.
4. Click the llama-2-7b-chat card. On the model details page, click Fine-tune.
5. Configure fine-tuning parameters.
By default, QuickStart uses the most common configurations of Computing resources and Hyper-parameters. You can modify the configurations based on your business requirements. The following table describes the parameters that are used in this example.
Parameter | Description | |
Job Configuration | Output Path | The path of the OSS bucket where the generated model file is stored. Note: If you configured a default storage path on the Workspace Details page, the path is automatically specified as the value of the Output Path parameter. For information about how to configure a default storage path for a workspace, see Manage workspaces. |
Dataset Configuration | Training dataset | You can use the default training dataset that is provided by QuickStart. If you want to use a custom dataset, prepare the training data in the required format and upload the dataset by using the following options: • Dataset Selection: For more information, see Create and manage datasets. • OSS file or directory: For more information, see Get started with OSS. The training data must be in the JSON format. Each data record consists of a question, an answer, and an ID, which are specified by the instruction , output , and id fields, respectively. Example:
In addition to the training dataset, we recommend that you prepare a separate validation dataset. The validation dataset is used to evaluate the model performance and optimize the fine-tuning parameters. |
7. In the Billing Notification dialog box, click OK.
8. The Task details tab appears. When the Job Status changes to Success, the model training is completed.
The trained model is saved in the OSS path that you specify. You can view the OSS path in the Output Path field under the Basic Information section.
Note: If you use the default training dataset and the default configurations of the hyperparameters and computing resources, the training job can complete in 1.5 hours. If you use custom training data and configurations, the training duration may vary. Most training jobs can complete within a few hours.
9. Deploy the fine-tuned model.
The procedure for deploying a fine-tuned model is the same as the procedure for deploying a model that is not fine-tuned. For more information, see the Deploy a model without fine-tuning section of this article.
On the QuickStart page, you can click Job Management to view the details of the training jobs and model deployment.
35 posts | 1 followers
FollowAlibaba Cloud Data Intelligence - June 18, 2024
Alibaba Cloud Data Intelligence - June 20, 2024
Alibaba Cloud Data Intelligence - October 16, 2023
Farruh - March 20, 2024
Farruh - July 25, 2023
Alibaba Container Service - August 30, 2024
35 posts | 1 followers
FollowA platform that provides enterprise-level data modeling services based on machine learning algorithms to quickly meet your needs for data-driven operations.
Learn MoreOffline SDKs for visual production, such as image segmentation, video segmentation, and character recognition, based on deep learning technologies developed by Alibaba Cloud.
Learn MoreAccelerate innovation with generative AI to create new business success
Learn MoreAccelerate AI-driven business and AI model training and inference with Alibaba Cloud GPU technology
Learn MoreMore Posts by Alibaba Cloud Data Intelligence