Quickly deploy a Llama 3 model in EAS - Platform For AI - Alibaba Cloud Documentation Center

This topic describes how to quickly deploy a Llama 3 model and use the deployed web application in Elastic Algorithm Service (EAS) of Platform for AI (PAI).

Background information

Llama 3 provides pretrained and instruction-tuned versions of models in 8B and 70B sizes, which are suitable for various scenarios. Llama 3 inherits the overall architecture of Llama 2 but increases the context length from 4K to 8K. In specific performance evaluations, the pretrained and instruction-tuned versions of Llama 3 models demonstrated significant improvements over the previous generation in various capabilities, such as subject ability, reasoning, knowledge, and comprehension.

Deploy a model service in EAS

Go to the Elastic Algorithm Service (EAS) page.
1. Log on to the PAI console.
2. In the left-side navigation pane, click Workspaces. On the Workspaces page, click the name of the workspace in which you want to deploy the model.
3. In the left-side navigation pane, choose Model Deployment>Elastic Algorithm Service (EAS) to go to the Elastic Algorithm Service (EAS) page.
On the Elastic Algorithm Service (EAS) page, click Deploy Service. In the Scenario-based Model Deployment section, click LLM Deployment.

On the LLM Deployment page, configure the parameters. The following table describes the key parameters. Use the default values for other parameters.

Parameter	Description
Service Name	The name of the service. In this example, chat_llama3_demo is used.
Model Source	Select Open Source Model.
Model Type	Select llama3-8b.
Resource Configuration	We recommend that you select ml.gu7i.c8m30.1-gu30 for the Instance Type parameter in the China (Beijing) region. Note If the preceding instance type is unavailable, you can also use the ecs.gn6i-c24g1.12xlarge instance type.

Click Deploy. The model deployment requires approximately 3 minutes.
When the Service Status changes to Running, the service is deployed.

Use the web application to perform model inference

Find the service that you want to manage and click View Web App in the Service Type column.
Perform model inference by using the web application.
Enter a prompt in the input text box, such as Give me a plan for learning the basics of personal finance. Then, click Send.

Reference

For more information about the versions of ChatLLM-WebUI, see Release notes for ChatLLM WebUI.