This topic describes how to quickly deploy a Llama 3 model and use the deployed web application in Elastic Algorithm Service (EAS) of Platform for AI (PAI).
Background information
Llama 3 provides pretrained and instruction-tuned versions of models in 8B and 70B sizes, which are suitable for various scenarios. Llama 3 inherits the overall architecture of Llama 2 but increases the context length from 4K to 8K. In specific performance evaluations, the pretrained and instruction-tuned versions of Llama 3 models demonstrated significant improvements over the previous generation in various capabilities, such as subject ability, reasoning, knowledge, and comprehension.
Deploy a model service in EAS
Go to the Elastic Algorithm Service (EAS) page.
Log on to the PAI console.
In the left-side navigation pane, click Workspaces. On the Workspaces page, click the name of the workspace in which you want to deploy the model.
In the left-side navigation pane, choose Model Deployment>Elastic Algorithm Service (EAS) to go to the Elastic Algorithm Service (EAS) page.
On the Elastic Algorithm Service (EAS) page, click Deploy Service. In the Scenario-based Model Deployment section, click LLM Deployment.
On the LLM Deployment page, configure the parameters. The following table describes the key parameters. Use the default values for other parameters.
Parameter
Description
Service Name
The name of the service. In this example, chat_llama3_demo is used.
Model Source
Select Open Source Model.
Model Type
Select llama3-8b.
Resource Configuration
We recommend that you select ml.gu7i.c8m30.1-gu30 for the Instance Type parameter in the China (Beijing) region.
NoteIf the preceding instance type is unavailable, you can also use the ecs.gn6i-c24g1.12xlarge instance type.
Click Deploy. The model deployment requires approximately 3 minutes.
When the Service Status changes to Running, the service is deployed.
Use the web application to perform model inference
Find the service that you want to manage and click View Web App in the Service Type column.
Perform model inference by using the web application.
Enter a prompt in the input text box, such as
Give me a plan for learning the basics of personal finance
. Then, click Send.
Reference
For more information about the versions of ChatLLM-WebUI, see Release notes for ChatLLM WebUI.