All Products
Search
Document Center

Platform For AI:Quickly deploy a Llama 3 model in EAS

Last Updated:Sep 05, 2024

This topic describes how to quickly deploy a Llama 3 model and use the deployed web application in Elastic Algorithm Service (EAS) of Platform for AI (PAI).

Background information

Llama 3 provides pretrained and instruction-tuned versions of models in 8B and 70B sizes, which are suitable for various scenarios. Llama 3 inherits the overall architecture of Llama 2 but increases the context length from 4K to 8K. In specific performance evaluations, the pretrained and instruction-tuned versions of Llama 3 models demonstrated significant improvements over the previous generation in various capabilities, such as subject ability, reasoning, knowledge, and comprehension.

Deploy a model service in EAS

  1. Go to the Elastic Algorithm Service (EAS) page.

    1. Log on to the PAI console.

    2. In the left-side navigation pane, click Workspaces. On the Workspaces page, click the name of the workspace in which you want to deploy the model.

    3. In the left-side navigation pane, choose Model Deployment>Elastic Algorithm Service (EAS) to go to the Elastic Algorithm Service (EAS) page. image

  2. On the Elastic Algorithm Service (EAS) page, click Deploy Service. In the Scenario-based Model Deployment section, click LLM Deployment. image

  3. On the LLM Deployment page, configure the parameters. The following table describes the key parameters. Use the default values for other parameters.

    Parameter

    Description

    Service Name

    The name of the service. In this example, chat_llama3_demo is used.

    Model Source

    Select Open Source Model.

    Model Type

    Select llama3-8b.

    Resource Configuration

    We recommend that you select ml.gu7i.c8m30.1-gu30 for the Instance Type parameter in the China (Beijing) region.

    Note

    If the preceding instance type is unavailable, you can also use the ecs.gn6i-c24g1.12xlarge instance type.

    image

  4. Click Deploy. The model deployment requires approximately 3 minutes.

    When the Service Status changes to Running, the service is deployed.

Use the web application to perform model inference

  1. Find the service that you want to manage and click View Web App in the Service Type column. ab4a0f8d6e810dd12c22142d271499d0

  2. Perform model inference by using the web application.

    Enter a prompt in the input text box, such as Give me a plan for learning the basics of personal finance. Then, click Send. image

Reference

For more information about the versions of ChatLLM-WebUI, see Release notes for ChatLLM WebUI.