All Products
Search
Document Center

Platform For AI:Deploy a model service in the PAI console

Last Updated:Aug 21, 2024

Elastic Algorithm Service (EAS) of Platform for AI (PAI) allows you to deploy trained models as inference services or AI-powered web applications. You can use models you trained or trained models from open source communities. EAS provides multiple methods for deploying models that are obtained in different ways. EAS also provides various scenario-based deployment methods that you can use to quickly deploy a model as an online service in the PAI console. This topic describes how to deploy models and manage EAS online services in the PAI console.

Prerequisites

A trained model is obtained.

Background information

You can deploy models and manage EAS online services in the console.

  • Upload and deploy models in the console

    You can deploy the model by using one of the following methods:

    • Custom deployment: Custom deployment allows you to deploy models in a more flexible manner. You can deploy a model as an AI-powered web application or an inference service by using images, models, or processors.

    • Scenario-based model deployment: EAS provides various scenario-specific deployment solutions that are suitable for different models, such as ModelScope, Hugging Face, Triton, TFServing, Stable Diffusion (for AI painting), and pre-trained large language models (LLMs). EAS provides simplified deployment solutions for these deployment scenarios.

  • Manage online model services

    You can manage deployed model services in the PAI console, such as viewing service details, updating service resource configurations, adding a version for a deployed model service, or scaling resources.

Upload and deploy models in the console

On the Elastic Algorithm Service (EAS) page, you can upload a model that you trained or a public model that you obtained from an open source community and then deploy the model as an online model service.

Step 1: Go to the Elastic Algorithm Service (EAS) page

  1. Log on to the PAI console.

  2. In the left-side navigation pane, click Workspaces. On the Workspaces page, click the name of the workspace that you want to manage.

  3. In the left-side navigation pane, choose Model Deployment > Elastic Algorithm Service (EAS). The Elastic Algorithm Service (EAS) page appears.

Step 2: Select a deployment method

  1. On the Inference Service tab, click Deploy Service.

  2. On the page that appears, select a deployment method.

    Deployment method

    Description

    Custom Model Deployment

    Custom Deployment

    A more flexible deployment method. You can quickly deploy a model as an online inference service by using a processor, or by configuring a preset image and third-party code library, mounting models and code, and running commands. For more information, see the Configure parameters for custom deployment section of this topic.

    JSON Deployment

    The model is deployed based on the content of a JSON file. For more information, see the Configure parameters for JSON deployment section of this topic.

    Scenario-based Model Deployment

    Note

    For information about the parameters of each scenario, see the Configure parameters for scenario-based deployment section of this topic.

    AI Painting - SD Web UI Deployment

    This method allows you to quickly deploy an AI painting service based on an open source SD web application and call the deployed service by using the web application or API operations. EAS isolates users and computing resources to implement enterprise-level applications.

    Large Language Model (LLM)

    This method allows you to quickly deploy an LLM as a web application that you can call by using the web page and API operations. You can use LangChain to integrate the application with your business data and build an enterprise knowledge base to implement intelligent dialogue and other automated services. You can also use the built-in inference acceleration provided by PAI-Blade to implement simplified model deployment in a cost-effective manner.

    RAG-based Smart Dialogue Deployment

    This method allows you to deploy an intelligent dialogue system based on a LLM and the Retrieval-Augmented Generation (RAG) technique. The system is suitable for Q&A, summarization, and other natural language processing tasks that rely on custom knowledge bases.

    AI Video Generation: ComfyUI-based Deployment

    This method allows you to deploy web applications for AI video generation based on the ComfyUI and Stable Video Diffusion models. EAS can help you quickly implement AI-powered text-to-video generation for industries such as short video platforms and animation production.

    ModelScope Model Deployment

    This method allows you to quickly deploy an open source ModelScope model and start model services.

    Hugging Face Model Deployment

    This method allows you to quickly deploy an open source Hugging Face model and start model services.

    Triton Deployment

    This method allows you to quickly deploy a model that uses an AI framework, such as TensorRT, TensorFlow, PyTorch, or ONNX as an online inference service by using the Triton Server inference service.

    TensorFlow Serving deployment

    This method allows you to quickly deploy a model in the standard SavedModel format as an online service by using the TensorFlow Serving engine.

Step 3: Deploy the service

Configure the parameters based on the deployment method. After you configure the parameters, click Deploy. When the service status changes to Running, the service is deployed.

Configure parameters for custom deployment

  1. On the Create Service page, configure the parameters in the Model Service Information section.

    • Service Name: Select a service name as prompted.

    • Deployment Method: The following deployment methods are supported: Deploy Service by Using Image, Deploy Web App by Using Image, and Deploy Service by Using Model and Processor.

      Note

      In complex model inference scenarios, such as AI content generation and video processing, inference takes a long time to complete. We recommend that you turn on Asynchronous Service to implement the asynchronous inference service. For more information, see Asynchronous inference services. The asynchronous inference service is available only if you set the Deployment Method parameter to Deploy Service by Using Image or Deploy Service by Using Model and Processor.

      • Deploy Service by Using Image: Select this deployment method if you want to quickly deploy AI inference services by mounting images, code, and models.

      • Deploy Web App by Using Image: Select this deployment method if you want to quickly deploy the web application by mounting images, code, and models.

      • Deploy Service by Using Model and Processor: Select this deployment method if you want to deploy AI inference services by using models and processors, such as built-in processors or custom processors. For more information, see Deploy model services by using built-in processors and Deploy services by using custom processors.

      Deploy a service or web application by using an image

      The following table describes the parameters if you set the Deployment Method parameter to Deploy Service by Using Image or Deploy Web App by Using Image.

      Parameter

      Description

      Select Image

      Valid values:

      • PAI Image: Select an Alibaba Cloud image.

      • Custom Image: Select a custom image. For more information about how to create a custom image, see View and add images.

      • Image Address: The URL of the image that is used to deploy the model service. Example: registry.cn-shanghai.aliyuncs.com/xxx/image:tag. You can specify the address of an image provided by PAI or a custom image. For more information about how to obtain the image address, see View and add images.

        Important

        The specified image must be in the same region as the service that you want to deploy.

        If you want to use an image from a private repository, click enter and specify the username and password of the image repository.

      Specify Model Settings

      Click Specify Model Settings to configure the model. You can use one of the following methods to configure model files:

      • Mount OSS Path

        • The path of the source OSS bucket.

        • In the Mount Path section, specify the mount path of the service instance. The mount path is used to read files from the specified OSS path.

      • Mount NAS File System

        • NAS Mount Target: the mount point of the NAS file system. The EAS service uses the mount point to access the NAS file system. For more information about how to create a general-purpose NAS file system, see Create a file system.

        • NAS Source Path: the NAS path where the files are stored.

        • Mount Path: the mount path of the service instance. The mount path is used to read files from the NAS file system.

      • Mount PAI Model

        • Configure the Model Name and Model Version parameters for an existing model that you want to use. For more information about how to view registered models, see Register and manage models.

        • Mount Path: the mount path of the service instance. The mount path is used to read the model file.

      Code Settings

      Click Specify Code Settings to configure the code. You can use one of the following mounting methods to provide access to the code that is required in the service deployment process.

      • Mount OSS Path

        • The path of the source OSS bucket.

        • In the Mount Path section, specify the mount path of the service instance. The mount path is used to read files from the specified OSS path.

      • Mount NAS File System

        • NAS Mount Target: the mount point of the NAS file system. The EAS service uses the mount point to access the NAS file system.

        • NAS Source Path: the NAS path where the files are stored.

        • Mount Path: the mount path of the service instance. The mount path is used to read files from the specified NAS path.

      • Mount Git Path

        • Git Repository Address: the address of the Git repository.

        • Mount Path: the mount path of the service instance. The path is used to read the code file from the Git directory.

      • Mount PAI Dataset

        • Select an existing dataset. If no dataset is available, you can click Create Dataset to create a dataset.

        • In the Mount Path section, specify the mount path of the service instance. The mount path is used to read files from the specified PAI dataset.

      • Mount PAI Code

        • Select an existing code build. If no code build is available, you can click Create Code Build to create a code build.

        • In the Mount Path section, specify the mount path of the service instance. The mount path is used to read files from the specified PAI code build.

      Third-party Libraries

      Click Specify Third-party Libraries to configure the third-party library. Valid values:

      • Third-party Libraries: Specify a third-party library in the field.

      • Path of requirements.txt: Specify the path of the requirements.txt file in the field. You must include the address of the third-party library in the requirements.txt file.

      Environment Variables

      Click Specify Environment Variables to configure environment variables.

      Specify Name and Value for the environment variable.

      • Variable Name: the name of the environment variable.

      • Variable Value: the value of the environment variable.

      Execution Command

      The command to run the image. Example: python/run.py.

      You also need to enter the port number, which is the local HTTP port on which the model service listens after the image is deployed.

      Important

      You cannot specify ports 8080 and 9090 because the EAS engine listens on ports 8080 and 9090.

      Deploy service by using model and processor

      The following table describes the parameters if you set the Deployment Method parameter to Deploy Service by Using Model and Processor.

      Parameter

      Description

      Model file

      Valid values:

      • Mount OSS Path

        Select the OSS path in which the model file is stored.

      • Upload Data

        1. Select an OSS path in the current region.

        2. Click Browse Local Files and select the on-premises model file that you want to upload. You can also directly drag the model file to the blank area.

      • Publicly Accessible Download URL

        Select Publicly Accessible Download URL. Then, enter a publicly accessible URL in the field below the parameter.

      • Select Model

        Configure the Model Name and Model Version parameters for an existing model that you want to use. For more information about how to view registered models, see Register and manage models.

      Processor Type

      The type of processor. You can select a built-in official processor or a custom processor based on your business requirements. For more information about built-in official processors, see Built-in processors.

      Model Type

      This parameter is required only if you set the Processor Type parameter to EasyVision(CPU), EasyVision(GPU), EasyTransfer(CPU), EasyTransfer(GPU), EasyNLP, or EasyCV. The available model types vary based on the processor type. You can configure the Processor Type and Model Type parameters based on your business requirements.

      Processor Language

      This parameter is available only if you set the Processor Type parameter to Custom Processor.

      Valid values: Cpp, Java, and python.

      Processor package

      This parameter is available only if you set the Processor Type parameter to Custom Processor. Valid values:

      • Import OSS File

        Select Import OSS File. Then, select the OSS path in which the processor package is stored.

      • Upload Local File

        1. Select Upload Local File.

        2. Select an OSS path in the current region.

        3. Click the folder icon and select the on-premises processor package that you want to upload. You can also directly drag the processor package to the blank area.

          The package is uploaded to the OSS path in the current region. The Processor Package parameter is automatically configured.

          Note

          You can accelerate the loading speed of a processor during model deployment by uploading an on-premises processor package.

      • Download from Internet

        Select Download from Internet. Then, enter a public URL.

      Processor Master File

      This parameter is available only if you set the Processor Type parameter to Custom Processor. This parameter specifies the main file of the processor package.

      Mount Settings

      Click Specify Mount Settings to configure the mounting method. You can use one of the following mounting methods.

      • Mount OSS Path

        • The path of the source OSS bucket.

        • In the Mount Path section, specify the mount path of the service instance. The mount path is used to read files from the specified OSS path.

      • Mount NAS File System

        • NAS Mount Target: the mount point of the NAS file system. The EAS service uses the mount point to access the NAS file system.

        • NAS Source Path: the NAS path where the files are stored.

        • Mount Path: the mount path of the service instance. The mount path is used to read files from the specified NAS path.

      • Mount PAI Dataset

        • Select an existing dataset. If no dataset is available, you can click Create Dataset to create a dataset.

        • In the Mount Path section, specify the mount path of the service instance. The mount path is used to read files from the specified PAI dataset.

      • Mount PAI Code

        • Select an existing code build. If no code build is available, you can click Create Code Build to create a code build.

        • In the Mount Path section, specify the mount path of the service instance. The mount path is used to read files from the specified PAI code build.

      Environment Variables

      Click Specify Environment Variables to configure environment variables.

      Specify Name and Value for the environment variable.

      • Variable Name: the name of the environment variable.

      • Variable Value: the value of the environment variable.

  2. In the Resource Deployment Information section of the Create Service page, configure the parameters. The following table describes the parameters.

    Parameter

    Description

    Resource Group Type

    The type of resource group in which you want to deploy the model. You can deploy the model by using the public resource group or a dedicated resource group. For more information, see Work with dedicated resource groups.

    Note

    If you run a small number of tasks and do not have high requirements for latency, we recommend that you use the public resource group.

    GPU Sharing

    This parameter is available only if you set the Resource Group Type parameter to a dedicated resource group. For more information, see GPU sharing.

    Note

    The GPU sharing feature is available only to users in the whitelist. If you want to use the GPU sharing feature, submit a ticket.

    Instance Count

    To prevent risks caused by single-instance deployment, we recommend that you specify multiple service instances.

    If you set the Resource Group Type parameter to a dedicated resource group, you must configure the CPU, Memory (MB), and GPU parameters for each service instance.

    Resource Configuration Mode

    This parameter is available only if you set the Resource Group Type parameter to Public Resource Group. This parameter supports the following configurations:

    • General

      You can select a single CPU or GPU instance type.

    • Cost-effective Resource Configuration

      You can configure multiple instance types or use preemptible instances. For more information, see Specify multiple instance types and Create and use preemptible instances.

      • Preemptible Instance Protection Period: You can specify a protection period of 1 hour for a preemptible instance. This means that the system ensures your access to the instance within the protection period of 1 hour.

      • Deployment: You can configure Common instances and Preemptible instances at the same time. Resources are started based on the sequence in which the instance types are configured. You can add up to five resource types. If you use Preemptible instances, you need to set a bid price to bid for preemptible instances.

    Elastic Resource Pool

    This parameter is available only if you set the Resource Group Type parameter to a dedicated resource group.

    You can turn on Elastic Resource Pool and configure your resources based on the instructions in the Resource Configuration Mode section.

    If you enable Elastic Resource Pool and the dedicated resource group that you use to deploy services is fully occupied, the system automatically adds pay-as-you-go instances to the public resource group during scale-outs. The added instances are billed as public resources. The instances in the public resource group are released first during scale-ins. For more information, see Elastic resource pool.

    Extra System Storage

    This parameter is available only if you set the Resource Group Type parameter to Public Resource Group.

    Click Extra System Storage to configure additional system disks for the EAS service. Unit: GB. Valid values: 0 to 2000. You have a free quota of 30 GB on the system disk. If you specify 20 in the field, the available storage space is 50 GB.

    Additional system disks are billed based on their capacity and usage duration. For more information, see Billing of EAS.

  3. Optional. In the VPC Settings section, set the VPC, vSwitch, and Security Group Name parameters to enable VPC direct connection for the EAS service deployed in the public resource group.

    After you enable the feature, the ECS instances that reside in the VPC can access EAS services deployed in the public resource group by using the created elastic network interface (ENI). In addition, the EAS services can access other cloud services that reside in the VPC.

  4. Optional. In the Service Configuration section, configure the parameters. The following table describes the parameters.

    Parameter

    Description

    Memory Caching

    If you enable this feature, the model files of an EAS service are cached to the on-premises directory to accelerate data reading and reduce latency. For more information, see Enable memory caching for a local directory.

    Specify a service response timeout period

    The service response timeout period. Default value: 15. Unit: seconds.

    Shared Memory Configuration

    The size of the shared memory. Unit: GB.

    Scalable Job Mode

    You can enable this feature to deploy an inference service as an elastic job service. For more information, see Overview of Elastic Job.

    Save Service Calls

    You can enable this feature to persistently save all service requests and responses to MaxCompute tables or Simple Log Service (SLS). Turn on the switch and select a Save Method:

    • MaxCompute

      • MaxCompute Project: Select an existing project from the drop-down list. If no project is available, you can click Create MaxCompute Project to create a project. For more information, see the "Create a MaxCompute project in the MaxCompute console" section in the Create a MaxCompute project topic.

      • MaxCompute Table: Specify the name of the table. When you deploy the service, the system automatically creates a table in the MaxCompute project.

    • Simple Log Service

      • Select a SLS project: Select a SLS project to isolate and control resources. If no project is available, you can click Go to Create SLS Project to create a project. For more information, see Manage a project.

      • logstore: Specify a logstore to collect, store, and query logs in SLS. When you deploy the service, the system sutomatically creates the logstore you specify in the SLS project.

    Dedicated Gateway

    Click Dedicated Gateway and select a dedicated gateway from the drop-down list. You can configure a dedicated gateway to enhance access control and improve the security and efficiency of service access. For more information, see Use a dedicated gateway.

    Health Check

    You can configure health check for the service. For more information, see Configure the health check feature.

    LLM Intelligent Router

    You can configure LLM intelligent router for the service. If no intelligent router is available, you can click Create LLM Intelligent Router to create an intelligent router. For more information, see Use LLM Gateway to improve inference efficiency.

    LLM Intelligent Router is a specialized EAS service that can be binded with a LLM inference service. When a LLM inference service corresponds to multiple backend instances, the intelligent router dynamically distributes requests based on backend load. This ensures balanced distribution of computational power and memory usage among backend instances, enhancing the utilization of cluster resources.

  5. In the Configuration Editor section, the configurations of the service are displayed in the code editor. You can add configuration items that are not included in the preceding steps. For more information, see the "Create a service" section in the Run commands to use the EASCMD client topic.

    image

Configure parameters for JSON Deployment

Prepare a JSON file that is used to deploy the service. For more information, see Parameters of model services. On the JSON Deployment page, enter the content of the JSON file in the JSON editor and click Deploy. image

Configure parameters for scenario-based model deployment

The following section describes the parameters for different scenarios.

AI Painting - SD Web UI Deployment

Parameter

Description

Basic Information

Service Name

The name of the service.

Edition

Valid values:

  • Standard Edition

    Standard Edition is suitable for an individual user to deploy common tests and applications, and supports web application and API calls.

  • API Edition

    API Edition is suitable for scenarios in which you need to integrate your business by calling API operations. The system automatically converts the service into an asynchronous inference service. For more information, see Asynchronous inference services.

  • Cluster Edition WebUI

    Cluster Edition WebUI is suitable for teamwork scenarios in which multiple members use the web application to generate images. This edition ensures that each user has an independent model and output path. The backend computing resources are shared and scheduled in a centralized manner to improve cost-effectiveness.

  • Serverless Edition

    The deployment of a Serverless Edition service is free of charge. You are charged based only on the time required to generate the image. The service automatically scales the service based on your request. You can call a Serverless Edition service only by using the web UI.

    Note

    You can deploy Serverless Edition services only in the China (Shanghai) and China (Hangzhou) regions.

Model Settings

You can specify model settings in the following scenarios: (1) you want to use an open source model that you downloaded from a community or a model that you fine-tuned; (2) you want to save the output data to your data source; (3) you need to install third-party plug-ins or configurations. Click Add to configure model settings. Valid values:

  • Mount OSS: an empty file directory in the OSS bucket. For more information about how to create a bucket, see Create a bucket. For more information about how to create an empty directory, see Manage directories.

  • Mount NAS

    • NAS Mount Target: the mount point of the NAS file system. The EAS service uses the mount point to access the NAS file system.

    • NAS Source Path: the NAS path where the files are stored.

Resource Configuration

Instance Count

The number of threads used by the component. Default value: 1. To prevent risks caused by single-instance deployment, we recommend that you specify multiple service instances.

Resource Configuration

Select the instance type for model deployment based on your business requirements. Only the public resource group is supported. To achieve cost-effectiveness, we recommend that you use the ml.gu7i.c16m60.1-gu30 instance type.

VPC Configuration (Optional)

VPC

Enable VPC direct connection for the EAS services deployed in the public resource group.

After you enable the feature, the ECS instances that reside in the VPC can access EAS services deployed in the public resource group by using the created ENI. In addition, the EAS services can access other cloud services that reside in the VPC.

vSwitch

Security Group Name

Large Language Model (LLM)

Parameter

Description

Basic Information

Service Name

The name of the service.

Model Source

Valid values:

  • Open Source Model: You can select a model from the Model Type drop-down list to quickly load and deploy a built-in LLM without the need to upload your model.

  • Custom Fine-tuned Model: You need to configure model settings to mount the fine-tuned model and configure the parameters to deploy the model.

Model Type

Select a model category.

Model Settings

This parameter is required only if you set the Model Source parameter to Custom Fine-tuned Model.

Valid values:

  • Mount OSS: the OSS bucket directory in which the fine-tuned model is stored.

  • Mount NAS

    • NAS Mount Target: the mount point of the NAS file system. The EAS service uses the mount point to access the NAS file system.

    • NAS Source Path: the source path of the NAS file system in which the fine-tuned model is stored.

  • Mount PAI Model: Select a registered model by specifying the model name and the model version. For more information about how to register models, see Register and manage models.

Resource Configuration

Instance Count

The number of threads used by the component. Default value: 1. We recommend that you specify multiple service instances to prevent risks caused by single-instance deployment.

Resource Configuration

Select the instance type for model deployment based on your business requirements. Only the public resource group is supported. To achieve cost-effectiveness, we recommend that you use the ml.gu7i.c16m60.1-gu30 instance type.

VPC Configuration (Optional)

VPC

Enable VPC direct connection for the EAS services deployed in the public resource group.

After you enable the feature, the ECS instances that reside in the VPC can access EAS services deployed in the public resource group by using the created ENI. In addition, the EAS services can access other cloud services that reside in the VPC.

vSwitch

Security Group Name

RAG-based LLM Chatbot Deployment

Parameter

Description

Basic Information

Service Name

The name of the service.

Model Source

Valid values:

  • Open Source Model: You can select a model from the Model Type drop-down list to quickly load and deploy a built-in LLM without the need to upload your model.

  • Custom Fine-tuned Model: You need to configure model settings to mount the fine-tuned model and configure the parameters to deploy the model.

Model Type

Select a model category.

Resource Configuration

Instances

The number of threads used by the component. Default value: 1. We recommend that you specify multiple service instances to prevent risks caused by single-instance deployment.

Resource Configuration

  • If you set Model Source to Open Source Model, the system automatically selects an instance type based on the selected model type as the default value.

  • If you set Model Source to Custom Fine-tuned Model, you need to select an instance type that matches the model. For more information, see Deploy LLM applications in EAS.

Inference Acceleration

Inference acceleration can be enabled for the Qwen, Llama2, ChatGLM, or Baichuan2 model that is deployed on A10 or GU30 instances. Valid values:

  • BladeLLM Inference Acceleration: The BladeLLM inference acceleration engine ensures high concurrency and low latency. You can use BladeLLM to accelerate LLM inference in a cost-effective manner.

  • Open-source vLLM Inference Acceleration

Vector Database Settings

Select a database as your vector database. Valid values: FAISS, Elasticsearch, Milvus, Hologres, and AnalyticDB. For more information about how to create and configure a vector database, see Step 1: Prepare a vector database and Step 2: Deploy a RAG service.

VPC Configuration (Optional)

VPC

  • If you use Hologres, AnalyticDB for PostgreSQL, or Elasticsearch to build a vector database, select the VPC in which the vector database is deployed.

  • If you use Faiss to build a vector database, you do not need to configure the VPC.

vSwitch

Security Group Name

AI Video Generation: ComfyUI-based Deployment

Parameter

Description

Basic Information

Service Name

The name of the model service.

Edition

The edition of the service. Valid values:

  • Standard Edition: suitable for scenarios in which a single user calls the service by using the web UI or by using API operations when the service is deployed on a single instance.

  • API Edition: suitable for high-concurrency scenarios. The system automatically deploys the service as an asynchronous service. This edition supports service calls only by using API operations.

  • Cluster Edition WebUI: suitable when multiple users call the service by using the web UI at the same time. This edition supports service calls only by using the web UI. For information about how a Cluster Edition service works, see Principles of the Cluster Edition service.

For information about the scenarios of each edition, see the Background information section of this topic.

Model Settings

If you deploy a fine-tuned model, install the ComfyUI plug-in, or call the API Edition or Standard Edition service by using API operations, click Add to configure the model. This makes it easier to upload the model and plug-in and obtain the inference result. Valid values:

  • Mount OSS: Click the image icon to select an existing OSS directory.

  • Mount NAS: Configure a NAS mount target and NAS source path.

You can upload the custom model and ComfyUI plug-in to the specific OSS or NAS path in subsequent steps. For more information, see the "How do I mount a custom model and the ComfyUI plug-in?" section of this topic.

Resource Configuration

Instance Count

If you select Standard Edition, we recommend that you set the number of instances to 1.

Resource Configuration

We recommend that you use the GU30, A10 or T4 GPU types. By default, the system uses the GPU-accelerated > ml.gu7i.c16m60.1-gu30 instance type to ensure cost-effectiveness.

Note

ComfyUI supports only single-GPU mode, which means tasks can run on a single-GPU instance or multiple single-GPU instances. ComfyUI does not support multi-GPU concurrent operation.

ModelScope Model Deployment

Parameter

Description

Basic Information

Service Name

The name of the service.

Select Model

Select a ModelScope model from the drop-down list.

Model Version

Select a model version from the drop-down list. By default, the latest version is used.

Model Type

After you select a model, the system automatically configures the Model Type parameter.

Resource Configuration

Instance Count

The number of threads used by the component. Default value: 1. We recommend that you specify multiple service instances to prevent risks caused by single-instance deployment.

Resource Configuration

Select the instance type for model deployment based on your business requirements. Only the public resource group is supported.

VPC Configuration (Optional)

VPC

Enable VPC direct connection for the EAS services deployed in the public resource group.

After you enable the feature, the ECS instances that reside in the VPC can access EAS services deployed in the public resource group by using the created ENI. In addition, the EAS services can access other cloud services that reside in the VPC.

vSwitch

Security Group Name

Hugging Face Model Deployment

Parameter

Description

Basic Information

Service Name

The name of the service.

Model ID

The ID of the Hugging Face model. Example: distilbert-base-uncased-finetuned-sst-2-english.

Model Type

The type of the Hugging Face model. Example: text-classification.

Model Version

The version of the Hugging Face model. Example: main.

Resource Configuration

Instance Count

The number of threads used by the component. Default value: 1. We recommend that you specify multiple service instances to prevent risks caused by single-instance deployment.

Resource Configuration

Select the instance type for model deployment based on your business requirements. Only the public resource group is supported.

VPC Configuration (Optional)

VPC

Enable VPC direct connection for the EAS services deployed in the public resource group.

After you enable the feature, the ECS instances that reside in the VPC can access EAS services deployed in the public resource group by using the created ENI. In addition, the EAS services can access other cloud services that reside in the VPC.

vSwitch

Security Group Name

Triton Deployment

Parameter

Description

Basic Information

Service Name

The name of the service.

Model Settings

Make sure that the model you deploy meets the structure requirements of Triton. For more information, see Model deployment by using Triton Server. After you prepare the model, select one of the following method to deploy the model:

  • Mount OSS: Select the OSS bucket directory in which the model is stored.

  • Mount NAS

    • NAS Mount Target: the mount point of the NAS file system. The EAS service uses the mount point to access the NAS file system. For more information about how to create a general-purpose NAS file system, see Create a file system.

    • NAS Source Path: the source path of the model in NAS.

  • Mount PAI Model: Select a registered model by specifying the model name and the model version. For more information about how to register models, see Register and manage models.

Resource Configuration

Instance Count

The number of threads used by the component. Default value: 1. We recommend that you specify multiple service instances to prevent risks caused by single-instance deployment.

Resource Configuration

Select the instance type for model deployment based on your business requirements. Only public resources are supported.

VPC Configuration (Optional)

VPC

Enable VPC direct connection for the EAS services deployed in the public resource group.

After you enable the feature, the ECS instances that reside in the VPC can access EAS services deployed in the public resource group by using the created ENI. In addition, the EAS services can access other cloud services that reside in the VPC.

vSwitch

Security Group Name

TensorFlow Serving Deployment

Parameter

Description

Basic Information

Service Name

The name of the service.

Deployment method

The following deployment methods are supported:

  • Standard Model Deployment: used to deploy a single-model service.

  • Configuration File Deployment: used to deploy a multi-model service.

Model Settings

TensorFlow Serving has specific structure requirements for deployed models. For more information, see Model deployment by using TensorFlow Serving.

  • If you set the Deployment Method parameter to Standard Model Deployment, you must configure the OSS bucket directory in which the model file is stored.

  • If you set the Deployment Method parameter to Configuration File Deployment, you must configure the following parameters:

    • OSS: the OSS bucket directory in which the model is stored.

    • Mount Path: the mount path of the service instance. The mount path is used to read the model file.

    • Configuration File: the OSS path in which the configuration file is stored.

Resource Configuration

Instance Count

The number of threads used by the component. Default value: 1. We recommend that you specify multiple service instances to prevent risks caused by single-instance deployment.

Resource Configuration

Select the instance type for model deployment based on your business requirements. Only the public resource group is supported.

VPC Configuration (Optional)

VPC

Enable VPC direct connection for the EAS services deployed in the public resource group.

After you enable the feature, the ECS instances that reside in the VPC can access EAS services deployed in the public resource group by using the created ENI. In addition, the EAS services can access other cloud services that reside in the VPC.

vSwitch

Security Group Name

Manage online model services in EAS

On the Inference Service tab of the Elastic Algorithm Service (EAS) page, you can view deployed services, and stop, start, or delete services.

Warning

If you stop or delete a model service, requests that rely on the model service fail. Proceed with caution.

  • View service details

    • Click the name of the service that you want to manage to go to the Service Details page. On the Service Details page, you can view the basic information, instances, and configurations of the service.

    • On the Service Details page, you can click different tabs to view information about service monitoring, logs, and deployment events.

  • View container logs

    EAS implements log aggregation and filtering at the service instance level. If a service instance fails, you can troubleshoot error messages based on the container logs.

    1. Click the name of the service to go to the Service Details page.

    2. In the Service Instance section, click Containers in the Actions column.

    3. In the Containers pane, click Logs in the Actions column.

  • Update service resource configurations

    On the Service Details tab, click Resource Configuration in the Resource Information section. In the Resource Configuration dialog box, update the resources that are used to run the service. For more information, see Upload and deploy models in the console.

  • Add a version for a deployed model service

    On the EAS-Online Model Services page, find the service that you want to update and click Update Service in the Actions column. For more information, see Upload and deploy models in the console.

    Warning

    When you add a version for a model service, the service is temporarily interrupted. Consequently, the requests that rely on the service fail until the service recovers. Proceed with caution.

    After you update the service, click the version number in the Current Version column to view the Version Information or change the service version. image

  • Scale resources

    On the EAS-Online Model Services page, find the service that you want to manage and click Scale in the Actions column. In the Scale dialogue box, specify the number of Instances to adjust the instances that are used to run the model service.

  • Enable auto scaling

    You can configure automatic scaling for the service to enable the service to automatically adjust the resources that are used to run the online model services in EAS based on your business requirements. For more information, see the "Method 1: Manage the horizontal auto scaling feature in the console" section in the Enable or disable the horizontal auto-scaling feature topic.

References