All Products
Search
Document Center

Platform For AI:Deploy a model service in the PAI console

Last Updated:Feb 05, 2025

Elastic Algorithm Service (EAS) of Platform for AI (PAI) allows you to deploy trained models as inference services or AI-powered web applications. You can use models you trained or trained models from open source communities. EAS provides multiple methods for deploying models that are obtained in different ways. EAS also provides various scenario-based deployment methods that you can use to quickly deploy a model as an online service in the PAI console. This topic describes how to deploy models and manage EAS online services in the PAI console.

Background information

You can deploy models and manage EAS online services in the console.

  • You can deploy a model by using one of the following methods:

    • Custom deployment: Custom deployment allows you to deploy models in a more flexible manner. You can deploy a model as an AI-powered web application or an inference service by using images or processors.

    • Scenario-based model deployment: EAS provides various scenario-specific deployment solutions that are suitable for different models, such as ModelScope, Hugging Face, Triton, TFServing, Stable Diffusion (for AI painting), and pre-trained large language models (LLMs). EAS provides simplified deployment solutions for these deployment scenarios.

  • Manage online model services

    You can manage deployed model services in the PAI console, such as viewing service details, updating service resource configurations, adding a version for a deployed model service, or scaling resources.

Procedure

  1. Log on to the PAI console. Select a region on the top of the page. Then, select the desired workspace and click Enter Elastic Algorithm Service (EAS).

  2. On the Inference Service tab of the Elastic Algorithm Service (EAS) page, click Deploy Service. On the Deploy Service page, select a deployment method.

    Deployment method

    Description

    Custom Model Deployment

    Custom Deployment

    A more flexible deployment method. You can quickly deploy a model as an online inference service by using a processor, or by configuring a preset image and third-party code library, mounting models and code, and running commands. For more information, see the Configure parameters for custom deployment.

    JSON Deployment

    The model is deployed based on the content of a JSON file. For more information, see Parameters of model services.

    Scenario-based Model Deployment

    AI Painting - SD Web UI Deployment

    This method allows you to quickly deploy an AI painting service based on an open source SD web application and call the deployed service by using the web application or API operations. EAS isolates users and computing resources to implement enterprise-level applications.

    LLM Deployment

    This method allows you to quickly deploy an LLM as a web application that you can call by using the web page and API operations. You can use LangChain to integrate the application with your business data and build an enterprise knowledge base to implement intelligent dialogue and other automated services. You can also use the built-in inference acceleration provided by PAI-Blade to implement simplified model deployment in a cost-effective manner.

    RAG-based Smart Dialogue Deployment

    This method allows you to deploy an intelligent dialogue system based on an LLM and the Retrieval-Augmented Generation (RAG) technique. The system is suitable for Q&A, summarization, and other natural language processing tasks that rely on custom knowledge bases.

    AI Video Generation: ComfyUI-based Deployment

    This method allows you to deploy web applications for AI video generation based on the ComfyUI and Stable Video Diffusion models. EAS can help you quickly implement AI-powered text-to-video generation for industries such as short video platforms and animation production.

    ModelScope Model Deployment

    This method allows you to quickly deploy an open source ModelScope model and start model services.

    Triton Deployment

    This method allows you to quickly deploy a model that uses an AI framework, such as TensorRT, TensorFlow, PyTorch, or ONNX as an online inference service by using the Triton Server inference service.

    Tensorflow Serving Deployment

    This method allows you to quickly deploy a model in the standard SavedModel format as an online service by using the TensorFlow Serving inference service.

  3. After you configure the parameters, click Deploy. When the service status changes to Running, the service is deployed.

Configure parameters for custom deployment

Basic Information

Parameter

Description

Service Name

Specify a service name as prompted.

Group

A service group has a unified ingress. You can use service groups to perform canary releases, blue-green deployments, heterogeneous resources inference, and asynchronous inference. For more information, see Manage service groups.

Environment Information

You can deploy a model by using an image or a processor.

  • Image-based Deployment: Select this deployment method if you want to quickly deploy AI inference services by mounting images, code, and models.

  • Processor-based Deployment: Select this deployment method if you want to deploy AI inference services by using models and processors, such as built-in processors or custom processors. For more information, see Deploy services by using built-in processors and Deploy services by using custom processors.

Note

In complex model inference scenarios, such as AI-generated content (AIGC) and video processing, inference takes a long time to complete. We recommend that you turn on Asynchronous Services to implement the asynchronous inference service. For more information, see Deploy an asynchronous inference service.

Deploy a service or web application by using an image

Image-based deployment supports asynchronous services and allows you to enable web applications. If the image that you use is integrated with a web UI application, the system automatically starts the web server after you enable web applications. This helps you directly access the web UI page.

Parameter

Description

Image Configuration

Valid values:

  • Alibaba Cloud Image: Select an Alibaba Cloud image.

  • Custom Image: Select a custom image. For more information about how to create a custom image, see Custom images.

  • Image Address: the URL of the image that is used to deploy the model service. Example: registry.cn-shanghai.aliyuncs.com/xxx/image:tag. You can specify the address of an image provided by PAI or a custom image. For more information about how to obtain the image address, see Custom images.

    Important

    The specified image must be in the same region as the service that you want to deploy.

    If you want to use an image from a private repository, click enter the username and password and specify the username and password of the image repository.

Model Settings

You can use one of the following methods to configure model files:

  • OSS

    • OSS: the path of the source Object Storage Service (OSS) bucket.

    • Mount Path: Specify the mount path of the service instance. The mount path is used to read files from the specified OSS path.

  • General-purpose File Storage NAS (NAS) file system

    • Select a file system: the ID of the created NAS file system. You can log on to the NAS console to view the ID of the NAS file system in the region. You can also view the ID of the NAS file system from the drop-down list.

    • Mount Target: the mount point of the NAS file system. The EAS service uses the mount point to access the NAS file system. For more information about how to create a general-purpose NAS file system, see Create a file system.

    • File System Path: the NAS path where the files are stored.

    • Mount Path: the mount path of the service instance. The mount path is used to read files from the NAS file system.

  • CPFS for Lingjun: If you use Lingjun resource quotas to deploy a service, you can mount CPFS for Lingjun storage resources.

    • Select a file system: Select a Cloud Parallel File Storage (CPFS) file system of your Alibaba Cloud account. For more information about how to create a CPFS file system, see Create a file system.

    • File System Path: the CPFS path where the files are stored.

    • Mount Path: the mount path of the service instance. The mount path is used to read files from the CPFS file system.

  • PAI Model

    • PAI Model: Select a registered model based on the model name and version. For more information about how to view registered models, see Register and manage models.

    • Mount Path: the mount path of the service instance. The mount path is used to read the model file.

Command

The command to run the image. Example: python/run.py.

You also need to enter the port number, which is the local HTTP port on which the model service listens after the image is deployed.

Important

You cannot specify ports 8080 and 9090 because the EAS engine listens on ports 8080 and 9090.

Code Build

You can use one of the following methods to configure the code:

  • OSS

    • OSS: the path of the source OSS bucket.

    • Mount Path: Specify the mount path of the service instance. The mount path is used to read files from the specified OSS path.

  • General-purpose NAS file system

    • Select a file system: the ID of the created NAS file system. You can log on to the NAS console to view the ID of the NAS file system in the region. You can also view the ID of the NAS file system from the drop-down list.

    • Mount Target: the mount point of the NAS file system. The EAS service uses the mount point to access the NAS file system. For more information about how to create a general-purpose NAS file system, see Create a file system.

    • File System Path: the NAS path where the files are stored.

    • Mount Path: the mount path of the service instance. The mount path is used to read files from the NAS file system.

  • CPFS for Lingjun: If you use Lingjun resource quotas to deploy a service, you can mount CPFS for Lingjun storage resources.

    • Select a file system: Select a CPFS file system of your Alibaba Cloud account. For more information about how to create a CPFS file system, see Create a file system.

    • File System Path: the CPFS path where the files are stored.

    • Mount Path: the mount path of the service instance. The mount path is used to read files from the CPFS file system.

  • Git

    • Git Repository Address: the address of the Git repository.

    • Mount Path: the mount path of the service instance. The path is used to read the code file from the Git directory.

  • Code Build

    • Code Build: Select an existing code build. If no code build is available, you can click Create Code Build to create a code build.

    • Mount Path: Specify the mount path of the service instance. The mount path is used to read files from the specified PAI code build.

  • Custom Dataset

    • Custom Dataset: Select an existing dataset. If no dataset is available, you can click Create Dataset to create a dataset.

    • Mount Path: Specify the mount path of the service instance. The mount path is used to read files from the specified PAI dataset.

Third-party Library Settings

Valid values:

  • Third-party Libraries: Specify third-party libraries in the field.

  • Path of requirements.txt: Specify the path of the requirements.txt file in the field. You must include the addresses of the third-party libraries in the requirements.txt file.

Environment Variables

Specify Key and Value for the environment variable.

  • Key: the name of the environment variable.

  • Value: the value of the environment variable.

Deploy a service by using processors

The following table describes the parameters if you set the Deployment Method parameter to Processor-based Deployment.

Parameter

Description

Model Settings

Valid values:

  • OSS: Select the OSS path in which the model file is stored.

  • Download URL: Enter a public URL.

  • PAI Model: Select a registered model by specifying the model name and the model version. For more information about how to view registered models, see Register and manage models.

Processor Type

The type of processor. You can select a built-in official processor or a custom processor based on your business requirements. For more information about built-in official processors, see Built-in processors.

Model Type

This parameter is required only if you set the Processor Type parameter to EasyVision(CPU), EasyVision(GPU), EasyTransfer(CPU), EasyTransfer(GPU), EasyNLP, or EasyCV. The available model types vary based on the processor type. You can configure the Processor Type and Model Type parameters based on your business requirements.

Processor Language

This parameter is available only if you set the Processor Type parameter to Custom Processor.

Valid values: cpp, java, and python.

Processor Package

This parameter is available only if you set the Processor Type parameter to Custom Processor. Valid values:

  • OSS: Select the OSS path in which the model file is stored.

  • Download URL: Enter a public URL.

Processor Main File

This parameter is available only if you set the Processor Type parameter to Custom Processor. This parameter specifies the main file of the processor package.

Mount Configurations

The following mount modes are supported:

  • OSS

    • OSS: the path of the source OSS bucket.

    • Mount Path: Specify the mount path of the service instance. The mount path is used to read files from the specified OSS path.

  • General-purpose NAS file system

    • Select a file system: the ID of the created NAS file system. You can log on to the NAS console to view the ID of the NAS file system in the region. You can also view the ID of the NAS file system from the drop-down list.

    • Mount Target: the mount point of the NAS file system. The EAS service uses the mount point to access the NAS file system. For more information about how to create a general-purpose NAS file system, see Create a file system.

    • File System Path: the NAS path where the files are stored.

    • Mount Path: the mount path of the service instance. The mount path is used to read files from the NAS file system.

  • CPFS for Lingjun: If you use Lingjun resource quotas to deploy a service, you can mount CPFS-based storage resources.

    • Select a file system: Select a CPFS file system of your Alibaba Cloud account. For more information about how to create a CPFS file system, see Create a file system.

    • File System Path: the CPFS path where the files are stored.

    • Mount Path: the mount path of the service instance. The mount path is used to read files from the CPFS file system.

  • Git

    • Git Repository Address: the address of the Git repository.

    • Mount Path: the mount path of the service instance. The path is used to read the code file from the Git directory.

  • Code Build

    • Code Build: Select an existing code build. If no code build is available, you can click Create Code Build to create a code build.

    • Mount Path: Specify the mount path of the service instance. The mount path is used to read files from the specified PAI code build.

  • Custom Dataset

    • Custom Dataset: Select an existing dataset. If no dataset is available, you can click Create Dataset to create a dataset.

    • Mount Path: Specify the mount path of the service instance. The mount path is used to read files from the specified PAI dataset.

Environment Variables

Specify Key and Value for the environment variable.

  • Key: the name of the environment variable.

  • Value: the value of the environment variable.

Resource Deployment

In the Resource Deployment section, configure the parameters described in the following table.

Parameter

Description

Resource Type

The type of the resource group in which you want to deploy the model. You can deploy the model by using the public resource group or a dedicated resource group that you have purchased. For more information, see Work with dedicated resource groups.

Note

If you run a small number of tasks and do not have high requirements for latency, we recommend that you use the public resource group.

GPU Sharing

This parameter is available only if you set the Resource Type parameter to EAS Resource Group. For more information, see GPU sharing.

Note
  • The GPU sharing feature is available only to users in the whitelist. If you want to use the GPU sharing feature, submit a ticket.

  • The GPU sharing feature does not support instances of the GU series. Make sure the EAS dedicated resources you purchase are not of the GU series.

Instances

We recommend that you specify multiple service instances to prevent risks caused by single-instance deployment.

If you set the Resource Type parameter to a dedicated EAS resource group, you must configure the GPUs, vCPUs, and Memory (MB) parameters for each service instance.

Deployment Resources

This parameter is supported when you set the Resource Type parameter to Public Resources.

  • You can select a single CPU or GPU instance type.

  • You can configure multiple instance types or use preemptible instances. For more information, see Specify multiple instance types and Specify preemptible instances.

    • Preemptible Instance Protection Period: You can specify a protection period of 1 hour for a preemptible instance. This means that the system ensures your access to the instance within the protection period of 1 hour.

    • Deployment Resources: You can configure Common instances and Preemptible instances at the same time. Resources are started based on the sequence in which the instance types are configured. You can add up to five resource types. If you use Preemptible instances, you need to set a bid price to bid for preemptible instances.

Elastic Resource Pool

This parameter is available only if you set the Resource Type parameter to EAS Resource Group.

You can turn on Elastic Resource Pool and configure your resources based on the instructions in the Deployment Resources section.

If you enable Elastic Resource Pool and the dedicated resource group that you use to deploy services is fully occupied, the system automatically adds pay-as-you-go instances to the public resource group during scale-outs. The added instances are billed as public resources. The instances in the public resource group are released first during scale-ins. For more information, see Elastic resource pool.

Additional System Disk

This parameter is available if you set the Resource Type parameter to Public Resources or EAS Resource Group and configure an elastic resource pool.

Configure additional system disks for the EAS service. Unit: GB. Valid values: 0 to 2000. You have a free quota of 30 GB on the system disk. If you specify 20 in the field, the available storage space is 50 GB.

Additional system disks are billed based on their capacity and usage duration. For more information, see Billing of EAS.

VPC (optional)

In the VPC section, configure the VPC (VPC), vSwitch, and Security Group Name parameters to enable VPC direct connection for the EAS service deployed in the public resource group. For more information, see Configure network connectivity.

After you enable the feature, the ECS instances that reside in the VPC can access EAS services deployed in the public resource group by using the created elastic network interface (ENI). The EAS services can also access other cloud services that reside in the VPC.

Features (optional)

In the Features section, configure the parameters described in the following table.

Parameter

Description

Memory Caching

If you enable this feature, the model files of an EAS service are cached to the on-premises directory to accelerate data reading and reduce latency. For more information, see Enable memory caching for a local directory.

Dedicated Gateway

You can configure a dedicated gateway to enhance access control and improve the security and efficiency of service access. For more information, see Use a dedicated gateway.

LLM Intelligent Router

Turn on LLM Intelligent Router and select an LLM Intelligent Router service that you deployed. If no LLM intelligent router is available, you can click Create LLM Intelligent Router to create an intelligent router. For more information, see Use LLM Intelligent Router to improve inference efficiency.

LLM Intelligent Router is a special EAS service that can be bound with an LLM inference service. When the LLM inference service has multiple instances, the LLM Intelligent Router can dynamically distributes requests based on backend load. This ensures that the computing power and memory resources of each inference instance are evenly allocated, and significantly improves the resource efficiency of the cluster.

Health Check

You can configure health check for the service. For more information, see Configure the health check feature.

Shared Memory

Configure shared memory for the instance to perform read and write operations on the memory without data copy or transfer. Unit: GB.

Enable gRPC

Specifies whether to enable the Google Remote Procedure Call (gRPC) connection for the service gateway. Default value: false. Valid values:

  • false: disables the gRPC connection. In this case, HTTP requests are supported by default.

  • true: enables the gRPC connection.

Service Response Timeout Period

The timeout period of the server for each request. Default value: 5.

Rolling Update

  • Number of Instances Exceeding Expectation: the maximum number of additional instances that can be created for the service during a rolling update. You can set this parameter to a positive integer, which specifies the number of additional instances. You can also set this parameter to a percentage, such as 2%, which specifies the ratio of the number of additional instances to the original number of service instances. The default value is 2%. The higher the value, the faster the service is updated. For example, if you set the number of service instances to 100 and set this parameter to 20, 20 additional instances are immediately created when you update the service.

  • Maximum number of Unavailable Instances: the maximum number of service instances that become unavailable during a rolling update. During a rolling update, the system can release existing instances to free up resources for new instances. This prevents update failures caused by insufficient resources. If a dedicated resource group is used, this parameter is set to 1 by default. If the public resource group is used, this parameter is set to 0 by default. For example, if you set this parameter to N, N instances are immediately stopped when a service update starts.

    Note

    If idle resources are sufficient, you can set this parameter to 0. If you set this parameter to a large value, service stability may be affected. This is because a larger value results in a reduced number of available instances during a service update and heavier workloads for each instance. When specifying this parameter, you must consider service stability and the resources you require.

Graceful Shutdown

  • Graceful Shutdown Time: the maximum amount of time allowed for a graceful shutdown. Unit: seconds. Default value: 30. EAS services use the rolling update policy. Before an instance is released, it enters the Terminating state and continues to process the requests that it received during the period of time that you specify, while the system switches the traffic to other instances. The instance is released after the instance finishes processing the requests. Therefore, the duration of the graceful shutdown process must be within the value of this parameter. If the time required to process requests is long, you can increase the value of this parameter to ensure that all requests that are in progress can be processed when the system updates the service.

    Important

    If you set this parameter to a small value, service stability may be affected. If you set this parameter to a large value, the service update may be prolonged. We recommend that you use the default value unless you have special requirements.

  • Send SIGTERM: Valid values:

    • false: When a service instance enters the EXIT state, the system does not send the SIGTERM signal. This is the default value.

    • true: When a service instance enters the EXIT state, the system immediately sends the SIGTERM signal to the main process. After the signal is received, the main process in the service performs a custom graceful shutdown in the signal processing function. If the signal is not processed, the main process may exit immediately after receiving the signal, causing the graceful shutdown to fail.

Save Call Records

You can enable this feature to persistently save all service requests and responses to MaxCompute tables or Simple Log Service. Turn on Save Call Records and select a save method:

  • MaxCompute

    • MaxCompute Project: Select an existing project from the drop-down list. If no project is available, you can click Create MaxCompute Project to create a project. For more information, see Create a MaxCompute project in the MaxCompute console.

    • MaxCompute Table: Specify a name for the table. When you deploy the service, the system automatically creates a table in the MaxCompute project.

  • Simple Log Service

    • Simple Log Service Project: specifies a project that is used to isolate and manage resources in Simple Log Service. Select an existing project. If no project is available, click Create Simple Log Service Project to create a project. For more information, see Manage a project.

    • Logstore: used to collect, store, and query log data in Simple Log Service. During logstore configuration and service deployment, the system automatically creates a logstore in a Simple Log Service project.

Task Mode

You can enable this feature to deploy an inference service as an elastic job service. For more information, see Overview.

Service Configuration

In the Service Configuration section, the configurations of the service are displayed in the code editor.

You can add configuration items that are not included in the preceding steps. For more information, see Parameters of model services.

You can use the EASCMD client to deploy the model based on the JSON configuration file. For more information, see Create a service.

Manage online model services in EAS

On the Inference Service tab of the Elastic Algorithm Service (EAS) page, you can view the deployed services, and stop, start, or delete the services.

Warning

If you stop or delete a model service, requests that rely on the model service fail. Proceed with caution.

  • View service details

    • Click the name of the service that you want to manage to go to the service details page. On the service details page, you can view the basic information, instances, and configurations of the service.

    • On the service details page, you can click different tabs to view information about service monitoring, logs, and deployment events.

  • Query container logs

    EAS implements log aggregation and filtering at the service instance level. If a service instance fails, you can troubleshoot error messages based on the container logs. Perform the following steps:

    1. Click the name of the service that you want to manage to go to the service details page.

    2. In the Service Instance section, click Containers in the Actions column.

    3. In the Containers dialog box, click Logs in the Actions column.

  • Update service resource configurations

    On the service details page, click Modify Configuration in the Resource Information section.

  • Add a version for a deployed model service

    On the Inference Service tab of the Elastic Algorithm Service (EAS) page, find the desired service and click Update in the Actions column.

    Warning

    When you add a version for a model service, the service is temporarily interrupted. Consequently, the requests that rely on the service fail until the service recovers. Proceed with caution.

    After you update the service, click the version number in the Current Version column to view the Version Information or change the service version.image

  • Scale resources

    On the Inference Service tab of the Elastic Algorithm Service (EAS) page, find the desired service and click Scale in the Actions column. In the Scale panel, specify the Instance Count parameter to adjust the instances that are used to run the model service.

  • Enable auto scaling

    You can configure automatic scaling for the service to enable the service to automatically adjust the resources that are used to run the online model services in EAS based on your business requirements. For more information, see Method 1: Manage the auto scaling feature in the PAI console.

References