Elastic Algorithm Service (EAS) of Platform for AI (PAI) allows you to deploy trained models as inference services or AI-powered web applications. You can use models you trained or trained models from open source communities. EAS provides multiple methods for deploying models that are obtained in different ways. EAS also provides various scenario-based deployment methods that you can use to quickly deploy a model as an online service in the PAI console. This topic describes how to deploy models and manage EAS online services in the PAI console.
Background information
You can deploy models and manage EAS online services in the console.
You can deploy a model by using one of the following methods:
Custom deployment: Custom deployment allows you to deploy models in a more flexible manner. You can deploy a model as an AI-powered web application or an inference service by using images or processors.
Scenario-based model deployment: EAS provides various scenario-specific deployment solutions that are suitable for different models, such as ModelScope, Hugging Face, Triton, TFServing, Stable Diffusion (for AI painting), and pre-trained large language models (LLMs). EAS provides simplified deployment solutions for these deployment scenarios.
You can manage deployed model services in the PAI console, such as viewing service details, updating service resource configurations, adding a version for a deployed model service, or scaling resources.
Procedure
Log on to the PAI console. Select a region on the top of the page. Then, select the desired workspace and click Enter Elastic Algorithm Service (EAS).
On the Inference Service tab of the Elastic Algorithm Service (EAS) page, click Deploy Service. On the Deploy Service page, select a deployment method.
Deployment method
Description
Custom Model Deployment
Custom Deployment
A more flexible deployment method. You can quickly deploy a model as an online inference service by using a processor, or by configuring a preset image and third-party code library, mounting models and code, and running commands. For more information, see the Configure parameters for custom deployment.
JSON Deployment
The model is deployed based on the content of a JSON file. For more information, see Parameters of model services.
Scenario-based Model Deployment
This method allows you to quickly deploy an AI painting service based on an open source SD web application and call the deployed service by using the web application or API operations. EAS isolates users and computing resources to implement enterprise-level applications.
This method allows you to quickly deploy an LLM as a web application that you can call by using the web page and API operations. You can use LangChain to integrate the application with your business data and build an enterprise knowledge base to implement intelligent dialogue and other automated services. You can also use the built-in inference acceleration provided by PAI-Blade to implement simplified model deployment in a cost-effective manner.
This method allows you to deploy an intelligent dialogue system based on an LLM and the Retrieval-Augmented Generation (RAG) technique. The system is suitable for Q&A, summarization, and other natural language processing tasks that rely on custom knowledge bases.
This method allows you to deploy web applications for AI video generation based on the ComfyUI and Stable Video Diffusion models. EAS can help you quickly implement AI-powered text-to-video generation for industries such as short video platforms and animation production.
This method allows you to quickly deploy an open source ModelScope model and start model services.
This method allows you to quickly deploy a model that uses an AI framework, such as TensorRT, TensorFlow, PyTorch, or ONNX as an online inference service by using the Triton Server inference service.
This method allows you to quickly deploy a model in the standard SavedModel format as an online service by using the TensorFlow Serving inference service.
After you configure the parameters, click Deploy. When the service status changes to Running, the service is deployed.
Configure parameters for custom deployment
Basic Information
Parameter | Description |
Service Name | Specify a service name as prompted. |
Group | A service group has a unified ingress. You can use service groups to perform canary releases, blue-green deployments, heterogeneous resources inference, and asynchronous inference. For more information, see Manage service groups. |
Environment Information
You can deploy a model by using an image or a processor.
Image-based Deployment: Select this deployment method if you want to quickly deploy AI inference services by mounting images, code, and models.
Processor-based Deployment: Select this deployment method if you want to deploy AI inference services by using models and processors, such as built-in processors or custom processors. For more information, see Deploy services by using built-in processors and Deploy services by using custom processors.
In complex model inference scenarios, such as AI-generated content (AIGC) and video processing, inference takes a long time to complete. We recommend that you turn on Asynchronous Services to implement the asynchronous inference service. For more information, see Deploy an asynchronous inference service.
Deploy a service or web application by using an image
Image-based deployment supports asynchronous services and allows you to enable web applications. If the image that you use is integrated with a web UI application, the system automatically starts the web server after you enable web applications. This helps you directly access the web UI page.
Parameter | Description |
Image Configuration | Valid values:
|
Model Settings | You can use one of the following methods to configure model files:
|
Command | The command to run the image. Example: You also need to enter the port number, which is the local HTTP port on which the model service listens after the image is deployed. Important You cannot specify ports 8080 and 9090 because the EAS engine listens on ports 8080 and 9090. |
Code Build | You can use one of the following methods to configure the code:
|
Third-party Library Settings | Valid values:
|
Environment Variables | Specify Key and Value for the environment variable.
|
Deploy a service by using processors
The following table describes the parameters if you set the Deployment Method parameter to Processor-based Deployment.
Parameter | Description |
Model Settings | Valid values:
|
Processor Type | The type of processor. You can select a built-in official processor or a custom processor based on your business requirements. For more information about built-in official processors, see Built-in processors. |
Model Type | This parameter is required only if you set the Processor Type parameter to EasyVision(CPU), EasyVision(GPU), EasyTransfer(CPU), EasyTransfer(GPU), EasyNLP, or EasyCV. The available model types vary based on the processor type. You can configure the Processor Type and Model Type parameters based on your business requirements. |
Processor Language | This parameter is available only if you set the Processor Type parameter to Custom Processor. Valid values: cpp, java, and python. |
Processor Package | This parameter is available only if you set the Processor Type parameter to Custom Processor. Valid values:
|
Processor Main File | This parameter is available only if you set the Processor Type parameter to Custom Processor. This parameter specifies the main file of the processor package. |
Mount Configurations | The following mount modes are supported:
|
Environment Variables | Specify Key and Value for the environment variable.
|
Resource Deployment
In the Resource Deployment section, configure the parameters described in the following table.
Parameter | Description |
Resource Type | The type of the resource group in which you want to deploy the model. You can deploy the model by using the public resource group or a dedicated resource group that you have purchased. For more information, see Work with dedicated resource groups. Note If you run a small number of tasks and do not have high requirements for latency, we recommend that you use the public resource group. |
GPU Sharing | This parameter is available only if you set the Resource Type parameter to EAS Resource Group. For more information, see GPU sharing. Note
|
Instances | We recommend that you specify multiple service instances to prevent risks caused by single-instance deployment. If you set the Resource Type parameter to a dedicated EAS resource group, you must configure the GPUs, vCPUs, and Memory (MB) parameters for each service instance. |
Deployment Resources | This parameter is supported when you set the Resource Type parameter to Public Resources.
|
Elastic Resource Pool | This parameter is available only if you set the Resource Type parameter to EAS Resource Group. You can turn on Elastic Resource Pool and configure your resources based on the instructions in the Deployment Resources section. If you enable Elastic Resource Pool and the dedicated resource group that you use to deploy services is fully occupied, the system automatically adds pay-as-you-go instances to the public resource group during scale-outs. The added instances are billed as public resources. The instances in the public resource group are released first during scale-ins. For more information, see Elastic resource pool. |
Additional System Disk | This parameter is available if you set the Resource Type parameter to Public Resources or EAS Resource Group and configure an elastic resource pool. Configure additional system disks for the EAS service. Unit: GB. Valid values: 0 to 2000. You have a free quota of 30 GB on the system disk. If you specify 20 in the field, the available storage space is Additional system disks are billed based on their capacity and usage duration. For more information, see Billing of EAS. |
VPC (optional)
In the VPC section, configure the VPC (VPC), vSwitch, and Security Group Name parameters to enable VPC direct connection for the EAS service deployed in the public resource group. For more information, see Configure network connectivity.
After you enable the feature, the ECS instances that reside in the VPC can access EAS services deployed in the public resource group by using the created elastic network interface (ENI). The EAS services can also access other cloud services that reside in the VPC.
Features (optional)
In the Features section, configure the parameters described in the following table.
Parameter | Description |
Memory Caching | If you enable this feature, the model files of an EAS service are cached to the on-premises directory to accelerate data reading and reduce latency. For more information, see Enable memory caching for a local directory. |
Dedicated Gateway | You can configure a dedicated gateway to enhance access control and improve the security and efficiency of service access. For more information, see Use a dedicated gateway. |
LLM Intelligent Router | Turn on LLM Intelligent Router and select an LLM Intelligent Router service that you deployed. If no LLM intelligent router is available, you can click Create LLM Intelligent Router to create an intelligent router. For more information, see Use LLM Intelligent Router to improve inference efficiency. LLM Intelligent Router is a special EAS service that can be bound with an LLM inference service. When the LLM inference service has multiple instances, the LLM Intelligent Router can dynamically distributes requests based on backend load. This ensures that the computing power and memory resources of each inference instance are evenly allocated, and significantly improves the resource efficiency of the cluster. |
Health Check | You can configure health check for the service. For more information, see Configure the health check feature. |
Shared Memory | Configure shared memory for the instance to perform read and write operations on the memory without data copy or transfer. Unit: GB. |
Enable gRPC | Specifies whether to enable the Google Remote Procedure Call (gRPC) connection for the service gateway. Default value: false. Valid values:
|
Service Response Timeout Period | The timeout period of the server for each request. Default value: 5. |
Rolling Update |
|
Graceful Shutdown |
|
Save Call Records | You can enable this feature to persistently save all service requests and responses to MaxCompute tables or Simple Log Service. Turn on Save Call Records and select a save method:
|
Task Mode | You can enable this feature to deploy an inference service as an elastic job service. For more information, see Overview. |
Service Configuration
In the Service Configuration section, the configurations of the service are displayed in the code editor.
You can add configuration items that are not included in the preceding steps. For more information, see Parameters of model services.
You can use the EASCMD client to deploy the model based on the JSON configuration file. For more information, see Create a service.
Manage online model services in EAS
On the Inference Service tab of the Elastic Algorithm Service (EAS) page, you can view the deployed services, and stop, start, or delete the services.
If you stop or delete a model service, requests that rely on the model service fail. Proceed with caution.
View service details
Click the name of the service that you want to manage to go to the service details page. On the service details page, you can view the basic information, instances, and configurations of the service.
On the service details page, you can click different tabs to view information about service monitoring, logs, and deployment events.
Query container logs
EAS implements log aggregation and filtering at the service instance level. If a service instance fails, you can troubleshoot error messages based on the container logs. Perform the following steps:
Click the name of the service that you want to manage to go to the service details page.
In the Service Instance section, click Containers in the Actions column.
In the Containers dialog box, click Logs in the Actions column.
Update service resource configurations
On the service details page, click Modify Configuration in the Resource Information section.
Add a version for a deployed model service
On the Inference Service tab of the Elastic Algorithm Service (EAS) page, find the desired service and click Update in the Actions column.
WarningWhen you add a version for a model service, the service is temporarily interrupted. Consequently, the requests that rely on the service fail until the service recovers. Proceed with caution.
After you update the service, click the version number in the Current Version column to view the Version Information or change the service version.
Scale resources
On the Inference Service tab of the Elastic Algorithm Service (EAS) page, find the desired service and click Scale in the Actions column. In the Scale panel, specify the Instance Count parameter to adjust the instances that are used to run the model service.
Enable auto scaling
You can configure automatic scaling for the service to enable the service to automatically adjust the resources that are used to run the online model services in EAS based on your business requirements. For more information, see Method 1: Manage the auto scaling feature in the PAI console.
References
After you deploy a service, you can use Online Debugging to check whether the service runs as expected. For more information, see Debug a service online.
After you deploy a model by using the scenario-based deployment method, you can call the service to check the model performance. For more information, see EAS use cases.
For more information about how to deploy model services in EAS, see Deploy a model service by using Machine Learning Designer or Deploy model services by using EASCMD.