All Products
Search
Document Center

Platform For AI:Deploy an online portrait service as a scalable job

Last Updated:Oct 31, 2024

In asynchronous inference scenarios, issues may occur, such as insufficient resource utilization and request interruption during scale-out. To resolve the issues, Elastic Algorithm Service (EAS) of Platform for AI (PAI) provides the Scalable Job service, which has an optimized service subscription mechanism. This topic describes how to deploy an online portrait service as a scalable job to perform inference.

Prerequisites

  • An Object Storage Service (OSS) bucket is created. For more information, see Create Buckets.

Deploy a scalable job for model inference

  1. Go to the EAS-Online Model Services page.

    1. Log on to the PAI console.

    2. In the left-side navigation pane, click Workspaces. On the Workspaces page, click the name of the workspace that you want to manage.

    3. In the left-side navigation pane, choose Model Deployment > Elastic Algorithm Service (EAS).

  2. Deploy a verification service.

    1. On the Elastic Algorithm Service (EAS) page, click Deploy Service. In the dialog box that appears, select Custom Deployment and click OK.

    2. On the Deploy Service page, configure the parameters. The following tables describe the key parameters. Use the default settings for other parameters. For more information, see Deploy a model service in the PAI console.

      • In the Model Service Information section, configure the following parameters:

        Parameter

        Description

        Service Name

        Specify a service name by following the on-screen instructions. Example: photog_check.

        Deployment Method

        Select Deploy Service by Using Image and turn on Asynchronous Inference Services.

        Select Image

        Select Image Address and specify the image address. Valid values:

        • Image address in the China (Beijing) region: registry.cn-beijing.aliyuncs.com/mybigpai/photog_pub:check. 1.0.0.pub.

        • Image address in the Singapore region: registry.ap-southeast-1.aliyuncs.com/mybigpai/photog_pub:check. 1.0.0.pub.

        Code Settings

        Click Specify Code Settings. Select Mount OSS Path, and configure the following parameters:

        • Select an OSS bucket path. Example: oss://examplebucket/.

        • Mount Path: In this example, /photog_oss is used.

        Command to Run

        • Set the value to python app.py.

        • Set the port number to 7860.

      • In the Resource Deployment Information section, configure the following parameters:

        Parameter

        Description

        Resource Group Type

        Select Public Resource Group.

        Resource Configuration Mode

        Select General.

        Resource Configuration

        Click GPU and select a -gu30 instance type. We recommend that you use the ml.gu7i.c32m188.1-gu30 instance type.

        Additional System Disk

        Set the value to 120. Unit: GB.

      • In the Asynchronous Service section, configure the following parameters:

        Parameter

        Description

        Resource Configuration for Asynchronous Queues

        Select Public Resource Group.

        Resources of Asynchronous Queues

        • Minimum Instances: 1

        • CPU: 8 Cores

        • Memory: 64 GB

        Maximum Data for A Single Input Request

        Set the value to 20480 KB to ensure that storage space is sufficient for each request in the queue.

        Maximum Data for A Single Output

      • In the VPC Settings section, select the virtual private cloud (VPC), vSwitch, and security group that you created.

      • In the Configuration Editor section, add the following options. For more information, see the complete configuration example.

        Field

        Added options

        metadata

        Add the following options:

        "rpc": {
                    "keepalive": 3600000,
                    "worker_threads": 1
                }
        • keepalive: the maximum processing time of a single request. Unit: milliseconds. Set the value to 3600000.

        • worker_threads: the number of threads that are used to concurrently process requests in each EAS instance.

          The default value is 5, which specifies that the first five tasks that are in the queue are assigned to the same instance. To ensure that requests are queued for processing in order, we recommend that you set this option to 1.

        queue

        Add the "max_delivery": 1 option to prevent repeated delivery after failure.

        Example of the complete configuration:

        {
            "metadata": {
                "name": "photog_check",
                "instance": 1,
                "rpc": {
                    "keepalive": 3600000,
                    "worker_threads": 1
                },
                "type": "Async"
            },
            "cloud": {
                "computing": {
                    "instance_type": "ml.gu7i.c32m188.1-gu30",
                    "instances": null
                },
                "networking": {
                    "vswitch_id": "vsw-2ze4o9kww55051tf2****",
                    "security_group_id": "sg-2ze0kgiee55d0fn4****",
                    "vpc_id": "vpc-2ze5hl4ozjl4fo7q3****"
                }
            },
            "features": {
                "eas.aliyun.com/extra-ephemeral-storage": "100Gi"
            },
            "queue": {
                "cpu": 8,
                "max_delivery": 1,
                "min_replica": 1,
                "memory": 64000,
                "resource": "",
                "source": {
                    "max_payload_size_kb": 20480
                },
                "sink": {
                    "max_payload_size_kb": 20480
                }
            },
            "storage": [
                {
                    "oss": {
                        "path": "oss://examplebucket/",
                        "readOnly": false
                    },
                    "properties": {
                        "resource_type": "code"
                    },
                    "mount_path": "/photog_oss"
                }
            ],
            "containers": [
                {
                    "image": "registry.cn-beijing.aliyuncs.com/mybigpai/photog_pub:check.1.0.0.pub",
                    "script": "python app.py",
                    "port": 7860
                }
            ]
        }
    3. Click Deploy.

  3. Deploy a training service.

    1. On the Elastic Algorithm Service (EAS) page, click Deploy Service. In the dialog box that appears, select Custom Deployment and click OK.

    2. On the Deploy Service page, configure the parameters. The following tables describe the key parameters. Use the default settings for other parameters. For more information, see Deploy a model service in the PAI console.

      • In the Model Service Information section, configure the following parameters:

        Parameter

        Description

        Service Name

        Specify a service name by following the on-screen instructions. In this example, photog_train_pmml is used.

        Deployment Method

        Select Deploy Service by Using Image and turn on Asynchronous Inference Services.

        Select Image

        Select Image Address and specify the image address. Valid values:

        • Image address in the China (Beijing) region: registry.cn-beijing.aliyuncs.com/mybigpai/photog_pub:train. 1.0.0.pub.

        • Image address in the Singapore region: registry.ap-southeast-1.aliyuncs.com/mybigpai/photog_pub:train. 1.0.0.pub.

        Code Settings

        Click Specify Code Settings. Select Mount OSS Path, and configure the following parameters:

        • Select the OSS bucket path that you specified for the verification service. Example: oss://examplebucket/.

        • Mount Path: In this example, /photog_oss is used.

        Command to Run

        • Set the value to python app.py.

        • Set the port number to 7860.

      • In the Resource Deployment Information section, configure the following parameters:

        Parameter

        Description

        Resource Group Type

        Select Public Resource Group.

        Resource Configuration Mode

        Select General.

        Resource Configuration

        Click GPU and select a -gu30 instance type. We recommend that you use the ml.gu7i.c32m188.1-gu30 instance type.

        Additional System Disk

        Set the value to 120. Unit: GB.

      • In the Asynchronous Service section, configure the following parameters:

        Parameter

        Description

        Resource Configuration for Asynchronous Queues

        Select Public Resource Group.

        Resources of Asynchronous Queues

        • Minimum Instances: 1

        • CPU: 8 Cores

        • Memory: 64 GB

        Maximum Data for A Single Input Request

        Set the value to 20480 KB to ensure that storage space is sufficient for each request in the queue.

        Maximum Data for A Single Output

      • In the VPC Settings section, select the virtual private cloud (VPC), vSwitch, and security group that you created.

      • In the Configuration Editor section, add the following options. For more information, see the complete configuration example.

        Field

        Added options

        autoscaler

        Optional. Configurations for automatic scaling of the service. For more information, see Auto scaling.

        "behavior": {
          "scaleDown": {
            "stabilizationWindowSeconds": 60
          }
        },
        "max": 5,
        "min": 1,
        "strategies": {
                    "queue[backlog]": 1
        }

        metadata

        Add the following options:

        "rpc": {
                    "keepalive": 3600000,
                    "worker_threads": 1
                }
        • keepalive: the maximum processing time of a single request. Unit: milliseconds. Set the value to 3600000.

        • worker_threads: the number of threads that are used to concurrently process requests in each EAS instance.

          The default value is 5, which specifies that the first five tasks that are in the queue are assigned to the same instance. To ensure that requests are queued for processing in order, we recommend that you set this option to 1.

        queue

        Add the "max_delivery": 1 option to prevent repeated delivery after failure.

        Example of the complete configuration:

        {
            "autoscaler": {
                "behavior": {
                    "scaleDown": {
                        "stabilizationWindowSeconds": 60
                    }
                },
                "max": 5,
                "min": 1,
                "strategies": {
                    "queue[backlog]": 1
                }
            },
            "metadata": {
                "name": "photog_train_pmml",
                "instance": 1,
                "rpc": {
                    "keepalive": 3600000,
                    "worker_threads": 1
                },
                "type": "Async"
            },
            "cloud": {
                "computing": {
                    "instance_type": "ml.gu7i.c32m188.1-gu30",
                    "instances": null
                },
                "networking": {
                    "vswitch_id": "vsw-2ze4o9kww55051tf2****",
                    "security_group_id": "sg-2ze0kgiee55d0fn4****",
                    "vpc_id": "vpc-2ze5hl4ozjl4fo7q3****"
                }
            },
            "features": {
                "eas.aliyun.com/extra-ephemeral-storage": "120Gi"
            },
            "queue": {
                "cpu": 8,
                "max_delivery": 1,
                "min_replica": 1,
                "memory": 64000,
                "resource": "",
                "source": {
                    "max_payload_size_kb": 20480
                },
                "sink": {
                    "max_payload_size_kb": 20480
                }
            },
            "storage": [
                {
                    "oss": {
                        "path": "oss://examplebucket/",
                        "readOnly": false
                    },
                    "properties": {
                        "resource_type": "code"
                    },
                    "mount_path": "/photog_oss"
                }
            ],
            "containers": [
                {
                    "image": "registry.cn-beijing.aliyuncs.com/mybigpai/photog_pub:train.1.0.0.pub",
                    "script": "python app.py",
                    "port": 7860
                }
            ]
        }
    3. Click Deploy.

  4. Deploy a prediction service.

    In this example, a prediction service is deployed as a scalable job. Perform the following steps:

    1. On the Elastic Algorithm Service (EAS) page, click Deploy Service. In the dialog box that appears, select Custom Deployment and click OK.

    2. In the Configuration Editor section, click JSON Deployment and enter the configuration information.

      {
          "metadata": {
              "name": "photog_pre_pmml",
              "instance": 1,
              "rpc": {
                  "keepalive": 3600000,
                  "worker_threads": 1
              },
              "type": "ScalableJob"
          },
          "cloud": {
              "computing": {
                  "instance_type": "ecs.gn6v-c8g1.2xlarge",
                  "instances": null
              },
              "networking": {
                  "vswitch_id": "vsw-2ze4o9kww55051tf2****",
                  "security_group_id": "sg-2ze0kgiee55d0fn4****",
                  "vpc_id": "vpc-2ze5hl4ozjl4fo7q3****"
              }
          },
          "features": {
              "eas.aliyun.com/extra-ephemeral-storage": "120Gi"
          },
          "queue": {
              "cpu": 8,
              "max_delivery": 1,
              "min_replica": 1,
              "memory": 64000,
              "resource": "",
              "source": {
                  "max_payload_size_kb": 20480
              },
              "sink": {
                  "max_payload_size_kb": 20480
              }
          },
          "storage": [
              {
                  "oss": {
                      "path": "oss://examplebucket/",
                      "readOnly": false
                  },
                  "properties": {
                      "resource_type": "code"
                  },
                  "mount_path": "/photog_oss"
              }
          ],
          "containers": [
              {
                  "image": "registry.cn-beijing.aliyuncs.com/mybigpai/photog_pub:infer.1.0.0.pub",
                  "env": [
                      {
                          "name": "URL",
                          "value": "http://127.0.0.1:8000"
                      },
                      {
                          "name": "AUTHORIZATION",
                          "value": "="
                      }
                  ],
                  "script": "python app.py",
                  "port": 7861
              },
              {
                  "image": "eas-registry-vpc.cn-beijing.cr.aliyuncs.com/pai-eas/stable-diffusion-webui:3.2",
                  "port": 8000,
                  "script": "./webui.sh --listen --port 8000 --skip-version-check --no-hashing --no-download-sd-model --skip-install --api --filebrowser --sd-dynamic-cache --data-dir /photog_oss/photog/webui/"
              }
          ]
      }

      The following table describes the key parameters. For more information about how to configure other parameters, see Parameters related to the service model.

      Parameter

      Description

      metadata

      name

      The service name, which is unique in the region.

      type

      Set the value to ScalableJob to deploy the asynchronous inference service as a scalable job.

      containers

      image

      You need to specify the image address of the AI portrait prediction service and the web UI prediction service. In this example, the image address in the China (Beijing) region is used. Valid values:

      • Image address in the China (Beijing) region:

        • AI portrait prediction service: registry.cn-beijing.aliyuncs.com/mybigpai/photog_pub:infer. 1.0.0.pub.

        • Web UI prediction service: eas-registry-vpc.cn-beijing.cr.aliyuncs.com/pai-eas/stable-diffusion-webui:3.2.

      • Image address in the Singapore region:

        • AI portrait prediction service: registry.ap-southeast-1.aliyuncs.com/mybigpai/photog_pub:infer. 1.0.0.pub.

        • Web UI prediction service: eas-registry-vpc.ap-southeast-1.cr.aliyuncs.com/pai-eas/stable-diffusion-webui:3.2.

      storage

      path

      In this example, OSS mounting is used. Set the value to the path of your OSS bucket that you specified for the verification service. Example: oss://examplebucket/.

      Download and decompress the WebUI model file file. Save the file in the OSS bucket. In this example, the oss://examplebucket/photog_oss/webui path is used. For information about how to upload objects to an OSS bucket, see ossutil overview. For information about how to upload files to an File Storage NAS file system, see Mount a file system on a Linux ECS instance and Manage files.

      mount_path

      Set the value to /photog_oss.

  5. Click Deploy.

    After you deploy a scalable job, the system automatically creates a queue service and enable the auto scaling feature for the service.

Call the service

After you deploy the service, you can call the service to implement AI portrait.

When you call a service, you must set the taskType parameter to query to specify that the request is an inference request. For more information, see the “Call the service” section in the Overview topic. Sample code:

import json
from eas_prediction import QueueClient

# Create input queue objects to receive input data. 
input_queue = QueueClient('182848887922****.cn-shanghai.pai-eas.aliyuncs.com', 'photog_check')
input_queue.set_token('<token>')
input_queue.init()

datas = json.dumps(
    {
       'request_id'    : 12345,
       'images'        : ["xx.jpg", "xx.jpg"], # urls, a list
       'configure'     : {
            'face_reconize' : True, # Judge whether all pictures are of a person
        }
    }
)
# Set the taskType parameter to query. 
tags = {"taskType": "query"}
index, request_id = input_queue.put(f'{datas}', tags)
print(index, request_id)

# View details about the input queues. 
attrs = input_queue.attributes()
print(attrs)

References