Deploy a PAI EAS model service based on the TensorFlow Serving inference service engine - Platform For AI

TensorFlow Serving is an inference serving engine for deep learning models. TensorFlow Serving allows you to deploy TensorFlow models in the SavedModel format as online services. TensorFlow Serving also supports features such as rolling updates and version management of models. This topic describes how to deploy a model service by using a TensorFlow Serving image.

Before you begin

Model files

To deploy a model service with a TensorFlow Serving image, ensure your model files are stored in an Object Storage Service (OSS) bucket of the following structure:

Version sub-directories: Each directory must have at least one version sub-directory. The name of a version sub-directory must be a number that indicates the model version. A larger number indicates a later model version.
Model files: Model files are stored in the SavedModel format within a version sub-directory. A model service automatically loads the model files from the sub-directory that corresponds to the latest version.

Take the following steps:

Create a model storage directory in an OSS bucket (For example, oss://examplebucket/models/tf_serving/). For more information, see Manage directories.

Upload the model file to the directory created in the previous step (you can use tf_serving.zip as a sample). The model storage directory format is as follows:

tf_serving 
├── modelA
│   └── 1
│       ├── saved_model.pb
│       └── variables
│           ├── variables.data-00000-of-00001
│           └── variables.index
│
├── modelB
│   ├── 1
│   │   └── ...
│   └── 2
│       └── ...
│
└── modelC
    ├── 1
    │   └── ...
    ├── 2
    │   └── ...
    └── 3
        └── ...

Model configuration file

A configuration file allows you to run multiple models within a single service. If you only need to deploy a single-model service, skip this step.

Create a configuration file following the instructions below and upload it to OSS (the example provided in the Model files section includes a model configuration file named model_config.pbtxt, which you can use or modify as needed). In this example, the model configuration file is uploaded to oss://examplebucket/models/tf_serving/.

The model configuration file, model_config.pbtxt, should contain the following:

model_config_list {
  config {
    name: 'modelA'
    base_path: '/models/modelA/'
    model_platform: 'tensorflow'
    model_version_policy{
        all: {}
    }
  }
  config {
    name: 'modelB'
    base_path: '/models/modelB/'
    model_platform: 'tensorflow'
    model_version_policy{
        specific {
            versions: 1
            versions: 2
        }
    }
    version_labels {
      	key: 'stable'
      	value: 1
    }
    version_labels {
      	key: 'canary'
      	value: 2
    }
  }
  config {
    name: 'modelC'
    base_path: '/models/modelC/'
    model_platform: 'tensorflow'
    model_version_policy{
        latest {
            num_versions: 2
        }
    }
  }
}

The following table describes the key parameters:

Parameter	Required	Description
name	No	The name of the model. We recommend that you specify this parameter. Otherwise, you cannot call the service later.
base_path	Yes	The path to the model directory within the service instance, used to read model files in subsequent steps. For instance, if the mount directory is `/models` and the model directory to be loaded is `/models/modelA`, set this parameter to `/models/modelA`.
model_version_policy	No	The policy for loading model versions. If this parameter is omitted, the latest version is loaded by default. all{}: Loads all versions of the model. In the example, modelA loads all versions. latest{}: In the example, modelC is set with `num_versions: 2`, which loads the two latest versions, versions 2 and 3. specific{}: Loads specific versions. In the example, modelB loads versions 1 and 2.
version_labels	No	Custom labels for identifying model versions. Without version_labels, model versions can only be distinguished by their numbers. The request path is: `/v1/models/<model name>/versions/<version number>:predict`. If version_labels are set, you can request the version label to point to a specific version number: `/v1/models/<model name>/labels/<version label>:predict`. Note Labels can only be assigned to model versions that have been loaded and started as services by default. To assign labels to unloaded model versions, set Command to Run to `--allow_version_labels_for_unavailable_models=true`. Scenario-based deployment does not support Command to Run. Select custom deployment instead.

Deploy the service

You can use one of the following methods to deploy a TensorFlow Serving model service.

Scenario-based deployment: suitable for basic deployment scenarios. You need to only configure a few parameters to deploy a TensorFlow Serving model service.
Custom deployment: flexible configuration supported. For example, you can modify a port or specify the polling period of a model file.

Important

TensorFlow Serving model services support ports 8501 and 8500:

8501: launches an HTTP or REST server on port 8501 to receive HTTP requests.
8500: launches a Google Remote Procedure Call (gRPC) server on port 8500 to receive gRPC requests

By default, scenario-based deployment uses port 8501 and cannot be changed. To use port 8500, you must select custom deployment.

Scenario-based deployment

Perform the following steps:

Log on to the PAI console. Select a region and a workspace. Then, click Enter Elastic Algorithm Service (EAS).
On the Elastic Algorithm Service (EAS) page, click Deploy Service. In the Scenario-based Model Deployment section of the page that appears, click TensorFlow Serving Deployment.

On the TFServing deployment page, configure the key parameters described in the following table. For information about other parameters, see Deploy a model service in the PAI console.

Parameter

Description

Deployment Method

Supported deployment methods include:

Standard Model Deployment: This method is for deploying services that use a single model.
Configuration File Deployment: This approach is for deploying services that incorporate multiple models.

Model Settings

When you choose Standard Model Deployment as the Deployment Method, specify the OSS path that contains the model files.

When you choose Configuration File Deployment as the Deployment Method, configure the following parameters:

OSS: Choose the OSS path where the model files are stored.
Mount Path: Specifies the destination path within the service instance for accessing model files.
Configuration File: Select the OSS path for the model configuration file.

Example configurations:

Parameter	Single-model example (deploy modelA)	Multi-model example
Service Name	modela_scene	multi_scene
Deployment Method	Choose Standard Model Deployment.	Opt for Configuration File Deployment.
Model Settings	OSS: `oss://examplebucket/models/tf_serving/modelA/`.	OSS:`oss://examplebucket/models/tf_serving/`. Mount Path: /models Configuration File: `oss://examplebucket/models/tf_serving/model_config.pbtxt`

Click Deploy.

Custom deployment

Perform the following steps:

Log on to the PAI console. Select a region and a workspace. Then, click Enter Elastic Algorithm Service (EAS).
On the Elastic Algorithm Service (EAS) page, click Deploy Service. In the Custom Model Deployment section, click Custom Deployment.

On the Custom Deployment page, configure the key parameters described in the following table. For information about other parameters, see Deploy a model service in the PAI console.

Parameter	Description
Image Configuration	Select a version of tensorflow-serving form Alibaba Cloud Image. We recommend that you use the latest version. Note If the model service requires GPU resources, the image version must be in the x.xx.x-gpu format.
Model Settings	You can configure model files using multiple methods. This example uses OSS. OSS: Choose the OSS path where the model files are stored. Mount Path: The path within the service instance for reading model files.
Run Command	The startup parameters for tensorflow-serving. When selecting the tensorflow-serving image, the command `/usr/bin/tf_serving_entrypoint.sh` is preloaded. Configure the following parameters: Startup parameters for single-model deployment: --model_name: The name of the model, used in the service request URL. Default value: model. --model_base_path: The path to the model directory within the service instance. Default value: `/models/model`. Startup parameters for multi-model deployment: --model_config_file: Required. The path to the model configuration file. --model_config_file_poll_wait_seconds: Optional. The interval for checking updates to the model configuration file, in seconds. For example, `--model_config_file_poll_wait_seconds=30` means the service checks the file every 30 seconds. Note When a new configuration file is detected, only the changes in the new file are applied. For instance, if Model A is removed from the new file and Model B is added, the service will unload Model A and load Model B. --allow_version_labels_for_unavailable_models: Optional. Default value: false. Set to true to assign custom labels to unloaded model versions. For instance, `--allow_version_labels_for_unavailable_models=true`.

Example configurations:

Parameter	Single-model example (deploy modelA)	Multi-model example
Deployment Method	Select Image-based Deployment.
Image Configuration	Choose Alibaba Cloud Image: tensorflow-serving:2.14.1.
Model Settings	Select OSS. OSS: `oss://examplebucket/models/tf_serving/`. Mount Path: `/models`.
Run Command	`/usr/bin/tf_serving_entrypoint.sh --model_name=modelA --model_base_path=/models/modelA`	`/usr/bin/tf_serving_entrypoint.sh --model_config_file=/models/model_config.pbtxt --model_config_file_poll_wait_seconds=30 --allow_version_labels_for_unavailable_models=true`

The default port number is 8501. The model service launches an HTTP or REST server on port 8501 to receive HTTP requests. If you want the service to support gRPC requests, perform the following operations:

In Environment Information, change the Port Number to 8500.
In Service Configuration, Enable gRPC.
In Edit Service Configuration, add the following configuration:
```
"networking": {
    "path": "/"
}
```

Select Deploy.

Call a model service

You can send HTTP or gRPC requests to a model service based on the port number that you configure when you deploy the model service. The example below sends requests to modelA.

Prepare Test Data
ModelA is an image classification model that uses the Fashion-MNIST training dataset, comprising 28x28 grayscale images. It predicts the likelihood of a sample belonging to one of ten categories. For testing, the test data for modelA service requests is represented by [[[[1.0]] * 28] * 28].

View sample requests.

HTTP request

The service is configured to listen on port 8501 for HTTP requests. Below is a summary of the HTTP request paths for both single-model and multi-model deployments:

Single Model

Multi-Model

Path format: <service_url>/v1/models/<model_name>:predict

Where:

For scenario-based deployment, <model_name> is predefined as model.
For custom deployment, <model_name> corresponds to the model name set in the Command to Run. Default value: model.

The service accommodates requests with or without a specified model version. The respective path formats are:

For requests without a version (automatically loading the latest version):
<service_url>/v1/models/<model_name>:predict
For requests specifying a model version:
<service_url>/v1/models/<model_name>/versions/<version_num>:predict
If version labels are configured:
<service_url>/v1/models/<model_name>/labels/<version label>:predict

Here, <model_name> refers to the name set in the model's configuration file.

<service_url> is the endpoint of your service. To view it, go to the Elastic Algorithm Service (EAS) page, click Invocation Method in the Service Type column of the desired service. View the endpoint on the Public Endpoint tab. This URL is pre-filled when using the console for online debugging.

For instance, the HTTP request path for a scenario-based single model deployment of modelA is: <service_url>/v1/models/model:predict.

Below are examples of how to perform online debugging in the console and send service requests using Python code:

Perform online debugging

Once the service is deployed, select Online Debugging in the Actions column. The <service_url> is included in the Request Parameter Online Tuning. Append the path /v1/models/model:predict to the URL and enter the request data in the Body:

{"signature_name": "serving_default", "instances": [[[[1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0]], [[1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0]], [[1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0]], [[1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0]], [[1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0]], [[1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0]], [[1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0]], [[1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0]], [[1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0]], [[1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0]], [[1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0]], [[1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0]], [[1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0]], [[1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0]], [[1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0]], [[1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0]], [[1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0]], [[1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0]], [[1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0]], [[1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0]], [[1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0]], [[1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0]], [[1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0]], [[1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0]], [[1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0]], [[1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0]], [[1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0]], [[1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0]], [[1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0]], [[1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0]], [[1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0]], [[1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0]]]]}

After setting the parameters, click Send Request. Sample return:

Send HTTP requests using Python code

Sample Python code:

from urllib import request
import json

# Please replace with your service endpoint and token.
# You can view the token on the Public address call tab by clicking Call information in the Service method column of the inference service list.
service_url = '<service_url>'
token = '<test-token>'
# For scenario-based single-model deployment, use model. For other cases, refer to the path description table above.
model_name = "model"
url = "{}/v1/models/{}:predict".format(service_url, model_name)

# Create an HTTP request.
req = request.Request(url, method="POST")
req.add_header('authorization', token)
data = {
    'signature_name': 'serving_default',
    'instances': [[[[1.0]] * 28] * 28]
}

# Send the request.
response = request.urlopen(req, data=json.dumps(data).encode('utf-8')).read()

# View the response.
response = json.loads(response)
print(response)

gRPC request

For gRPC requests, configure the service to use port 8500 and add the necessary settings. Below is sample Python code for sending gRPC requests:

import grpc
import tensorflow as tf
from tensorflow_serving.apis import predict_pb2
from tensorflow_serving.apis import prediction_service_pb2_grpc
from tensorflow.core.framework import tensor_shape_pb2

# The endpoint of the service. For more information, see the description of the host parameter below.
host = "tf-serving-multi-grpc-test.166233998075****.cn-hangzhou.pai-eas.aliyuncs.com:80"
# Replace <test-token> with the token of the service. You can view the token on the Public address call tab.
token = "<test-token>"

# The name of the model. For more information, see the description of the name parameter below.
name = "<model_name>"
signature_name = "serving_default"
# Set the value to a model version that you want to use. You can specify only one model version in a request.
version = "<version_num>"

# Create a gRPC request.
request = predict_pb2.PredictRequest()
request.model_spec.name = name
request.model_spec.signature_name = signature_name
request.model_spec.version.value = version
request.inputs["keras_tensor"].CopyFrom(tf.make_tensor_proto([[[[1.0]] * 28] * 28]))

# Send the request.
channel = grpc.insecure_channel(host)
stub = prediction_service_pb2_grpc.PredictionServiceStub(channel)
metadata = (("authorization", token),)
response, _ = stub.Predict.with_call(request, metadata=metadata)

print(response)

The following table details the primary parameters:

Parameter	Description
host	The endpoint of the model service without the `http://` prefix and with the `:80` suffix. To obtain the endpoint, perform the following steps: Go to the Elastic Algorithm Service (EAS) page, find the model service, and then click Invocation Method in the Service Type column. On the Public Endpoint tab, view the endpoint of the model service.
name	For single-model gRPC requests: In scenario-based deployments, set name to model. In custom deployments, set name to the model name specified in the Command to Run. If not set, it defaults to model. For multi-model gRPC requests: Set name to the model name outlined in the model configuration file.
version	The model version that you want to use. You can specify only one model version in a request.
metadata	The token of the model service. You can view the token on the Public Endpoint tab.

References

For information about how to deploy a model service by using a Triton Inference Server image, see Use a Triton Inference Server image to deploy a model service.
You can create a custom image to deploy a model service in EAS. For more information, see Deploy a model service by using a custom image.