TensorFlow Serving is an inference serving engine for deep learning models. TensorFlow Serving allows you to deploy TensorFlow models in the SavedModel format as online services. TensorFlow Serving also supports features such as rolling updates and version management of models. This topic describes how to deploy a model service by using a TensorFlow Serving image.
Before you begin
Model files
To deploy a model service with a TensorFlow Serving image, ensure your model files are stored in an Object Storage Service (OSS) bucket of the following structure:
Version sub-directories: Each directory must have at least one version sub-directory. The name of a version sub-directory must be a number that indicates the model version. A larger number indicates a later model version.
Model files: Model files are stored in the SavedModel format within a version sub-directory. A model service automatically loads the model files from the sub-directory that corresponds to the latest version.
Take the following steps:
Create a model storage directory in an OSS bucket (For example,
oss://examplebucket/models/tf_serving/
). For more information, see Manage directories.Upload the model file to the directory created in the previous step (you can use tf_serving.zip as a sample). The model storage directory format is as follows:
tf_serving ├── modelA │ └── 1 │ ├── saved_model.pb │ └── variables │ ├── variables.data-00000-of-00001 │ └── variables.index │ ├── modelB │ ├── 1 │ │ └── ... │ └── 2 │ └── ... │ └── modelC ├── 1 │ └── ... ├── 2 │ └── ... └── 3 └── ...
Model configuration file
A configuration file allows you to run multiple models within a single service. If you only need to deploy a single-model service, skip this step.
Create a configuration file following the instructions below and upload it to OSS (the example provided in the Model files section includes a model configuration file named model_config.pbtxt, which you can use or modify as needed). In this example, the model configuration file is uploaded to oss://examplebucket/models/tf_serving/
.
The model configuration file, model_config.pbtxt, should contain the following:
model_config_list {
config {
name: 'modelA'
base_path: '/models/modelA/'
model_platform: 'tensorflow'
model_version_policy{
all: {}
}
}
config {
name: 'modelB'
base_path: '/models/modelB/'
model_platform: 'tensorflow'
model_version_policy{
specific {
versions: 1
versions: 2
}
}
version_labels {
key: 'stable'
value: 1
}
version_labels {
key: 'canary'
value: 2
}
}
config {
name: 'modelC'
base_path: '/models/modelC/'
model_platform: 'tensorflow'
model_version_policy{
latest {
num_versions: 2
}
}
}
}
The following table describes the key parameters:
Parameter | Required | Description |
name | No | The name of the model. We recommend that you specify this parameter. Otherwise, you cannot call the service later. |
base_path | Yes | The path to the model directory within the service instance, used to read model files in subsequent steps. For instance, if the mount directory is |
model_version_policy | No | The policy for loading model versions.
|
version_labels | No | Custom labels for identifying model versions. Without version_labels, model versions can only be distinguished by their numbers. The request path is: If version_labels are set, you can request the version label to point to a specific version number: Note Labels can only be assigned to model versions that have been loaded and started as services by default. To assign labels to unloaded model versions, set Command to Run to |
Deploy the service
You can use one of the following methods to deploy a TensorFlow Serving model service.
Scenario-based deployment: suitable for basic deployment scenarios. You need to only configure a few parameters to deploy a TensorFlow Serving model service.
Custom deployment: suitable for model services that need to run in specific environments. You can configure settings based on your business requirements.
TensorFlow Serving model services support ports 8501 and 8500:
8501: launches an HTTP or REST server on port 8501 to receive HTTP requests.
8500: launches a Google Remote Procedure Call (gRPC) server on port 8500 to receive gRPC requests
By default, scenario-based deployment uses port 8501 and cannot be changed. To configure port 8500 for gRPC services, select custom deployment.
The example below deploys modelA as a single-model service.
Scenario-based deployment
Perform the following steps:
Log on to the PAI console. Select a region and a workspace. Then, click Enter Elastic Algorithm Service (EAS).
On the Elastic Algorithm Service (EAS) page, click Deploy Service. In the Scenario-based Model Deployment section of the page that appears, click TensorFlow Serving Deployment.
On the TFServing deployment page, configure the key parameters described in the following table. For information about other parameters, see Deploy a model service in the PAI console.
Parameter
Description
Deployment Method
Supported deployment methods include:
Standard Model Deployment: This method is for deploying services that use a single model.
Configuration File Deployment: This approach is for deploying services that incorporate multiple models.
Model Settings
When you choose Standard Model Deployment as the Deployment Method, specify the OSS path that contains the model files.
When you choose Configuration File Deployment as the Deployment Method, configure the following parameters:
OSS: Choose the OSS path where the model files are stored.
Mount Path: Specifies the destination path within the service instance for accessing model files.
Configuration File: Select the OSS path for the model configuration file.
Example configurations:
Parameter
Single-model example (deploy modelA)
Multi-model example
Service Name
modela_scene
multi_scene
Deployment Method
Choose Standard Model Deployment.
Opt for Configuration File Deployment.
Model Settings
OSS:
oss://examplebucket/models/tf_serving/modelA/
.OSS:
oss://examplebucket/models/tf_serving/
.Mount Path: /models
Configuration File:
oss://examplebucket/models/tf_serving/model_config.pbtxt
Click Deploy.
Custom deployment
Perform the following steps:
Log on to the PAI console. Select a region and a workspace. Then, click Enter Elastic Algorithm Service (EAS).
On the Elastic Algorithm Service (EAS) page, click Deploy Service. In the Custom Model Deployment section, click Custom Deployment.
On the Custom Deployment page, configure the key parameters described in the following table. For information about other parameters, see Deploy a model service in the PAI console.
Parameter
Description
Image Configuration
Select a version of tensorflow-serving form Alibaba Cloud Image. We recommend that you use the latest version.
NoteIf the model service requires GPU resources, the image version must be in the x.xx.x-gpu format.
Model Settings
You can configure model files using multiple methods. This example uses OSS.
OSS: Choose the OSS path where the model files are stored.
Mount Path: The path within the service instance for reading model files.
Run Command
The startup parameters for tensorflow-serving. When selecting the tensorflow-serving image, the command
/usr/bin/tf_serving_entrypoint.sh
is preloaded. Configure the following parameters:Startup parameters for single-model deployment:
--model_name: The name of the model, used in the service request URL. Default value: model.
--model_base_path: The path to the model directory within the service instance. Default value:
/models/model
.
Startup parameters for multi-model deployment:
--model_config_file: Required. The path to the model configuration file.
--model_config_file_poll_wait_seconds: Optional. The interval for checking updates to the model configuration file, in seconds. For example,
--model_config_file_poll_wait_seconds=30
means the service checks the file every 30 seconds.NoteWhen a new configuration file is detected, only the changes in the new file are applied. For instance, if Model A is removed from the new file and Model B is added, the service will unload Model A and load Model B.
--allow_version_labels_for_unavailable_models: Optional. Default value: false. Set to true to assign custom labels to unloaded model versions. For instance,
--allow_version_labels_for_unavailable_models=true
.
Example configurations:
Parameter
Single-model example (deploy modelA)
Multi-model example
Deployment Method
Select Image-based Deployment.
Image Configuration
Choose Alibaba Cloud Image: tensorflow-serving:2.14.1.
Model Settings
Select OSS.
OSS:
oss://examplebucket/models/tf_serving/
.Mount Path:
/models
.
Run Command
/usr/bin/tf_serving_entrypoint.sh --model_name=modelA --model_base_path=/models/modelA
/usr/bin/tf_serving_entrypoint.sh --model_config_file=/models/model_config.pbtxt --model_config_file_poll_wait_seconds=30 --allow_version_labels_for_unavailable_models=true
The default port number is 8501. The model service launches an HTTP or REST server on port 8501 to receive HTTP requests. If you want the service to support gRPC requests, perform the following operations:
In Environment Information, change the Port Number to 8500.
In Service Configuration, Enable gRPC.
In Edit Service Configuration, add the following configuration:
"networking": { "path": "/" }
Select Deploy.
Call a model service
You can send HTTP or gRPC requests to a model service based on the port number that you configure when you deploy the model service. The example below sends requests to modelA.
Prepare Test Data
ModelA is an image classification model that uses the Fashion-MNIST training dataset, comprising 28x28 grayscale images. It predicts the likelihood of a sample belonging to one of ten categories. For testing, the test data for modelA service requests is represented by
[[[[1.0]] * 28] * 28]
.Sample Request:
HTTP request
The service is configured to listen on port 8501 for HTTP requests. Below is a summary of the HTTP request paths for both single-model and multi-model deployments:
Single Model
Multi-Model
Path format:
<service_url>/v1/models/<model_name>:predict
Where:
For scenario-based deployment, <model_name> is predefined as model.
For custom deployment, <model_name> corresponds to the model name set in the Command to Run. Default value: model.
The service accommodates requests with or without a specified model version. The respective path formats are:
For requests without a version (automatically loading the latest version):
<service_url>/v1/models/<model_name>:predict
For requests specifying a model version:
<service_url>/v1/models/<model_name>/versions/<version_num>:predict
If version labels are configured:
/v1/models/<model name>/labels/<version label>:predict
Here, <model_name> refers to the name set in the model's configuration file.
The <service_url> is the endpoint of your service. To view it, go to the Elastic Algorithm Service (EAS) page, click Invocation Method in the Service Type column of the desired service. View the endpoint on the Public Endpoint tab. This URL is pre-filled when using the console for online debugging.
For instance, the HTTP request path for a scenario-based single model deployment of modelA is:
<service_url>/v1/models/model:predict
.Below are examples of how to send service requests using the console and Python code:
Use the console
Once the service is deployed, select Online Debugging in the Actions column. The <service_url> is included in the Request Parameter Online Tuning. Append the path
/v1/models/model:predict
to the URL and enter the request data in the Body:{"signature_name": "serving_default", "instances": [[[[1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0]], [[1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0]], [[1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0]], [[1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0]], [[1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0]], [[1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0]], [[1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0]], [[1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0]], [[1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0]], [[1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0]], [[1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0]], [[1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0]], [[1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0]], [[1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0]], [[1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0]], [[1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0]], [[1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0]], [[1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0]], [[1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0]], [[1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0]], [[1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0]], [[1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0]], [[1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0]], [[1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0]], [[1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0]], [[1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0]], [[1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0]], [[1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0]], [[1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0]], [[1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0]], [[1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0]], [[1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0]]]]}
After setting the parameters, click Send Request. Sample return:
Python code
Sample Python code:
from urllib import request import json # Please replace with your service endpoint and token. # You can view the token on the Public address call tab by clicking Call information in the Service method column of the inference service list. service_url = '<service_url>' token = '<test-token>' # For scenario-based single-model deployment, use model. For other cases, refer to the path description table above. model_name = "model" url = "{}/v1/models/{}:predict".format(service_url, model_name) # Create an HTTP request. req = request.Request(url, method="POST") req.add_header('authorization', token) data = { 'signature_name': 'serving_default', 'instances': [[[[1.0]] * 28] * 28] } # Send the request. response = request.urlopen(req, data=json.dumps(data).encode('utf-8')).read() # View the response. response = json.loads(response) print(response)
gRPC request
For gRPC requests, configure the service to use port 8500 and add the necessary settings. Below is sample Python code for sending gRPC requests:
import grpc import tensorflow as tf from tensorflow_serving.apis import predict_pb2 from tensorflow_serving.apis import prediction_service_pb2_grpc from tensorflow.core.framework import tensor_shape_pb2 # The endpoint of the service. For more information, see the description of the host parameter below. host = "tf-serving-multi-grpc-test.166233998075****.cn-hangzhou.pai-eas.aliyuncs.com:80" # Replace <test-token> with the token of the service. You can view the token on the Public address call tab. token = "<test-token>" # The name of the model. For more information, see the description of the name parameter below. name = "<model_name>" signature_name = "serving_default" # Set the value to a model version that you want to use. You can specify only one model version in a request. version = "<version_num>" # Create a gRPC request. request = predict_pb2.PredictRequest() request.model_spec.name = name request.model_spec.signature_name = signature_name request.model_spec.version.value = version request.inputs["keras_tensor"].CopyFrom(tf.make_tensor_proto([[[[1.0]] * 28] * 28])) # Send the request. channel = grpc.insecure_channel(host) stub = prediction_service_pb2_grpc.PredictionServiceStub(channel) metadata = (("authorization", token),) response, _ = stub.Predict.with_call(request, metadata=metadata) print(response)
The following table details the primary parameters:
Parameter
Description
host
The endpoint of the model service without the
http://
prefix and with the:80
suffix. To obtain the endpoint, perform the following steps: Go to the Elastic Algorithm Service (EAS) page, find the model service, and then click Invocation Method in the Service Type column. On the Public Endpoint tab, view the endpoint of the model service.name
For single-model gRPC requests:
In scenario-based deployments, set name to model.
In custom deployments, set name to the model name specified in the Command to Run. If not set, it defaults to model.
For multi-model gRPC requests:
Set name to the model name outlined in the model configuration file.
version
The model version that you want to use. You can specify only one model version in a request.
metadata
The token of the model service. You can view the token on the Public Endpoint tab.
References
For information about how to deploy a model service by using a Triton Inference Server image, see Use a Triton Inference Server image to deploy a model service.
You can create a custom image to deploy a model service in EAS. For more information, see Deploy a model service by using a custom image.