All Products
Search
Document Center

Platform For AI:TensorFlow processor

Last Updated:Apr 30, 2024

Elastic Algorithm Service (EAS) provides the built-in TensorFlow processor to deploy TensorFlow models in the standard SavedModel format as online services. This topic describes how to deploy and call a TensorFlow model service.

Background information

If your model is a Keras or Checkpoint model, you must first convert it to a SavedModel model before you can deploy it. For more information, see Export TensorFlow models in the SavedModel format. Models optimized by Blade can run directly.

TensorFlow processor versions

TensorFlow is available in multiple versions that support both CPU and GPU devices. If you do not have special business requirements, we recommend that you use the latest version when deploying services. Newer versions of TensorFlow provide better performance and are backwards compatible with features of the earlier versions. The following table lists the processor names corresponding to each TensorFlow version.

Processor name

TensorFlow version

GPU support

tensorflow_cpu_1.12

Tensorflow 1.12

No

tensorflow_cpu_1.14

Tensorflow 1.14

No

tensorflow_cpu_1.15

Tensorflow 1.15

No

tensorflow_cpu_2.3

Tensorflow 2.3

No

tensorflow_cpu_2.4

Tensorflow 2.4

No

tensorflow_cpu_2.7

Tensorflow 2.7

No

tensorflow_gpu_1.12

Tensorflow 1.12

Yes

tensorflow_gpu_1.14

Tensorflow 1.14

Yes

tensorflow_gpu_1.15

Tensorflow 1.15

Yes

tensorflow_gpu_2.4

Tensorflow 2.4

Yes

tensorflow_gpu_2.7

Tensorflow 2.7

Yes

Step 1: Deploy a model service

  1. Optional: Configure the warm-up request file.

    For some TensorFlow model services, model-related files and parameters need to be loaded to the memory the first time you call the services. This process may take a large amount of time, which causes the response time of the first few requests to be slower than expected. The requests may return 408 timeout errors or 450 errors. To ensure that the services do not experience jitter during rollovers, you must add the relevant parameters to warm up the models during service deployment. This ensures that the service instances receive traffic normally after the warm-up is complete. For more information, see Warm up model services.

  2. Deploy the service.

    When you use the EASCMD client to deploy a TensorFlow model service, you must set the processor parameter to the processor name. The following code block shows an example:

    {
      "name": "tf_serving_test",
      "model_path": "http://examplebucket.oss-cn-shanghai.aliyuncs.com/models/model.tar.gz",
      "processor": "tensorflow_cpu_1.15",
      "warm_up_data_path":"oss://path/to/warm_up_test.bin", // The path of the warm-up request file in Object Storage Service (OSS).     
      "metadata": {
        "instance": 1,
        "cpu": 1,
        "memory": 4000
      }
    }

    For more information about how to use the EASCMD client to deploy model services, see Deploy model services by using EASCMD or DSW.

    You can also deploy TensorFlow model services in the Machine Learning Platform for AI (PAI) console. For more information, see Model service deployment by using the PAI console.

  3. After deploying the TensorFlow model service, you can perform the following steps to obtain the public and virtual private cloud (VPC) endpoints of the service and the token used for service authentication: On the Elastic Algorithm Service (EAS) page, find the service that you want to call and click Call Inform in the Service Type column.

Step 2: Call the model service

The input and output of the TensorFlow service are both Protocol Buffers files instead of plain text. However, the online debugging feature in the PAI console only supports plain text. Therefore, the online debugging feature cannot be used.

EAS provides different versions of PAI SDKs to package service request and response data. The SDKs contain statements of direct connection and fault tolerance. We recommend that you use SDKs to create and send requests.

  1. Query the structure of the model.

    If an empty request is sent to a model in the standard SavedModel format, the model structure information in the JSON format is returned.

    // Send an empty request. 
    $ curl 1828488879222***.cn-shanghai.pai-eas.aliyuncs.com/api/predict/mnist_saved_model_example -H 'Authorization: YTg2ZjE0ZjM4ZmE3OTc0NzYxZDMyNmYzMTJjZTQ1***'
    
    // The following model structure information is returned. 
    {
        "inputs": [
            {
                "name": "images",
                "shape": [
                    -1,
                    784
                ],
                "type": "DT_FLOAT"
            }
        ],
        "outputs": [
            {
                "name": "scores",
                "shape": [
                    -1,
                    10
                ],
                "type": "DT_FLOAT"
            }
        ],
        "signature_name": "predict_images"
    }               
    Note

    The structure information of frozen pb models cannot be obtained.

  2. Send an inference request.

    The following code block shows an example of how to use PAI SDK for Python to send an inference request.

    #!/usr/bin/env python
    
    from eas_prediction import PredictClient
    from eas_prediction import TFRequest
    
    if __name__ == '__main__':
        client = PredictClient('http://1828488879222***.cn-shanghai.pai-eas.aliyuncs.com', 'mnist_saved_model_example')
        client.set_token('YTg2ZjE0ZjM4ZmE3OTc0NzYxZDMyNmYzMTJjZTQ1****')
        client.init()
    
        req = TFRequest('predict_images')
        req.add_feed('images', [1, 784], TFRequest.DT_FLOAT, [1] * 784)
        for x in range(0, 1000000):
            resp = client.predict(req)
            print(resp)

    For more information about parameter configurations, see SDK for Python.

Later, you can also build your own service request. For more information, see Request syntax.

Request syntax

You must ensure that the input and output data of the TensorFlow processor is in the Protocol Buffers format. When you use the PAI SDK to send a request, the SDK packages the request. You only need to create the request based on the functions provided by the SDK. If you want to create custom service call logic, you can generate the request code based on the following syntax. For more information, see Construct a request for a TensorFlow service.

syntax = "proto3";
option cc_enable_arenas = true;
option java_package = "com.aliyun.openservices.eas.predict.proto";
option java_outer_classname = "PredictProtos";
enum ArrayDataType {
  // Not a legal value for DataType. Used to indicate a DataType field
  // has not been set.
  DT_INVALID = 0;
  // Data types that all computation devices are expected to be
  // capable to support.
  DT_FLOAT = 1;
  DT_DOUBLE = 2;
  DT_INT32 = 3;
  DT_UINT8 = 4;
  DT_INT16 = 5;
  DT_INT8 = 6;
  DT_STRING = 7;
  DT_COMPLEX64 = 8;  // Single-precision complex.
  DT_INT64 = 9;
  DT_BOOL = 10;
  DT_QINT8 = 11;     // Quantized int8.
  DT_QUINT8 = 12;    // Quantized uint8.
  DT_QINT32 = 13;    // Quantized int32.
  DT_BFLOAT16 = 14;  // Float32 truncated to 16 bits.  Only for cast ops.
  DT_QINT16 = 15;    // Quantized int16.
  DT_QUINT16 = 16;   // Quantized uint16.
  DT_UINT16 = 17;
  DT_COMPLEX128 = 18;  // Double-precision complex.
  DT_HALF = 19;
  DT_RESOURCE = 20;
  DT_VARIANT = 21;  // Arbitrary C++ data types.
}
// Dimensions of an array.
message ArrayShape {
  repeated int64 dim = 1 [packed = true];
}
// Protocol buffer representing an array.
message ArrayProto {
  // Data Type.
  ArrayDataType dtype = 1;
  // Shape of the array.
  ArrayShape array_shape = 2;
  // DT_FLOAT.
  repeated float float_val = 3 [packed = true];
  // DT_DOUBLE.
  repeated double double_val = 4 [packed = true];
  // DT_INT32, DT_INT16, DT_INT8, DT_UINT8.
  repeated int32 int_val = 5 [packed = true];
  // DT_STRING.
  repeated bytes string_val = 6;
  // DT_INT64.
  repeated int64 int64_val = 7 [packed = true];
  // DT_BOOL.
  repeated bool bool_val = 8 [packed = true];
}
// PredictRequest specifies which TensorFlow model to run, as well as
// how inputs are mapped to tensors and how outputs are filtered before
// returning to user.
message PredictRequest {
  // A named signature to evaluate. If unspecified, the default signature
  // will be used.
  string signature_name = 1;
  // Input tensors.
  // Names of input tensor are alias names. The mapping from aliases to real
  // input tensor names is expected to be stored as named generic signature
  // under the key "inputs" in the model export.
  // Each alias listed in a generic signature named "inputs" should be provided
  // exactly once in order to run the prediction.
  map<string, ArrayProto> inputs = 2;
  // Output filter.
  // Names specified are alias names. The mapping from aliases to real output
  // tensor names is expected to be stored as named generic signature under
  // the key "outputs" in the model export.
  // Only tensors specified here will be run/fetched and returned, with the
  // exception that when none is specified, all tensors specified in the
  // named signature will be run/fetched and returned.
  repeated string output_filter = 3;
}
// Response for PredictRequest on successful run.
message PredictResponse {
  // Output tensors.
  map<string, ArrayProto> outputs = 1;
}