All Products
Search
Document Center

Platform For AI:Built-in processors

Last Updated:Apr 30, 2024

A processor is a package of online prediction logic. Elastic Algorithm Service (EAS) of Platform for AI (PAI) provides built-in processors, which are commonly used to deploy models. The built-in processors can help you reduce the costs of developing the online prediction logic of models.

The following table describes the names and codes of the processors provided by EAS. If you use the EASCMD client to deploy a model, a processor code is required.

Processor name

Processor code (required if EASCMD is used)

Reference

CPU edition

GPU edition

PMML

pmml

None

PMML processor

TensorFlow1.12

tensorflow_cpu_1.12

tensorflow_gpu_1.12

TensorFlow1.12 Processor

TensorFlow1.14

tensorflow_cpu_1.14

tensorflow_gpu_1.14

TensorFlow1.14 Processor

TensorFlow1.15

tensorflow_cpu_1.15

tensorflow_gpu_1.15

TensorFlow1.15 processor with a built-in optimization engine based on PAI-Blade of the agility edition

TensorFlow2.3

tensorflow_cpu_2.3

None

TensorFlow2.3 Processor

PyTorch1.6

pytorch_cpu_1.6

pytorch_gpu_1.6

PyTorch1.6 processor with a built-in optimization engine based on PAI-Blade of the agility edition

Caffe

caffe_cpu

caffe_gpu

Caffe Processor

Parameter server algorithm

parameter_sever

None

PS processor

Alink

alink_pai_processor

None

None

xNN

xnn_cpu

None

None

EasyVision

easy_vision_cpu_tf1.12_torch151

easy_vision_gpu_tf1.12_torch151

EasyVision Processor

EasyTransfer

easytransfer_cpu

easytransfer_gpu

EasyTransfer Processor

EasyNLP

easynlp

easynlp

EasyNLP Processor

EasyCV

easycv

easycv

EasyCV Processor

Blade

blade_cpu

blade_cuda10.0_beta

None

MediaFlow

None

mediaflow

MediaFlow Processor

Triton

None

triton

Triton Processor

PMML processor

The built-in Predictive Model Markup Language (PMML) processor in EAS performs the following operations:

  • Loads a model service from a PMML file.

  • Processes requests that are sent to call the model service.

  • Uses the model to calculate the request results and returns the results to clients.

The PMML processor provides a default policy to fill in missing values. If the isMissing policy is not specified for the feature columns in the PMML file, the values in the following table are automatically used.

Data type

Default input value

BOOLEAN

false

DOUBLE

0.0

FLOAT

0.0

INT

0

STRING

""

You can deploy a model from a PMML file by using one of the following methods:

  • Upload the model file in the console

    Set the Processor Type parameter to PMML. For more information, see Upload and deploy models in the console.

  • Use the EASCMD client to deploy the model

    In the service.json service configuration file, set the processor parameter to pmml. Sample code:

    {
      "processor": "pmml",
      "generate_token": "true",
      "model_path": "http://xxxxx/lr.pmml",
      "name": "eas_lr_example",
      "metadata": {
        "instance": 1,
        "cpu": 1 # Allocate 4 GB of memory to each CPU. One CPU and 4 GB memory are considered one quota. 
      }
    }
  • Use Data Science Workshop (DSW) to deploy the model

    Modify the service.json service configuration file. This method is similar to the method of deploying models by using the EASCMD client. For more information, see Use DSW.

TensorFlow1.12 Processor

The TensorFlow1.12 processor of EAS can load TensorFlow models in the SavedModel or SessionBundle format. We recommend that you use the SavedModel format. Before you can deploy the model, you must convert a Keras or Checkpoint model to a SavedModel model. For more information, see Export TensorFlow models in the SavedModel format.

Note

The general-purpose processor does not support custom TensorFlow operations.

You can deploy a TensorFlow model by using one of the following methods:

  • Upload the model file in the console

    Set the Processor Type parameter to TensorFlow1.12. For more information, see Upload and deploy models in the console.

  • Use the EASCMD client to deploy the model

    In the service.json service configuration file, set the processor parameter to tensorflow_cpu_1.12 or tensorflow_gpu_1.12 based on the model resources. If the value of the processor parameter does not match the resource type, a deployment error occurs. Sample code:

    {
      "name": "tf_serving_test",
      "generate_token": "true",
      "model_path": "http://xxxxx/savedmodel_example.zip",
      "processor": "tensorflow_cpu_1.12",
      "metadata": {
        "instance": 1,
        "cpu": 1,
        "gpu": 0,
        "memory": 2000
      }
    }
  • Use DSW to deploy the model

    Modify the service.json service configuration file. This method is similar to the method of deploying models by using the EASCMD client. For more information, see Use DSW.

TensorFlow1.14 Processor

The TensorFlow1.14 processor of EAS can load TensorFlow models in the SavedModel or SessionBundle format. We recommend that you use the SavedModel format. Before you can deploy the model, you must convert a Keras or Checkpoint model to a SavedModel model. For more information, see Export TensorFlow models in the SavedModel format.

Note

The general-purpose processor does not support custom TensorFlow operations.

You can deploy a TensorFlow model by using one of the following methods:

  • Upload the model file in the console

    Set the Processor Type parameter to TensorFlow1.14. For more information, see Upload and deploy models in the console.

  • Use the EASCMD client to deploy the model

    In the service.json service configuration file, set the processor parameter to tensorflow_cpu_1.14 or tensorflow_gpu_1.14 based on the model resources. If the value of the processor parameter does not match the resource type, a deployment error occurs. Sample code:

    {
      "name": "tf_serving_test",
      "generate_token": "true",
      "model_path": "http://xxxxx/savedmodel_example.zip",
      "processor": "tensorflow_cpu_1.14",
      "metadata": {
        "instance": 1,
        "cpu": 1,
        "gpu": 0,
        "memory": 2000
      }
    }
  • Use DSW to deploy the model

    Modify the service.json service configuration file. This method is similar to the method of deploying models by using the EASCMD client. For more information, see Use DSW.

TensorFlow1.15 processor with a built-in optimization engine based on PAI-Blade of the agility edition

The TensorFlow1.15 processor of EAS can load TensorFlow models in the SavedModel or SessionBundle format. We recommend that you use the SavedModel format. Before you can deploy the model, you must convert a Keras or Checkpoint model to a SavedModel model. For more information, see Export TensorFlow models in the SavedModel format.

Note
  • The general-purpose processor does not support custom TensorFlow operations.

  • TensorFlow1.15 processor provides a built-in optimization engine based on PAI-Blade of the agility edition. You can use this processor to deploy TensorFlow models that are optimized by PAI-Blade of the agility edition.

You can deploy a TensorFlow model by using one of the following methods:

  • Upload the model file to the console

    Set the Processor Type parameter to TensorFlow1.15. For more information, see Upload and deploy models in the console.

  • Use the EASCMD client to deploy the model

    In the service.json service configuration file, set the processor parameter to tensorflow_cpu_1.15 or tensorflow_gpu_1.15 based on the model resources. If the value of the processor parameter does not match the resource type, a deployment error occurs. Sample code:

    {
      "name": "tf_serving_test",
      "generate_token": "true",
      "model_path": "http://xxxxx/savedmodel_example.zip",
      "processor": "tensorflow_cpu_1.15",
      "metadata": {
        "instance": 1,
        "cpu": 1,
        "gpu": 0,
        "memory": 2000
      }
    }
  • Use DSW to deploy the model

    Modify the service.json service configuration file. This method is similar to the method of deploying models by using the EASCMD client. For more information, see Use DSW. For more information about the parameters in the service configuration file, see Create a service.

TensorFlow2.3 Processor

The TensorFlow2.3 processor EAS can load TensorFlow models in the SavedModel or SessionBundle format. We recommend that you use the SavedModel format. Before you can deploy the model, you must convert a Keras or Checkpoint model to a SavedModel model. For more information, see Export TensorFlow models in the SavedModel format.

Note

The general-purpose processor does not support custom TensorFlow operations.

You can deploy a TensorFlow model by using one of the following methods:

  • Upload the model file in the console

    Set the Processor Type parameter to TensorFlow2.3. For more information, see Upload and deploy models in the console.

  • Use the EASCMD client to deploy the model

    In the service.json service configuration file, set the processor parameter to tensorflow_cpu_2.3. Sample code:

    {
      "name": "tf_serving_test",
      "generate_token": "true",
      "model_path": "http://xxxxx/savedmodel_example.zip",
      "processor": "tensorflow_cpu_2.3",
      "metadata": {
        "instance": 1,
        "cpu": 1,
        "gpu": 0,
        "memory": 2000
      }
    }
  • Use DSW to deploy the model

    Modify the service.json service configuration file. This method is similar to the method of deploying models by using the EASCMD client. For more information, see Use DSW.

PyTorch1.6 processor with a built-in optimization engine based on PAI-Blade of the agility edition

The PyTorch1.6 processor of EAS can load models in the TorchScript format. For more information, see TorchScript.

Note
  • The general-purpose processor does not support PyTorch extensions. You cannot use this processor to import or export models other than TensorFlow models.

  • The PyTorch1.6 processor provides a built-in optimization engine based on PAI-Blade of the agility edition. You can use this processor to deploy PyTorch models that are optimized by PAI-Blade of the agility edition.

You can deploy a TorchScript model by using one of the following methods:

  • Upload the model file in the console

    Set the Processor Type parameter to PyTorch1.6. For more information, see Upload and deploy models in the console.

  • Use the EASCMD client to deploy the model

    In the service.json service configuration file, set the processor parameter to pytorch_cpu_1.6 or pytorch_gpu_1.6 based on the model resources. If the value of the processor parameter does not match the resource type, a deployment error occurs. Sample code:

    {
      "name": "pytorch_serving_test",
      "generate_token": "true",
      "model_path": "http://xxxxx/torchscript_model.pt",
      "processor": "pytorch_gpu_1.6",
      "metadata": {
        "instance": 1,
        "cpu": 1,
        "gpu": 1,
        "cuda": "10.0",
        "memory": 2000
      }
    }
  • Use DSW to deploy the model

    Modify the service.json service configuration file. This method is similar to the method of deploying models by using the EASCMD client. For more information, see Use DSW. For more information about the parameters in the service configuration file, see Create a service.

Caffe Processor

The Caffe processor of EAS can load deep learning models that are trained based on the Caffe framework. The Caffe framework has flexible capabilities. When you deploy a Caffe model, you must specify the names of the model file and weight file in the model package.

Note

The general-purpose processor does not support custom data layers.

You can deploy a Caffe model by using one of the following methods:

  • Upload the model file in the console

    Set the Processor Type parameter to Caffe. For more information, see Upload and deploy models in the console.

  • Use the EASCMD client to deploy the model

    In the service.json service configuration file, set the processor parameter to caffe_cpu or caffe_gpu based on the model resources. If the value of the processor parameter does not match the type of resource, a deployment error occurs. Sample code:

    {
      "name": "caffe_serving_test",
      "generate_token": "true",
      "model_path": "http://xxxxx/caffe_model.zip",
      "processor": "caffe_cpu",
      "model_config": {
        "model": "deploy.prototxt",
        "weight": "bvlc_reference_caffenet.caffemodel"
      },
      "metadata": {
        "instance": 1,
        "cpu": 1,
        "gpu": 0,
        "memory": 2000
      }
    }
  • Use DSW to deploy the model

    Modify the service.json service configuration file. This method is similar to the method of deploying models by using the EASCMD client. For more information, see Use DSW.

PS processor

The PS processor of EAS is developed based on PS algorithms. The processor can load models in the PS format.

The following section describes how to deploy a PS model as a service and send requests by using a PS model service.

  • You can deploy a PS model by using one of the following methods:

    • Upload the model file in the console

      Set the Processor Type parameter to PS Algorithm. For more information, see Upload and deploy models in the console.

    • Use the EASCMD client to deploy the model

      In the service.json service configuration file, set the processor parameter to parameter_sever. Sample code:

      {
        "name":"ps_smart",
        "model_path": "oss://examplebucket/xlab_m_pai_ps_smart_b_1058272_v0.tar.gz",
        "processor": "parameter_sever",
        "metadata": {
          "region": "beijing",
          "cpu": 1,
          "instance": 1,
          "memory": 2048
        }
      }
    • Use DSW to deploy the model

      Modify the service.json service configuration file. This method is similar to the method of deploying models by using the EASCMD client. For more information, see Use DSW.

  • Request description

    You can use the PS model service to send a single request or send multiple requests at the same time. The two methods have the same request syntax. The feature objects contained in the JSON arrays are the same.

    • Sample syntax for sending a single request

      curl "http://eas.location/api/predict/ps_smart" -d "[
                  {
                      "f0": 1,
                      "f1": 0.2,
                      "f3": 0.5,
                  }
      ]"
    • Sample syntax for sending multiple requests at the same time

      curl "http://eas.location/api/predict/ps_smart" -d "[
              {
                  "f0": 1,
                  "f1": 0.2,
                  "f3": 0.5,
              },
              {
                  "f0": 1,
                  "f1": 0.2,
                  "f3": 0.5,
              }
      ]"
    • Responses

      The two methods also have the same response syntax. The returned objects contained in the JSON arrays are the same and follow the same order as the request syntax.

      [
        {
          "label":"xxxx",
          "score" : 0.2,
          "details" : [{"k1":0.3}, {"k2":0.5}]
        },
        {
          "label":"xxxx",
          "score" : 0.2,
          "details" : [{"k1":0.3}, {"k2":0.5}]
        }
      ]

EasyTransfer Processor

The EasyTransfer processor of EAS can load TensorFlow-based deep learning natural language processing (NLP) models that are trained based on the EasyTransfer framework.

You can deploy an EasyTransfer model by using one of the following methods:

  • Upload the model file in the console

    Set the Processor Type parameter to EasyTransfer. For more information, see Upload and deploy models in the console.

  • Use the EASCMD client to deploy the model

    In the service.json service configuration file, set the processor parameter to easytransfer_cpu or easytransfer_gpu based on the model resources. If the value of the processor parameter does not match the resource types, a deployment error occurs. Set the type field of the model_config parameter to the model type that you want to use. In the following examples, a text classification model is used. For more information about other parameters, see Create a service.

    • Deploy the model on a GPU node (the public resource group is used in this example)

      {
        "name": "et_app_demo"
        "metadata": {
          "instance": 1
        },
        "cloud": {
          "computing": {
            "instance_type": "ecs.gn6i-c4g1.xlarge"
          }
        },
        "model_path": "http://xxxxx/your_model.zip",
        "processor": "easytransfer_gpu",
        "model_config": {
          "type": "text_classify_bert"
        }
      }
    • Deploy the model on a CPU node

      {
        "name": "et_app_demo",
        "model_path": "http://xxxxx/your_model.zip",
        "processor": "easytransfer_cpu",
        "model_config": {
          "type":"text_classify_bert"
        }
        "metadata": {
          "instance": 1,
          "cpu": 1,
          "memory": 4000
        }
      }

    The following table lists the supported model types.

    Job Type

    Model type

    Text matching

    text_match_bert

    Text classification

    text_classify_bert

    Sequence labeling

    sequence_labeling_bert

    Text vectorization

    vectorization_bert

EasyNLP Processor

The EasyNLP processor of EAS can load PyTorch-based deep learning NLP models that are trained based on the EasyNLP framework.

You can deploy an EasyNLP model by using one of the following methods:

  • Upload the model file in the console

    Set the Processor Type parameter to EasyNLP. For more information, see Upload and deploy models in the console.

  • Use the EASCMD client to deploy the model

    In the service.json service configuration file, set the processor parameter to easynlp. Set the type field of the model_config parameter to the model type that you want to use. In the following examples, a single-label text classification model is used. For more information about other parameters, see Create a service.

    {
      "name": "easynlp_app_demo",
      "metadata": {
        "instance": 1
      },
      "cloud": {
        "computing": {
          "instance_type": "ecs.gn6i-c4g1.xlarge"
        }
      },
      "model_config": {
        "app_name": "text_classify",
        "type": "text_classify"
      },
      "model_path": "http://xxxxx/your_model.tar.gz",
      "processor": "easynlp"
    }

    The following table lists the supported model types.

    Job Type

    Model type

    Single-label text classification

    text_classify

    Multi-label text classification

    text_classify_multi

    Text matching

    text_match

    Sequence labeling

    sequence_labeling

    Text vectorization

    vectorization

    Summary generation for Chinese text (GPU)

    sequence_generation_zh

    Summary generation for English text (GPU)

    sequence_generation_en

    Machine reading comprehension for Chinese text

    machine_reading_comprehension_zh

    Machine reading comprehension for English text

    machine_reading_comprehension_en

    WUKONG_CLIP (GPU)

    wukong_clip

    CLIP (GPU)

    clip

After the model service is deployed to EAS, go to the Elastic Algorithm Service (EAS) page, find the service and click Invocation Method in the Service Type column to obtain the service endpoint and token. The following sample code provides a sample Python request that is used to call the service:

import requests
# Replace the value with the endpoint of the service. 
url = '<eas-service-url>'
# Replace the value with the token that you obtained. 
token = '<eas-service-token>'
# Generate the prediction data. In the following example, text classification is used. 
request_body = {
    "first_sequence": "hello"
}
 
headers = {"Authorization": token}
resp = requests.post(url=url, headers=headers, json=request_body)
print(resp.content.decode())

EasyCV Processor

The EasyCV processor of EAS can load deep learning models that are trained based on the EasyCV framework.

You can deploy an EasyCV model by using one of the following methods:

  • Upload the model file in the console

    Set the Processor Type parameter to EasyCV. For more information, see Upload and deploy models in the console.

  • Use the EASCMD client to deploy the model

    In the service.json service configuration file, set the processor parameter to easycv. Set the type field of the model_config parameter to the model type that you want to use. In the following examples, an image classification model is used. For more information about other parameters, see Create a service.

    {
      "name": "easycv_classification_example",
      "processor": "easycv",
      "model_path": "oss://examplebucket/epoch_10_export.pt",
      "model_config": {"type":"TorchClassifier"},
      "metadata": {
        "instance": 1
      },
      "cloud": {
        "computing": {
          "instance_type": "ecs.gn5i-c4g1.xlarge"
        }
      }
    }

    The following table lists the supported model types.

    Job Type

    model_config

    Image classification

    {"type":"TorchClassifier"}

    Object detection

    {"type":"DetectionPredictor"}

    Semantic segmentation

    {"type":"SegmentationPredictor"}

    YOLOX

    {"type":"YoloXPredictor"}

    Video classification

    {"type":"VideoClassificationPredictor"}

After the model service is deployed to EAS, go to the Elastic Algorithm Service (EAS) page, find the service and click Invocation Method in the Service Type column to obtain the service endpoint and token. The following sample code provides a sample Python request that is used to call the service:

import requests
import base64
import json
resp = requests.get('http://exmaplebucket.oss-cn-zhangjiakou.aliyuncs.com/images/000000123213.jpg')
ENCODING = 'utf-8'
datas = json.dumps( {
            "image": base64.b64encode(resp.content).decode(ENCODING)
            })
# Replace the value with the token that you obtained. 
head = {
   "Authorization": "NTFmNDJlM2E4OTRjMzc3OWY0NzI3MTg5MzZmNGQ5Yj***"
}
for x in range(0,10):
  	# Replace the value with the endpoint of the service. 
    resp = requests.post("http://150231884461***.cn-hangzhou.pai-eas.aliyuncs.com/api/predict/test_easycv_classification_example", data=datas, headers=head)
    print(resp.text)
                            

You must convert the images and video files to the Base64 format for transmission. Use image to specify image data and video to specify video data.

EasyVision Processor

The EasyVision processor of EAS can load deep learning models that are trained based on the EasyVision framework.

You can deploy an EasyVision model by using one of the following methods:

  • Upload the model file in the console

    Set the Processor Type parameter to EasyVision. For more information, see Upload and deploy models in the console.

  • Use the EASCMD client to deploy the model

    In the service.json service configuration file, set the processor parameter to easy_vision_cpu_tf1.12_torch151 or easy_vision_gpu_tf1.12_torch151 based on the model resources. If the value of the processor parameter does not match the type of resource, a deployment error occurs. Set the type parameter in the model_config section to the type of the model that is trained. The following code block shows an example. For more information about other parameters, see Create a service.

    • Deploy the model on a GPU node

      {
        "name": "ev_app_demo",
        "processor": "easy_vision_gpu_tf1.12_torch151",
        "model_path": "oss://path/to/your/model",
        "model_config": "{\"type\":\"classifier\"}",
        "metadata": {
          "resource": "your_resource_name",
          "cuda": "9.0",
          "instance": 1,
          "memory": 4000,
          "gpu": 1,
          "cpu": 4,
          "rpc.worker_threads" : 5
        }
      }
    • Deploy the model on a CPU node

      {
        "name": "ev_app_cpu_demo",
        "processor": "easy_vision_cpu_tf1.12_torch151",
        "model_path": "oss://path/to/your/model",
        "model_config": "{\"type\":\"classifier\"}",
        "metadata": {
          "resource": "your_resource_name",
          "instance": 1,
          "memory": 4000,
          "gpu": 0,
          "cpu": 4,
          "rpc.worker_threads" : 5
        }
      }

MediaFlow Processor

The MediaFlow processor of EAS is a general-purpose orchestration engine that can analyze and process video, audio, and images.

You can deploy a MediaFlow model by using one of the following methods:

  • Upload the model file in the console

    Set the Processor Type parameter to MediaFlow. For more information, see Upload and deploy models in the console.

  • Use the EASCMD client to deploy models

    In the service.json service configuration file, set the processor parameter to mediaflow. If you use the MediaFlow processor to deploy models, you must configure the following parameters. For more information about other parameters, see Create a service.

    • graph_pool_size: the number of graph pools.

    • worker_threads: the number of worker threads.

    Sample code:

    • Deploy a model for video classification

      {
        "model_entry": "video_classification/video_classification_ext.js", 
        "name": "video_classification", 
        "model_path": "oss://path/to/your/model", 
        "generate_token": "true", 
        "processor": "mediaflow", 
        "model_config" : {
            "graph_pool_size":8,
            "worker_threads":16
        },
        "metadata": {
          "eas.handlers.disable_failure_handler" :true,
          "resource": "your_resource_name", 
            "rpc.worker_threads": 30,
            "rpc.enable_jemalloc": true,
          "rpc.keepalive": 500000, 
          "cpu": 4, 
          "instance": 1, 
          "cuda": "9.0", 
          "rpc.max_batch_size": 64, 
          "memory": 10000, 
          "gpu": 1 
        }
      }
    • Deploy a model for automated speech recognition (ASR)

      {
        "model_entry": "asr/video_asr_ext.js", 
        "name": "video_asr", 
        "model_path": "oss://path/to/your/model", 
        "generate_token": "true", 
        "processor": "mediaflow", 
        "model_config" : {
            "graph_pool_size":8,
            "worker_threads":16
        },
        "metadata": {
          "eas.handlers.disable_failure_handler" :true,
          "resource": "your_resource_name", 
            "rpc.worker_threads": 30,
            "rpc.enable_jemalloc": true,
          "rpc.keepalive": 500000, 
          "cpu": 4, 
          "instance": 1, 
          "cuda": "9.0", 
          "rpc.max_batch_size": 64, 
          "memory": 10000, 
          "gpu": 1 
        }
      }

    In the service.json service configuration file, the values of the model_entry, name, and model_path parameters for video classification and ASR vary. You must configure the parameters based on the purpose of the model.

Triton Processor

Triton Inference Server is a new-generation online service framework released by NVIDIA. Triton Inference Server simplifies the deployment and management of GPU-accelerated models and complies with the API standards of KFServing. Triton Inference Server provides the following features:

  • Supports multiple open source frameworks such as TensorFlow, PyTorch, ONNX Runtime, TensorRT, and custom framework backends.

  • Concurrently runs multiple models on one GPU to maximize GPU utilization.

  • Supports the HTTP and gRPC protocols and allows you to send requests in binary format to reduce the request size.

  • Supports the dynamic batching feature to improve service throughput.

EAS provides a built-in Triton processor.

Note
  • The Triton processor is available for public preview only in the China (Shanghai) region.

  • The models that are deployed by using the Triton processor must be stored in Object Storage Service (OSS). Therefore, you must activate OSS and upload model files to OSS before you can use the Triton processor to deploy models. For information about how to upload objects to OSS, see Upload objects.

The following section describes how to use the Triton processor to deploy a model as a service and call the service:

  • Use the Triton processor to deploy a model

    You can use the Triton processor to deploy models only on the EASCMD client. For information about how to use the EASCMD client to deploy models, see Create a service. In the service.json service configuration file, set the processor parameter to triton. To ensure that the Triton processor can obtain model files from OSS, you must set the parameters related to OSS. The following sample code provides an example on how to modify the service.json service configuration file:

    {
      "name": "triton_test",                          
      "processor": "triton",
      "processor_params": [
        "--model-repository=oss://triton-model-repo/models", 
        "--allow-http=true", 
      ],
      "metadata": {
        "instance": 1,
        "cpu": 4,
        "gpu": 1,
        "memory": 10000,
        "resource":"<your resource id>"
      }
    }

    The following table describes the parameters that are required if you use the Triton processor to deploy models. For more information about other parameters, see Run commands to use the EASCMD client.

    Parameter

    Description

    processor_params

    The parameters that you want to pass to Triton Inference Server when the deployment starts. Parameters that are not supported are automatically filtered out by Triton Inference Server. The Parameters that can be passed to Triton Inference Server table describes the parameters that can be passed to Triton Inference Server. The model-repository parameter is required. For more information about optional parameters, see main.cc.

    oss_endpoint

    The endpoint of OSS. If you do not specify an endpoint, the system automatically uses the OSS service in the region where the EAS service is deployed. If you want to use an OSS service that is activated in another region, you must configure this parameter. For information about the valid values of this parameter, see Regions and endpoints.

    metadata

    resource

    The ID of the exclusive resource group that is used to deploy the model in EAS. If you want to deploy a model by using the Triton processor, the resources that are used must belong to the exclusive resource group in EAS. For information about how to create a dedicated resource group in EAS, see Work with dedicated resource groups.

    Table 1. Parameters that can be passed to Triton Inference Server

    Parameter

    Required

    Description

    model-repository

    Yes

    The OSS path of the model. You must set the model-repository parameter to a subdirectory of an OSS bucket instead of the root directory of the OSS bucket.

    For example, you can set the parameter to oss://triton-model-repo/models. triton-model-repo is the name of the OSS bucket, and models is a subdirectory of the OSS bucket.

    log-verbose

    No

    For more information, see main.cc.

    log-info

    No

    log-warning

    No

    log-error

    No

    exit-on-error

    No

    strict-model-config

    No

    strict-readiness

    No

    allow-http

    No

    http-thread-count

    No

    pinned-memory-pool-byte-size

    No

    cuda-memory-pool-byte-size

    No

    min-supported-compute-capability

    No

    buffer-manager-thread-count

    No

    backend-config

    No

  • Use the official Triton client to call the deployed service by using the Triton processor

    Before you use the Triton client for Python to call the deployed service, run the following commands to install the official Triton client:

    pip3 install nvidia-pyindex
    pip3 install tritonclient[all]

    Run the following command to download a test image to the current directory:

    wget http://pai-blade.oss-cn-zhangjiakou.aliyuncs.com/doc-assets/cat.png

    The following sample code shows how the Triton client for Python sends a request in binary format to the service that is deployed by using the Triton processor:

    import numpy as np
    import time
    from PIL import Image
    
    import tritonclient.http as httpclient
    from tritonclient.utils import InferenceServerException
    
    URL = "<service url>" # Replace <service url> with the endpoint of the deployed service. 
    HEADERS = {"Authorization": "<service token>"} # Replace <service token> with the token that is used to access the service. 
    input_img = httpclient.InferInput("input", [1, 299, 299, 3], "FP32")
    img = Image.open('./cat.png').resize((299, 299))
    img = np.asarray(img).astype('float32') / 255.0
    input_img.set_data_from_numpy(img.reshape([1, 299, 299, 3]), binary_data=True)
    
    output = httpclient.InferRequestedOutput(
        "InceptionV3/Predictions/Softmax", binary_data=True
    )
    triton_client = httpclient.InferenceServerClient(url=URL, verbose=False)
    
    start = time.time()
    for i in range(10):
        results = triton_client.infer(
            "inception_graphdef", inputs=[input_img], outputs=[output], headers=HEADERS
        )
        res_body = results.get_response()
        elapsed_ms = (time.time() - start) * 1000
        if i == 0:
            print("model name: ", res_body["model_name"])
            print("model version: ", res_body["model_version"])
            print("output name: ", res_body["outputs"][0]["name"])
            print("output shape: ", res_body["outputs"][0]["shape"])
        print("[{}] Avg rt(ms): {:.2f}".format(i, elapsed_ms))
        start = time.time()