Pytorch - Platform For AI

EAS內建的Pytorch Processor支援將Pytorch標準的TorchScript格式的模型部署成線上服務。本文為您介紹如何部署及調用Pytorch模型服務。

Pytorch Processor版本說明

Pytorch支援多個版本，包括GPU和CPU版本，各個版本對應的Processor名稱如下表所示。

Processor名稱	Pytorch版本	是否支援GPU版本
pytorch_cpu_1.6	Pytorch 1.6	否
pytorch_cpu_1.7	Pytorch 1.7	否
pytorch_cpu_1.9	Pytorch 1.9	否
pytorch_cpu_1.10	Pytorch 1.10	否
pytorch_gpu_1.6	Pytorch 1.6	是
pytorch_gpu_1.7	Pytorch 1.7	是
pytorch_gpu_1.9	Pytorch 1.9	是
pytorch_gpu_1.10	Pytorch 1.10	是

步驟一：部署服務

使用eascmd用戶端部署Pytorch模型服務時，您需要指定配置參數processor的取值為上述支援的Pytorch的Processor名稱，服務組態檔樣本如下。

{

  "name": "pytorch_resnet_example",
  "model_path": "http://examplebucket.oss-cn-shanghai.aliyuncs.com/models/resnet18.pt",
  "processor": "pytorch_cpu_1.6",
    "metadata": {
    "cpu": 1,
    "instance": 1,
    "memory": 1000
  }
}

關於如何使用用戶端工具部署服務，詳情請參見服務部署：EASCMD&DSW。

您也可以通過控制台部署Pytorch模型服務，詳情請參見服務部署：控制台。

步驟二：調用服務

Pytorch服務輸入輸出格式為ProtoBuf，不是純文字，而線上調式目前僅支援純文字的輸入輸出資料，因此無法使用控制台的線上調試功能。

EAS提供了不同版本的SDK，對請求和響應資料進行了封裝，且SDK內部包含了關於直連和容錯相關的機制，推薦使用SDK來構建和發送請求。具體推理請求樣本如下。

#!/usr/bin/env python

from eas_prediction import PredictClient
from eas_prediction import TorchRequest

if __name__ == '__main__':
    client = PredictClient('http://182848887922****.cn-shanghai.pai-eas.aliyuncs.com', 'pytorch_gpu_wl')
    client.init()

    req = TorchRequest()
    req.add_feed(0, [1, 3, 224, 224], TorchRequest.DT_FLOAT, [1] * 150528)
    # req.add_fetch(0)
    for x in range(0, 10):
        resp = client.predict(req)
        print(resp.get_tensor_shape(0))

關於代碼中的參數配置說明及調用方法，詳情請參見Python SDK使用說明。

後續您也可以自行構建服務要求，詳情請參見請求格式。

請求格式

Pytorch Processor輸入輸出為ProtoBuf格式。當您使用SDK來發送請求時，SDK對請求進行了封裝，您只需根據SDK提供的函數來構建請求即可。如果您希望自行構建服務要求，則可以參考如下pb定義來產生相關的代碼，詳情請參見TensorFlow服務要求構造。

syntax = "proto3";

package pytorch.eas;
option cc_enable_arenas = true;

enum ArrayDataType {
  // Not a legal value for DataType. Used to indicate a DataType field
  // has not been set
  DT_INVALID = 0;

  // Data types that all computation devices are expected to be
  // capable to support
  DT_FLOAT = 1;
  DT_DOUBLE = 2;
  DT_INT32 = 3;
  DT_UINT8 = 4;
  DT_INT16 = 5;
  DT_INT8 = 6;
  DT_STRING = 7;
  DT_COMPLEX64 = 8;  // Single-precision complex
  DT_INT64 = 9;
  DT_BOOL = 10;
  DT_QINT8 = 11;     // Quantized int8
  DT_QUINT8 = 12;    // Quantized uint8
  DT_QINT32 = 13;    // Quantized int32
  DT_BFLOAT16 = 14;  // Float32 truncated to 16 bits.  Only for cast ops
  DT_QINT16 = 15;    // Quantized int16
  DT_QUINT16 = 16;   // Quantized uint16
  DT_UINT16 = 17;
  DT_COMPLEX128 = 18;  // Double-precision complex
  DT_HALF = 19;
  DT_RESOURCE = 20;
  DT_VARIANT = 21;  // Arbitrary C++ data types
}

// Dimensions of an array
message ArrayShape {
  repeated int64 dim = 1 [packed = true];
}

// Protocol buffer representing an array
message ArrayProto {
  // Data Type
  ArrayDataType dtype = 1;

  // Shape of the array.
  ArrayShape array_shape = 2;

  // DT_FLOAT
  repeated float float_val = 3 [packed = true];

  // DT_DOUBLE
  repeated double double_val = 4 [packed = true];

  // DT_INT32, DT_INT16, DT_INT8, DT_UINT8.
  repeated int32 int_val = 5 [packed = true];

  // DT_STRING
  repeated bytes string_val = 6;

  // DT_INT64.
  repeated int64 int64_val = 7 [packed = true];

}


message PredictRequest {

  // Input tensors.
  repeated ArrayProto inputs = 1;

  // Output filter.
  repeated int32 output_filter = 2;
}

// Response for PredictRequest on successful run.
message PredictResponse {
  // Output tensors.
  repeated ArrayProto outputs = 1;
}