Elastic Algorithm Service (EAS) of Machine Learning Platform for AI provides built-in PyTorch processors. You can use PyTorch processors to deploy models in TorchScript format supported by PyTorch as online model services. This topic describes how to deploy and call PyTorch model services.
PyTorch processor versions
Processor name | PyTorch version | Support for GPU acceleration |
---|---|---|
pytorch_cpu_1.6 | Pytorch 1.6 | No |
pytorch_cpu_1.7 | Pytorch 1.7 | No |
pytorch_cpu_1.9 | Pytorch 1.9 | No |
pytorch_cpu_1.10 | Pytorch 1.10 | No |
pytorch_gpu_1.6 | Pytorch 1.6 | Yes |
pytorch_gpu_1.7 | Pytorch 1.7 | Yes |
pytorch_gpu_1.9 | Pytorch 1.9 | Yes |
pytorch_gpu_1.10 | Pytorch 1.10 | Yes |
Step 1: Deploy a model service
{
"name": "pytorch_resnet_example",
"model_path": "http://examplebucket.oss-cn-shanghai.aliyuncs.com/models/resnet18.pt",
"processor": "pytorch_cpu_1.6",
"metadata": {
"cpu": 1,
"instance": 1,
"memory": 1000
}
}
For more information about how to use the EASCMD client to deploy model services, see Deploy model services by using EASCMD or DSW. You can also use the console to deploy PyTorch model services. For more information, see Model service deployment by using the PAI console and Machine Learning Designer.
Step 2: Call the model service
Both the input and output of PyTorch model services are protocol buffers but not plaintext. The online debugging feature supports only input and output data in plaintext. Therefore, you cannot use the online debugging feature in the console to call model services.
#!/usr/bin/env python
from eas_prediction import PredictClient
from eas_prediction import TorchRequest
if __name__ == '__main__':
client = PredictClient('http://182848887922****.cn-shanghai.pai-eas.aliyuncs.com', 'pytorch_gpu_wl')
client.init()
req = TorchRequest()
req.add_feed(0, [1, 3, 224, 224], TorchRequest.DT_FLOAT, [1] * 150528)
# req.add_fetch(0)
for x in range(0, 10):
resp = client.predict(req)
print(resp.get_tensor_shape(0))
For more information about the parameters in the sample request and how to call model services, see SDK for Python. You can also create custom service requests without using EAS SDKs. For more information, see Request syntax.
Request syntax
syntax = "proto3";
package pytorch.eas;
option cc_enable_arenas = true;
enum ArrayDataType {
// Not a legal value for DataType. Used to indicate a DataType field
// has not been set
DT_INVALID = 0;
// Data types that all computation devices are expected to be
// capable to support
DT_FLOAT = 1;
DT_DOUBLE = 2;
DT_INT32 = 3;
DT_UINT8 = 4;
DT_INT16 = 5;
DT_INT8 = 6;
DT_STRING = 7;
DT_COMPLEX64 = 8; // Single-precision complex
DT_INT64 = 9;
DT_BOOL = 10;
DT_QINT8 = 11; // Quantized int8
DT_QUINT8 = 12; // Quantized uint8
DT_QINT32 = 13; // Quantized int32
DT_BFLOAT16 = 14; // Float32 truncated to 16 bits. Only for cast ops
DT_QINT16 = 15; // Quantized int16
DT_QUINT16 = 16; // Quantized uint16
DT_UINT16 = 17;
DT_COMPLEX128 = 18; // Double-precision complex
DT_HALF = 19;
DT_RESOURCE = 20;
DT_VARIANT = 21; // Arbitrary C++ data types
}
// Dimensions of an array
message ArrayShape {
repeated int64 dim = 1 [packed = true];
}
// Protocol buffer representing an array
message ArrayProto {
// Data Type
ArrayDataType dtype = 1;
// Shape of the array.
ArrayShape array_shape = 2;
// DT_FLOAT
repeated float float_val = 3 [packed = true];
// DT_DOUBLE
repeated double double_val = 4 [packed = true];
// DT_INT32, DT_INT16, DT_INT8, DT_UINT8.
repeated int32 int_val = 5 [packed = true];
// DT_STRING
repeated bytes string_val = 6;
// DT_INT64.
repeated int64 int64_val = 7 [packed = true];
}
message PredictRequest {
// Input tensors.
repeated ArrayProto inputs = 1;
// Output filter.
repeated int32 output_filter = 2;
}
// Response for PredictRequest on successful run.
message PredictResponse {
// Output tensors.
repeated ArrayProto outputs = 1;
}