Transformer is an InferenceService component that is used for pre-processing, post-processing, and model inference. InferenceService communicates with a transformer by using the REST protocol. A transformer can easily transform raw input data into the format required by the model server so that end-to-end data processing and model inference can be implemented.
Prerequisites
You can run basic inference services as expected in your environment. For more information, see Integrate KServe with ASM to implement inference services based on cloud-native AI models.
Different KServe versions may require different input data formats. In this example, KServe 0.10 is used. For more information, see Deploy Transformer with InferenceService.
Step 1: Create a transformer Docker image
Method 1: Under the kserve/python directory of KServe on GitHub, create a transformer Docker image by using Dockerfile.
cd python docker build -t <your-registry-url>/image-transformer:latest -f custom_transformer.Dockerfile . docker push <your-registry-url>/image-transformer:latest
Method 2: Use an existing image.
asm-registry.cn-hangzhou.cr.aliyuncs.com/asm/kserve-image-custom-transformer:0.10
Step 2: Use REST predictor to deploy InferenceService
By default, InferenceService uses TorchServe to serve the PyTorch models, and the models are loaded from a model repository. In this example, the model repository has a MNIST model.
Create a transformer-new.yaml file that contains the following content:
apiVersion: serving.kserve.io/v1beta1 kind: InferenceService metadata: name: torch-transformer spec: predictor: model: modelFormat: name: pytorch storageUri: gs://kfserving-examples/models/torchserve/image_classifier/v1 transformer: containers: - image: asm-registry.cn-hangzhou.cr.aliyuncs.com/asm/kserve-image-custom-transformer:0.10 name: kserve-container command: - "python" - "-m" - "model" args: - --model_name - mnist
Run the following command to deploy InferenceService:
kubectl apply -f transformer-new.yaml
Step 3: Run a prediction
Verify the request input payload.
Encode the content of the following image to Base64 and save it as the following input.json file.
{ "instances":[ { "image":{ "b64": "iVBORw0KGgoAAAANSUhEUgAAABwAAAAcCAAAAABXZoBIAAAAw0lEQVR4nGNgGFggVVj4/y8Q2GOR83n+58/fP0DwcSqmpNN7oOTJw6f+/H2pjUU2JCSEk0EWqN0cl828e/FIxvz9/9cCh1zS5z9/G9mwyzl/+PNnKQ45nyNAr9ThMHQ/UG4tDofuB4bQIhz6fIBenMWJQ+7Vn7+zeLCbKXv6z59NOPQVgsIcW4QA9YFi6wNQLrKwsBebW/68DJ388Nun5XFocrqvIFH59+XhBAxThTfeB0r+vP/QHbuDCgr2JmOXoSsAAKK7bU3vISS4AAAAAElFTkSuQmCC" } } ] }
Access the model service over an ingress gateway.
Run the following command to obtain the value of SERVICE_HOSTNAME:
SERVICE_NAME=torchserve-transformer SERVICE_HOSTNAME=$(kubectl get inferenceservice $SERVICE_NAME -o jsonpath='{.status.url}' | cut -d "/" -f 3) echo $SERVICE_HOSTNAME
Expected output:
torchserve-transformer.default.example.com
Run the following command to access the model service.
For more information about how to obtain the IP address of the ingress gateway, see Substep 1 Obtain the IP address of the ingress gateway of Step 3 in the "Use Istio resources to route traffic to different versions of a service" topic.
MODEL_NAME=mnist INPUT_PATH=@./input.json ASM_GATEWAY="XXXX" # Replace XXXX with the IP address of the ingress gateway. curl -v -H "Host: ${SERVICE_HOSTNAME}" -d $INPUT_PATH http://${ASM_GATEWAY}/v1/models/$MODEL_NAME:predict
Expected output:
> POST /v1/models/mnist:predict HTTP/1.1 > Host: torchserve-transformer.default.example.com > User-Agent: curl/7.79.1 > Accept: */* > Content-Length: 427 > Content-Type: application/x-www-form-urlencoded > * Mark bundle as not supporting multiuse < HTTP/1.1 200 OK < content-length: 19 < content-type: application/json < date: Mon, 13 Nov 2023 05:53:15 GMT < server: istio-envoy < x-envoy-upstream-service-time: 119 < * Connection #0 to host xxxx left intact {"predictions":[2]}%
The output indicates that the access to the model service is successful.