Model Service Mesh provides a scalable, high-performance infrastructure for managing, deploying, and scheduling multiple model services. It helps you better handle model deployment, version management, routing, and load balancing of inference requests. This topic describes the terms that are commonly used in Model Service Mesh and some common features of Model Service Mesh.
What is Model Service Mesh?
Model Service Mesh is a new architecture that is used to deploy and manage machine learning model services in a distributed environment.
Model Service Mesh deploys models as scalable services. Model Service Mesh manages these services and routes requests to these services by using the mesh. This simplifies the management and O&M of model services. Model Service Mesh can orchestrate and scale model services, simplifying the model deployment, scaling, and version management. Model Service Mesh also provides some core features, such as load balancing, auto scaling, and fault recovery, to ensure the high availability and reliability of model services. Models can be automatically scaled based on the inference request load and load balancing can be performed. This way, you can use models for efficient inference.
In addition, Model Service Mesh provides some advanced features, such as traffic splitting, A/B testing, and canary release, to better control and manage the traffic destined for model services. You can use these features to easily switch traffic among different model versions and roll back to specific model versions. Model Service Mesh also supports the dynamic routing feature. This feature allows you to route requests to appropriate model services based on their attributes, such as model type, data format, or other metadata.
Model Service Mesh allows developers to deploy, manage, and scale machine learning models more easily while providing high availability, resiliency, and flexibility to meet different business needs.