Machine Learning Designer supports batch prediction. You can use models to implement periodic batch prediction on datasets in business scenarios that do not require real-time results. This topic describes how to implement batch prediction on the Machine Learning Designer platform.
Implement batch prediction in the development environment
Designer provides a variety of prediction components to support different algorithms and scenarios. You can directly drag and drop these components on the canvas.
You can directly use the paired model training and prediction components displayed in the left-side component pane to train a model and then use the model to implement batch prediction.
If no prediction component is available for the algorithm that you want to use, you can use the general-purpose prediction component to implement batch prediction after you train the model.
ImportantThe general-purpose prediction component supports only OfflineModel models. It does not support Predictive Model Markup Language (PMML) models.
If an existing model is available, you can also use a component to import the model and prediction data. Then, connect a prediction component as the downstream node of the component to implement prediction and deployment.
Periodically schedule a batch prediction pipeline
After the batch prediction pipeline pass the test, you can submit the pipeline to DataWorks and schedule it periodically. For more information, see Use DataWorks tasks to schedule pipelines in Machine Learning Designer.
If your workspace is in DataWorks standard mode, the development environment and production environment maintain MaxCompute data separately. Therefore, before you periodically schedule an offline prediction workflow, you need to synchronize the model that is trained offline to the production environment. You can use one of the following methods to synchronize the model:
Use the Copy MaxCompute Offline Model and Read MaxCompute Offline Model components
Use the Copy MaxCompute Offline Model component to replicate the trained OfflineModel model to the production environment, and then use the Read MaxCompute Offline Model component in the periodically scheduled pipeline to read the model in the production environment.
The system needs to write MaxCompute data in the production environment when replicating the model. Therefore, you need to use the workspace administrator account or production account to perform the replicate operation. For more information, see Data access behaviors in and required access permissions on MaxCompute compute engine instances associated with workspaces in different modes.
Use the Model Export and Import MaxCompute Offline Model components (recommended)
Use the Model Export component to export the trained OfflineModel model to Object Storage Service (OSS), and then use the Import MaxCompute Offline Model component in the periodically scheduled pipeline to import the model from OSS.
References
If the offline prediction results meet your expectations, you can deploy the model to EAS as an online service. For more information, see Deploy a model as an online service.
Machine Learning Designer allows you to deploy a batch data processing pipeline to EAS as an online service after you package the pipeline as a model. For more information, see Deploy a pipeline as an online service.