Machine Learning Designer of Platform for AI (PAI) allows you to use DataWorks tasks to schedule pipelines offline to periodically update models and create pipelines for model training. This topic describes how to use DataWorks tasks to periodically schedule Machine Learning Designer pipelines offline and automatically synchronize PAI models to Object Storage Service (OSS) during task scheduling.
Prerequisites
All nodes in a pipeline are run successfully.
DataWorks is activated and a workflow is created. For more information, see Create a workflow.
The workspace in which the workflow resides must be the same as the workspace of your Machine Learning Designer pipeline. Otherwise, you cannot set the Path parameter to the workflow when you create an offline scheduling task.
If the workspace in which the DataWorks workflow resides is in standard mode, synchronize the model generated by offline training to the production environment before a periodical task is scheduled because MaxCompute data is isolated between the development and production environments. For more information, see Periodically schedule a batch prediction pipeline.
Procedure
The ratio of PAI-Designer pipeline to Designer nodes in DataWorks is 1:N, meaning you can create multiple Designer nodes in DataWorks based on the same PAI-Designer pipeline.
Log on to the PAI console, select a desired workspace, and then click Enter Visualized Modeling (Designer). On the page that appears, double-click a desired pipeline.
(Optional) Add the Model Export component if you need to synchronize a model in Machine Learning Designer to OSS during periodical task scheduling.
On the Pipeline Attributes tab, set Data Storage to the OSS path in which the model file is stored.
If you need to export a model file in the PMML format, click the desired model component, such as the Logistic Regression for Binary Classification component, and select Whether To Generate PMML on the Fields Setting tab of the component.
NoteOnly specific model components support exporting model files in the PMML format. Skip this step for model components that do not support this feature.
Connect the model component to the downstream Model Export component. For more information, see Model export.
Use DataWorks tasks to schedule a Machine Learning Designer pipeline offline.
In the upper-left corner of the canvas, click Periodic Scheduling. In the dialog box that appears, click Create Scheduling Node. On the Create Node dialog box in DataWorks, specify a node name and click Confirm.
On the edit page of the node, select the pipeline that you created in Machine Learning Designer from the Select PAI Designer Experiment drop-down list.
If you want to modify the pipeline in Machine Learning Designer, click Edit in PAI Designer.
On the node edit tab, click the Properties tab in the right-side navigation pane. In the Properties panel, configure the scheduling properties for the node.
The Properties panel contains the General, Scheduling Parameter, Schedule, Resource Group, and Dependencies sections. You can specify a scheduling cycle in the Schedule section. DataWorks automatically runs the pipeline based on the scheduling cycle that you specify. For more information, see Configure scheduling properties.
NoteDuring scheduling in DataWorks, the system may report errors related to "Start Container timeout". This is because timeout issues occasionally occur. We recommend that you enable the Auto Rerun upon Failure feature when you configure the time properties. After this feature is enabled, the scheduling system automatically reruns the failed pipelines (exclude the pipelines that are stopped by the user) based on the specified number of reruns and rerun interval.
Click the
and
icons in the toolbar and follow the on-screen instructions to save and commit the node.
ImportantYou must configure the Rerun and Parent Nodes parameters in the Properties panel before you commit the node.
If the workspace that you use is in standard mode, click Deploy in the upper part of the page after you commit a node. For more information, see Deploy nodes.
Click Operation Center in the upper part of the page to view the status and operational logs of the machine learning task.
You can also backfill data for the node and test the pipeline. For more information, see View and manage auto triggered tasks.
References
For more information about model prediction and deployment, see Model prediction and deployment.
Machine Learning Designer allows you to use the Update EAS Service(Beta) component to update online model services. For more information, see Periodically update online model services.