DataWorks task scheduling is widely used in machine learning scenarios. DataWorks task scheduling allows you to periodically run DataWorks tasks to update your model and create a model training pipeline. You can use DataWorks tasks to periodically schedule pipelines in Machine Learning Designer. This topic describes how to use DataWorks tasks to periodically schedule pipelines in Machine Learning Designer.
Prerequisites
A workflow is created. For more information, see Create a workflow.
The workspace in which the workflow resides must be the same as the workspace of your Machine Learning Designer pipeline. Otherwise, you cannot set the Path parameter to the workflow when you create an offline scheduling task.
Background information
After you run all nodes in a pipeline, you can deploy the pipeline to DataWorks to periodically run the pipeline.
Before you schedule the nodes, make sure that all nodes in the pipeline are run and DataWorks is activated. For more information, see Create a workspace .
Procedure
Log on to the Machine Learning Platform for AI (PAI) console and go to the details page of the pipeline that you created in Machine Learning Designer.
In this topic, the Heart Disease Prediction pipeline is used as an example. For more information about how to create a pipeline and how to go to the configuration tab of the pipeline, see Predict heart disease.
On the pipeline details page, click Periodic Scheduling.
In the Deployment Scheduling dialog box, click OK to go to the DataStudio page of DataWorks.
Create a Machine Learning Designer node.
In the Create Node dialog box, set the Node Type parameter to PAI Designer and configure the Path parameter.
You can also move the pointer over Create on the DataStudio page and click Create Node.
Click Confirm.
On the edit page of the node, select the pipeline that you created in Machine Learning Designer from the Select PAI Designer node drop-down list.
If you want to modify the pipeline in Machine Learning Designer, click Go to PAI Designer to edit.
On the node edit tab, click the Properties tab in the right-side navigation pane. In the Properties panel, configure the scheduling properties for the node.
The Properties panel contains the General, Parameters, Schedule, Resource Group, and Dependencies sections. You can specify a scheduling cycle in the Schedule section. DataWorks automatically runs the pipeline based on the scheduling cycle that you specify. For more information, see Configure basic properties.
Click the and icons in the toolbar and follow the on-screen instructions to save and commit the node.
ImportantYou must configure the Rerun and Parent Nodes parameters in the Properties panel before you commit the node.
If the workspace that you use is in standard mode, click Deploy in the upper-right corner after you commit a node. For more information, see Deploy nodes.
Click Operation Center in the upper-right corner to view the status and operational logs of the machine learning task.
You can also backfill data for the node and test the pipeline. For more information, see View and manage auto triggered nodes.