All Products
Search
Document Center

Platform For AI:Use DataWorks tasks to schedule pipelines in Machine Learning Designer

Last Updated:Nov 11, 2024

DataWorks task scheduling is widely used in machine learning scenarios. DataWorks task scheduling allows you to periodically run DataWorks tasks to update your model and create a model training pipeline. You can use DataWorks tasks to periodically schedule pipelines in Machine Learning Designer. This topic describes how to use DataWorks tasks to periodically schedule pipelines in Machine Learning Designer.

Prerequisites

A workflow is created. For more information, see Create a workflow.

Important

The workspace in which the workflow resides must be the same as the workspace of your Machine Learning Designer pipeline. Otherwise, you cannot set the Path parameter to the workflow when you create an offline scheduling task.

Background information

  • After you run all nodes in a pipeline, you can deploy the pipeline to DataWorks to periodically run the pipeline.

    Note

    Before you schedule the nodes, make sure that all nodes in the pipeline are run and DataWorks is activated. For more information, see Create a workspace .

  • The ratio of PAI-Designer pipeline to Designer nodes in DataWorks is 1:N, meaning you can create multiple Designer nodes in DataWorks based on the same PAI-Designer pipeline.

Procedure

  1. Log on to the Machine Learning Platform for AI (PAI) console and go to the details page of the pipeline that you created in Machine Learning Designer.

    In this topic, the Heart Disease Prediction pipeline is used as an example. For more information about how to create a pipeline and how to go to the configuration tab of the pipeline, see Predict heart disease.

  2. In the upper-left corner of the canvas, click Periodic Scheduling > New scheduling node to go to DataWorks. Specify a node name and click Confirm.

  3. On the edit page of the node, select the pipeline that you created in Machine Learning Designer from the Select PAI Designer node drop-down list.

    If you want to modify the pipeline in Machine Learning Designer, click Go to PAI Designer to edit. 编辑页面

  4. On the node edit tab, click the Properties tab in the right-side navigation pane. In the Properties panel, configure the scheduling properties for the node.

    调度配置The Properties panel contains the General, Parameters, Schedule, Resource Group, and Dependencies sections. You can specify a scheduling cycle in the Schedule section. DataWorks automatically runs the pipeline based on the scheduling cycle that you specify. For more information, see Configure basic properties.

  5. Click the 保存 and 提交 icons in the toolbar and follow the on-screen instructions to save and commit the node.

    Important

    You must configure the Rerun and Parent Nodes parameters in the Properties panel before you commit the node.

    If the workspace that you use is in standard mode, click Deploy in the upper-right corner after you commit a node. For more information, see Deploy nodes.

  6. Click Operation Center in the upper-right corner to view the status and operational logs of the machine learning task.

    You can also backfill data for the node and test the pipeline. For more information, see View and manage auto triggered nodes.