All Products
Search
Document Center

DataWorks:Create and use a PAI DLC node

Last Updated:Jan 06, 2026

DLC, a service on PAI, runs distributed training tasks. DataWorks provides a PAI DLC node that lets you load a DLC task and configure its dependencies. This node enables you to schedule and run your DLC tasks periodically.

Prerequisites

  • You have granted DataWorks access to PAI.

    Go to the Authorization Page to grant the permissions. For more information about the permission, see AliyunServiceRoleForDataworksEngine. Only an Alibaba Cloud account

    or a RAM user with the AliyunDataWorksFullAccess permission can perform this one-click authorization.

  • You have created a workflow.

    In DataStudio, nodes must belong to a workflow. Therefore, you must create a workflow before you can create a node. For instructions, see Create a workflow.

Usage notes

  • Periodically scheduled PAI DLC nodes create a new DLC task on PAI with each run. This can result in many similarly named tasks that are difficult to distinguish. To prevent this, add date and time variables to your task name. Use scheduling parameters to assign values to these variables, ensuring unique task names. For more information, see Step 2: Develop a PAI DLC task.

  • DataWorks does not support running PAI DLC tasks on shared resource groups for scheduling.

Note

The examples in this topic use the Singapore region. The user interface may vary in other Regions.

Step 1: Create a PAI DLC node

  1. Go to the DataStudio page.

    Log on to the DataWorks console. In the top navigation bar, select the desired region. In the left-side navigation pane, choose Data Development and O&M > Data Development. On the page that appears, select the desired workspace from the drop-down list and click Go to Data Development.

  2. Right-click the target workflow and choose Create Node > Machine Learning > PAI DLC.

  3. In the Create Node dialog box, configure the Name parameter and click Confirm. Then, you can use the node to develop tasks and configure task scheduling properties.

Step 2: Develop a PAI DLC task

Develop the task code: a simple example

You can write the DLC task in the node editor in one of two ways:

  • Generate code from an existing DLC task.

    Search for and load an existing PAI DLC task. After you load the task, the editor generates the corresponding code based on the task's configuration in PAI. You can then edit this code as needed.

    Note
  • Write the DLC task code from scratch.

    Write the task code directly in the PAI DLC node editor in DataWorks.

Running the code in the PAI DLC node creates a new DLC task in PAI based on your configuration. The following code is an example:

dlc submit xgboostjob \   # Submits the DLC task.
    --name=wsytest_pai04_XGBoost \   # task name. Using a variable or the name of the DataWorks node is recommended.
    --command='echo '\''${variable_name}'\'';' \   # Command to execute.
    --workspace_id=80593 \   # The ID of the workspace where the DLC task runs.
    --priority=1 \   # The task priority. Valid values: 1 to 9. A higher value indicates a higher priority.
    --workers=1 \    # Number of workers. Use a value > 1 for a distributed task.
    --worker_image=registry.cn-hangzhou.aliyuncs.com/pai-dlc/tensorflow-training:2.3-cpu-py36-ubuntu18.04 \   # The worker image that provides the runtime environment for the DLC task.
    --worker_spec=ecs.g6.xlarge   # The specification for the worker nodes.

Use scheduling parameters in code

DataWorks provides scheduling parameters to dynamically pass variables to your code, which is useful for periodically scheduled tasks. Define variables in your code using the ${variable_name} format, then assign values on the Scheduling > Parameters tab. For more information about the supported formats for scheduling parameters, see Supported formats for scheduling parameters.

The following code shows an example:

--command='echo '\''${Variable}'\'';'   # You can assign a specific scheduling parameter to the variable.

Step 3: Configure task scheduling

To run the task periodically, click Scheduling in the right-side pane and configure its properties. For more information, see Overview.

Note

You must configure the rerun properties and dependencies for the node before you can commit it.

Step 4: Debug the task code

Debug the task to verify that it runs as expected.

  1. (Optional) Select a resource group for the run and assign values to custom parameters.

  2. Save and run the code.

    Click the 保存 icon in the toolbar to save your code, and then click the 运行 icon to run the task.

  3. (Optional) Perform a smoke test.

    To check how the scheduled task will execute, you can perform a smoke test in the development environment. You can run this test when you commit the node or anytime after. For more information, see Perform smoke testing.

Step 5: Submit and deploy the task

After you configure the node, you must submit and deploy it. After deploying, the node runs periodically according to its schedule.

  1. Click the 保存 icon in the toolbar to save the task.

  2. Click the 提交 icon in the toolbar to submit the task.

    In the Submit dialog box, enter a Change description. You can also choose whether to perform a code review after the node is submitted.

    Note
    • You must configure the rerun properties and dependencies for the node before you can submit it.

    • Code reviews help ensure code quality and prevent errors from being deployed to the production environment. If you enable code review, a reviewer must approve the submitted code before it can be deployed. For more information, see Code review.

If you are using a workspace in standard mode, you must click Deploy in the upper-right corner of the node editor page after the node is submitted. This deploys the task to the production environment. For more information, see Deploy nodes.

Next steps

Once the node is committed and published, it runs periodically according to its schedule. You can click Operation Center where you can view the scheduled task's status. For more information, see Manage scheduled tasks.