All Products
Search
Document Center

Platform For AI:Notification rule

Last Updated:Jan 13, 2025

You can create a notification rule for a Platform for AI (PAI) workspace to track and monitor the status of Deep Learning Containers (DLC) jobs. This topic describes how to configure a notification rule.

Configure a notification rule

  1. On the Workspace Details page, choose Configure Workspace > Configure Event Notification. Then, click Create Event Rule.image

  2. In the Create Event Rule panel, configure the following parameters, and click Submit.image

    Parameter

    Description

    Rule Name

    Follow the on-screen instructions to specify a rule name.

    Event Type

    Select DLC Jobs for Event Type. Then, select one of the following:

    • Job Process

      • Enter Queue: The job enters the queued status.

      • Start Bidding: The job enters the bidding status.

      • Start Environment Preparation: The job enters the preparing environment status.

      • Start Run: The job enters the running status.

      • Job Failure: The job execution failed.

      • Job Completed (Succeeded or Failed): The job succeeded or failed.

    • Automatic Fault Tolerance: When a DLC job encounters an exception or error and performs automatic fault tolerance processing, a notification is sent.

    • Job Timeout: If you select this option, you must first set the timeout rule on the scheduling settings page of the workspace. For more information, see Configure a timeout rule.

      • Queue Timeout: The queue duration > the specified maximum queue duration.

      • Environment Preparation Timeout: The environment preparation duration > the specified maximum preparation duration.

      • Wait Timeout: The waiting duration from job creation to running > the specified maximum waiting duration.

      • Run Timeout: The job running duration > the specified maximum running duration, triggering automatic stop.

    • Other Events

      • Job Preempted: When an idle job or bidding job is preempted, a notification is sent.

      • Job Manually Stopped

      • Job Priority Modified

    Event Scope

    Valid values:

    • Created by Me: Only the DLC jobs you created.

    • In the current workspace: All DLC jobs in the current workspace.

    Event Target

    Notifications can be sent through DingTalk notification, voice call, text message, and email.

After you create a notification rule, the system will automatically send an alert to the designated contact when a job activates the rule. We recommend that you go to the Deep Learning Containers (DLC) page to check whether your jobs are performing as expected. For further troubleshooting, refer to the monitoring status and logs of the jobs. For more information, see View training jobs.

Configure a timeout rule

To configure a timeout rule for specific event types, follow these steps:

  1. Go to the Configure Workspace page, select the DataWorks Scheduling Settings tab. Then, configure the maximum running duration and maximum job wait time in the DLC section.image

    Policy

    Description

    Resource Quota

    Configure the maximum waiting duration for jobs using specified resources. Valid values:

    • Public Resource Group

    • Resource Quota: Select a resource quota associated to this workspace.

    Timeout Rule Configuration

    Set the timeout duration for specified event types. Valid values:

    • Job Waiting Duration (Queue Duration + Environment Preparation Duration)

    • Queue Duration

    • Environment Preparation Duration

    To add multiple timeout rules, click Add.

  2. Click Save.

Then, go to the Configure Event Notification tab to configure corresponding timeout rules. Otherwise, no alerts will be sent. For more information, see Configure a notification rule.