All Products
Search
Document Center

DataWorks:Intelligent diagnosis

Last Updated:Feb 27, 2025

You can use the Intelligent Diagnosis feature to perform end-to-end analysis on tasks. If tasks do not run as expected, this feature enables quick identification of problems.

Background information

The Intelligent Diagnosis feature can diagnose and analyze tasks across the following dimensions:

  • View Running Details: The conditions for a scheduling task to run include successful execution of upstream tasks, the current task reaching its scheduled time, availability of scheduling resources, and the current task not already running. For more information, see Task Running Conditions.

  • View Basic Information: Enables you to view the key time points of the current task.

  • Affected Baseline: Enables you to view the list of baselines that include the current task in the monitoring scope and the baseline running status. For more information about intelligent baselines, see Intelligent Baseline Overview.

  • Historical Instance Running Status: Enables you to view the execution status of the current task over the past 15 days through visual charts and lists.

Limits

  • Only DataWorks Professional Edition or a more advanced edition supports the Intelligent Diagnosis feature. You can currently experience it for free, but upgrading to the Professional Edition is recommended for access to more product capabilities. For more information about version upgrades, see DataWorks Version Details.

  • The Intelligent Diagnosis feature supports the following regions: China (Hangzhou), China (Shanghai), China (Beijing), China (Zhangjiakou), China (Shenzhen), Hong Kong (China), Japan (Tokyo), Singapore, Malaysia (Kuala Lumpur), Indonesia (Jakarta), Germany (Frankfurt), US (Silicon Valley), US (Virginia), and UAE (Dubai).

Access intelligent diagnosis

  1. Go to the Operation Center page.

    Log on to the DataWorks console. In the top navigation bar, select the desired region. In the left-side navigation pane, choose Data Development and O&M > Operation Center. On the page that appears, select the desired workspace from the drop-down list and click Go to Operation Center.

  2. After you enter the Operation Center, you can access the Intelligent Diagnosis page in the following two ways.

    • Method 1: Access Intelligent Diagnosis through instances.

      • Find Recurring Task Operations > Recurring Instance in the left-side navigation pane, click the Instance View tab, and then click Running Diagnosis in the Operation column of the corresponding instance to go to the Intelligent Diagnosis page.

      • In the left-side navigation pane, find Recurring Task Operations > Recurring Instance, click the Instance View tab, and then click DAG Graph in the Operation column for the relevant instance. In DAG mode, right-click the instance's DAG graph and choose Running Diagnosis from the menu to access the intelligent diagnosis page.

    • Method 2: Click Operations Assistant > Intelligent Diagnosis in the left-side navigation pane to go to the Intelligent Diagnosis page.

      Note

      Intelligent Diagnosis only supports locating specific instances through instance IDs.

View running details

DataWorks checks the status of upstream tasks, the scheduled time of the current task, the usage of scheduling resources, and the execution status of the current task based on the conditions required for running a task:

  • Upstream Dependency

    The Upstream Dependency page of Intelligent Diagnosis checks the running status of upstream tasks of the current task. If upstream tasks do not run successfully, the current task will be blocked. You can click Running Diagnosis in the Operation column of upstream tasks to identify the cause of upstream task failures.

    Note

    If upstream tasks have not run and the upstream dependency level is deep, it is recommended to use the Upstream Analysis feature in the DAG panel to quickly locate the key upstream tasks that block the current task. Then use the Intelligent Diagnosis feature to diagnose the reason why the key tasks have not run, improving operational efficiency.

  • Scheduled Check

    The Scheduled Check verifies whether the current task has reached its scheduled running time. This check is initiated only when the upstream dependency check is successful.

    Note

    When configuring scheduling properties for a task in the data development module, you must specify the expected running time of the task in the scheduling environment. However, the actual running time of the task may be delayed due to issues such as upstream task failures.

  • Scheduling Resources

    The Scheduling Resources page of Intelligent Diagnosis displays the resource usage and the list of tasks occupying the resources while waiting for resources. If the current task fails this check, it indicates that the scheduling resources used by the current task are insufficient. The task will enter a waiting state until the tasks occupying the scheduling resources are completed and the resources are released, allowing the current task to start running. You can arrange the scheduling time of tasks based on resource usage trends to avoid peak hours.

    Feature

    Description

    Scheduling Resource Information

    Shows the name of the scheduling resource group used by the previous task, the number of tasks currently running on the resource group, and the number of tasks waiting to run on the resource group.

    Note

    It is recommended to use Serverless resource groups to alleviate resource shortages.

    If you use shared resource groups for scheduling, DataWorks experiences peak hours from 00:00 to 09:00 every day, during which shared scheduling resources are tight and tasks may wait for resources.

    Diagnosis Results

    Shows the execution status of the current task.

    Resource Usage Trends

    If you use shared resource groups for scheduling, this section shows the resource utilization of the current scheduling resource group within each time period and the time consumed by the current task to wait for resources.

  • Task Execution

    The Task Execution section presents the execution logs, details of associated data quality rules, and code details of the current task. For failed tasks, Intelligent Diagnosis provides suggestions based on log information to help you quickly identify the cause of task errors.

    Feature

    Description

    Logs

    Shows the detailed execution process of the task.

    You can click the EMR Web UI address printed in the Logs information in the Running Details section of the Task Execution module to jump to and view the corresponding EMR component web page.

    DQC

    If the task is associated with data quality rules, the data quality rules are triggered when the task is run. You can view the detailed execution status of the data quality rules here.

    Code Details

    Shows the code details of the current task.

View basic information

You can view the key time points and basic information of the current task on the Basic Information page. For more information about related properties, see Scheduling Configuration.

View affected baseline

You can view the list of baselines that include the current task in the monitoring scope and the baseline running status on the Affected Baseline page. For more information about intelligent baselines, see Intelligent Baseline Overview.

View historical instances

On the Historical Instances page, you can view the following information:

  • The trend of changes in various dimension metrics of the current task: Displays the trend of changes in Running Time, Start Time, Waiting Time For Resources, and Completion Time of the current task over the past 15 days through visual charts.

  • The historical instance running status of the current task: Provides running details of historical instances of the current task, including start time, end time, running duration, and waiting time for resources. You can click Running Diagnosis in the Operation column to navigate to the diagnosis details page of the corresponding instance.