All Products
Search
Document Center

DataWorks:Intelligent diagnosis

Last Updated:Nov 15, 2024

The intelligent diagnosis feature allows you to perform end-to-end diagnosis on task instances. If task instances are not run as expected, you can use this feature to identify problems.

Overview

You can use the intelligent diagnosis feature to diagnose and analyze task instances from the following dimensions:

  • Status of the current instance

    • Check the status of ancestor instances of the current instance: If an ancestor instance of the current instance fails to be run, the current instance is blocked. The intelligent diagnosis feature can help you identify the reason for the failure of the ancestor instance.

    • Check whether the scheduling time configured for the current instance has arrived.

      Note

      When you configure scheduling properties for a task for which the current instance is generated on the DataStudio page, you must specify the time at which the task is scheduled to run in the scheduling system. However, the actual time at which the task starts to be run may be later than the scheduling time of the task due to issues such as the failure of an ancestor task of the current task.

    • Check the usage of scheduling resources: You can view the resource usage and the list of instances that occupy resources when the current instance is waiting for the resources.

    • View running details of the current instance: You can view the run logs of the current instance, details of associated data quality monitoring rules, code details of the task for which the current instance is generated, and suggestions on the current instance based on diagnosis results.

    Note
    • An instance can be scheduled to run only when the following conditions are met: Ancestor instances of the instance are successfully run, the scheduling time of the instance has arrived, scheduling resources are sufficient, and the instance has not been run. For more information, see What are the conditions that are required for a node to successfully run?.

    • If some ancestor instances of the current instance are not run and dependencies between the current instance and its ancestor instances are complex, we recommend that you use the upstream analysis feature on the Upstream Analysis tab of the DAG page to identify the key ancestor instances that block the running of the current instance. Then, you can use the intelligent diagnosis feature to identify the reason why the ancestor instances are not run. This improves O&M efficiency.

  • Basic information: You can view the key points in time for the current instance.

  • Affected baseline: You can view the baseline that contains the task for which the current instance is generated within the monitoring scope and the status of the baseline. For more information about the intelligent baseline feature, see Overview.

  • Status of the historical instances of the current task: You can view the status of the historical instances of the current task within recent 15 days in chart mode or list mode.

Limits

  • Only users of DataWorks Professional Edition or a more advanced edition can use the intelligent diagnosis feature. If you use another edition, you can have a trial use of the feature for free. However, we recommend that you upgrade the DataWorks service to DataWorks Professional Edition to use more features. For more information, see Differences among DataWorks editions.

  • The intelligent diagnosis feature is supported only in the following regions:

    China (Hangzhou), China (Shanghai), China (Beijing), China (Zhangjiakou), China (Shenzhen), China (Hong Kong), Japan (Tokyo), Singapore, Malaysia (Kuala Lumpur), Indonesia (Jakarta), Germany (Frankfurt), US (Silicon Valley), US (Virginia), and UAE (Dubai)

Go to the Intelligent Diagnosis page

  1. Go to the Operation Center page.

    Log on to the DataWorks console. In the top navigation bar, select the desired region. In the left-side navigation pane, choose Data Development and Governance > Operation Center. On the page that appears, select the desired workspace from the drop-down list and click Go to Operation Center.

  2. On the Operation Center page, use one of the following methods to go to the Intelligent Diagnosis page:

    • Method 1: Go to the Intelligent Diagnosis page of an instance.

      • In the left-side navigation pane, choose Auto Triggered Node O&M > Auto Triggered Instances or Test Instances. Find the desired instance and click theimage, image, or image icon in the General column to go to the Intelligent Diagnosis page of the instance.

      • In the left-side navigation pane, choose Manually Triggered Node O&M > Manual Triggered Instances. Find the desired instance and click theimage, image, or image icon in the General column to go to the Intelligent Diagnosis page of the instance.

      • In the list of instances, find the desired instance and click Perform Diagnostics in the Actions column. If the current page is not displayed in list mode, you can click the 箭头 icon in the middle of the page. The current page is displayed in list mode.

      • On the DAG page of the desired instance, right-click the instance and select Instance Diagnose. If the current page is not displayed in DAG mode, you can click DAG in the Actions column of the desired instance to open the DAG of the instance.

      • On the DAG page of the desired instance, click the instance. In the pane that appears in the lower-right corner, click Perform Diagnostics next to Node Status.

    • Method 2: In the left-side navigation pane, choose O&M Assistant > Intelligent Diagnosis.

      Note

      The intelligent diagnosis feature allows you to search for instances only by instance ID. You can obtain the instance ID on the instance details page.

Status of the current instance

On the Running Details tab, DataWorks checks the status of ancestor instances of the current instance, the scheduling time configured for the current instance, the usage of scheduling resources, and the status of the current instance in sequence based on the conditions required for running an instance.

  • Upstream Nodes

    In the Upstream Nodes step on the Running Details tab of the Intelligent Diagnosis page, you can view the status of ancestor instances of the current instance. If an ancestor instance fails to be run, the current instance is blocked. You can click Instance Diagnose in the Operation column of the ancestor instance to identify the reason for the failure.

    Note

    If some ancestor instances of the current instance are not run and dependencies between the current instance and its ancestor instances are complex, we recommend that you use the upstream analysis feature on the Upstream Analysis tab of the DAG page to identify the key ancestor instances that block the running of the current instance. Then, you can use the intelligent diagnosis feature to identify the reason why the ancestor instances are not run. This improves O&M efficiency.

    上游依赖

  • Timing Check

    In the Timing Check step, you can check whether the scheduling time configured for the current instance has arrived. The check is triggered only when the upstream dependency check is successful.定时检查

  • Resources

    In the Resources step, you can view the resource usage. If the current instance fails to pass the resource usage check, the scheduling resources used for running the current instance are insufficient. In this case, the current instance enters the state of waiting for resources. The current instance can start to be run only when instances that occupy the scheduling resources are complete and the scheduling resources are released. You can arrange the scheduling time of the current instance to avoid peak hours based on the information in the Resources step.调度资源

    Section

    Description

    Scheduling resource information

    Allows you to view the name of the resource group for scheduling that is used by the current instance, the number of instances that are running on the resource group for scheduling, and the number of instances that are waiting to be run on the resource group for scheduling.

    Note

    We recommend that you use serverless resource groups to ease scheduling resource constraints.

    The peak hours for DataWorks tasks are from 00:00 to 09:00 every day. If you use the shared resource group for scheduling during the peak hours, resources in the resource group may be insufficient, and tasks may wait for resources.

    Diagnosis Results

    Allows you to view the execution status of the current task.

    Resource Usage Trends

    Allows you to view the resource usage of the current resource group for scheduling within each time period and the time consumed by the current instance to wait for resources if you use shared resource groups for scheduling.

  • Execution

    In the Execution step, you can view the run logs of the current instance, details of associated data quality monitoring rules, and code details of the node for which the current instance is generated. For an instance that fails to be run, the intelligent diagnosis feature provides diagnosis results and suggestions based on log information. This helps you identify the cause of the error that occurs on the instance.任务运行

    Tab

    Description

    Log

    Allows you to view the running details of the current instance.

    DQC

    Allows you to view details of the data quality monitoring rule. If you associate a data quality monitoring rule with the task for which the current instance is generated, the data quality monitoring rule is triggered after the task is run.

    Code details

    Allows you to view the code details of the task for which the current instance is generated.

General tab

On the General tab, you can view key points in time for the current instance and basic information about the current instance. For more information about scheduling properties that are configured for the node for which the current instance is generated, see Configure basic properties.基本信息

Impact baseline tab

On the Impact baseline tab, you can view the baseline that contains the task for which the current instance is generated within the monitoring scope and the status of the baseline. For more information about intelligent baselines, see Overview of intelligent baselines.影响基线

Historical instance tab

On the Historical instance tab, you can view the following information:

  • Trends of the following metrics measured for the current node within recent 15 days in charts: Running time, Start run time, Time consumption of waiting for scheduling resources, and Completed At.

  • Running details of the instances that are generated for the current node over a historical period of time in the Historical instance list, including the time when an instance started to run, the time when the instance was complete, the running duration, and the time spent for waiting for resources. You can click Instance Diagnose in the Operation column of an instance to go to the diagnosis details page of the instance.

历史实例