You can use the Intelligent Diagnosis feature to perform end-to-end analysis on tasks. If tasks do not run as expected, this feature enables quick identification of problems.
Background information
The Intelligent Diagnosis feature can diagnose and analyze tasks across the following dimensions:
-
View Running Details: The conditions for a scheduling task to run include successful execution of upstream tasks, the current task reaching its scheduled time, availability of scheduling resources, and the current task not already running. For more information, see Task Running Conditions.
-
View Basic Information: Enables you to view the key time points of the current task.
-
Affected Baseline: Enables you to view the list of baselines that include the current task in the monitoring scope and the baseline running status. For more information about intelligent baselines, see Intelligent Baseline Overview.
-
Historical Instance Running Status: Enables you to view the execution status of the current task over the past 15 days through visual charts and lists.
Limits
-
Only DataWorks Professional Edition or a more advanced edition supports the Intelligent Diagnosis feature. You can currently experience it for free, but upgrading to the Professional Edition is recommended for access to more product capabilities. For more information about version upgrades, see DataWorks Version Details.
-
The Intelligent Diagnosis feature supports the following regions: China (Hangzhou), China (Shanghai), China (Beijing), China (Zhangjiakou), China (Shenzhen), Hong Kong (China), Japan (Tokyo), Singapore, Malaysia (Kuala Lumpur), Indonesia (Jakarta), Germany (Frankfurt), US (Silicon Valley), US (Virginia), and UAE (Dubai).
Access intelligent diagnosis
Go to the Operation Center page.
Log on to the DataWorks console. In the top navigation bar, select the desired region. In the left-side navigation pane, choose . On the page that appears, select the desired workspace from the drop-down list and click Go to Operation Center.
-
After you enter the Operation Center, you can access the Intelligent Diagnosis page in the following two ways.
-
Method 1: Access Intelligent Diagnosis through instances.
-
Find
in the left-side navigation pane, click the Instance View tab, and then click Running Diagnosis in the Operation column of the corresponding instance to go to the Intelligent Diagnosis page. -
In the left-side navigation pane, find
, click the Instance View tab, and then click DAG Graph in the Operation column for the relevant instance. In DAG mode, right-click the instance's DAG graph and choose Running Diagnosis from the menu to access the intelligent diagnosis page.
-
-
Method 2: Click
in the left-side navigation pane to go to the Intelligent Diagnosis page.NoteIntelligent Diagnosis only supports locating specific instances through instance IDs.
-
View running details
DataWorks checks the status of upstream tasks, the scheduled time of the current task, the usage of scheduling resources, and the execution status of the current task based on the conditions required for running a task:
-
Upstream Dependency
The Upstream Dependency page of Intelligent Diagnosis checks the running status of upstream tasks of the current task. If upstream tasks do not run successfully, the current task will be blocked. You can click Running Diagnosis in the Operation column of upstream tasks to identify the cause of upstream task failures.
NoteIf upstream tasks have not run and the upstream dependency level is deep, it is recommended to use the Upstream Analysis feature in the DAG panel to quickly locate the key upstream tasks that block the current task. Then use the Intelligent Diagnosis feature to diagnose the reason why the key tasks have not run, improving operational efficiency.
-
Scheduled Check
The Scheduled Check verifies whether the current task has reached its scheduled running time. This check is initiated only when the upstream dependency check is successful.
NoteWhen configuring scheduling properties for a task in the data development module, you must specify the expected running time of the task in the scheduling environment. However, the actual running time of the task may be delayed due to issues such as upstream task failures.
-
Scheduling Resources
The Scheduling Resources page of Intelligent Diagnosis displays the resource usage and the list of tasks occupying the resources while waiting for resources. If the current task fails this check, it indicates that the scheduling resources used by the current task are insufficient. The task will enter a waiting state until the tasks occupying the scheduling resources are completed and the resources are released, allowing the current task to start running. You can arrange the scheduling time of tasks based on resource usage trends to avoid peak hours.
Feature
Description
Scheduling Resource Information
Shows the name of the scheduling resource group used by the previous task, the number of tasks currently running on the resource group, and the number of tasks waiting to run on the resource group.
NoteIt is recommended to use Serverless resource groups to alleviate resource shortages.
If you use shared resource groups for scheduling, DataWorks experiences peak hours from 00:00 to 09:00 every day, during which shared scheduling resources are tight and tasks may wait for resources.
Diagnosis Results
Shows the execution status of the current task.
Resource Usage Trends
If you use shared resource groups for scheduling, this section shows the resource utilization of the current scheduling resource group within each time period and the time consumed by the current task to wait for resources.
-
Task Execution
The Task Execution section presents the execution logs, details of associated data quality rules, and code details of the current task. For failed tasks, Intelligent Diagnosis provides suggestions based on log information to help you quickly identify the cause of task errors.
Feature
Description
Logs
Shows the detailed execution process of the task.
You can click the EMR Web UI address printed in the Logs information in the Running Details section of the Task Execution module to jump to and view the corresponding EMR component web page.
DQC
If the task is associated with data quality rules, the data quality rules are triggered when the task is run. You can view the detailed execution status of the data quality rules here.
Code Details
Shows the code details of the current task.
View basic information
You can view the key time points and basic information of the current task on the Basic Information page. For more information about related properties, see Scheduling Configuration.
View affected baseline
You can view the list of baselines that include the current task in the monitoring scope and the baseline running status on the Affected Baseline page. For more information about intelligent baselines, see Intelligent Baseline Overview.
View historical instances
On the Historical Instances page, you can view the following information:
-
The trend of changes in various dimension metrics of the current task: Displays the trend of changes in Running Time, Start Time, Waiting Time For Resources, and Completion Time of the current task over the past 15 days through visual charts.
-
The historical instance running status of the current task: Provides running details of historical instances of the current task, including start time, end time, running duration, and waiting time for resources. You can click Running Diagnosis in the Operation column to navigate to the diagnosis details page of the corresponding instance.