You can use the intelligent baseline feature to detect an exception that prevents a task in a baseline from being completed on time. If an exception is detected, the system sends you an alert notification about the exception at the earliest opportunity. This ensures that important data is generated as expected in scenarios in which dependencies between tasks in the baseline are complex. This also helps you reduce configuration costs, prevent invalid alerts, and automatically monitor important tasks.
Scenarios
Manage the priorities of tasks.
In the scenario where the number of tasks is increasing, but the number of resources is limited, the tasks preempt the resources. You can create a baseline and add important tasks to the baseline. Then, you can configure a high priority for the baseline to ensure that the system preferentially allocates the resources to tasks in the baseline.
Calculate the estimated completion time of a task.
The running of a task is affected by resource supply and the status of ancestor tasks of the task. After you add a task that is scheduled to run every day or every hour to a baseline, DataWorks can calculate and display the estimated completion time of the task on the specified day or in the specified hour.
Ensure that a task finishes running before the committed point in time.
You can add a task to a baseline and configure a committed point in time for the baseline. If the system predicts that the task in the baseline cannot finish running before the committed point in time, an error occurs for an ancestor task of the task, or the ancestor task slows down, the system sends you an alert notification. This way, you can troubleshoot issues based on the alert at the earliest opportunity to ensure that the task can finish running before the committed point in time.
Terms
Baseline: You can create a baseline, add important tasks to the baseline, and configure a committed point in time for the baseline. This way, the system can calculate the estimated completion time for tasks in the baseline based on the status of the tasks. If the system determines that a task in the baseline cannot finish running before the committed point in time, the system sends you an alert notification.
Committed point in time: the committed point in time before which all tasks in a baseline finish running. DataWorks ensures that all tasks in a data application finish running before the committed point in time. If you want to reserve a certain amount of time for O&M personnel to handle exceptions that occur on tasks in a baseline, you can configure an alert margin threshold for the baseline. The system uses the time obtained by
subtracting the alert margin threshold from the committed point in time
as the alert time of the baseline. The alert time is also the estimated point in time before which all tasks in the baseline finish running.Alert time: the time that is obtained by
subtracting the alert margin threshold from the committed point in time
.Baseline task: a task that you add to a baseline.
Baseline instance: an instance that is generated by a task in a baseline. The system uses baseline instances to calculate the estimated completion time for a task in the baseline each time. The status of a baseline instance can be safe, alert, or overtime.
A baseline instance is in the safe state if the
estimated completion time for the baseline instance is earlier than the alert time
.A baseline instance is in the alert state if the
estimated completion time for the baseline instance is later than the alert time and earlier than the committed point in time
.A baseline instance is in the overtime state if the
estimated completion time for the baseline instance is later than the committed point in time
.
Key path: Multiple paths affect data generation of the current task in a baseline. The path that takes the longest time for task execution is considered the key path.
Event: an event that is generated if an error is reported for a task in a baseline or for an ancestor task of the task in the baseline, or if a task in the key path slows down. Events prevent tasks in the baseline from being completed on time.
Features
After you add tasks to a baseline, DataWorks preferentially allocates resources to the tasks in the baseline based on the priority of the baseline to ensure data generation of the tasks. In addition, DataWorks determines a monitoring scope for the tasks in the baseline based on the dependencies between the tasks. A baseline alert or an event alert is triggered based on the status of the tasks within the monitoring scope.
Determine the tasks that you want to monitor based on Task K in the baseline.
Ancestor tasks of Task K in the baseline: Tasks that affect the running of Task K will be monitored.
Descendant tasks of Task K in the baseline: Descendant tasks of Task K will not be monitored.
Key path: indicates the path in which the total time consumed to run all tasks along the path is the longest among all paths that affect the running of Task K.
Create a baseline.
Specify tasks that you want to add to Task K in the baseline.
Configure the priority for the baseline and configure alert rules.
A baseline alert or an event alert is triggered based on the running details of tasks that are monitored in the baseline.
You can perform the following operations on a baseline:
Create and manage a baseline
You can create and manage a baseline on the Baselines tab.
You can add important tasks to a baseline, configure basic information such as the committed point in time for the baseline, and configure alert rule parameters such as the notification method and alert contact for the baseline. The system monitors tasks and sends you alert notifications based on the baseline configurations.
You can also configure the priority of the baseline, which determines the priority of the tasks in the baseline. The higher the priority of a baseline, the higher the priority of the tasks in the baseline. If scheduling resources are insufficient, the system preferentially allocates the scheduling resources to tasks in the baseline for which you configure a high priority.
NoteThe priority that you configured for the baseline is mapped to the priority of MaxCompute compute tasks if the following conditions are met:
The priority feature is enabled for MaxCompute projects.
MaxCompute projects use the subscription computing resources.
The priority of a MaxCompute job is calculated based on the following formula: 9 - Priority of a baseline in DataWorks.
For more information about how to create and manage a baseline, see Manage baselines.
Determine the monitoring scope
DataWorks determines the monitoring scope based on the dependencies between tasks in a baseline. All tasks that may affect data generation of the tasks in the baseline are monitored. For more information, see Core logic: monitoring scope.
Trigger an alert and send an alert notification
Baseline alert
An alert can be automatically triggered based on the alert rule parameters that you configure for a baseline, and the status of tasks that are within the monitoring scope. After an alert is triggered, DataWorks sends you an alert notification. If DataWorks predicts that tasks in a baseline cannot finish running before the committed point in time, DataWorks sends you an alert notification by using a notification method that you specify. For more information, see Core logic: baseline alert.
Event alert
After a monitoring scope is determined, when an error is reported for a task in a baseline or for an ancestor task of the task in the baseline, or when a task in the key path slows down, the related event is generated and DataWorks sends you an alert notification. You can view existing events on the Events tab. For more information, see Manage events.
Billing
Number of baseline instances: Tasks in a baseline that is enabled generate instances in the baseline. You are billed based on the number of baseline instances that are generated before 00:23:59 each day. For more information, see Baseline instances.
Numbers of alert text messages and alert phone calls: You are charged for text messages and phone calls that are generated when baseline alerts are triggered. For more information, see Billing of alert text messages and alert phone calls.
Limits
Only users of DataWorks Standard Edition or a more advanced edition can use the intelligent baseline feature. If your DataWorks service does not meet the requirements, you must upgrade it to Standard Edition or a more advanced edition first. For more information, see Differences among DataWorks editions.
Core logic: monitoring scope
After you create a baseline and add a task to the baseline, you cannot use the intelligent baseline feature to monitor all ancestor and descendant tasks of the task in the baseline. The following content describes the monitoring scope for tasks in a baseline:
Ancestor tasks: The ancestor tasks that affect data generation of tasks in a baseline are monitored.
Descendant tasks: The descendant tasks of tasks in a baseline are not monitored. If an error is reported for a descendant task of a task in a baseline or for a descendant task that is in a different branch from the task in the baseline, the system does not send an alert notification.
As shown in the preceding figure, tasks A, B, C, D, E, and F are created in DataWorks. Tasks D and E are tasks in a baseline. Tasks A and B are ancestor tasks of Tasks D and E and affect data generation of Tasks D and E. Therefore, tasks A, B, D, and E are within the monitoring scope. If an exception occurs on a task within the monitoring scope or if a task within the monitoring scope slows down, the system can detect the issue. However, tasks C and F are not within the monitoring scope of the intelligent baseline.
Core logic: baseline alert
You can add important tasks to a baseline, and configure the committed point in time and alert margin threshold for the baseline.
DataWorks uses the time obtained by
subtracting the alert margin threshold from the committed point in time
as the alert time of the baseline. DataWorks uses baseline instances to calculate the latest completion time and latest start time for each task within the monitoring scope based on the alert time and the average running durations of tasks within the monitoring scope over a historical period of time.If DataWorks predicts that the status of a task within the monitoring scope may prevent tasks in a baseline from being completed before the alert time, DataWorks sends you an alert notification.
Core logic: event alert
After a monitoring scope is determined, the intelligent monitoring system generates an event and reports an alert when an exception occurs on a task within the monitoring scope. The alert is reported based on the analysis results of the event. Exceptions:
Error: indicates that a task fails to run.
Slow: indicates that the running duration of a task is significantly longer than the average running duration of the task in the previous periods.
If a task slows down and then encounters an error, two events are generated.
You can go to the Events tab to view the details about an event.
Core logic: key path and key instance
The dependencies between the tasks that you want to monitor in a baseline may be complex. DataWorks provides Gantt charts for you to quickly identify the key path and key instances that prevent the tasks in the baseline from generating data. The key path for a baseline is the path in which tasks affect data generation of the tasks that you want to monitor in the baseline and that takes the longest time for task execution.
Example
Scenario: The current time is 6:40. Task F is still running.
Baseline alert:
YYYY-MM-DD HH:mm:ss
Baseline alert; data timestamp: xx; alert margin threshold: -10 min...
Event alert:
YYYY-MM-DD HH:mm:ss
Event alert; data timestamp: XX; Task XX, status: delayed...
You can use a Gantt chart to view the key path of a task that you want to monitor. The following Gantt chart shows the key path of and the point in time at which each exception occurs for the tasks that are shown in the preceding figure.