You can use custom alert rules to monitor the status or resource usage of specified nodes based on your business requirements. This helps you identify and handle exceptions at the earliest opportunity. This topic describes how to create a custom alert rule on the Rule Management page. This topic also describes how to add a DingTalk chatbot and obtain the webhook URL of the chatbot.
Limits
Custom alert rules take effect only on auto triggered node instances. The results of test instances and data backfill instances that are generated for auto triggered nodes are not monitored.
Custom alert rules support the following alert notification methods: email, text message, phone call, DingTalk chatbot, and webhook URL. Take note of the following limits on the supported alert notification methods:
Phone call: Alert notifications that are sent by using mobile phone numbers only in the Chinese mainland are supported.
Webhook URL:
The webhook URL-based alerting feature is supported only in DataWorks Enterprise Edition.
The webhook URL-based alerting feature is supported in the following regions: China (Shanghai), China (Chengdu), China (Zhangjiakou), China (Beijing), China (Hangzhou), China (Shenzhen), China (Hong Kong), Germany (Frankfurt), and Singapore.
An alert notification can be sent by using the webhook URL-based alerting feature only to WeCom or Lark.
NoteDataWorks supports the webhook URL-based alerting feature only for DingTalk, WeCom, and Lark. If you want to use a self-developed, webhook-based message sending service, refer to Intelligent monitoring: Formats of alert messages sent by using a custom webhook to configure settings. After the configuration is complete, submit a ticket to contact Alibaba Cloud DataWorks technical support for further processing.
You can configure trigger conditions such as Instances with Errors, Proportion of Instances with Errors, and Node Logs Contain Keywords only in DataWorks Professional Edition or a more advanced edition. For more information, see Differences among DataWorks editions. For information about how to activate DataWorks, see Purchase guide.
Precautions
The following table describes the monitoring time ranges that correspond to different alert trigger conditions when you use custom alert rules to monitor auto triggered node instances.
Monitoring time range | Trigger condition | Description |
Data timestamp (previous day, represented by T) |
| DataWorks monitors auto triggered node instances whose data timestamp is the previous day and scheduling time is the current day. If one of the trigger conditions is met, an alert is reported. |
Data timestamp (previous day, represented by T) and the day before the previous day (represented by T-1) |
| DataWorks monitors auto triggered node instances whose data timestamp is the previous day and scheduling time is the current day and auto triggered node instances whose data timestamp is the day before the previous day and scheduling time is the previous day. If one of the trigger conditions is met, an alert is reported. |
Data timestamp (previous day, represented by T), the day before the previous day (represented by T-1), and two days before the previous day (represented by T-2) |
| DataWorks monitors the following auto triggered node instances: auto triggered node instances whose data timestamp is the previous day and scheduling time is the current day, auto triggered node instances whose data timestamp is the day before the previous day and scheduling time is the previous day, and auto triggered node instances whose data timestamp is two days before the previous day and scheduling time is the day before the previous day. If one of the trigger conditions is met, an alert is reported. |
For an auto triggered node instance that is not within the required time range, an alert is not reported even if the instance meets a trigger condition. For more information about monitoring rules that correspond to different trigger conditions, see the Create a custom alert rule section in this topic.
Go to the Rule Management page
Go to the Operation Center page.
Log on to the DataWorks console. In the top navigation bar, select the desired region. In the left-side navigation pane, choose . On the page that appears, select the desired workspace from the drop-down list and click Go to Operation Center.
In the left-side navigation pane of the Operation Center page, choose .
NoteYou can also go to the Auto Triggered Nodes page, select multiple nodes, and then choose Actions > Add Alert Rule at the bottom to create a custom alert rule for the nodes. For more information, see View and manage auto triggered tasks.
Create a custom alert rule
On the Rule Management page, you can create a custom alert rule based on your business requirements.
Configure parameters in the Basic Information section
Parameter | Description |
Rule Name | The name of the custom alert rule. |
Object Type | The type of object that you want to monitor. Valid values: Node, Baseline, Workspace, Workflow, Exclusive Resource Group for Scheduling, and Exclusive Resource Group for Data Integration. Note If this parameter is set to Baseline, you can monitor only the status of nodes that belong to a specified baseline. If you also want to monitor the status of ancestor nodes of the nodes that belong to the baseline, see Overview. |
Rule Object | The object that you want to monitor. To add an object that you want to monitor, enter the name or ID of the object in the Rule Object field, select the object that appears, and then click Add. You can add the following types of objects. The maximum number of objects that you can add varies based on the object type you selected.
|
Add to Whitelist | Specifies the nodes that are in the monitoring scope but you do not want to monitor. This parameter is required only if you set the Object Type parameter to Baseline, Workspace, or Workflow. To add a node to the whitelist, enter the name or ID of the node in the Add to Whitelist field and click Add. Note You can add a maximum of 50 nodes to the whitelist. The nodes that you add to the whitelist are not monitored. |
Resource Group Name | The name of the exclusive resource group that you want to monitor. This parameter is required only if you set the Object Type parameter to Exclusive Resource Group for Scheduling or Exclusive Resource Group for Data Integration. |
Configure parameters in the Trigger Condition section
In the logic of a custom alert rule, a node is complete if the node is in the frozen state.
Object type | Trigger Condition | Description |
Node, Baseline, Workspace, or Workflow | Complete | Nodes are monitored from the time when they start to run. When the nodes are successfully run, an alert is reported.
Note For a node that is scheduled to run by hour, the node is considered complete only after the node is successfully run in all cycles. |
Incomplete | Nodes are monitored from the time when they start to run. If the nodes are still running at a specified point in time, an alert is reported. Note Alert rules of this trigger condition type are different from alert policies provided by using the intelligent baseline feature. The intelligent baseline feature can be used to detect an exception that prevents a node in a baseline from being complete on time. If an exception is detected, the system sends you an alert notification about the exception at the earliest opportunity. For more information, see Overview. Sample scenarios:
Note For a node that is scheduled to run by hour or minute, the system checks whether the node is complete at a specified point in time in all cycles on the current day. | |
Error | Nodes are monitored from the time when they start to run. If an error occurs when the nodes are running, an alert is reported. Note If an error occurs for a node instance, the icon is displayed in the General column on the Auto Triggered Instances page under Auto Triggered Node O&M in Operation Center.
| |
Instances with Errors | An alert is reported if the number of instances on which an error occurs on the current day reaches a specified threshold. The error can be a failed data quality check or a failure in execution of code logic. If the Object Type parameter is set to Workspace and the Trigger Condition parameter is set to Instances with Errors, you must specify a threshold. Note
| |
Proportion of Instances with Errors | An alert is reported if the proportion of the number of instances on which an error occurs in the workspace to the total number of instances on the current day reaches a specified threshold. If the Object Type parameter is set to Workspace and the Trigger Condition parameter is set to Proportion of Instances with Errors, you must specify a threshold. Note
| |
Node Logs Contain Keywords | An alert is reported if run logs of nodes contain keywords on the current day. If the Object Type parameter is set to Workspace and the Trigger Condition parameter is set to Node Logs Contain Keywords, you must specify keywords. Note
| |
Incomplete in Cycle | If nodes are still running at the end of a specified cycle, an alert is reported. In most cases, you can configure this trigger condition for node instances that are scheduled to run by hour. If the Trigger Condition parameter is set to Incomplete in Cycle for workflows, the system monitors nodes that are scheduled to run by day, hour, or minute in the workflows based on the cycle number (N) that you specified. If the number of node instances for a node is less than N, the system ignores the alerts reported for the node. For example, you set the cycle number to 3, and two nodes are contained in a workflow. The following examples show alerting and monitoring details:
| |
Timed Out | Nodes are monitored from the time when they start to run. If the nodes are still running after a specified period of time ends, an alert is reported. In most cases, you can configure this trigger condition to monitor the duration of a node. Note If a node that is monitored fails to be run and remains in the failed state after a specified period of time ends, a timeout alert is reported. | |
Error Persisting After Automatic Rerun of Node | Nodes are monitored from the time when they start to run. If an error persists after the nodes are rerun, an alert is reported. Note If you want an alert to be reported each time an error occurs when a node is running, you can set the trigger condition to Error. | |
Instance Generated | You can set the trigger condition to Instance Generated only when the Object Type parameter is set to Workspace. | |
Fluctuation of Instance Count | You can set the trigger condition to Fluctuation of Instance Count only when the Object Type parameter is set to Workspace. DataWorks generates auto triggered node instances that need to run the next day before 24:00 every day. When the number of auto triggered node instances that are generated in your workspace significantly fluctuates, in comparison with the average number of auto triggered node instances that are historically generated in the workspace, an alert is reported. | |
Exclusive Resource Group for Scheduling or Exclusive Resource Group for Data Integration | Resource Group Usage | If the value of the Resource Group Usage parameter is greater than a specific percentage for a specific period of time, an alert is reported. Example: If the value of the Resource Group Usage parameter is greater than 50% for 15 minutes, an alert is reported. |
Number of Instances Waiting for Resources in Resource Group | If the value of the Number of Instances Waiting for Resources in Resource Group parameter is greater than a specific number for a specific period of time, an alert is reported. Example: If the value of the Number of Instances Waiting for Resources in Resource Group parameter is greater than 10 for 15 minutes, an alert is reported. |
Configure parameters in the Alert Details section
Alert notification method | Alert contact | Description |
Mail, SMS, or Telephone | You can select Node Owner, Shift Schedule, or Others for Alert Contact. |
|
DingTalk Chatbot or WebHook | You can specify members in a group. |
|
Configure parameters in the Alerting Frequency Control section
Parameter | Description |
Maximum Alerts | The maximum number of times an alert is reported. If the number of times an alert is reported exceeds the specified threshold, the alert is no longer reported. |
Minimum Alert Interval | The minimum interval at which an alert is reported. |
Alerting Do-Not-Disturb Period | The system does not send alert notifications during the period of time that is specified by this parameter. For example, you set the Trigger Condition parameter to Timed Out, Error, or Incomplete for a node and set the Alerting Do-Not-Disturb Period parameter to the period of time from |
Click OK. An alert rule is created. On the Rule Management page, you can click View Details, Disable, Enable, or Delete in the Actions column that corresponds to a rule to perform the related operation.
View Details: View basic information about the desired rule.
Enable or Disable: Enable or disable a rule. You can enable a rule to monitor the status of a node for which the rule is configured. You can view alert details on the Alert Management page. For more information, see View alert details.
Delete: Delete a rule.
Scenario practices: Send alert notifications to a DingTalk group
Open the DingTalk group to which you want the system to send alert notifications and click the Group Settings icon in the upper-right corner.
In the Group Settings panel, click Group Assistant.
In the Group Assistant panel, click Add Robot.
In the ChatBot dialog box, click the icon.
In the Please choose which robot to add section, click Custom.
In the Robot details message, click Add.
In the Add Robot dialog box, configure the parameters.
Parameter
Description
Chatbot name
The name of the custom chatbot.
Add to Group
The DingTalk group to which the chatbot is added. This group cannot be changed.
Custom Keywords
After you specify custom keywords, messages can be sent only if these messages contain at least one of the specified keywords. You must add DataWorks as a keyword. This keyword is case-sensitive.
NoteYou can specify a maximum of 10 keywords. A message can be sent only if it contains at least one of the specified keywords.
Read the terms of service, select I have read and accepted <<DingTalk Custom Robot Service Terms of Service>>, and then click Finished.
After you complete the security settings, copy the webhook URL of the chatbot and click Finished.
ImportantKeep the webhook URL confidential. If the webhook URL is leaked, your business is at risk.
Go to the Rule Management page and click Create Custom Rule. In the Create Custom Rule dialog box, set the Alert Notification Method parameter to DingTalk Chatbot, and paste the chatbot webhook URL that you copied from DingTalk in the Webhook URL column in the DingTalk Chatbot section.