Application Real-time Monitoring Service (ARMS) allows you to create alert rules. If an alert is triggered, the system sends alert notifications to the specified contact group based on the notification methods specified in the alert rule. This way, you can handle the alert at the earliest opportunity.

Prerequisites

Note Alibaba Cloud no longer supports the alert feature of the old version. You can use the new alert management feature to create alert rules. For more information, see the following topics:
  • A monitoring task is created. For more information, see Create an application monitoring job.
  • Contacts are created. Only contact groups can be set as the notification receiver of an alert.

Background information

By default, alert notifications are sent based on the following rules:

  • To prevent you from receiving a large number of alert notifications in a short period of time, the system sends only one message for repeated alerts within 24 hours.
  • If no repeated alerts are generated within 5 minutes, the system sends a recovery email to notify you that the alert is cleared.
  • After a recovery email is sent, the system resets the alert status. If this alert arises again, it is deemed a new one.

An alert widget is essentially a data display method for datasets. When an alert widget is created, a dataset is also created to store the underlying data of the alert widget.

Note New alerts take effect within 10 minutes. The alert check may require 1 to 3 minutes.

Create an application monitoring alert rule

To create an alert for an application monitoring task on Java Virtual Machine-Garbage Collection (JVM-GC) times in period-over-period comparison, perform the following steps:

  1. Log on to the ARMS console.
  2. In the left-side navigation pane, choose Alerts > Alert Policies.
  3. On the Alert Policies page, choose Create Alert Rule > Application Monitoring Alarm in the upper-right corner.
  4. In the Create Alarm dialog box, enter all required information and click Save.
    1. Specify Alarm Name. Example: Application Call Statistics.
    2. Select an application from the Application Site drop-down list. Select an application group from the Application Group drop-down list.
    3. Select a metric type from the Type drop-down list. For example, you can select Invocation_Statistic.
    4. Set the Dimension parameter to Traverse.
    5. Set Alarm Rules to Meet All of the Following Criteria.
    6. Specify a value for Last N Minutes to configure alert conditions. For example, if the average error rate in the last 5 minutes is greater than or equal to 100%, an alert is triggered.
      Note To add more alert rules, click the Plus icon icon next to Last N Minutes.
    7. Configure the Notification Mode parameter. For example, you can select Email.
    8. Specify the Notification Receiver parameter. In the Contact Groups list, click the name of a contact group. If the contact group appears in the Selected Groups list, the setting is successful.

Create a browser monitoring alert rule

To create a Page_Metric alert rule to monitor JS_Error_Rate and JS_Error_Count for a browser monitoring job, perform the following steps:

  1. In the left-side navigation pane, choose Alerts > Alert Policies.
  2. On the Alert Policies page, choose Create Alert Rule > Browser Monitoring Alert in the upper-right corner.
  3. In the Create Alarm dialog box, enter all required information and click Save.
    1. Enter an alert name. Example: Page_Metric.
    2. Select the monitoring job that you created from the Application Site drop-down list.
    3. Select the type of metric that you want to monitor from the Type drop-down list. For example, you can select Page_Metric.
    4. Set the Dimension parameter to Traverse.
    5. Configure an alert rule.
      1. Select Meet All of the Following Criteria.
      2. Edit the alert rule. For example, an alert is triggered when the value of N is 10 and the average value of JS_Error_Rate equals or exceeds 20.
      3. To add more alert rules, click the plus sign (+) on the right of the first alert rule. For example, an alert is triggered when the value of N is 10 and the sum of JS_Error_Count equals or exceeds 20.
    6. Set Notification Mode. For example, select SMS and Email.
    7. Set the Notification Receiver parameter. In the Contact Groups list, click the name of a contact group. If the contact group appears in the Selected Groups list, the setting is successful.
    Browser Monitoring Alarm

Create a Prometheus alert rule

To create an alert rule for a Prometheus monitoring job such as an alert on network receiving pressure, perform the following steps:

  1. You can select one of the two available methods to go to the Create Alarm dialog box.
    • On the New DashBoard page of the Prometheus Grafana dashboard, click the icon to go to the ARMS Prometheus Create Alarm dialog box.
    • In the left-side navigation pane of the console, choose Alerts > Alert Policies. On the Alarm Policies page, choose Create Alarm > Prometheus in the upper-right corner.
  2. In the Create Alarm dialog box, enter all required information and click Save.
    1. Enter an alert name. Example: Received_Bytes.
    2. Select the cluster of the Prometheus monitoring job.
    3. Set Type to grafana.
    4. Select the specific dashboard and chart.
    5. Configure an alert rule.
      1. Select Meet All of the Following Criteria.
      2. Edit the alert rule. For example, an alert is triggered when the value of N is 5 and the average value of Received_Bytes (MB) equals or exceeds 3.
        Note A Grafana chart may display data of Curve A, Curve B, and Curve C. You can select one of them to monitor.
      3. In the PromQL field, edit the existing PromQL statement or enter a new PromQL statement.
        Important An error may be reported if a PromQL statement contains a dollar sign ($). You must delete the equal sign (=) and the parameters on both sides of the dollar sign ($) from the statement that contains the dollar sign ($). For example, change sum (rate (container_network_receive_bytes_total{instance=~"^$HostIp.*"}[1m])) to sum (rate (container_network_receive_bytes_total[1m]))
    6. Set Notification Mode. For example, select SMS.
    7. Set the Notification Receiver parameter. In the Contact Groups list, click the name of a contact group. If the contact group appears in the Selected Groups list, the setting is successful.
    Prometheus Monitoring Alarm

Description of basic fields

The following table describes the basic fields in the Create Alarm dialog box.

Create Alarm dialog box
FieldMeaningDescription
Application SiteThe monitoring task that is created. Select a value from the drop-down list.
TypeThe type of the alert metric. The metric types for the following three alerts are different:
  • Application Monitoring Alarm: includes application entry calls, the statistics of application call types, database metrics, JVM monitoring, host monitoring, and abnormal API calls.
  • Browser Monitoring Alarm: includes page metrics, API metrics, custom metrics, and page API metrics.
DimensionThe dimension for the specified metric (dataset). You can select None, =, or Traverse.
  • If you select None, the alert content shows the sum of all the values of this dimension.
  • If you select =, you must enter a specific value.
  • If you select Traverse, the alert content shows the dimension content that actually triggers the alert.
Last N MinutesThe system checks whether the data results in the last N minutes meet the trigger condition. Valid values of N: 1 to 60.
Notification ModeYou can select Email, SMS, Ding Ding Robot, and Webhook. You can select multiple notification methods. For information about how to configure a DingTalk chatbot alert, see Configure a DingTalk chatbot to send alert notifications.
Alarm Quiet PeriodYou can turn on or turn off Alarm Quiet Period. By default, the switch is turned on.
  • Turn on Alarm Quiet Period: If data remains in the triggered state, the second alert notification is sent 24 hours after the first alert is triggered. If the data is recovered, the system sends a data recovery notification and clears the alert. If the data triggers the alert one more time, the system sends the alert notification again.
  • Turn off Alarm Quiet Period: If the alert is continuously triggered, the system sends the alert notification every minute.
Alarm SeverityValid values: Warn, Error, and Fatal. N/A
Notification TimeThe time range during which alert notifications can be sent. No alert notifications are sent outside of this time range, but alert events are recorded. For more information about alert event records, see Manage alerts.
Notification ContentThe custom content of the alert notification. You can edit the default template. In the template, the following four variables must be specified: $AlarmName, $AlarmFilter, $AlarmTime, and $AlarmContent. The rest of the content can be customized. Other variables are not supported.

Description of complex general fields: period-over-period comparison

  • N-minute-on-N-minute comparison: Assume that β is the data (optionally average, sum, maximum, or minimum) in the last N minutes, and α is the data generated between the Nth minute and the 2Nth minute. The N-minute-on-N-minute comparison is the percentage increase or decrease when β is compared with α. Day-on-day Growth or Decline
  • N-minute-on-N-minute hourly comparison: Assume that β is the data (optionally average, sum, maximum or minimum) in the last N minutes, and α is the data generated during the last N minutes in the previous hour. The N-minute-on-N-minute hourly comparison is the percentage increase or decrease when β is compared with α. Growth or Decline
  • N-minute-on-N-minute daily comparison: Assume that β is the data (optionally average, sum, maximum or minimum) in the last N minutes, and α is the data generated during the last N minutes at the same time in the previous day. The N-minute-on-N-minute daily comparison is the percentage increase or decrease when β is compared with α. Growth or Decline2

Description of complex general fields: Alarm Data Revision

You can set Alarm Data Revision to Set 0, Set 1, or Set Null (Won't Trigger). This feature allows you to fix data anomalies, such as no data, abnormal composite metrics, and abnormal period-over-period comparisons.

  • Fill 0: fixes the checked value to 0.
  • Fill 1: fixes the checked value to 1.
  • Set Null (Won't Trigger): does not trigger the alert

Scenarios:

  • Anomaly 1: no data

    User A wants to use the alerting feature to monitor page views. When User A creates an alert rule in Browser Monitoring, User A specifies that an alert is triggered when N is 5, and the sum of page views is less than or equal to 10. If the page is not accessed, no data is reported, and no alert notification is sent. To resolve this issue, User A can select Fill 0 for the Alarm Data Revision Policy parameter. If no data is received, the system determines that zero data records are received. This meets the conditions specified in the alert rule, and an alert notification is sent.

  • Anomaly 2: abnormal period-over-period comparisons

    User C wants to use the alerting feature to monitor the CPU usage of a node. When User C creates an alert rule in Application Monitoring, User C specifies that an alert is triggered when N is 3, and the average CPU usage of the node decreases by 100% compared with that in the previous monitoring period. If the CPU fails in the last N minutes, the system cannot obtain α, which is used to calculate the period-over-period comparison result. In this case, the comparison result does not exist. No alert notification is sent. To resolve this issue, User C can select Fill 1 for the Alarm Data Revision Policy parameter. If α is not obtained, the system determines that the period-over-period comparison result is a decrease of 100%. This meets the conditions specified in the alert rule, and an alert notification is sent.

What to do next

You can query and delete alert records in alert management.