All Products
Search
Document Center

Managed Service for Prometheus:Create an alert rule for a Prometheus instance

Last Updated:Nov 14, 2024

Alibaba Cloud Prometheus allows you to specify a condition in an alert rule to monitor a metric. If the condition is met, alert events are generated. You can configure a notification policy to send alert notifications by text message, email, phone call, DingTalk chatbot, WeCom chatbot, or webhook.

Prerequisites

A Prometheus instance is created in Managed Service for Prometheus. For more information, see the following topics:

Go to the Create Prometheus Alert Rule page

  1. Log on to the Managed Service for Prometheus console.

  2. In the left-side navigation pane, click View Alert Rules.

  3. On the Prometheus Alert Rules page, click Create Prometheus Alert Rule.

Use a preset metric to create an alert rule

ARMS provides various preset metrics. You can select a preset metric and configure an alert rule for the metric.

  1. On the Create Prometheus Alert Rule page, configure the parameters. The following table describes the parameters.

    Parameter

    Description

    Example

    Alert Rule Name

    Enter the name of the alert rule.

    Production cluster - container CPU utilization alert

    Check Type

    Select Static Threshold.

    Static Threshold

    Prometheus Instance

    Select the Prometheus instance.

    Production cluster

    Alert Contact Group

    Select an alert contact group.

    The alert contact groups that are supported by a Prometheus instance vary based on the type of the Prometheus instance.

    Kubernetes load

    Alert Metric

    Select a metric. Different alert contact groups provide different metrics.

    Container CPU utilization

    Alert Condition

    Specify the condition based on which alert events are generated.

    If the CPU utilization of the container is greater than 80%, an alert event is generated.

    Filter Conditions

    Specify the applicable scope of the alert rule. If a resource meets both the filter condition and the alert condition, an alert event is generated.

    The following types of filter conditions are supported:

    • Traverse: The alert rule applies to all resources in the current Prometheus instance. By default, Traverse is selected.

    • Equal: If you select this filter condition, you must enter a resource name. The alert rule applies only to the specified resource. You cannot specify multiple resources at the same time.

    • Not Equal: If you select this filter condition, you must enter a resource name. The alert rule applies to resources other than the specified resource. You cannot specify multiple resources at the same time.

    • Regex match: If you select this filter condition, you must enter a regular expression to match resource names. The alert rule that you create by using the template applies to all resources that match the regular expression.

    • Regex not match: If you select this filter condition, you must enter a regular expression to match resource names. The alert rule applies to resources that do not match the regular expression.

    Note

    After you set the filter conditions, the Data Preview section appears.

    Traverse

    Data Preview

    The Data Preview displays the PromQL statement that corresponds to the alert condition. The section also displays the values of the specified metric in a time series graph.

    By default, only the real-time values of one resource are displayed. You can specify filter conditions to view the metric values of different resources in different time ranges.

    Note
    • The threshold in the time series graph is represented by a red line. The part of the curve that meets the alert condition is displayed in dark red, and the part of the curve that does not meet the alert condition is displayed in blue.

    • You can move the pointer over the curve to view resource details at a specific point in time.

    • You can also select a time period on the time series curve to view the time series curve of the selected time period.

    None

    Duration

    • If the alert condition is met, an alert event is generated: If a data point reaches the threshold, an alert event is generated.

    • If the alert condition is continuously met for N minutes, an alert event is generated: An alert event is generated only if the duration for which the threshold is reached is greater than or equal to N minutes.

    1

    Alert Level

    Specify the alert level. Default value: Default. Valid values: Default, P4, P3, P2, and P1. Default indicates the lowest severity level, while P1 indicates the highest severity level.

    Default

    Alert Message

    Specify the alert message that you want to send to the end users. You can specify custom variables in the alert message based on the Go template syntax.

    Namespace: {{$labels.namespace}} / Pod: {{$labels.pod_name}} / Container: {{$labels.container}} CPU utilization: {{$labels.metrics_params_opt_label_value}} {{$labels.metrics_params_value}}%. Current value: {{ printf "%.2f" $value }}%

    Alert Notification

    • Simple Mode: You need to set the Notification Objects, Notification Period, and Whether to Resend Notifications.

    • Standard Mode:

      • Do Not Specify Notification Policy: If you select this option, you can create a notification policy on the Notification Policy page after you create the alert rule. On the Notification Policy page, you can specify match rules and match conditions. For example, you can specify an alert rule name as the match condition. When the alert rule is triggered, an alert event is generated and an alert notification is sent to the contacts or contact groups that are specified in the notification policy. For more information, see Create and manage a notification policy.

      • You can also select a notification policy from the drop-down list. ARMS automatically adds a match rule to the selected notification policy and specifies the ID of the alert rule as the match condition. The name of the alert rule is displayed on the Notification Policy page. This way, the alert events that are generated based on the alert rule can be matched by the selected notification policy.

      Important

      After you select a notification policy, the alert events that are generated based on the alert rule can be matched by the notification policy and alerts can be generated. The alert events may also be matched by other notification policies that use fuzzy match, and alerts may be generated. One or more alert events can be matched by one or more notification policies.

    Do Not Specify Notification Policy

    Advanced Settings

    Alert Check Cycle

    An alert rule is triggered every N minutes to check whether the alert conditions are met. Default value: 1. Minimum value: 1.

    1

    Check after the data is complete

    • Yes

    • No

    Yes

    Tags

    Specify tags for the alert rule. The specified tags can be used to match notification policies.

    None

    Annotations

    Specify annotations for the alert rule.

    None

  2. Click Save. On the Prometheus Alert Rules page, check the status of the alert rule.

    If Automatic Interruption appears in the Status column, modify the alert rule as prompted and click Start in the Actions column. In the message that appears, click OK. If the issue persists after you apply the preceding solution, contact technical support (DingTalk ID: d9j_rg9e4062f).

    An alert rule may be automatically interrupted due to the following reasons:

    • The number of results queried by the alert rule exceeds 1,500.

    • No notification object is configured.

    • The Prometheus instance is uninstalled or unavailable.

Use a custom PromQL statement to create an alert rule

To monitor a metric other than the preset metrics, you can use a custom PromQL statement to create an alert rule.

  1. On the Create Prometheus Alert Rule page, configure the parameters. The following table describes the parameters.

    Parameter

    Description

    Example

    Alert Rule Name

    Enter the name of the alert rule.

    Pod CPU utilization exceeds 8%

    Check Type

    Select Custom PromQL.

    Custom PromQL

    Prometheus Instance

    Select the Prometheus instance.

    None

    Reference Alert Contact Group

    Select an alert contact group.

    The alert contact groups that are supported by a Prometheus instance vary based on the type of the Prometheus instance.

    Kubernetes load

    Reference Metrics

    Optional. The Reference Metrics drop-down list displays common metrics. After you select a metric, the PromQL statement of the metric is displayed in the Custom PromQL Statements field. You can modify the statement based on your business requirements.

    The values in the Reference Metrics drop-down list vary based on the type of the Prometheus instance.

    Pod disk usage alert

    Custom PromQL Statements

    Enter a PromQL statement.

    Namespace: {{$labels.namespace}}/Pod: {{$labels.pod_name}} / The utilization of the {{$labels.device}} disk exceeds 90%. Current value: {{ printf "%.2f" $value }}%max(container_fs_usage_bytes{pod!="", namespace!="arms-prom",namespace!="monitoring"}) by (pod_name, namespace, device)/max(container_fs_limit_bytes{pod!=""}) by (pod_name,namespace, device) * 100 > 90

    Data Preview

    The Data Preview displays the PromQL statement that corresponds to the alert condition. The section also displays the values of the specified metric in a time series graph.

    By default, only the real-time values of one resource are displayed. You can specify filter conditions to view the metric values of different resources in different time ranges.

    Note
    • You can move the pointer over the curve to view resource details at a specific point in time.

    • You can also select a time period on the time series curve to view the time series curve of the selected time period.

    None

    Duration

    • If the alert condition is met, an alert event is generated: If a data point reaches the threshold, an alert event is generated.

    • If the alert condition is continuously met for N minutes, an alert event is generated: An alert event is generated only if the duration for which the threshold is reached is greater than or equal to N minutes.

    1

    Alert Level

    Specify the alert level. Default value: Default. Valid values: Default, P4, P3, P2, and P1. Default indicates the lowest severity level, while P1 indicates the highest severity level.

    Default

    Alert Message

    Specify the alert message that you want to send to the end users. You can specify custom variables in the alert message based on the Go template syntax.

    Namespace: {{$labels.namespace}} / Pod: {{$labels.pod_name}} / The utilization of the {{$labels.device}} disk exceeds 90%. Current value: {{ printf "%.2f" $value }}%

    Alert Notification

    • Simple Mode: You need to set the Notification Objects, Notification Period, and Whether to Resend Notifications.

    • Standard Mode:

      • Do Not Specify Notification Policy: If you select this option, you can create a notification policy on the Notification Policy page after you create the alert rule. On the Notification Policy page, you can specify match rules and match conditions. For example, you can specify an alert rule name as the match condition. When the alert rule is triggered, an alert event is generated and an alert notification is sent to the contacts or contact groups that are specified in the notification policy. For more information, see Create and manage a notification policy.

      • You can also select a notification policy from the drop-down list. ARMS automatically adds a match rule to the selected notification policy and specifies the ID of the alert rule as the match condition. The name of the alert rule is displayed on the Notification Policy page. This way, the alert events that are generated based on the alert rule can be matched by the selected notification policy.

      Important

      After you select a notification policy, the alert events that are generated based on the alert rule can be matched by the notification policy and alerts can be generated. The alert events may also be matched by other notification policies that use fuzzy match, and alerts may be generated. One or more alert events can be matched by one or more notification policies.

    Do Not Specify Notification Policy

    Advanced Settings

    Alert Check Cycle

    An alert rule is triggered every N minutes to check whether the alert conditions are met. Default value: 1. Minimum value: 1.

    1

    Check after the data is complete

    • Yes

    • No

    Yes

    Tags

    Specify tags for the alert rule. The specified tags can be used to match notification policies.

    None

    Annotations

    Specify annotations for the alert rule.

    None

  2. Click Save. On the Prometheus page, check the status of the alert rule.

    If Automatic Interruption appears in the Status column, modify the alert rule as prompted and click Start in the Actions column. In the message that appears, click OK. If the issue persists after you apply the preceding solution, contact technical support (DingTalk ID: d9j_rg9e4062f).

    An alert rule may be automatically interrupted due to the following reasons:

    • The number of results queried by the alert rule exceeds 1,500.

    • No notification object is configured.

    • The Prometheus instance is uninstalled or unavailable.

Manage an alert rule

  • For alert rules created on the View Alert Rules page in the Managed Service for Prometheus console, including static threshold and custom PromQL rules, you can edit, delete, copy, start, and stop them, and view historical alert events.

  • For alert rules generated in the console of other Alibaba Cloud services, you can view the historical alert events and go back to the alert rule list of the cloud services.