全部產品
Search
文件中心

:接入CloudWatch警示

更新時間:Jan 14, 2025

AWS CloudWatch是用於即時監控AWS資源以及運行在AWS上的應用的一個服務。CloudWatch支援通過AWS SNS服務發送警示訊息,您只需要在AWS SNS中配置Log Service開放警示介面的URL,即可將CloudWatch警示訊息發送給Log Service,由Log Service警示系統完成警示降噪、通知等處理。

前提條件

已建立協議CloudWatch的開放警示應用。具體操作,請參見配置開放警示對外介面

CloudWatch配置

  1. 登入AWS管理主控台。
  2. 建立SNS主題。
    您需在Amazon SNS控制台上配置如下必填參數。具體操作,請參見Creating an Amazon SNS topic
    參數說明
    Type主題的類型,選擇Standard
    Name主題的名稱。
  3. 訂閱SNS主題。
    您需在Amazon SNS控制台上配置如下必填參數。具體操作,請參見Subscribing to an Amazon SNS topic
    參數說明
    Topic ARN您在步驟2中所建立的主題的ARN。
    Protocol協議,選擇HTTP
    Endpoint配置為您在Log Service中建立開放警示服務和應用後產生的介面資訊(完整URL)。如何擷取,請參見擷取介面資訊
    Enable raw message delivery選中Enable raw message delivery複選框。
    配置完成後,訂閱處於Pending confirmation狀態。此時AWS SNS將給Log Service發送一條訂閱確認訊息,Log Service收到該訊息後會自動訪問訊息中的訂閱確認連結。訪問成功後,訂閱處於Confirmed狀態,表示訂閱成功。
    說明 如果未訂閱成功,您可以選中目標訂閱後,單擊Request confirmation,重新發送一條訂閱確認訊息。如果仍未成功,您可以在Log Service的警示排障中心查看錯誤記錄檔。
    訂閱SNS主題
  4. 選擇您要接入Log Service的警示並添加通知方式。
    您需在CloudWatch控制台上的目標警示編輯頁面添加兩個通知方式,相關說明如下。具體操作,請參見To edit an alarm
    • Alarm state trigger:選擇觸發警示的狀態。
      • 其中一個通知方式的Alarm state trigger配置為In alarmInsufficient data,表示警示處於對應的狀態時,系統發送警示通知。
      • 另一個通知方式的Alarm state trigger配置為OK,表示警示恢複時,系統發送一條恢複通知。
    • Select an SNS topic:選擇Select an existing SNS topic
    • Send a notification to…:選擇您在步驟2中建立的主題。
    警示

CloudWatch警示訊息

CloudWatch警示分為靜態閾值警示和異常檢測警示。靜態閾值警示訊息和異常檢測警示訊息的Trigger欄位的值不同。更多資訊,請參見CloudWatch::Alarm屬性說明
  • 靜態閾值警示訊息中的Trigger欄位值包含MetricNameDimensions等欄位。
  • 異常檢測警示訊息值的Trigger欄位值包含Metrics等欄位,其中Metrics欄位值是一個指標資料查詢列表。
  • 靜態閾值警示訊息
    {
        "AlarmName": "test-alert",
        "AlarmDescription": "this is a test alert",
        "AWSAccountId": "123456",
        "NewStateValue": "ALARM",
        "NewStateReason": "Threshold Crossed: 1 out of the last 1 datapoints [1.0 (04/08/21 03:06:00)] was greater than or equal to the threshold (1.0) (minimum 1 datapoint for OK -> ALARM transition).",
        "StateChangeTime": "2021-08-04T03:10:10.215+0000",
        "Region": "US East (Ohio)",
        "AlarmArn": "arn:aws:cloudwatch:us-east-2:123456:alarm:test-alert",
        "OldStateValue": "OK",
        "Trigger":
        {
            "MetricName": "NumberOfMessagesPublished",
            "Namespace": "AWS/SNS",
            "StatisticType": "Statistic",
            "Statistic": "SUM",
            "Unit": null,
            "Dimensions":
            [
                {
                    "value": "my-topic",
                    "name": "TopicName"
                }
            ],
            "Period": 60,
            "EvaluationPeriods": 1,
            "ComparisonOperator": "GreaterThanOrEqualToThreshold",
            "Threshold": 1.0,
            "TreatMissingData": "- TreatMissingData:                    missing",
            "EvaluateLowSampleCountPercentile": ""
        }
    }
  • 異常檢測的警示訊息
    {
        "AlarmName": "cpu alrm",
        "AlarmDescription": "this is a cpu alarm",
        "AWSAccountId": "123456",
        "NewStateValue": "INSUFFICIENT_DATA",
        "NewStateReason": "Threshold Crossed: no datapoints were received for 2 periods and 2 missing datapoints were treated as [Breaching].",
        "StateChangeTime": "2021-08-05T08:38:47.104+0000",
        "Region": "US East (Ohio)",
        "AlarmArn": "arn:aws:cloudwatch:us-east-2:123456:alarm:cpu alrm",
        "OldStateValue": "OK",
        "Trigger":
        {
            "Period": 60,
            "EvaluationPeriods": 2,
            "ComparisonOperator": "GreaterThanUpperThreshold",
            "ThresholdMetricId": "ad1",
            "TreatMissingData": "- TreatMissingData:                    breaching",
            "EvaluateLowSampleCountPercentile": "",
            "Metrics":
            [
                {
                    "Id": "m1",
                    "MetricStat":
                    {
                        "Metric":
                        {
                            "Dimensions":
                            [
                                {
                                    "value": "i-1a2b3c4d",
                                    "name": "InstanceId"
                                }
                            ],
                            "MetricName": "CPUUtilization",
                            "Namespace": "AWS/EC2"
                        },
                        "Period": 60,
                        "Stat": "Average"
                    },
                    "ReturnData": true
                },
                {
                    "Expression": "ANOMALY_DETECTION_BAND(m1, 0.1)",
                    "Id": "ad1",
                    "Label": "CPUUtilization (預期)",
                    "ReturnData": true
                }
            ]
        }
    }

警示訊息映射

CloudWatch警示被接入到Log Service後,映射為Log Service警示內容。樣本如下:

  • 靜態閾值警示訊息
    {
        "aliuid": "aliuid1",
        "alert_instance_id": "{自動產生}",
        "alert_id": "CloudWatch_test-alert",
        "alert_type": "sls_pub",
        "alert_name": "test-alert",
        "region": "{警示中心Project所在地區}",
        "project": "{警示中心所屬的Project}",
        "project_id": 0,
        "next_eval_interval": 60,
        "alert_time": 1628046610,
        "fire_time": 1628046610,
        "fire_results": null,
        "fire_results_count": 0,
        "resolve_time": 0,
        "status": "firing",
        "results": null,
        "labels":
        {
            "TopicName": "my-topic",
            "__comparison_operator__": "GreaterThanOrEqualToThreshold",
            "__statistic__": "SUM",
            "__statistic_type__": "Statistic",
            "__threshold__": "1",
            "metric_name": "NumberOfMessagesPublished"
        },
        "annotations":
        {
            "__alarm_arn__": "arn:aws:cloudwatch:us-east-2:123456:alarm:test-alert",
            "__aws_accountId__": "123456",
            "__aws_region__": "US East (Ohio)",
            "__cloud_watch_alert_type__": "StaticThreshold",
            "__config_app__": "sls_pub_alert",
            "__pub_alert_app__": "{開放警示應用ID}",
            "__pub_alert_protocol__": "cloud_watch",
            "__pub_alert_region__": "{接收警示訊息的網路介面對應的地區}",
            "__pub_alert_service__": "{開放警示服務ID}",
            "desc": "this is a test alert",
            "title": "Threshold Crossed: 1 out of the last 1 datapoints [1.0 (04/08/21 03:06:00)] was greater than or equal to the threshold (1.0) (minimum 1 datapoint for OK -> ALARM transition)."
        },
        "severity": 10,
        "policy":
        {
            "alert_policy_id": "{開放警示應用中配置的警示策略ID}",
            "action_policy_id": "{開放警示應用中配置的行動策略ID}",
            "use_default": false,
            "repeat_interval": "{開放警示應用中配置的重複等待時間}"
        },
        "template": null,
        "drill_down_query": "https://us-east-2.console.aws.amazon.com/cloudwatch/home?region=us-east-2#alarmsV2:alarm/test-alert"
    }
  • 異常檢測警示訊息
    {
        "aliuid": "aliuid1",
        "alert_instance_id": "{自動產生}",
        "alert_id": "CloudWatch_cpu alrm",
        "alert_type": "sls_pub",
        "alert_name": "cpu alrm",
        "region": "{警示中心Project所在地區}",
        "project": "{警示中心所屬的Project}",
        "project_id": 0,
        "next_eval_interval": 120,
        "alert_time": 1628152727,
        "fire_time": 1628152727,
        "fire_results": null,
        "fire_results_count": 0,
        "resolve_time": 0,
        "status": "firing",
        "results": null,
        "labels":
        {
            "__comparison_operator__": "GreaterThanUpperThreshold",
            "__threshold_metricId__": "ad1"
        },
        "annotations":
        {
            "__alarm_arn__": "arn:aws:cloudwatch:us-east-2:123456:alarm:cpu alrm",
            "__aws_accountId__": "123456",
            "__aws_region__": "US East (Ohio)",
            "__cloud_watch_alert_type__": "AnomalyDetection",
            "__config_app__": "sls_pub_alert",
            "__pub_alert_app__": "{開放警示應用ID}",
            "__pub_alert_protocol__": "cloud_watch",
            "__pub_alert_region__": "{接收警示訊息的網路介面對應的地區}",
            "__pub_alert_service__": "{開放警示服務ID}",
            "desc": "this is a cpu alarm",
            "title": "Threshold Crossed: no datapoints were received for 2 periods and 2 missing datapoints were treated as [Breaching]."
        },
        "severity": 8,
        "policy":
        {
            "alert_policy_id": "{開放警示應用中配置的警示策略ID}",
            "action_policy_id": "{開放警示應用中配置的行動策略ID}",
            "use_default": false,
            "repeat_interval": "{開放警示應用中配置的重複等待時間}"
        },
        "template": null,
        "drill_down_query": "https://us-east-2.console.aws.amazon.com/cloudwatch/home?region=us-east-2#alarmsV2:alarm/cpu%20alrm"
    }

Log Service警示訊息內容與CloudWatch警示訊息內容的映射關係如下:

Log Service欄位CloudWatch欄位說明
aliuid用於接入警示的開放警示應用所屬的阿里雲帳號ID。
alert_id警示監控規則的ID。

alert_id欄位值為CloudWatch_{$alert_name},其中{$alert_name}為警示監控規則的名稱。

alert_type警示類型,固定為sls_pub。
alert_nameAlarmName警示監控規則的名稱。
statusNewStateValue警示狀態。
  • 如果CloudWatch警示訊息中NewStateValue欄位的值為ALARM或INSUFFICIENT_DATA,則status欄位的值為firing。
  • 如果CloudWatch警示訊息中NewStateValue欄位的值為OK,則status欄位的值為resolved。
next_eval_interval
  • Period
  • EvaluationPeriods
警示評估間隔時間,為CloudWatch警示訊息中的Period欄位值和EvaluationPeriods欄位值的乘積。
alert_timeStateChangeTime警示觸發時間。
fire_timeStateChangeTime警示首次觸發時間。
resolve_timeStateChangeTime警示恢復。
  • 如果status欄位的值為firing,則resolve_time的值為0。
  • 如果status欄位的值為resolved,則resolve_time的值為CloudWatch警示訊息中StateChangeTime欄位的值。
labels標籤資訊。
  • 靜態閾值警示訊息
    • 將如下欄位和欄位值添加到labels欄位中,且將欄位名重新命名,詳細說明如下:
      • ComparisonOperator重新命名為__comparison_operator__
      • MetricName重新命名為__metric_name__
      • StatisticType重新命名為__statistic_type__
      • Statistic重新命名為__statistic__
      • Threshold重新命名為__threshold__
    • Dimensions欄位中每個name欄位的值作為欄位,每個value欄位的值作為欄位值,添加到labels欄位中。
  • 異常檢測警示訊息
    將如下欄位和欄位值添加到labels欄位中,且將欄位名重新命名,詳細說明如下:
    • ComparisonOperator重新命名為__comparison_operator__
    • ThresholdMetricId重新命名為__threshold_metricId__
annotations標註資訊,Log Service的annotations欄位中將加入以下欄位:
  • desc:警示內容描述,對應CloudWatch警示訊息中的NewStateReason欄位的值。
  • title:警示訊息的標題,對應CloudWatch警示訊息中的AlarmDescription欄位的值。
  • __cloud_watch_alert_type__:CloudWatch的警示類型。
    • 如果是靜態閾值警示,欄位值為StaticThreshold。
    • 如果是異常檢測警示,欄位值為AnomalyDetection。
  • trigger欄位外所有未被使用的欄位都會被加入到annotations欄位中。

    欄位將被重新命名,命名方式為在欄位名前後加上兩個底線(__),小寫形式。由多個單詞構成的欄位名,按照單詞拆分,各個單詞之間加上底線(_)。例如AlarmArn欄位重新命名為__alarm_arn__

severityNewStateValue警示嚴重度。
  • 如果CloudWatch警示訊息中NewStateValue欄位的值為ALARM,則severity欄位的值為10,即嚴重。
  • 如果CloudWatch警示訊息中NewStateValue欄位的值為INSUFFICIENT_DATA,則severity欄位的值為8,即高。
  • 如果CloudWatch警示訊息中的NewStateValue欄位的值為OK,則severity欄位的值將由CloudWatch警示訊息中OldStateValue欄位的值決定。
policy您在開放警示應用中配置的警示策略。更多資訊,請參見Policy結構
project警示中心所屬的Project。更多資訊,請參見專案(Project)
drill_down_query對應CloudWatch警示的URL地址。