AWS CloudWatch是用於即時監控AWS資源以及運行在AWS上的應用的一個服務。CloudWatch支援通過AWS SNS服務發送警示訊息,您只需要在AWS SNS中配置Log Service開放警示介面的URL,即可將CloudWatch警示訊息發送給Log Service,由Log Service警示系統完成警示降噪、通知等處理。
前提條件
已建立協議為CloudWatch的開放警示應用。具體操作,請參見配置開放警示對外介面。CloudWatch配置
- 登入AWS管理主控台。
- 建立SNS主題。您需在Amazon SNS控制台上配置如下必填參數。具體操作,請參見Creating an Amazon SNS topic。
參數 說明 Type 主題的類型,選擇Standard。 Name 主題的名稱。 - 訂閱SNS主題。您需在Amazon SNS控制台上配置如下必填參數。具體操作,請參見Subscribing to an Amazon SNS topic。
參數 說明 Topic ARN 您在步驟2中所建立的主題的ARN。 Protocol 協議,選擇HTTP。 Endpoint 配置為您在Log Service中建立開放警示服務和應用後產生的介面資訊(完整URL)。如何擷取,請參見擷取介面資訊。 Enable raw message delivery 選中Enable raw message delivery複選框。 配置完成後,訂閱處於Pending confirmation狀態。此時AWS SNS將給Log Service發送一條訂閱確認訊息,Log Service收到該訊息後會自動訪問訊息中的訂閱確認連結。訪問成功後,訂閱處於Confirmed狀態,表示訂閱成功。說明 如果未訂閱成功,您可以選中目標訂閱後,單擊Request confirmation,重新發送一條訂閱確認訊息。如果仍未成功,您可以在Log Service的警示排障中心查看錯誤記錄檔。
- 選擇您要接入Log Service的警示並添加通知方式。您需在CloudWatch控制台上的目標警示編輯頁面添加兩個通知方式,相關說明如下。具體操作,請參見To edit an alarm。

CloudWatch警示訊息
CloudWatch警示分為靜態閾值警示和異常檢測警示。靜態閾值警示訊息和異常檢測警示訊息的Trigger欄位的值不同。更多資訊,請參見CloudWatch::Alarm屬性說明。
- 靜態閾值警示訊息中的Trigger欄位值包含MetricName和Dimensions等欄位。
- 異常檢測警示訊息值的Trigger欄位值包含Metrics等欄位,其中Metrics欄位值是一個指標資料查詢列表。
- 靜態閾值警示訊息
{ "AlarmName": "test-alert", "AlarmDescription": "this is a test alert", "AWSAccountId": "123456", "NewStateValue": "ALARM", "NewStateReason": "Threshold Crossed: 1 out of the last 1 datapoints [1.0 (04/08/21 03:06:00)] was greater than or equal to the threshold (1.0) (minimum 1 datapoint for OK -> ALARM transition).", "StateChangeTime": "2021-08-04T03:10:10.215+0000", "Region": "US East (Ohio)", "AlarmArn": "arn:aws:cloudwatch:us-east-2:123456:alarm:test-alert", "OldStateValue": "OK", "Trigger": { "MetricName": "NumberOfMessagesPublished", "Namespace": "AWS/SNS", "StatisticType": "Statistic", "Statistic": "SUM", "Unit": null, "Dimensions": [ { "value": "my-topic", "name": "TopicName" } ], "Period": 60, "EvaluationPeriods": 1, "ComparisonOperator": "GreaterThanOrEqualToThreshold", "Threshold": 1.0, "TreatMissingData": "- TreatMissingData: missing", "EvaluateLowSampleCountPercentile": "" } } - 異常檢測的警示訊息
{ "AlarmName": "cpu alrm", "AlarmDescription": "this is a cpu alarm", "AWSAccountId": "123456", "NewStateValue": "INSUFFICIENT_DATA", "NewStateReason": "Threshold Crossed: no datapoints were received for 2 periods and 2 missing datapoints were treated as [Breaching].", "StateChangeTime": "2021-08-05T08:38:47.104+0000", "Region": "US East (Ohio)", "AlarmArn": "arn:aws:cloudwatch:us-east-2:123456:alarm:cpu alrm", "OldStateValue": "OK", "Trigger": { "Period": 60, "EvaluationPeriods": 2, "ComparisonOperator": "GreaterThanUpperThreshold", "ThresholdMetricId": "ad1", "TreatMissingData": "- TreatMissingData: breaching", "EvaluateLowSampleCountPercentile": "", "Metrics": [ { "Id": "m1", "MetricStat": { "Metric": { "Dimensions": [ { "value": "i-1a2b3c4d", "name": "InstanceId" } ], "MetricName": "CPUUtilization", "Namespace": "AWS/EC2" }, "Period": 60, "Stat": "Average" }, "ReturnData": true }, { "Expression": "ANOMALY_DETECTION_BAND(m1, 0.1)", "Id": "ad1", "Label": "CPUUtilization (預期)", "ReturnData": true } ] } }
警示訊息映射
CloudWatch警示被接入到Log Service後,映射為Log Service警示內容。樣本如下:
- 靜態閾值警示訊息
{ "aliuid": "aliuid1", "alert_instance_id": "{自動產生}", "alert_id": "CloudWatch_test-alert", "alert_type": "sls_pub", "alert_name": "test-alert", "region": "{警示中心Project所在地區}", "project": "{警示中心所屬的Project}", "project_id": 0, "next_eval_interval": 60, "alert_time": 1628046610, "fire_time": 1628046610, "fire_results": null, "fire_results_count": 0, "resolve_time": 0, "status": "firing", "results": null, "labels": { "TopicName": "my-topic", "__comparison_operator__": "GreaterThanOrEqualToThreshold", "__statistic__": "SUM", "__statistic_type__": "Statistic", "__threshold__": "1", "metric_name": "NumberOfMessagesPublished" }, "annotations": { "__alarm_arn__": "arn:aws:cloudwatch:us-east-2:123456:alarm:test-alert", "__aws_accountId__": "123456", "__aws_region__": "US East (Ohio)", "__cloud_watch_alert_type__": "StaticThreshold", "__config_app__": "sls_pub_alert", "__pub_alert_app__": "{開放警示應用ID}", "__pub_alert_protocol__": "cloud_watch", "__pub_alert_region__": "{接收警示訊息的網路介面對應的地區}", "__pub_alert_service__": "{開放警示服務ID}", "desc": "this is a test alert", "title": "Threshold Crossed: 1 out of the last 1 datapoints [1.0 (04/08/21 03:06:00)] was greater than or equal to the threshold (1.0) (minimum 1 datapoint for OK -> ALARM transition)." }, "severity": 10, "policy": { "alert_policy_id": "{開放警示應用中配置的警示策略ID}", "action_policy_id": "{開放警示應用中配置的行動策略ID}", "use_default": false, "repeat_interval": "{開放警示應用中配置的重複等待時間}" }, "template": null, "drill_down_query": "https://us-east-2.console.aws.amazon.com/cloudwatch/home?region=us-east-2#alarmsV2:alarm/test-alert" } - 異常檢測警示訊息
{ "aliuid": "aliuid1", "alert_instance_id": "{自動產生}", "alert_id": "CloudWatch_cpu alrm", "alert_type": "sls_pub", "alert_name": "cpu alrm", "region": "{警示中心Project所在地區}", "project": "{警示中心所屬的Project}", "project_id": 0, "next_eval_interval": 120, "alert_time": 1628152727, "fire_time": 1628152727, "fire_results": null, "fire_results_count": 0, "resolve_time": 0, "status": "firing", "results": null, "labels": { "__comparison_operator__": "GreaterThanUpperThreshold", "__threshold_metricId__": "ad1" }, "annotations": { "__alarm_arn__": "arn:aws:cloudwatch:us-east-2:123456:alarm:cpu alrm", "__aws_accountId__": "123456", "__aws_region__": "US East (Ohio)", "__cloud_watch_alert_type__": "AnomalyDetection", "__config_app__": "sls_pub_alert", "__pub_alert_app__": "{開放警示應用ID}", "__pub_alert_protocol__": "cloud_watch", "__pub_alert_region__": "{接收警示訊息的網路介面對應的地區}", "__pub_alert_service__": "{開放警示服務ID}", "desc": "this is a cpu alarm", "title": "Threshold Crossed: no datapoints were received for 2 periods and 2 missing datapoints were treated as [Breaching]." }, "severity": 8, "policy": { "alert_policy_id": "{開放警示應用中配置的警示策略ID}", "action_policy_id": "{開放警示應用中配置的行動策略ID}", "use_default": false, "repeat_interval": "{開放警示應用中配置的重複等待時間}" }, "template": null, "drill_down_query": "https://us-east-2.console.aws.amazon.com/cloudwatch/home?region=us-east-2#alarmsV2:alarm/cpu%20alrm" }
Log Service警示訊息內容與CloudWatch警示訊息內容的映射關係如下:
| Log Service欄位 | CloudWatch欄位 | 說明 |
| aliuid | 無 | 用於接入警示的開放警示應用所屬的阿里雲帳號ID。 |
| alert_id | 無 | 警示監控規則的ID。 alert_id欄位值為CloudWatch_{$alert_name},其中{$alert_name}為警示監控規則的名稱。 |
| alert_type | 無 | 警示類型,固定為sls_pub。 |
| alert_name | AlarmName | 警示監控規則的名稱。 |
| status | NewStateValue | 警示狀態。
|
| next_eval_interval |
| 警示評估間隔時間,為CloudWatch警示訊息中的Period欄位值和EvaluationPeriods欄位值的乘積。 |
| alert_time | StateChangeTime | 警示觸發時間。 |
| fire_time | StateChangeTime | 警示首次觸發時間。 |
| resolve_time | StateChangeTime | 警示恢復。
|
| labels | 無 | 標籤資訊。
|
| annotations | 無 | 標註資訊,Log Service的annotations欄位中將加入以下欄位:
|
| severity | NewStateValue | 警示嚴重度。
|
| policy | 無 | 您在開放警示應用中配置的警示策略。更多資訊,請參見Policy結構。 |
| project | 無 | 警示中心所屬的Project。更多資訊,請參見專案(Project)。 |
| drill_down_query | 無 | 對應CloudWatch警示的URL地址。 |