用途
當包含警示觸發器的模板建立執行後,該執行初始為等待中狀態。如果警示觸發器中設定的監控項達到警示閾值,執行狀態則切換為運行中,並立即開始執行模板中定義後續任務,後續任務一般為自動解除警示的相關操作。應用情境舉例,如當ECS執行個體的cpu使用率超過90%時,觸發警示,自動執行重啟該執行個體的操作。
重要
在警示觸發器中,可設定監控項有兩大類,分別是預裝外掛程式採集的和ECS原生內建的,關於如何區分可參見監控項說明。如需對CloudMonitor外掛程式類採集的監控項進行監控,請您先為待監控執行個體安裝外掛程式,否則警示無法觸發。外掛程式安裝方法:在CloudMonitor控制台的主機監控中選擇待監控執行個體,單擊點擊安裝即可。
限制
觸發器有如下限制:
一個模板只允許有一個觸發器動作。
觸發器動作的任務必須定義在模板Tasks中的第一個任務。
被嵌套的模板(子模板)中不允許有觸發器動作。
文法
YAML格式
Tasks: - Name: taskName1 # 任務名稱 Action: 'ACS::AlarmTrigger' Properties: Namespace: 'acs_ecs_dashboard' # 必填,產品的資料命名空間。比如ecs產品。選擇性參數通過查詢DescribeMetricMetaList介面獲得。 MetricName: 'cpu_total' # 必填,監控項名稱。比如當前消耗的總CPU百分比。選擇性參數通過查詢DescribeMetricMetaList介面獲得。 Statistics: 'Average' # 警示統計方法。如Average為統計某時間段平均值。選擇性參數通過查詢DescribeMetricMetaList介面獲得。 ComparisonOperator: 'GreaterThanThreshold' # 必填,閾值比較符。可選擇比較類型有,GreaterThanOrEqualToThreshold:大於等於、GreaterThanThreshold:大於、LessThanOrEqualToThreshold:小於等於、LessThanThreshold:小於、NotEqualToThreshold:不等、GreaterThanYesterday:同比昨天時間上漲、LessThanYesterday:同比昨天時間下降、GreaterThanLastWeek:同比上周同一時間上漲、LessThanLastWeek:同比上周同一時間下降、GreaterThanLastPeriod:環比上周期上漲、LessThanLastPeriod:環比上周期下降。 Threshold: '90' # 警示閾值,比如cpu90%的總使用率。 Resources: '[{"resource":"_ALL"}]' # 必填,需要警示的資源。如[{"resource":"_ALL"}]為表示帳號下所有資源,如指定具體執行個體為[{"instanceId":"i-bp123467zxcvb"}];如指定某執行個體上的磁碟分割[{"instanceId":"i-bp123467zxcvb","device":"/dev/vda1"}];指定執行個體上的多個磁碟分割,[{"instanceId":"i-bp123467zxcvb","device":"/dev/vda1"},{"instanceId":"i-bp123467zxcvb","device":"/dev/vdb1"}] Times: 1 # 警示重複次數。 Interval: 60 # 警示規則的探測周期,單位為秒。預設為監控項的最小頻率60s。 SilenceTime: 3600 # 通道沉默周期,單位為秒。預設86400秒(即1天)。監控資料持續超過警示規則閾值時,每個沉默周期內只發送1次警示通知。 Outputs: paraName1: Type: String ValueSelector: .key # 此處的.key表示擷取json訊息體中的某個key的值,後附json樣式。具體即.instanceId會得到"i-abc12345zxcv",警示觸發的事件對應訊息體Json樣式 { "curLevel": "INFO", "Minimum": "34.00", "Maximum": "95.00", "instanceId": "i-abc12345zxcv", "Average": "85.00", "ruleName": "alarmtrigger-1390000****-exec-2130c0c073fa487098d3", "userId": "1390000****", "timestamp": "1598349720000", "executionId": "exec-2130c0c073fa487098d3", "sourceAliUid": "1390000****" }
JSON格式(請參考YAML注釋說明)
{ "Tasks": [ { "Name": "taskName1", "Action": "ACS::AlarmTrigger", "Properties": { "Namespace": "acs_ecs_dashboard", "MetricName": "cpu_total", "Statistics": "Average", "ComparisonOperator": "GreaterThanThreshold", "Threshold": "90", "Resources": "[{\"resource\":\"_ALL\"}]", "Times": 1, "Interval": 60, "SilenceTime": 3600 }, "Outputs": { "paraName1": { "Type": "String", "ValueSelector": ".key" } } } ] }
樣本
在1分鐘周期內,若被監控ECS執行個體的CPU總使用率超過閾值,則執行個體自動重啟。
YAML格式
FormatVersion: OOS-2019-06-01 Description: en: Reboot ECS instance with specified tag when its CPU utilization exceeded threshold.The selected instance must already have the Cloud Monitor agent installed. zh-cn: 按tag在ECS執行個體CPU利用率超過閾值時執行執行個體重啟。所選執行個體必須已安裝CloudMonitorAgent。 name-en: ACS-ECS-RebootInstanceAtHighCpuByTags name-zh-cn: 按tag在ECS執行個體CPU利用率超過閾值時執行執行個體重啟 categories: - alarm-trigger Parameters: tags: Type: Json Description: en: The tags to select ECS instances. zh-cn: 執行個體的標籤。 AssociationProperty: Tags threshold: Type: Number Description: en: The CPU utilization threshold. zh-cn: CPU利用率閾值。 silenceTime: Type: Number Description: en: The silence time of alarm (seconds). zh-cn: 警示通道沉默周期(秒)。 Default: 60 OOSAssumeRole: Description: en: The RAM role to be assumed by OOS. zh-cn: OOS扮演的RAM角色。 Type: String Default: OOSServiceRole RamRole: '{{ OOSAssumeRole }}' Tasks: - Name: alarmTrigger Action: 'ACS::AlarmTrigger' Description: en: Set the CPU utilization alarm for ECS instance. zh-cn: 對ECS執行個體的CPU使用率進行監控。 Properties: Namespace: acs_ecs_dashboard MetricName: cpu_total Statistics: Average ComparisonOperator: GreaterThanThreshold Threshold: '{{threshold}}' Times: 1 SilenceTime: '{{ silenceTime }}' Period: 60 Interval: 60 Outputs: InstanceId: Type: String ValueSelector: .instanceId - Name: CheckForInstances Action: 'ACS::CheckFor' Description: en: Check ECS instance has specified tag. zh-cn: 檢查ECS執行個體有指定的tag。 OnError: 'ACS::END' Properties: Service: ECS API: DescribeInstances Parameters: Tags: '{{ tags }}' InstanceIds: '["{{ alarmTrigger.instanceId }}"]' PropertySelector: TotalCount DesiredValues: - 1 - Name: RebootInstance Action: 'ACS::ECS::RebootInstance' Description: en: Restarts the ECS instances. zh-cn: 重啟執行個體。 Properties: instanceId: '{{ alarmTrigger.instanceId }}'
JSON格式
{
"FormatVersion": "OOS-2019-06-01",
"Description": {
"en": "Reboot ECS instance with specified tag when its CPU utilization exceeded threshold.The selected instance must already have the Cloud Monitor agent installed.",
"zh-cn": "按tag在ECS執行個體CPU利用率超過閾值時執行執行個體重啟。所選執行個體必須已安裝CloudMonitorAgent。",
"name-en": "ACS-ECS-RebootInstanceAtHighCpuByTags",
"name-zh-cn": "按tag在ECS執行個體CPU利用率超過閾值時執行執行個體重啟",
"categories": [
"alarm-trigger"
]
},
"Parameters": {
"tags": {
"Type": "Json",
"Description": {
"en": "The tags to select ECS instances.",
"zh-cn": "執行個體的標籤。"
},
"AssociationProperty": "Tags"
},
"threshold": {
"Type": "Number",
"Description": {
"en": "The CPU utilization threshold.",
"zh-cn": "CPU利用率閾值。"
}
},
"silenceTime": {
"Type": "Number",
"Description": {
"en": "The silence time of alarm (seconds).",
"zh-cn": "警示通道沉默周期(秒)。"
},
"Default": 60
},
"OOSAssumeRole": {
"Description": {
"en": "The RAM role to be assumed by OOS.",
"zh-cn": "OOS扮演的RAM角色。"
},
"Type": "String",
"Default": "OOSServiceRole"
}
},
"RamRole": "{{ OOSAssumeRole }}",
"Tasks": [
{
"Name": "alarmTrigger",
"Action": "ACS::AlarmTrigger",
"Description": {
"en": "Set the CPU utilization alarm for ECS instance.",
"zh-cn": "對ECS執行個體的CPU使用率進行監控。"
},
"Properties": {
"Namespace": "acs_ecs_dashboard",
"MetricName": "cpu_total",
"Statistics": "Average",
"ComparisonOperator": "GreaterThanThreshold",
"Threshold": "{{threshold}}",
"Times": 1,
"SilenceTime": "{{ silenceTime }}",
"Period": 60,
"Interval": 60
},
"Outputs": {
"InstanceId": {
"Type": "String",
"ValueSelector": ".instanceId"
}
}
},
{
"Name": "CheckForInstances",
"Action": "ACS::CheckFor",
"Description": {
"en": "Check ECS instance has specified tag.",
"zh-cn": "檢查ECS執行個體有指定的tag。"
},
"OnError": "ACS::END",
"Properties": {
"Service": "ECS",
"API": "DescribeInstances",
"Parameters": {
"Tags": "{{ tags }}",
"InstanceIds": "[\"{{ alarmTrigger.instanceId }}\"]"
},
"PropertySelector": "TotalCount",
"DesiredValues": [
1
]
}
},
{
"Name": "RebootInstance",
"Action": "ACS::ECS::RebootInstance",
"Description": {
"en": "Restarts the ECS instances.",
"zh-cn": "重啟執行個體。"
},
"Properties": {
"instanceId": "{{ alarmTrigger.instanceId }}"
}
}
]
}