全部產品
Search
文件中心

CloudOps Orchestration Service:警示觸發器ACS::AlarmTrigger

更新時間:Jun 30, 2024

用途

當包含警示觸發器的模板建立執行後,該執行初始為等待中狀態。如果警示觸發器中設定的監控項達到警示閾值,執行狀態則切換為運行中,並立即開始執行模板中定義後續任務,後續任務一般為自動解除警示的相關操作。應用情境舉例,如當ECS執行個體的cpu使用率超過90%時,觸發警示,自動執行重啟該執行個體的操作。

重要

在警示觸發器中,可設定監控項有兩大類,分別是預裝外掛程式採集的和ECS原生內建的,關於如何區分可參見監控項說明。如需對CloudMonitor外掛程式類採集的監控項進行監控,請您先為待監控執行個體安裝外掛程式,否則警示無法觸發。外掛程式安裝方法:在CloudMonitor控制台的主機監控中選擇待監控執行個體,單擊點擊安裝即可。

限制

觸發器有如下限制:

  • 一個模板只允許有一個觸發器動作。

  • 觸發器動作的任務必須定義在模板Tasks中的第一個任務。

  • 被嵌套的模板(子模板)中不允許有觸發器動作。

文法

  • YAML格式

    Tasks:
    - Name: taskName1 # 任務名稱
      Action: 'ACS::AlarmTrigger'
      Properties:
          Namespace: 'acs_ecs_dashboard'  # 必填,產品的資料命名空間。比如ecs產品。選擇性參數通過查詢DescribeMetricMetaList介面獲得。
          MetricName: 'cpu_total'  #  必填,監控項名稱。比如當前消耗的總CPU百分比。選擇性參數通過查詢DescribeMetricMetaList介面獲得。
          Statistics: 'Average' # 警示統計方法。如Average為統計某時間段平均值。選擇性參數通過查詢DescribeMetricMetaList介面獲得。
          ComparisonOperator:  'GreaterThanThreshold' #   必填,閾值比較符。可選擇比較類型有,GreaterThanOrEqualToThreshold:大於等於、GreaterThanThreshold:大於、LessThanOrEqualToThreshold:小於等於、LessThanThreshold:小於、NotEqualToThreshold:不等、GreaterThanYesterday:同比昨天時間上漲、LessThanYesterday:同比昨天時間下降、GreaterThanLastWeek:同比上周同一時間上漲、LessThanLastWeek:同比上周同一時間下降、GreaterThanLastPeriod:環比上周期上漲、LessThanLastPeriod:環比上周期下降。
          Threshold: '90' # 警示閾值,比如cpu90%的總使用率。
          Resources: '[{"resource":"_ALL"}]'  #  必填,需要警示的資源。如[{"resource":"_ALL"}]為表示帳號下所有資源,如指定具體執行個體為[{"instanceId":"i-bp123467zxcvb"}];如指定某執行個體上的磁碟分割[{"instanceId":"i-bp123467zxcvb","device":"/dev/vda1"}];指定執行個體上的多個磁碟分割,[{"instanceId":"i-bp123467zxcvb","device":"/dev/vda1"},{"instanceId":"i-bp123467zxcvb","device":"/dev/vdb1"}]
          Times: 1 # 警示重複次數。
          Interval: 60 # 警示規則的探測周期,單位為秒。預設為監控項的最小頻率60s。
          SilenceTime: 3600   # 通道沉默周期,單位為秒。預設86400秒(即1天)。監控資料持續超過警示規則閾值時,每個沉默周期內只發送1次警示通知。
      Outputs:  
       paraName1:
           Type: String
           ValueSelector: .key # 此處的.key表示擷取json訊息體中的某個key的值,後附json樣式。具體即.instanceId會得到"i-abc12345zxcv",警示觸發的事件對應訊息體Json樣式 { "curLevel": "INFO", "Minimum": "34.00", "Maximum": "95.00", "instanceId": "i-abc12345zxcv", "Average": "85.00", "ruleName": "alarmtrigger-1390000****-exec-2130c0c073fa487098d3", "userId": "1390000****", "timestamp": "1598349720000", "executionId": "exec-2130c0c073fa487098d3", "sourceAliUid": "1390000****" }
  • JSON格式(請參考YAML注釋說明)

    {
      "Tasks": [
        {
          "Name": "taskName1",
          "Action": "ACS::AlarmTrigger",
          "Properties": {
            "Namespace": "acs_ecs_dashboard",
            "MetricName": "cpu_total",
            "Statistics": "Average",
            "ComparisonOperator": "GreaterThanThreshold",
            "Threshold": "90",
            "Resources": "[{\"resource\":\"_ALL\"}]",
            "Times": 1,
            "Interval": 60,
            "SilenceTime": 3600
          },
          "Outputs": {
            "paraName1": {
              "Type": "String",
              "ValueSelector": ".key"
            }
          }
        }
      ]
    }

樣本

在1分鐘周期內,若被監控ECS執行個體的CPU總使用率超過閾值,則執行個體自動重啟。

  • YAML格式 

    FormatVersion: OOS-2019-06-01
    Description:
      en: Reboot ECS instance with specified tag when its CPU utilization exceeded threshold.The selected instance must already have the Cloud Monitor agent installed.
      zh-cn: 按tag在ECS執行個體CPU利用率超過閾值時執行執行個體重啟。所選執行個體必須已安裝CloudMonitorAgent。
      name-en: ACS-ECS-RebootInstanceAtHighCpuByTags
      name-zh-cn: 按tag在ECS執行個體CPU利用率超過閾值時執行執行個體重啟
      categories:
        - alarm-trigger
    Parameters:
      tags:
        Type: Json
        Description:
          en: The tags to select ECS instances.
          zh-cn: 執行個體的標籤。
        AssociationProperty: Tags
      threshold:
        Type: Number
        Description:
          en: The CPU utilization threshold.
          zh-cn: CPU利用率閾值。
      silenceTime:
        Type: Number
        Description:
          en: The silence time of alarm (seconds).
          zh-cn: 警示通道沉默周期(秒)。
        Default: 60
      OOSAssumeRole:
        Description:
          en: The RAM role to be assumed by OOS.
          zh-cn: OOS扮演的RAM角色。
        Type: String
        Default: OOSServiceRole
    RamRole: '{{ OOSAssumeRole }}'
    Tasks:
      - Name: alarmTrigger
        Action: 'ACS::AlarmTrigger'
        Description:
          en: Set the CPU utilization alarm for ECS instance.
          zh-cn: 對ECS執行個體的CPU使用率進行監控。
        Properties:
          Namespace: acs_ecs_dashboard
          MetricName: cpu_total
          Statistics: Average
          ComparisonOperator: GreaterThanThreshold
          Threshold: '{{threshold}}'
          Times: 1
          SilenceTime: '{{ silenceTime }}'
          Period: 60
          Interval: 60
        Outputs:
          InstanceId:
            Type: String
            ValueSelector: .instanceId
      - Name: CheckForInstances
        Action: 'ACS::CheckFor'
        Description:
          en: Check ECS instance has specified tag.
          zh-cn: 檢查ECS執行個體有指定的tag。
        OnError: 'ACS::END'
        Properties:
          Service: ECS
          API: DescribeInstances
          Parameters:
            Tags: '{{ tags }}'
            InstanceIds: '["{{ alarmTrigger.instanceId }}"]'
          PropertySelector: TotalCount
          DesiredValues:
            - 1
      - Name: RebootInstance
        Action: 'ACS::ECS::RebootInstance'
        Description:
          en: Restarts the ECS instances.
          zh-cn: 重啟執行個體。
        Properties:
          instanceId: '{{ alarmTrigger.instanceId }}'
                                            

  • JSON格式

{
  "FormatVersion": "OOS-2019-06-01",
  "Description": {
    "en": "Reboot ECS instance with specified tag when its CPU utilization exceeded threshold.The selected instance must already have the Cloud Monitor agent installed.",
    "zh-cn": "按tag在ECS執行個體CPU利用率超過閾值時執行執行個體重啟。所選執行個體必須已安裝CloudMonitorAgent。",
    "name-en": "ACS-ECS-RebootInstanceAtHighCpuByTags",
    "name-zh-cn": "按tag在ECS執行個體CPU利用率超過閾值時執行執行個體重啟",
    "categories": [
      "alarm-trigger"
    ]
  },
  "Parameters": {
    "tags": {
      "Type": "Json",
      "Description": {
        "en": "The tags to select ECS instances.",
        "zh-cn": "執行個體的標籤。"
      },
      "AssociationProperty": "Tags"
    },
    "threshold": {
      "Type": "Number",
      "Description": {
        "en": "The CPU utilization threshold.",
        "zh-cn": "CPU利用率閾值。"
      }
    },
    "silenceTime": {
      "Type": "Number",
      "Description": {
        "en": "The silence time of alarm (seconds).",
        "zh-cn": "警示通道沉默周期(秒)。"
      },
      "Default": 60
    },
    "OOSAssumeRole": {
      "Description": {
        "en": "The RAM role to be assumed by OOS.",
        "zh-cn": "OOS扮演的RAM角色。"
      },
      "Type": "String",
      "Default": "OOSServiceRole"
    }
  },
  "RamRole": "{{ OOSAssumeRole }}",
  "Tasks": [
    {
      "Name": "alarmTrigger",
      "Action": "ACS::AlarmTrigger",
      "Description": {
        "en": "Set the CPU utilization alarm for ECS instance.",
        "zh-cn": "對ECS執行個體的CPU使用率進行監控。"
      },
      "Properties": {
        "Namespace": "acs_ecs_dashboard",
        "MetricName": "cpu_total",
        "Statistics": "Average",
        "ComparisonOperator": "GreaterThanThreshold",
        "Threshold": "{{threshold}}",
        "Times": 1,
        "SilenceTime": "{{ silenceTime }}",
        "Period": 60,
        "Interval": 60
      },
      "Outputs": {
        "InstanceId": {
          "Type": "String",
          "ValueSelector": ".instanceId"
        }
      }
    },
    {
      "Name": "CheckForInstances",
      "Action": "ACS::CheckFor",
      "Description": {
        "en": "Check ECS instance has specified tag.",
        "zh-cn": "檢查ECS執行個體有指定的tag。"
      },
      "OnError": "ACS::END",
      "Properties": {
        "Service": "ECS",
        "API": "DescribeInstances",
        "Parameters": {
          "Tags": "{{ tags }}",
          "InstanceIds": "[\"{{ alarmTrigger.instanceId }}\"]"
        },
        "PropertySelector": "TotalCount",
        "DesiredValues": [
          1
        ]
      }
    },
    {
      "Name": "RebootInstance",
      "Action": "ACS::ECS::RebootInstance",
      "Description": {
        "en": "Restarts the ECS instances.",
        "zh-cn": "重啟執行個體。"
      },
      "Properties": {
        "instanceId": "{{ alarmTrigger.instanceId }}"
      }
    }
  ]
}