CloudOps Orchestration Service (OOS) supports O&M tasks for threshold-triggered alerts based on metrics of cloud services. An alert O&M task executes the specified template when the specified metric of a monitored cloud service reaches the threshold. An alert O&M task keeps running to listen to the specified alert until you cancel the task. For example, you can configure an alert O&M task to automatically clear the log directory when the disk usage exceeds 80%.
For more information about the supported metrics, see Major metrics of Alibaba Cloud services.
To create an alert O&M task, perform the following steps:
Configure an alert rule.
Select the template to be executed.
Configure the parameters for executing the template.
Configure an alert rule
Parameter | Required | Description |
Product type | Yes | The service to be monitored. Select a service from the drop-down list. |
Rule description | Yes | The rule for triggering the alert based on the threshold. |
Trigger silence cycle | No | The period during which the alert is triggered only once even if the metric value consecutively exceeds the threshold several times. Default value: 24Hours. |
Effective From | No | The time period during which the alert rule is effective. By default, the alert rule takes effect all day. |
A threshold-triggered alert rule contains the following information:
Metric name
Aggregation period of monitoring data
Number of aggregation periods
Statistics collection method
Comparison operator
Threshold
Select the template to be executed
Select the template to be executed when the alert is generated.
Configure the parameters for executing the template
You can set the Template Parameters parameter to Extract Value from Message Body or Fixed Value. If you select Fixed Value, the template is executed based on the parameter values that you set. If you select Extract Value from Message Body, you can use jQuery expressions to extract values from alert message bodies.
To extract values from alert message bodies, use jQuery expressions in the $.Parameter name format. For example, the following content indicates an alert message for the Host.cpu.total metric of an Elastic Compute Service (ECS) instance:
{
"Average": 50.15,
"Maximum": 50.75,
"Minimum": 49.75,
"curLevel": "INFO",
"instanceId": "i-bp1gn7od******qh5r12",
"ruleName": "alarmtrigger-130920******0047-exec-de81413d******71b537",
"timestamp": 1575970560000,
"userId": "130920******0047"
}
To obtain the ID of the instance for which the alert is triggered, use the following expression: $.instanceId
.
The following table describes the parameters that can be extracted from alert message bodies.
Expression | Description | Example |
$.timestamp | The timestamp when the alert was triggered. Unit: milliseconds. | 1575970560000 |
$.curLevel | The level of the alert. | INFO |
$.userId | The ID of the Alibaba Cloud account. | 130920**0047 |
$.dimensionFieldName | OK indicates that the alert has been cleared. The dimension of the metric. Replace dimensionFieldName in the expression with the parameter name of the metric dimension. For example, the CPU utilization of ECS instances is monitored based on the instance ID. You can use the | N/A |
The following figure shows an example of extracting values from an alert message body.
You can also set fixed parameter values for executing the template. The method is similar to that for regular templates.