To ensure that you do not miss important notifications, we recommend that you configure alert rules for key monitoring metrics. When the performance metrics such as CPU utilization and memory usage of your Tair instance are abnormal or when a master-replica switchover is triggered for the instance, CloudMonitor promptly sends alerts to you.
Background information
CloudMonitor is a service that can be used to monitor Alibaba Cloud resources and Internet applications. It offers an all-in-one enterprise-grade monitoring solution that is ready to use out of the box. For more information, see What is CloudMonitor? You can create alert rules and specify metrics based on which alerts are configured. When the alert rules of a specified metric are triggered, alerts are generated and sent to alert contacts in an alert contact group.
CloudMonitor sends alerts to alert contacts in alert contact groups. Before you add an alert contact to an alert contact group, you must create the alert contact or alert contact group. For more information, see Create an alert contact or alert contact group.
Procedure
Log on to the Tair console and go to the Instances page. In the top navigation bar, select the region in which the instance that you want to manage resides. Then, find the instance and click the instance ID.
In the left-side navigation pane, click Alert Settings.
On the Alarm Settings page, view metrics of the current instance.
You can also click Alarm Settings in the upper-right corner to go to the CloudMonitor console to add or manage alert rules. Configuration methods:
Create an alert rule: When the value of a metric exceeds the specified threshold, the system sends an alert. For example, if the CPU utilization of an instance exceeds the threshold of 90%, the system sends an alert. This alerting mechanism enables you to stay informed about the health and performance of your resources and respond to exceptions in a timely manner.
In most cases, workloads are sensitive to fluctuations in the CPU utilization, memory usage, and network traffic of Tair instances. We recommend that you specify alert thresholds for key metrics. The following metrics and thresholds are provided for your reference:
CPU utilization: greater than 60%.
Memory usage: greater than 80%.
Inbound bandwidth usage and outbound bandwidth usage: greater than 80%.
For more information about the monitoring metrics supported by CloudMonitor, see Appendix 1: Metrics.
Subscribe to event notifications: If a Tair instance fails, performs a master-replica switchover, or runs a proactive O&M task such as an instance migration, the system sends an alert. This allows you to resolve issues in a timely manner. Alerts are triggered by events such as InstanceMaintenance (proactive O&M) and instance exceptions.
FAQ
What does the Blocked Clients metric in the alert settings mean?
The Node/Blocked Clients metric that is provided for creating an alert rule indicates the number of client connections that are in a blocked state due to the execution of blocking commands on a Tair instance. Blocking commands include BRPOP, BLPOP, BZPOPMIN, BZPOPMAX, and XREAD.