Alibaba Cloud Elasticsearch can monitor clusters and allows you to customize alert thresholds for your Elasticsearch cluster. If an alert is detected, the system notifies you of the alert. To ensure the stability of your Elasticsearch cluster, we recommend that you configure monitoring and alerting for the cluster. This way, the system can monitor items such as cluster status and disk usage in real time, and you can check alert notifications and take measures at the earliest opportunity. This topic describes how to configure alerting for an Elasticsearch cluster, including the initiative alert feature and custom alert rules.
Enable the initiative alert feature
The initiative alert feature is provided by CloudMonitor and is disabled by default. After this feature is enabled, alert rules are created to detect errors, such as abnormal cluster status, high disk usage (greater than 75%), and high JVM heap memory usage (greater than 85%). These rules apply to all Elasticsearch clusters within your Alibaba Cloud account.
- Log on to the Alibaba Cloud Elasticsearch console.
- In the left-side navigation pane, click Elasticsearch Clusters.
On the Elasticsearch Clusters page, click Initiative Alert.
In the Initiative Alert dialog box, click Enable Now.
NoteIf the Disable Now button is displayed in the dialog box, the initiative alert feature is already enabled. In this case, you do not need to perform the following steps.
On the Initiative Alert page of the CloudMonitor console, turn on the Initiative Alert switch for Elasticsearch.
(Optional) Go to the Elasticsearch console and check whether the initiative alert feature is enabled.
On the Elasticsearch Clusters page, find your cluster and click its ID.
In the left-side navigation pane of the page that appears, choose
.In the upper-right corner of the Basic Monitoring tab, view the value of Initiative Alert.
If the value of Initiative Alert is Enabled, the initiative alert feature is enabled.
Configure custom alert rules in CloudMonitor
Log on to the CloudMonitor console.
In the left-side navigation pane, choose .
On the Alert Rules page, click Create Alert Rule.
In the Create Alert Rule panel, configure an alert rule.
In this example, an alert rule is created to monitor the NodeDiskUtilization, ClusterStatus, and NodeHeapMemoryUtilization metrics. The following table describes some parameters for configuring the alert rule. For parameters that are not provided in the following table, default values are used. For more information about the involved parameters, see Create an alert rule.
Parameter
Description
Product
Select Elasticsearch.
Resource Range
Select Instances.
Associated Resources
Select the cluster that you want to monitor.
Rule Description
Click Add Rule and select a metric type. In the Configure Rule Description panel, specify a rule name in the Alert Rule field and configure the following parameters:
Metric Type: Select Combined Metrics.
Alert Level: Select Warning(Warn).
Multi-metric Alert Condition:
Choose
, select Value, select >=, and then specify 2.0.Choose
, select Average, select >=, and then specify 75.Choose
, select Average, select >=, and then specify 85.
Relationship Between Metrics: Select Generate alerts if one of the conditions is met.
Alert Threshold Triggers: Select 3 Consecutive Cycles (1 Cycle = 1 Minutes).
You can also select Single Metric for Metric Type to configure an alert rule only for disk usage. For more information, see Example of configuring an alert rule for disk usage.
Alert Contact Group
Select the alert contact group that you created. For information about how to create an alert contact group, see Create an alert contact or alert contact group.
NoteYou can also click Advanced Settings and enter a URL that can be accessed over the Internet in the Alert Callback field. This way, CloudMonitor can push alert notifications to the URL through a POST request. Only HTTP requests are supported. For more information, see Use the alert callback feature to send notifications about threshold-triggered alerts.
You can configure alert rules for the metrics of your Elasticsearch cluster based on the instructions in the following table. For more information about the metrics, see Metrics and exception handling suggestions.
Metric
Description
ClusterStatus(value)
Required. This metric checks the status of your cluster. Green indicates that your cluster is in a normal state. Yellow or red indicates that your cluster is in an abnormal state.
The value for the cluster state green is 0.00, that for the cluster state yellow is 1.00, and that for the cluster state red is 2.00. Reference these values and specify a suitable threshold for the ClusterStatus(value) metric.
NodeDiskUtilization(%)
Required. Set the threshold to a value that is less than 75%. The upper limit is 80%.
NodeHeapMemoryUtilization(%)
Required. Set the threshold to a value that is less than 85%. The upper limit is 90%.
NodeCPUUtilization(%)
Optional. Set the threshold to a value that is less than or equal to 95%.
NodeLoad_1m(value)
Optional. Set the threshold to a value that is 80% of the number of vCPUs for each node.
ClusterQueryQPS(Count/Second)
Optional. Set the threshold based on the actual test result.
ClusterIndexQPS(Count/Second)
Optional. Set the threshold based on the actual test result.
NodeStatsFullGcCollectionCount(Count)
Optional. If the value of this metric is not 0, an error occurs on your cluster.
NodeStatsExceptionLogCount(Count)
Optional. If the value of this metric is not 0, an error occurs on your cluster.
ClusterAutoSnapshotLatestStatus(value)
Optional. If the value of this metric is -1 or 0, your cluster is normal. If the value of this metric is 2, an error occurs on your cluster.
Click Confirm.
The system then starts to monitor your cluster. If the system detects an exception on the metrics that are configured in the alert rule, the specified alert contact can receive an alert notification based on the notification method configured in the alert rule.
Example of configuring an alert rule for disk usage
You can configure an alert rule for the disk usage of nodes in an Elasticsearch cluster in the CloudMonitor console. This way, you can obtain exceptions on the disk usage and troubleshoot related issues at the earliest opportunity.
For more information, see Configure custom alert rules in CloudMonitor. The following table provides the related parameter configurations in this example.
Parameter | Description |
Alert Rule | Set the value to Disk Usage Alerting. |
Metric Type | Select Single Metric. |
Metric | Choose . |
Threshold and Alert Level |
|
Chart Preview | The chart in which the monitoring data of the selected metric is displayed. |