After logs are collected to Simple Log Service, you can use the alerting system of Simple Log Service to configure alerts based on log keywords.
Background information
Logs can record information about the operating process and exceptions of a system. For example, logs can record warnings, errors, panic errors in Go, and the java.lang.StackOverflowError error in Java. Logs can also record the status of a system. For example, logs can record payment failures. Log keyword-based retrieval, monitoring, and alerting are frequently used. You can retrieve keywords from logs and configure alerts based on the keywords. This way, you can identify issues at the earliest opportunity. Simple Log Service provides an O&M-free alerting solution that features high performance and flexible configurations to help you configure alerts based on log keywords.
Case 1: Specify keywords to trigger alerts
This case provides an example on how to configure a query statement and an alert monitoring rule that triggers alerts when a specified keyword appears in logs.
Query statement
Set the time range to 15 Minutes(Relative) and execute the following statement to query the logs that include the ERROR keyword. For more information, see Query and analyze logs.
ERROR
Query and analysis result
The following query and analysis result shows that the ERROR keyword appears once within the last 15 minutes.
Alert monitoring rule
You can create an alert monitoring rule based on the obtained query and analysis result. For more information, see Create an alert monitoring rule for logs. You need to take note of the following parameters:
Set the Trigger Condition parameter to Data is returned. An alert is triggered when the ERROR keyword appears in logs.
Set the Description field in the Add Annotation parameter to ${logging} and Alert Template to SLS built-in content template. This way, an alert notification includes the content of the logging field in a raw log.
Alert notification
After the alert monitoring rule is created, you can receive an alert notification in the specified DingTalk group when the ERROR keyword appears in logs. You can click View Details to view the log for which an alert is generated to identify root causes.
Case 2: Configure alerts based on the number of times that a keyword appears in logs
This case provides an example on how to configure a query statement and an alert monitoring rule that triggers alerts when the number of times that a keyword appears in logs reaches a specified number within a specified time range.
Query statement
Set the time range to 1 Hour(Relative) and execute the following statement to query the number of times that the ERROR keyword appears in logs within an hour. For more information, see Query and analyze logs.
ERROR | SELECT count(*) AS cnt
Query and analysis result
The following query and analysis result shows that the ERROR keyword appears 11 times within the last hour.
Alert monitoring rule
You can create an alert monitoring rule based on the obtained query and analysis result. For more information, see Create an alert monitoring rule for logs. You need to take note of the following parameters:
Set the Trigger Condition parameter to data matches the expression, cnt > 5. An alert is triggered when the number of times that the ERROR keyword appears in logs exceeds 5 within an hour.
Set the Description field in the Add Annotation parameter to ${cnt} times that the ERROR keyword appears within an hour and Alert Template to SLS builtin content template. This way, an alert notification displays the number of times that the ERROR keyword appears within the last hour.
Alert notification
After the alert monitoring rule is created, you can receive an alert notification in the specified DingTalk group when the number of times that the ERROR keyword appears in logs exceeds 5 within the last hour. You can click View Details to view the log for which an alert is generated to identify root causes.
Case 3: Configure alerts by comparing the number of times that a keyword appears within a specific time range on a specified day and the day before
A keyword appears at regular intervals, such as daily, and is more likely to appear during daytime than during nighttime. In this case, absolute values such as the number of times that a keyword appears may not be a suitable indication of the health of a system. You can use interval-valued comparison and periodicity-valued comparison functions to calculate the percentage of the number of times that a keyword appears in logs within a specific time range on one day to the number of times that the keyword appears in logs within the same time range on a different day and configure alerts based on the calculation result.
Query statement
Set the time range to 1 Hour(Relative) and execute the following statement to calculate the percentage of the number of times that the ERROR keyword appears in logs within the last hour to the number of times that the ERROR keyword appeared in logs within the same time range the day before. For more information, see Query and analyze logs. For more information about the compare function, see Interval-valued comparison functions and periodicity-valued comparison functions.
ERROR | SELECT diff [1] AS today, diff [2] AS yesterday, round((diff [3]-1) * 100, 2) AS ratio FROM ( SELECT compare(cnt, 86400) AS diff FROM ( SELECT COUNT(*) AS cnt FROM log ) )
Query and analysis result
The following query and analysis result shows that the ERROR keyword appears 11 times within the last hour and 6 times within the same time range the day before. The growth rate is 83.33%.
Alert monitoring rule
You can create an alert monitoring rule based on the obtained query and analysis result. For more information, see Create an alert monitoring rule for logs. You need to take note of the following parameters:
Set the Trigger Condition parameter to data matches the expression, ratio > 10. An alert is triggered when the percentage of the number of times that the ERROR keyword appears in logs within the last hour to the number of times that the ERROR keyword appeared in logs within the same time range the day before exceeds 10%.
Set the Description field in the Add Annotation parameter to ${today} times that the keyword ERROR appears in logs within the last hour, ${yesterday} times that the ERROR keyword appeared in logs within the same time range the day before, and the growth rate is ${ratio}% and Alert Template to SLS builtin content template. This way, an alert notification displays the number of times that the ERROR keyword appears in logs within the last hour, the number of times that the ERROR keyword appeared in logs within the same time range the day before, and the growth rate.
Alert notification
After the alert monitoring rule is created, you can receive an alert notification in the specified DingTalk group when the percentage of the number of times that the ERROR keyword appears in logs within the last hour to the number of times that the ERROR keyword appeared in logs within the same time range the day before exceeds 10%. You can click View Details to view the log for which an alert is generated to identify root causes.
Case 4: Configure alerts for anomalies based on machine learning algorithms
The preceding cases describe the common scenarios for keyword-based alert configurations. However, in special scenarios, you need to use Simple Log Service machine learning algorithms to configure alerts. For example, the number of times that a keyword appears in a day does not frequently fluctuate, but the number may sharply increase or decrease at a specific point in time. To identify the change at the earliest opportunity, you can perform time series forecasting and anomaly detection based on Simple Log Service machine learning algorithms. For more information about machine learning algorithms, see Machine learning functions.
Query statement
Set the time range to 4 Hours(Relative) and execute the following statement to query the number of times that anomalies are detected. The anomalies are detected on the numbers of times that the ERROR keyword appears within the last 4 hours. For more information, see Query and analyze logs. For more information about the ts_predicate_simple function, see ts_predicate_simple.
ERROR | SELECT ts_predicate_simple(stamp, value, 6) FROM ( select __time__-__time__ % 30 AS stamp, count(1) AS value FROM log GROUP BY stamp ORDER BY stamp )
Query and analysis result
The following query and analysis result shows that the src, predict, upper, lower, and anomaly_prob columns are returned. If a value of anomaly_prob is greater than 0, an anomaly is detected. The total number of anomalies is equal to the number of data entries for which the value of anomaly_prob is greater than 0. You can configure alerts based on the numbers.
The query and analysis result can be displayed in a time series chart. This way, you can easily identify abrupt changes. Each small red circle in the following time series chart represents an anomaly. The chart shows that 15 anomalies are detected within the specified time range.
Alert monitoring rule
You can create an alert monitoring rule based on the obtained query and analysis result. For more information, see Create an alert monitoring rule for logs. You need to take note of the following parameters:
Set the Trigger Condition parameter to the query result contains, >, 5, anomaly_prob > 0. An alert is triggered when the number of times that anomalies are detected exceeds 5 within the last 4 hours.
Set the Description field in the Add Annotation parameter to the number of times that anomalies are detected exceeds 5 and Alert Template to SLS builtin content template. This way, an alert notification displays the number of anomalies within the last 4 hours.
Alert notification
After the alert monitoring rule is created, you can receive an alert notification in the specified DingTalk group when the number of times that anomalies are detected exceeds 5 within the last 4 hours. You can click View Details to view the log for which an alert is generated to identify root causes.