Simple Log Service supports associated monitoring and no-data alerts. This topic explains how to configure associated monitoring and no-data alerts.
Monitoring timeliness
Alerting and Monitoring System Principles
The alerting and monitoring system executes the configured query statement at regular intervals based on the Check Frequency. The results are used as parameters for alert conditions. If the conditions are met, an alert is triggered.
Issue Analysis
Data Indexing Latency: There is a delay between data being written to Simple Log Service and when it is available for querying. This delay can result in missed data.
For instance, if the alert execution time is 12:03:30 and the query range is one minute prior, the Check Frequency is set to a fixed interval of 1 minute. The query time range is [12:02:30, 12:03:30). Logs written at 12:03:29 may not be included in the query at 12:03:30.
Query Inaccuracy: Logs with different timestamps within the same minute may be indexed incorrectly due to the indexing method of Simple Log Service.
For example, if logs are written at 12:02:50 with timestamps such as 12:02:20 and 12:02:50, the index may incorrectly assign these logs to the 12:02:20 timestamp, making them unqueryable in the time range [12:02:30, 12:03:30).
Optimization Suggestions
For Accuracy: If you require high accuracy for alerting and want to avoid repeated alerts or false negatives.
Data Indexing Latency: Set the relative Start Time and End Time of the Query Interval to slightly earlier than the current time, such as 70 seconds to 10 seconds ago (relative). This allows for a buffer to account for indexing delays.
Query Inaccuracy: Choose whole time points such as 1 minute, 5 minutes, or 1 hour for the Query Interval and set the check frequency to match, ensuring more accurate results.
For Real-Time Performance: If you require prompt alerting and can tolerate duplicate alerts.
Data Indexing Latency: Adjust the relative Start Time of the Query Interval when performing Query And Analysis to an earlier point, such as 70 seconds (relative).
Query Inaccuracy: Ensure the Query Interval during Query Statistics includes the previous minute, such as 90 seconds (relative), and set the check frequency to 1 minute.
Associate multiple query analysis results
The Simple Log Service alerting and monitoring system processes query and analysis results as sets and enables the monitoring of multiple sets in combination, as illustrated below.
Simple Log Service can monitor up to three combined sets.
By default, set operations utilize only the first 1,000 rows from the query and analysis results. However, if there are three query and analysis operations and the Set Operation is not configured with a No Merge option, only the first 100 rows from each operation are considered.
When three sets are involved, the system first conducts a set operation on the initial two sets, then combines those results with the third set. For instance:
Set A LEFT JOIN Set B LEFT JOIN Set C: The system first LEFT JOINs Set A with Set B, then LEFT JOINs those results with Set C.
Set A JOIN Set B INNER JOIN Set C: The system first JOINs Set A with Set B, then INNER JOINs those results with Set C.
Set A LEFT EXCLUDE JOIN Set B No Merge Set C: The system performs a LEFT EXCLUDE JOIN between Set A and Set B, while Set C is not included in the final query and analysis results.
The Set Operation feature offers nine configurations, detailed as follows:
Set operation | Diagram | Description |
The two sets are not associated. Set A is used as query and analysis results. Set B is used as the reference source for alert template variables. | ||
None | Arbitrary data from Set A combines with arbitrary data from Set B. In most cases, this set operation is used to filter data for further evaluation. | |
Data in Set B is added to Set A and aligned by field. | ||
Only data that exists in Set B is retained in Set A. Set B is the whitelist of Set A. | ||
Partial data from Set B is supplemented to Set A. Set B is the dimension table of Set A. | ||
Partial data from Set A is supplemented to Set B. Set A is the dimension table of Set B. | ||
Set A and Set B complement each other. | ||
Data that exists in Set B is deleted from Set A. Set B is the blacklist of Set A. | ||
Data that exists in Set A is deleted from Set B. Set A is the blacklist of Set B. |
No merge
Use scenario
NGINX access logs are monitored for errors. An alert is triggered and a notification is sent if the count of 5XX status code errors exceeds 500 within a 15-minute window. The notification includes the host details where the alert originated.
Configuration
Results
Results of Query Statement 0
This statement counts the 5XX status code errors occurring within a 15-minute interval.
cnt
1234
Results of Query Statement 1
This statement identifies the top 5 hosts by the number of 5XX status code errors within a 15-minute interval, along with the error count for each host.
host
pv
host1
60
host2
55
host3
47
host4
45
host5
30
Results of the set operation
Choosing Set Operation as No Merge will yield Query Statement 0's results as the set operation outcome.
Cartesian product
Example 1
Use scenario:
Monitoring is set up for OSS and SLB access logs. Queries are made for 4XX errors in OSS and 5XX errors in SLB within a 15-minute window. An alert is triggered if the combined error count reaches 1,000.
Configuration:
Results:
Results of Query Statement 0
This statement counts the 4XX errors in OSS logs within a 15-minute span.
pv
890
Results of Query Statement 1
This statement counts the 5XX errors in SLB logs within a 15-minute span.
pv
567
Results of the set operation
Choosing Set Operation as Cartesian Product will present the following results:
$0.pv
$1.pv
890
567
Additional examples
Results of Query Statement 0
a
b
a1
b1
a2
b2
a5
b5
Results of Query Statement 1
a
c
a1
c1
a3
c3
Results of the set operation
Choosing Set Operation as Cartesian Product will present the following results:
$0.a
b
$1.a
c
a1
b1
a1
c1
a1
b1
a3
c3
a2
b2
a1
c1
a2
b2
a3
c3
a5
b5
a1
c1
a5
b5
a3
c3
Join
Example 1
Use scenario
Two Logstores in the China (Beijing) and China (Shanghai) regions store NGINX access logs. An alert is triggered if more than 10 hosts in both Logstores have over 30 errors with a 5XX status code within a 15-minute window.
Configuration
Results
Results of Query Statement 0
This statement counts the hosts with over 30 errors of 5XX status within a 15-minute period, including the error count per host.
host
pv
host1
60
host2
55
host3
47
host4
45
host5
31
Results of Query Statement 1
This statement also counts the hosts with over 30 errors of 5XX status within a 15-minute period, including the error count per host.
host
pv
hosta
70
hostb
45
hostc
44
hostd
42
Results of the set operation
Choosing Set Operation as Join will present the following results:
host
pv
host1
60
host2
55
host3
47
host4
45
hosg5
31
hosta
70
hostb
45
hostc
44
hostd
42
Additional examples
Non-matching fields are left empty after performing the JOIN operation if the fields in two query results do not match completely.
Results of Query Statement 0
a
b
a1
b1
a2
b2
Results of Query Statement 1
b
c
b1
c1
b2
c2
Results of the set operation
a
b
c
a1
b1
None
a2
b2
None
None
b1
c1
None
b2
c2
For three specified query statements, the system performs a set operation on the first two results. Then, it combines the outcome with the third statement's results.
Results of Query Statement 0
a
b
a1
b1
a2
b2
Results of Query Statement 1
a
b
a1
b11
a2
b22
a3
b33
Results of the set operation on Query Statements 0 and 1
Choosing Set Operation as Inner Join, with condition $0.a == $1.a, will present the following results:
a
$0.b
$1.b
a1
b1
b11
a2
b2
b22
Results of Query Statement 2
a
b
a3
b333
a4
b444
Results of the set operation
Choosing Set Operation as Join will present the following results:
NoteField b in Query Statement 2's results is aligned with field $0.b.
a
$0.b
$1.b
a1
b1
b11
a2
b2
b22
a3
b333
None
a4
b444
None
Inner join
Example 1
Use scenario
Monitoring is set for the number of 5XX errors in specific buckets. An alert is triggered if the error count exceeds 1,000 within a 15-minute window. Resource data must be added to maintain the bucket whitelist for this requirement.
Configuration
Results
Results of Query Statement 0
This query identifies buckets with over 1,000 errors of 5XX status within a 15-minute window.
bucket
pv
bucket_01
1600
bucket_02
1550
bucket_03
1470
bucket_04
1450
Results of Query Statement 1
The table below contains the resource data for the buckets.
bucket
desc
bucket_03
for dev team
bucket_04
for test team
bucket_05
for service team
bucket_06
for support team
Results of the set operation
Choosing Set Operation as Inner Join, with condition $0.bucket == $1.bucket, will present the following results:
bucket
pv
desc
bucket_03
1470
for dev team
bucket_04
1450
for test team
Example 2
Use scenario
Two Logstores in the China (Beijing) and China (Shanghai) regions store NGINX access logs. An alert is triggered if both Logstores record 5XX errors and the Beijing Logstore's error count exceeds Shanghai's within a 15-minute window.
Configuration
Results
Results of Query Statement 0
This statement counts clients with over 30 errors of 5XX status in the Beijing Logstore within a 15-minute window, including the error count per client.
client_ip
pv
192.0.2.4
60
192.0.2.5
55
192.0.2.6
47
192.0.2.7
45
192.0.2.8
31
Results of Query Statement 1
This statement counts clients with over 30 errors of 5XX status in the Shanghai Logstore within a 15-minute window, including the error count per client.
client_ip
pv
192.0.2.5
70
192.0.2.6
45
192.0.2.7
44
192.0.2.8
42
192.0.2.9
42
Results of the set operation
Choosing Set Operation as Inner Join, with conditions $0.client_ip == $1.client_ip and $0.pv > $1.pv, will present the following results:
client_ip
pv
192.0.2.6
47
192.0.2.7
45
Additional examples
When fields in Query Statement 0 and Query Statement 1 have the same name but are not associated, they are automatically prefixed with $0 and $1 in the specified set operation results.
Results of Query Statement 0
a
b
c
d
a1
b1
c1
d1
a2
b2
c2
d2
a3
b3
c3
d3
Results of Query Statement 1
a
b
c
a1
b11
c11
a2
b22
c22
Results of the set operation
Choosing Set Operation as Inner Join, with condition $0.a == $1.a, will present the following results:
a
$0.b
$0.c
d
$1.b
$1.c
a1
b1
c1
d1
b11
c11
a2
b2
c2
d2
b22
c22
Left join
Results of Query Statement 0
a
b
a1
b1
a2
b2
a3
b3
Results of Query Statement 1
a
b
c
a1
b11
c1
a2
b22
c2
Results of the set operation
Choosing Set Operation as Left Join, with condition $0.a == $1.a, will present the following results:
a
$0.b
$1.b
c
a1
b1
b11
c1
a2
b2
b22
c2
a3
b3
None
None
Right join
Results of Query Statement 0
a
b
c
a1
b11
c1
a2
b22
c2
Results of Query Statement 1
a
b
a1
b1
a2
b2
a3
b3
Results of the set operation
Choosing Set Operation as Right Join, with condition $0.a == $1.a, will present the following results:
a
$0.b
c
$1.b
a1
b11
c1
b1
a2
b22
c2
b2
a3
None
None
b3
Full join
Results of Query Statement 0
a
b
c
a1
b1
c1
a2
b2
c2
a5
b5
c3
Results of Query Statement 1
a
b
d
a1
b11
d1
a2
b22
d2
a3
b33
d3
Results of the set operation
Choosing Set Operation as Full Join, with condition $0.a == $1.a, will present the following results:
a
$0.b
c
$1.b
d
a1
b1
c1
b11
d1
a2
b2
c2
b22
d2
a5
b5
c3
None
None
a3
None
None
b33
d3
Left exclude join
Use scenario
Monitoring is set for the number of 5XX errors in unspecified buckets. An alert is triggered if the error count exceeds 1,000 within a 15-minute window. Resource data must be added to maintain the bucket blacklist for this requirement.
Configuration
Results
Results of Query Statement 0
This query identifies buckets with over 1,000 errors of 5XX status within a 15-minute window.
bucket
pv
bucket_01
60
bucket_02
55
bucket_03
47
bucket_04
45
Results of Query Statement 1
The table below contains the resource data for the buckets.
bucket
desc
bucket_03
for dev team
bucket_04
for test team
Results of the set operation
Choosing Set Operation as Left Exclude Join, with condition $0.bucket == $1.bucket, will present the following results:
bucket
pv
bucket_01
60
bucket_02
55
Right exclude join
Use scenario
Monitoring is set for the number of 5XX errors in unspecified buckets. An alert is triggered if the error count exceeds 1,000 within a 15-minute window. Resource data must be added to maintain the bucket blacklist for this requirement.
Configuration
Results
Results of Query Statement 0
The table below contains the resource data for the buckets.
bucket
desc
bucket_03
for dev team
bucket_04
for test team
Results of Query Statement 1
This query identifies buckets with over 1,000 errors of 5XX status within a 15-minute window.
bucket
pv
bucket_01
60
bucket_02
55
bucket_03
47
bucket_04
45
Results of the set operation
Choosing Set Operation as Right Exclude Join, with condition $0.bucket == $1.bucket, will present the following results:
bucket
pv
bucket_01
60
bucket_02
55
No data alert
If data loss occurs during collection, Simple Log Service may not receive data, which can go unnoticed. To address this, the no-data alert feature sends notifications when data is missing. For instance, you can set up an alert rule to monitor CPU metrics for each host. An alert is triggered and a notification is sent if:
The CPU utilization of a host exceeds 95%.
No data is returned for the query and analysis operation.
Configure the alert rule as follows:
Query Statistics: Query the CPU utilization, for example.
* | select promql_query_range('cpu_util') from metrics limit 1000
Trigger Condition: Data Matches, value>95, Severity: Medium
A medium-level alert is triggered if the value in the results exceeds 95.
Threshold Of Continuous Triggers: An alert is generated when the number of consecutive triggers reaches this threshold.
No Data Alert: Enable the No Data Alert feature and set the severity and annotations.
Activating the No-Data Alert will trigger an alert if the consecutive instances of no data returned surpass the specified threshold in the Threshold of Continuous Triggers parameter.
An alert is triggered when no data is returned more times consecutively than the threshold allows. Specify a separate severity and annotations for this feature when enabled.
The configuration is shown in the figure below.