This topic describes how to optimize the parameters of decomposition-based time series anomaly detection algorithms, including the ostl-esd, istl-esd, and istl-nsigma algorithms.
Background information
Decomposition-based time series anomaly detection algorithms are applicable to periodic data, such as the QPS data. For example, you can use these algorithms to analyze time series data with peaks or valleys at fixed intervals on a daily or weekly basis.
When you use decomposition-based time series anomaly detection algorithms to analyze raw data, data points are decomposed into the following three terms: trend, season, and residual. Then, the esd algorithm is used to detect anomalies in the residual terms decomposed from the raw data. The esd algorithm calculates an anomaly score for the residual term of each data point and compares the score with the threshold that is determined by the esd.alpha input parameter. If the anomaly score of a data point is larger than the threshold, the data point is abnormal. Otherwise, the data point is normal.
Scenarios
The number of detected abnormal data points is too large or too small
Optimization: Set the esd.alpha parameter to a larger value to increase the sensitivity of the algorithm. In this case, more abnormal data points are detected. If you set the esd.alpha parameter to a larger value, less abnormal data points are detected. The following statements provide an example on how to adjust the sensitivity of the ostl-esd and istl-esd algorithms:
// Optimize the parameters of the ostl-esd algorithm.
SELECT xx, anomaly_detect(mean_duration, 'ostl-esd', 'periods[0]=24, esd.alpha=0.2') as res FROM xxx SAMPLE BY 0
// Optimize the parameters of the istl-esd algorithm.
SELECT xx, anomaly_detect(mean_duration, 'istl-esd', 'frequency=1h, periods[0]=24h, esd.alpha=0.2') as res FROM xxx SAMPLE BY 0
// Optimize the parameters of the istl-nsigma algorithm.
SELECT xx, anomaly_detect(mean_duration, 'istl-nsigma', 'periods[0]=24, nsigma.n=2') as res FROM xxx SAMPLE BY 0
Parameters:
esd.alpha: The sensitivity of the anomaly detection algorithm. Valid values: (0,1).
Precisely adjust the esd.alpha parameter
You can precisely adjust the esd.alpha parameter to obtain detection results that are more accurate.
Optimization: Add the verbose=true, esd.verbose=true
condition in the anomaly detection statement to enable the verbose mode. The following statements provide an example on how to precisely adjust the esd.alpha parameter:
// Optimize the parameters of the ostl-esd algorithm.
SELECT xx, anomaly_detect(mean_duration, 'ostl-esd', 'periods[0]=24, verbose=true, esd.verbose=true') as res FROM xxx SAMPLE BY 0
// Optimize the parameters of the istl-esd algorithm.
SELECT xx, anomaly_detect(mean_duration, 'istl-esd', 'frequency=1h, periods[0]=24h, verbose=true, esd.verbose=true') as res FROM xxx SAMPLE BY 0
// Optimize the parameters of the istl-nsigma algorithm.
SELECT xx, anomaly_detect(mean_duration, 'istl-nsigma', 'periods[0]=24, verbose=true, nsigma.verbose=true') as res FROM xxx SAMPLE BY 0
+----------------------------+---------------+-------------+---------------------+---------------------+-----------------------+
| time | mean_duration | ... | res$anomalyScore | res$threshold | res$detectedDirection |
+----------------------------+---------------+-------------+---------------------+---------------------+-----------------------+
| 2022-04-11T13:00:00+08:00 | 0 | ... | 0 | 1.6447834844273468 | NONE |
| 2022-04-11T14:00:00+08:00 | 0 | ... | 0 | 1.6447834844273468 | NONE |
| 2022-04-11T15:00:00+08:00 | 0 | ... | 0 | 1.6447834844273468 | NONE |
| 2022-04-11T16:00:00+08:00 | 0 | ... | 0 | 1.6447834844273468 | NONE |
| 2022-04-11T17:00:00+08:00 | 3136.3 | ... | 0.6917962785972575 | 1.6447834844273468 | NONE |
|* 2022-04-11T18:00:00+08:00 | 13622.6 | ... | 3.0136653345953954 | 1.6447834844273468 | UP |
|* 2022-04-11T19:00:00+08:00 | 8651.6 | ... | 1.7122438285577357 | 1.6447834844273468 | UP |
| 2022-04-11T20:00:00+08:00 | 6735.46 | ... | 1.252994967798293 | 1.6447834844273468 | NONE |
| 2022-04-11T21:00:00+08:00 | 1496.683 | ... | 0 | 1.6447834844273468 | NONE |
| 2022-04-11T22:00:00+08:00 | 1691.3175 | ... | 0 | 1.6447834844273468 | NONE |
+----------------------------+---------------+-------------+---------------------+---------------------+-----------------------+
According to the result, data points whose mean_duration value is 13622.6 and 8651.6 are determined abnormal. The anomalyScore and threshold columns are also returned for the two data points.
If you want to prevent the data point whose mean_duration value is 8651.6 from being determined abnormal in your business, you can adjust the esd.alpha parameter to precisely adjust the sensitivity of the algorithm. For more information about how to adjust the esd.alpha parameter, see Step 2 in the "Adjust the detection results of specific data points" section of Optimize the parameters of statistical time series anomaly detection algorithms.
Normal data within cycles is determined abnormal
Data may vary with cycles. Some data points that slightly change across cycles may be determined abnormal.
- For the ostl-esd algorithm:
- Set esd.alpha to a larger value to increase the sensitivity of the algorithm. The following statement provides an example on how to modify the esd.alpha parameter to increase the sensitivity of the algorithm:
SELECT xx, anomaly_detect(mean_duration, 'ostl-esd', 'periods[0]=24, esd.alpha=0.2') as res FROM xxx SAMPLE BY 0
- (Optional) Set alphaStl to a smaller value. The following statement provides an example.Note If the expected result is returned after you set the esd.alpha parameter to a larger value, you can skip this step.
Parameters:SELECT xx, anomaly_detect(mean_duration, 'ostl-esd', 'periods[0]=24, alphaStl=0.25') as res FROM xxx SAMPLE BY 0
alphaStl: This parameter indicates the level at which the algorithm uses data in past cycles in anomaly detection. Valid values: (0,1). If you set alphaStl to a larger value, the algorithm detect anomalies based on data in past cycles at a higher level. By default, the value of alphaStl is 0.35 in the ostl-esd algorithm.
- Set esd.alpha to a larger value to increase the sensitivity of the algorithm. The following statement provides an example on how to modify the esd.alpha parameter to increase the sensitivity of the algorithm:
- For the istl-esd algorithm: The istl-esd algorithm can automatically adjust the level at which data in past cycles is used in anomaly detection. You need only to set esd.alpha to a smaller value to decrease the sensitivity of the algorithm. For more information about how to configure the esd.alpha parameter, see the steps for the ostl-esd algorithm.
Unexpected anomalies are returned when the cycle length significantly changes
Optimization
- For the ostl-esd algorithm: Reset the model and specify a new cycle length, such as 24. Then, specify that the algorithm detects anomalies from the data point from which the cycle length changes. The following statement provides an example on how to optimize the parameters of the ostl-esd algorithm:
SELECT xx, anomaly_detect(mean_duration, 'ostl-esd', 'periods[0]=24, reset_state=true') as res FROM xxx WHERE time >= xxxx SAMPLE BY 0
Important After you reset the model, data points in the first four cycles are used to warm up the model. Data points in these cycles are not returned as anomalies. - For the istl-esd algorithm: Reset the model and then specify that the algorithm detects anomalies from the data point from which the cycle length changes. The istl-esd algorithm can automatically identify the cycle length of data. You can also manually specify the cycle length of the data. The following statement provides an example on how to manually set the cycle length to 12 hours.
SELECT xx, anomaly_detect(mean_duration, 'istl-esd', 'frequency=1h, periods[0]=12h, reset_state=true') as res FROM xxx WHERE time >= xxxx SAMPLE BY 1h