Simple Log Service provides the anomaly detection feature to identify anomalies in service systems and the root causes of the anomalies. The anomaly detection feature can automatically identify abnormal metric changes based on the current pattern of a metric and machine learning. You can use multivariate pattern identification functions to identify metric patterns. This topic describes the multivariate pattern identification functions.
Functions
Function | Syntax | Description | Data type of the return value |
| Identifies and returns a multivariate pattern based on the specified samples and sample weights. The sample weights are optional. Statistical patterns cover a variety of statistics and joint statistics, such as a mean, standard deviation, and covariance matrix. | varchar | |
| Merges multivariate patterns that are returned by the summarize function. The multivariate patterns can be patterns that are obtained by learning from the same dataset in different stages or patterns that are obtained by learning from two independent datasets. For more information, see summarize function. | varchar | |
normalize_vector(varchar summary, array(double) x_vector) | Normalizes a new sample vector specified by the | array(double) | |
standardize_vector(varchar summary, array(double) x_vector) | Standardizes a new sample vector specified by the | array(double) | |
mah_distance(varchar summary, array(double) x_vector) | Calculates the Mahalanobis distance for a new sample vector specified by the | double | |
standard_distance(varchar summary, double metric_value, int element_index) | Calculates the standardized distance for a metric specified by the | double | |
| Calculates the Mahalanobis distance for a new sample vector specified by the If you specify a value for the | array(double) |
summarize function
The summarize function identifies and returns a multivariate pattern based on the specified samples and sample weights. The sample weights are optional. Statistical patterns cover a variety of statistics and joint statistics, such as a mean, standard deviation, and covariance matrix.
varchar summarize(array(array(double)) data_samples)
Or
varchar summarize(array(array(double)) data_samples, array(double) weights)
Parameter | Description |
| A two-dimensional array. The array can be used as a two-dimensional table. Each column specifies a variable. Each row specifies the variable values of a sample. |
| Optional. The sample weight. The value is a one-dimensional array in the same length as one of the one-dimensional arrays that constitute the two-dimensional array specified by the |
Example
Query statement
* | with data_table as ( select 1 as entity_id, 'A' as entity_group, cast(array[1, 2, 3, 0] as array(double)) as features union all select 2 as entity_id, 'A' as entity_group, cast(array[4, 5, 6, 0] as array(double)) as features union all select 3 as entity_id, 'A' as entity_group, cast(array[7, 8, 9, 0] as array(double)) as features union all select 4 as entity_id, 'A' as entity_group, cast(array[10, 11, 1, 0] as array(double)) as features union all select 5 as entity_id, 'A' as entity_group, cast(array[13, 14, 15, 0] as array(double)) as features union all select 6 as entity_id, 'A' as entity_group, cast(array[16, 17, 18, 0] as array(double)) as features union all select 7 as entity_id, 'A' as entity_group, cast(array[19, 20, 21, 0] as array(double)) as features union all select 8 as entity_id, 'A' as entity_group, cast(array[22, 23, 1, 0] as array(double)) as features ) select entity_group, summarize(array_agg(features)) as statistical_summary from data_table group by entity_group
Query and analysis results
entity_group
statistical_summary
A
{ "sampleCount": 8, "vectorSize": 4, "means": [ 11.5, 12.5, 9.25, 0.0 ], "stdDevs": [ 6.87386354243376, 6.87386354243376, 7.361215932167728, 0.0 ], "variances": [ 47.25, 47.25, 54.1875, 0.0 ], "mins": [ 1.0, 2.0, 1.0, 0.0 ], "maxs": [ 22.0, 23.0, 21.0, 0.0 ], "covariance": [ [ 47.25, 47.25, 19.125, 0.0 ], [ 47.25, 47.25, 19.125, 0.0 ], [ 19.125, 19.125, 54.1875, 0.0 ], [ 0.0, 0.0, 0.0, 0.0 ] ], "correlations": [ [ 1.0, 1.0, 0.37796447300922725, 0.0 ], [ 1.0, 1.0, 0.37796447300922725, 0.0 ], [ 0.37796447300922725, 0.37796447300922725, 1.0, 0.0 ], [ 0.0, 0.0, 0.0, 1.0 ] ], "sums": [ 92.0, 100.0, 74.0, 0.0 ], "weightSum": 8.0, "sumProducts": [ [ 1436.0, 1528.0, 1004.0, 0.0 ], [ 1528.0, 1628.0, 1078.0, 0.0 ], [ 1004.0, 1078.0, 1118.0, 0.0 ], [ 0.0, 0.0, 0.0, 0.0 ] ], "isSummarized": true }
Parameters in the query and analysis results
Parameter
Description
sampleCount
The number of samples.
vectorSize
The length of the vector.
means
The average values of all vector components.
stdDevs
The standard deviation values of all vector components.
variances
The variance values of all vector components.
mins
The minimum values of all vector components.
maxs
The maximum values of all vector components.
covariance
The covariance matrix of all vector components.
correlations
The correlation coefficient matrix of all vector components.
sums
The sum of all vector components.
weightSum
The sum of all sample weights.
sumProducts
The intermediate results that are used to merge statistical patterns.
isSummarized
Indicates whether a normal multivariate pattern is returned. Valid values:
true
false
merge_summary function
The merge_summary function merges multivariate patterns that are returned by the summarize function. The multivariate patterns can be patterns that are obtained by learning from the same dataset in different stages or patterns that are obtained by learning from two independent datasets. For more information, see summarize function.
varchar merge_summary(varchar summary1, varchar summary2)
Or
varchar merge_summary(varchar summary1, double weight1, varchar summary2, double weight2)
Parameter | Description |
| A multivariate pattern that is returned by the summarize function. For more information, see summarize function. |
| The weight of the multivariate pattern specified by the summary1 parameter. |
| A multivariate pattern that is returned by the summarize function. For more information, see summarize function. |
| The weight of the multivariate pattern specified by the summary2 parameter. |
Example
Query statement
* | with data_table_01 as ( select 1 as entity_id, 'A' as entity_group, cast(array[1, 2, 3, 0] as array(double)) as features union all select 2 as entity_id, 'A' as entity_group, cast(array[4, 5, 6, 0] as array(double)) as features union all select 3 as entity_id, 'A' as entity_group, cast(array[7, 8, 9, 0] as array(double)) as features union all select 4 as entity_id, 'A' as entity_group, cast(array[10, 11, 1, 0] as array(double)) as features ), summaries_01 as ( select entity_group, summarize(array_agg(features)) as statistical_summary from data_table_01 group by entity_group ), data_table_02 as ( select 5 as entity_id, 'A' as entity_group, cast(array[13, 14, 15, 0] as array(double)) as features union all select 6 as entity_id, 'A' as entity_group, cast(array[16, 17, 18, 0] as array(double)) as features union all select 7 as entity_id, 'A' as entity_group, cast(array[19, 20, 21, 0] as array(double)) as features union all select 8 as entity_id, 'A' as entity_group, cast(array[22, 23, 1, 0] as array(double)) as features ), summaries_02 as ( select entity_group, summarize(array_agg(features)) as statistical_summary from data_table_02 group by entity_group ) select s1.entity_group, merge_summary(s1.statistical_summary, s2.statistical_summary) as statistical_summary from summaries_01 as s1 join summaries_02 as s2 on s1.entity_group = s2.entity_group
Query and analysis results
The
statistical_summary
parameter indicates the multivariate pattern that is obtained after the merge operation.entity_group
statistical_summary
2
{ "sampleCount": 8, "vectorSize": 4, "means": [ 11.5, 12.5, 9.25, 0.0 ], "stdDevs": [ 6.87386354243376, 6.87386354243376, 7.361215932167728, 0.0 ], "variances": [ 47.25, 47.25, 54.1875, 0.0 ], "mins": [ 1.0, 2.0, 1.0, 0.0 ], "maxs": [ 22.0, 23.0, 21.0, 0.0 ], "covariance": [ [ 47.25, 47.25, 19.125, 0.0 ], [ 47.25, 47.25, 19.125, 0.0 ], [ 19.125, 19.125, 54.1875, 0.0 ], [ 0.0, 0.0, 0.0, 0.0 ] ], "correlations": [ [ 1.0, 1.0, 0.37796447300922725, 0.0 ], [ 1.0, 1.0, 0.37796447300922725, 0.0 ], [ 0.37796447300922725, 0.37796447300922725, 1.0, 0.0 ], [ 0.0, 0.0, 0.0, 1.0 ] ], "sums": [ 92.0, 100.0, 74.0, 0.0 ], "weightSum": 8.0, "sumProducts": [ [ 1436.0, 1528.0, 1004.0, 0.0 ], [ 1528.0, 1628.0, 1078.0, 0.0 ], [ 1004.0, 1078.0, 1118.0, 0.0 ], [ 0.0, 0.0, 0.0, 0.0 ] ], "isSummarized": true }
Parameters in the query and analysis results
Parameter
Description
sampleCount
The number of samples.
vectorSize
The length of the vector.
means
The average values of all vector components.
stdDevs
The standard deviation values of all vector components.
variances
The variance values of all vector components.
mins
The minimum values of all vector components.
maxs
The maximum values of all vector components.
covariance
The covariance matrix of all vector components.
correlations
The correlation coefficient matrix of all vector components.
sums
The sum of all vector components.
weightSum
The sum of all sample weights.
sumProducts
The intermediate results that are used to merge statistical patterns.
isSummarized
Indicates whether a normal multivariate pattern is returned. Valid values:
true
false
normalize_vector function
The normalize_vector function normalizes a new sample vector specified by the x_vector
parameter based on a pattern specified by the summary parameter. Make sure that each component of the vector is mapped to the [0, 1] interval. You can set the summary parameter to a pattern that is returned by the summarize function. For more information, see summarize function.
array(double) normalize_vector(varchar summary, array(double) x_vector)
Parameter | Description |
| A multivariate pattern that is returned by the summarize function. For more information, see summarize function. |
| The new sample data. |
Example
Query statement
* | with data_table as ( select 1 as entity_id, 'A' as entity_group, cast(array[1, 2, 3, 0] as array(double)) as features union all select 2 as entity_id, 'A' as entity_group, cast(array[4, 5, 6, 0] as array(double)) as features union all select 3 as entity_id, 'A' as entity_group, cast(array[7, 8, 9, 0] as array(double)) as features union all select 4 as entity_id, 'A' as entity_group, cast(array[10, 11, 1, 0] as array(double)) as features union all select 5 as entity_id, 'A' as entity_group, cast(array[13, 14, 15, 0] as array(double)) as features union all select 6 as entity_id, 'A' as entity_group, cast(array[16, 17, 18, 0] as array(double)) as features union all select 7 as entity_id, 'A' as entity_group, cast(array[19, 20, 21, 0] as array(double)) as features union all select 8 as entity_id, 'A' as entity_group, cast(array[22, 23, 1, 0] as array(double)) as features ), summaries as ( select entity_group, summarize(array_agg(features)) as statistical_summary from data_table group by entity_group ) select t1.entity_id, t1.entity_group, normalize_vector(t2.statistical_summary, t1.features) as normalized_features from data_table as t1 join summaries as t2 on t1.entity_group = t2.entity_group
Query and analysis results
The
normalized_features
parameter indicates the normalization results of the sample vector specified by thex_vector
parameter.entity_id
entity_group
normalized_features
2
A
[0.14285714285714286,0.14285714285714286,0.25,0.5]
4
A
[0.42857142857142857,0.42857142857142857,0.0,0.5]
3
A
[0.2857142857142857,0.2857142857142857,0.4,0.5]
...
...
...
standardize_vector function
The standardize_vector function standardizes a new sample vector specified by the x_vector
parameter based on a pattern specified by the summary parameter. Make sure that the components of the vector have a mean of 0 and a standard deviation of 1. You can set the summary parameter to a pattern that is returned by the summarize function. For more information, see summarize function.
array(double) standardize_vector(varchar summary, array(double) x_vector)
Parameter | Description |
| A multivariate pattern that is returned by the summarize function. For more information, see summarize function. |
| The new sample data. |
Example
Query statement
* | with data_table as ( select 1 as entity_id, 'A' as entity_group, cast(array[1, 2, 3, 0] as array(double)) as features union all select 2 as entity_id, 'A' as entity_group, cast(array[4, 5, 6, 0] as array(double)) as features union all select 3 as entity_id, 'A' as entity_group, cast(array[7, 8, 9, 0] as array(double)) as features union all select 4 as entity_id, 'A' as entity_group, cast(array[10, 11, 1, 0] as array(double)) as features union all select 5 as entity_id, 'A' as entity_group, cast(array[13, 14, 15, 0] as array(double)) as features union all select 6 as entity_id, 'A' as entity_group, cast(array[16, 17, 18, 0] as array(double)) as features union all select 7 as entity_id, 'A' as entity_group, cast(array[19, 20, 21, 0] as array(double)) as features union all select 8 as entity_id, 'A' as entity_group, cast(array[22, 23, 1, 0] as array(double)) as features ), summaries as ( select entity_group, summarize(array_agg(features)) as statistical_summary from data_table group by entity_group ) select t1.entity_id, t1.entity_group, standardize_vector(t2.statistical_summary, t1.features) as standardized_features from data_table as t1 join summaries as t2 on t1.entity_group = t2.entity_group
Query and analysis results
The
standardized_features
parameter indicates the standardization results of the sample vector specified by thex_vector
parameter.entity_id
entity_group
standardized_features
2
A
[-1.0910894511799619,-1.0910894511799619,-0.4415031470273609,0.0]
4
A
[-0.21821789023599237,-0.21821789023599237,-1.1207387578386854,0.0]
3
A
[-0.6546536707079771,-0.6546536707079771,-0.03396178054056622,0.0]
...
...
...
mah_distance function
The mah_distance function calculates the Mahalanobis distance for a new sample vector specified by the x_vector
parameter based on a pattern specified by the summary parameter. You can set the summary parameter to a pattern that is returned by the summarize function. For more information, see summarize function. You can use the Mahalanobis distance to handle the scale differences between different variables in an effective manner. The Mahalanobis distance measures the distance between the sample vector specified by the x_vector
parameter after the vector is standardized and the centroid of the vector. If the Mahalanobis distance is 1, the distance between the sample vector and the centroid of the vector is equal to the average distance between all vectors and the centroid of the vectors.
double mah_distance(varchar summary, array(double) x_vector)
Parameter | Description |
| A multivariate pattern that is returned by the summarize function. For more information, see summarize function. |
| The new sample data. |
Example
Query statement
* | with data_table as ( select 1 as entity_id, 'A' as entity_group, cast(array[1, 2, 3, 0] as array(double)) as features union all select 2 as entity_id, 'A' as entity_group, cast(array[4, 5, 6, 0] as array(double)) as features union all select 3 as entity_id, 'A' as entity_group, cast(array[7, 8, 9, 0] as array(double)) as features union all select 4 as entity_id, 'A' as entity_group, cast(array[10, 11, 1, 0] as array(double)) as features union all select 5 as entity_id, 'A' as entity_group, cast(array[13, 14, 15, 0] as array(double)) as features union all select 6 as entity_id, 'A' as entity_group, cast(array[16, 17, 18, 0] as array(double)) as features union all select 7 as entity_id, 'A' as entity_group, cast(array[19, 20, 21, 0] as array(double)) as features union all select 8 as entity_id, 'A' as entity_group, cast(array[22, 23, 1, 0] as array(double)) as features ), summaries as ( select entity_group, summarize(array_agg(features)) as statistical_summary from data_table group by entity_group ) select t1.entity_id, t1.entity_group, mah_distance(t2.statistical_summary, t1.features) as std_distance from data_table as t1 join summaries as t2 on t1.entity_group = t2.entity_group
Query and analysis results
The
std_distance
parameter indicates the Mahalanobis distance of the sample vector specified by thex_vector
parameter.entity_id
entity_group
std_distance
8
A
2.386927730244857
7
A
1.6809080087793125
1
A
1.5554594371997328
...
...
...
standard_distance function
The standard_distance function calculates the standardized distance for a new sample vector specified by the metric_value
parameter based on a pattern sepcified by the summary
parameter. You can set the summary parameter to a pattern that is returned by the summarize function. For more information, see summarize function. summarize function The standardized distance is similar to the Mahalanobis distance. The Mahalanobis distance measures the standardized distance between a vector that consists of multiple metrics and the centroid of the vector. The standardized distance measures the distance between a metric of a vector and the centroid of the vector. The element_index
parameter specifies the index of the metric. The index starts from 0. The metric_value
parameter specifies the value of the metric.
double standard_distance(varchar summary, double metric_value, int element_index)
Parameter | Description |
| A multivariate pattern that is returned by the summarize function. For more information, see summarize function. |
| The metric. |
| The index of the metric in the multivariate pattern specified by the |
Example
Query statement
* | with data_table as ( select 1 as entity_id, 'A' as entity_group, cast(array[1, 2, 3, 0] as array(double)) as features union all select 2 as entity_id, 'A' as entity_group, cast(array[4, 5, 6, 0] as array(double)) as features union all select 3 as entity_id, 'A' as entity_group, cast(array[7, 8, 9, 0] as array(double)) as features union all select 4 as entity_id, 'A' as entity_group, cast(array[10, 11, 1, 0] as array(double)) as features union all select 5 as entity_id, 'A' as entity_group, cast(array[13, 14, 15, 0] as array(double)) as features union all select 6 as entity_id, 'A' as entity_group, cast(array[16, 17, 18, 0] as array(double)) as features union all select 7 as entity_id, 'A' as entity_group, cast(array[19, 20, 21, 0] as array(double)) as features union all select 8 as entity_id, 'A' as entity_group, cast(array[22, 23, 1, 0] as array(double)) as features ), summaries as ( select entity_group, summarize(array_agg(features)) as statistical_summary from data_table group by entity_group ) select t1.entity_id, t1.entity_group, standard_distance(t2.statistical_summary, 30, 1) as std_distance from data_table as t1 join summaries as t2 on t1.entity_group = t2.entity_group
Query and analysis results
The
std_distance
parameter indicates the standardized distance of the metric specified by themetric_value
parameter.entity_id
entity_group
std_distance
8
A
2.386927730244857
7
A
1.6809080087793125
1
A
1.5554594371997328
...
...
...
anomaly_level function
The anomaly_level function calculates the Mahalanobis distance for a new sample vector specified by the x_vector
parameter based on a pattern specified by the summary parameter and rounds down each distance value to obtain different levels of abnormal probabilities. You can set the summary parameter to a pattern that is returned by the summarize function. For more information, see summarize function. If the return value is 0.1, the probability that the new sample vector is abnormal is 10%. This value indicates first-level anomalies. If the return value is 0.01, the probability that the new sample vector is abnormal is 1%. This value indicates second-level anomalies. If the return value is 0.001, the probability that the new sample vector is abnormal is 0.1%. This value indicates third-level anomalies. If the return value is 0.0001, the probability that the new sample vector is abnormal is 0.01%. This value indicates fourth-level anomalies.An increase in the level of anomalies indicates a decrease in the abnormal probability of the new sample vector and an increase in the suspiciability of the vector. In most cases, you can configure a threshold to filter anomalies based on the anomaly detection results. For example, you can configure a threshold to retain only fourth-level anomalies and higher.
If you specify a value for the element_index
parameter, the function returns a probability only for the metric with the specified index of the specified vector. If you do not configure the element_index parameter, the function returns a probability for each metric of the specified vector.
double anomaly_level(varchar summary, array(double) x_vector)
Or
double anomaly_level(varchar summary, array(double) x_vector, int element_index)
Parameter | Description |
| A multivariate pattern that is returned by the summarize function. For more information, see summarize function. |
| The new sample data. |
| Optional. The index. |
Example
Query statement
* | with dummy as ( select sequence(1, 1000) as seq_data, count(*) as record_count from log ), sample_data as ( select 'G1' as group_id, s.seq_num, -- Generate 1,000 two-dimensional random vectors that are distributed around the (100, 5000) range. The standard deviation values of two components are 20 and 500. inverse_normal_cdf(100, 20, random()) as x1, inverse_normal_cdf(5000, 500, rand()) as x2 from dummy, unnest(seq_data) as s(seq_num) ), data_summary as ( select group_id, summarize(array_agg(array[x1, x2])) as metric_summary from sample_data group by group_id ), new_data as ( select 'G1' as group_id, 1001 as object_id, 100.0 as x1, 5000.0 as x2 union all select 'G1' as group_id, 1002 as object_id, 118.0 as x1, 5450.0 as x2 union all select 'G1' as group_id, 1003 as object_id, 138.0 as x1, 5950.0 as x2 union all select 'G1' as group_id, 1004 as object_id, 158.0 as x1, 6450.0 as x2 union all select 'G1' as group_id, 1005 as object_id, 178.0 as x1, 6950.0 as x2 union all select 'G1' as group_id, 1006 as object_id, 198.0 as x1, 7450.0 as x2 union all select 'G1' as group_id, 1007 as object_id, 318.0 as x1, 10000.0 as x2 ) select n.group_id, json_extract(s.metric_summary, '$.means') as metric_vector_mean, json_extract(s.metric_summary, '$.covariance') as metric_covariance, n.object_id, n.x1, n.x2, anomaly_level(s.metric_summary, array[x1, x2]) as anomaly_level from data_summary as s join new_data as n on s.group_id = n.group_id order by n.group_id, n.object_id limit 100000
Query and analysis results
The
anomaly_level
parameter indicates the abnormal probability of the sample vector specified by thex_vector
parameter.group_id
object_id
anomaly_level
G1
1007
13.0
G1
1006
5.0
G1
1005
4.0
...
...
...