Multivariate pattern identification functions

Simple Log Service provides the anomaly detection feature to identify anomalies in service systems and the root causes of the anomalies. The anomaly detection feature can automatically identify abnormal metric changes based on the current pattern of a metric and machine learning. You can use multivariate pattern identification functions to identify metric patterns. This topic describes the multivariate pattern identification functions.

Functions

Function	Syntax	Description	Data type of the return value

Function	Syntax	Description	Data type of the return value
summarize function	summarize(array(array(double)) data_samples) summarize(array(array(double)) data_samples, array(double) weights)	Identifies and returns a multivariate pattern based on the specified samples and sample weights. The sample weights are optional. Statistical patterns cover a variety of statistics and joint statistics, such as a mean, standard deviation, and covariance matrix.	varchar
merge_summary function	merge_summary(varchar summary1, varchar summary2) merge_summary(varchar summary1, double weight1, varchar summary2, double weight2)	Merges multivariate patterns that are returned by the summarize function. The multivariate patterns can be patterns that are obtained by learning from the same dataset in different stages or patterns that are obtained by learning from two independent datasets. For more information, see summarize function.	varchar
normalize_vector function	normalize_vector(varchar summary, array(double) x_vector)	Normalizes a new sample vector specified by the `x_vector` parameter based on a pattern specified by the summary parameter. Make sure that each component of the vector is mapped to the [0, 1] interval. You can set the summary parameter to a pattern that is returned by the summarize function. For more information, see summarize function.	array(double)
standardize_vector function	standardize_vector(varchar summary, array(double) x_vector)	Standardizes a new sample vector specified by the `x_vector` parameter based on a pattern specified by the summary parameter. Make sure that the components of the vector have a mean of 0 and a standard deviation of 1. You can set the summary parameter to a pattern that is returned by the summarize function. For more information, see summarize function.	array(double)
mah_distance function	mah_distance(varchar summary, array(double) x_vector)	Calculates the Mahalanobis distance for a new sample vector specified by the `x_vector` parameter based on a pattern specified by the summary parameter. You can set the summary parameter to a pattern that is returned by the summarize function. For more information, see summarize function. You can use the Mahalanobis distance to handle the scale differences between different variables in an effective manner. The Mahalanobis distance measures the distance between the sample vector specified by the `x_vector` parameter after the vector is standardized and the centroid of the vector. If the Mahalanobis distance is 1, the distance between the sample vector and the centroid of the vector is equal to the average distance between all vectors and the centroid of the vectors.	double
standard_distance function	standard_distance(varchar summary, double metric_value, int element_index)	Calculates the standardized distance for a metric specified by the `metric_value` parameter based on a pattern specified by the `summary` parameter. You can set the summary parameter to a pattern that is returned by the summarize function. For more information, see summarize function. The standardized distance is similar to the Mahalanobis distance. The Mahalanobis distance measures the standardized distance between a vector that consists of multiple metrics and the centroid of the vector. The standardized distance measures the distance between a metric of a vector and the centroid of the vector. The `element_index` parameter specifies the index of the metric. The `metric_value` parameter specifies the value of the metric.	double
anomaly_level function	anomaly_level(varchar summary, array(double) x_vector) anomaly_level(varchar summary, array(double) x_vector, int element_index)	Calculates the Mahalanobis distance for a new sample vector specified by the `x_vector` parameter based on a pattern specified by the summary parameter and rounds down each distance value to obtain different levels of abnormal probabilities. You can set the summary parameter to a pattern that is returned by the summarize function. For more information, see summarize function. If the return value is 0.1, the probability that the new sample vector is abnormal is 10%. This value indicates first-level anomalies. If the return value is 0.01, the probability that the new sample vector is abnormal is 1%. This value indicates second-level anomalies. If the return value is 0.001, the probability that the new sample vector is abnormal is 0.1%. This value indicates third-level anomalies. If the return value is 0.0001, the probability that the new sample vector is abnormal is 0.01%. This value indicates fourth-level anomalies.An increase in the level of anomalies indicates a decrease in the abnormal probability of the new sample vector and an increase in the suspiciability of the vector. In most cases, you can configure a threshold to filter anomalies based on the anomaly detection results. For example, you can configure a threshold to retain only fourth-level anomalies and higher. If you specify a value for the `element_index` parameter, the function returns a probability only for the metric with the specified index of the specified vector. If you do not configure the element_index parameter, the function returns a probability for each metric of the specified vector.	array(double)

summarize function

The summarize function identifies and returns a multivariate pattern based on the specified samples and sample weights. The sample weights are optional. Statistical patterns cover a variety of statistics and joint statistics, such as a mean, standard deviation, and covariance matrix.

varchar summarize(array(array(double)) data_samples)

varchar summarize(array(array(double)) data_samples, array(double) weights)

Parameter	Description

Parameter	Description
`data_samples`	A two-dimensional array. The array can be used as a two-dimensional table. Each column specifies a variable. Each row specifies the variable values of a sample.
`weights`	Optional. The sample weight. The value is a one-dimensional array in the same length as one of the one-dimensional arrays that constitute the two-dimensional array specified by the `data_samples` parameter. If you do not specify a value for this parameter, the same weight is used for all samples.

Example

Query statement

* | with data_table as
(
    select 1 as entity_id, 'A' as entity_group, cast(array[1, 2, 3, 0] as array(double)) as features union all
    select 2 as entity_id, 'A' as entity_group, cast(array[4, 5, 6, 0] as array(double)) as features union all
    select 3 as entity_id, 'A' as entity_group, cast(array[7, 8, 9, 0] as array(double)) as features union all
    select 4 as entity_id, 'A' as entity_group, cast(array[10, 11, 1, 0] as array(double)) as features union all
    select 5 as entity_id, 'A' as entity_group, cast(array[13, 14, 15, 0] as array(double)) as features union all
    select 6 as entity_id, 'A' as entity_group, cast(array[16, 17, 18, 0] as array(double)) as features union all
    select 7 as entity_id, 'A' as entity_group, cast(array[19, 20, 21, 0] as array(double)) as features union all
    select 8 as entity_id, 'A' as entity_group, cast(array[22, 23, 1, 0] as array(double)) as features
)
    select entity_group,
        summarize(array_agg(features)) as statistical_summary
    from data_table
    group by entity_group

Query and analysis results

entity_group

statistical_summary

{
  "sampleCount": 8,
  "vectorSize": 4,
  "means": [
    11.5,
    12.5,
    9.25,
    0.0
  ],
  "stdDevs": [
    6.87386354243376,
    6.87386354243376,
    7.361215932167728,
    0.0
  ],
  "variances": [
    47.25,
    47.25,
    54.1875,
    0.0
  ],
  "mins": [
    1.0,
    2.0,
    1.0,
    0.0
  ],
  "maxs": [
    22.0,
    23.0,
    21.0,
    0.0
  ],
  "covariance": [
    [
      47.25,
      47.25,
      19.125,
      0.0
    ],
    [
      47.25,
      47.25,
      19.125,
      0.0
    ],
    [
      19.125,
      19.125,
      54.1875,
      0.0
    ],
    [
      0.0,
      0.0,
      0.0,
      0.0
    ]
  ],
  "correlations": [
    [
      1.0,
      1.0,
      0.37796447300922725,
      0.0
    ],
    [
      1.0,
      1.0,
      0.37796447300922725,
      0.0
    ],
    [
      0.37796447300922725,
      0.37796447300922725,
      1.0,
      0.0
    ],
    [
      0.0,
      0.0,
      0.0,
      1.0
    ]
  ],
  "sums": [
    92.0,
    100.0,
    74.0,
    0.0
  ],
  "weightSum": 8.0,
  "sumProducts": [
    [
      1436.0,
      1528.0,
      1004.0,
      0.0
    ],
    [
      1528.0,
      1628.0,
      1078.0,
      0.0
    ],
    [
      1004.0,
      1078.0,
      1118.0,
      0.0
    ],
    [
      0.0,
      0.0,
      0.0,
      0.0
    ]
  ],
  "isSummarized": true
}

Parameters in the query and analysis results

Parameter	Description

Parameter	Description
`sampleCount`	The number of samples.
`vectorSize`	The length of the vector.
`means`	The average values of all vector components.
`stdDevs`	The standard deviation values of all vector components.
`variances`	The variance values of all vector components.
`mins`	The minimum values of all vector components.
`maxs`	The maximum values of all vector components.
`covariance`	The covariance matrix of all vector components.
`correlations`	The correlation coefficient matrix of all vector components.
`sums`	The sum of all vector components.
`weightSum`	The sum of all sample weights.
`sumProducts`	The intermediate results that are used to merge statistical patterns.
`isSummarized`	Indicates whether a normal multivariate pattern is returned. Valid values: true false

merge_summary function

The merge_summary function merges multivariate patterns that are returned by the summarize function. The multivariate patterns can be patterns that are obtained by learning from the same dataset in different stages or patterns that are obtained by learning from two independent datasets. For more information, see summarize function.

varchar merge_summary(varchar summary1, varchar summary2)

varchar merge_summary(varchar summary1, double weight1, varchar summary2, double weight2)

Parameter	Description

Parameter	Description
`summary1`	A multivariate pattern that is returned by the summarize function. For more information, see summarize function.
`weight1`	The weight of the multivariate pattern specified by the summary1 parameter.
`summary2`	A multivariate pattern that is returned by the summarize function. For more information, see summarize function.
`weight2`	The weight of the multivariate pattern specified by the summary2 parameter.

Example

Query statement

* | with data_table_01 as
(
    select 1 as entity_id, 'A' as entity_group, cast(array[1, 2, 3, 0] as array(double)) as features union all
    select 2 as entity_id, 'A' as entity_group, cast(array[4, 5, 6, 0] as array(double)) as features union all
    select 3 as entity_id, 'A' as entity_group, cast(array[7, 8, 9, 0] as array(double)) as features union all
    select 4 as entity_id, 'A' as entity_group, cast(array[10, 11, 1, 0] as array(double)) as features
),

summaries_01 as
(
    select entity_group,
        summarize(array_agg(features)) as statistical_summary
    from data_table_01
    group by entity_group
),

data_table_02 as
(
    select 5 as entity_id, 'A' as entity_group, cast(array[13, 14, 15, 0] as array(double)) as features union all
    select 6 as entity_id, 'A' as entity_group, cast(array[16, 17, 18, 0] as array(double)) as features union all
    select 7 as entity_id, 'A' as entity_group, cast(array[19, 20, 21, 0] as array(double)) as features union all
    select 8 as entity_id, 'A' as entity_group, cast(array[22, 23, 1, 0] as array(double)) as features
),

summaries_02 as
(
    select entity_group,
        summarize(array_agg(features)) as statistical_summary
    from data_table_02
    group by entity_group
)
select s1.entity_group,
    merge_summary(s1.statistical_summary, s2.statistical_summary) as statistical_summary
from summaries_01 as s1
    join summaries_02 as s2
    on s1.entity_group = s2.entity_group

Query and analysis results

The statistical_summary parameter indicates the multivariate pattern that is obtained after the merge operation.

entity_group	statistical_summary

entity_group

statistical_summary

{
  "sampleCount": 8,
  "vectorSize": 4,
  "means": [
    11.5,
    12.5,
    9.25,
    0.0
  ],
  "stdDevs": [
    6.87386354243376,
    6.87386354243376,
    7.361215932167728,
    0.0
  ],
  "variances": [
    47.25,
    47.25,
    54.1875,
    0.0
  ],
  "mins": [
    1.0,
    2.0,
    1.0,
    0.0
  ],
  "maxs": [
    22.0,
    23.0,
    21.0,
    0.0
  ],
  "covariance": [
    [
      47.25,
      47.25,
      19.125,
      0.0
    ],
    [
      47.25,
      47.25,
      19.125,
      0.0
    ],
    [
      19.125,
      19.125,
      54.1875,
      0.0
    ],
    [
      0.0,
      0.0,
      0.0,
      0.0
    ]
  ],
  "correlations": [
    [
      1.0,
      1.0,
      0.37796447300922725,
      0.0
    ],
    [
      1.0,
      1.0,
      0.37796447300922725,
      0.0
    ],
    [
      0.37796447300922725,
      0.37796447300922725,
      1.0,
      0.0
    ],
    [
      0.0,
      0.0,
      0.0,
      1.0
    ]
  ],
  "sums": [
    92.0,
    100.0,
    74.0,
    0.0
  ],
  "weightSum": 8.0,
  "sumProducts": [
    [
      1436.0,
      1528.0,
      1004.0,
      0.0
    ],
    [
      1528.0,
      1628.0,
      1078.0,
      0.0
    ],
    [
      1004.0,
      1078.0,
      1118.0,
      0.0
    ],
    [
      0.0,
      0.0,
      0.0,
      0.0
    ]
  ],
  "isSummarized": true
}

Parameters in the query and analysis results

Parameter	Description

Parameter	Description
`sampleCount`	The number of samples.
`vectorSize`	The length of the vector.
`means`	The average values of all vector components.
`stdDevs`	The standard deviation values of all vector components.
`variances`	The variance values of all vector components.
`mins`	The minimum values of all vector components.
`maxs`	The maximum values of all vector components.
`covariance`	The covariance matrix of all vector components.
`correlations`	The correlation coefficient matrix of all vector components.
`sums`	The sum of all vector components.
`weightSum`	The sum of all sample weights.
`sumProducts`	The intermediate results that are used to merge statistical patterns.
`isSummarized`	Indicates whether a normal multivariate pattern is returned. Valid values: true false

normalize_vector function

The normalize_vector function normalizes a new sample vector specified by the x_vector parameter based on a pattern specified by the summary parameter. Make sure that each component of the vector is mapped to the [0, 1] interval. You can set the summary parameter to a pattern that is returned by the summarize function. For more information, see summarize function.

array(double) normalize_vector(varchar summary, array(double) x_vector)

Parameter	Description

Parameter	Description
`summary`	A multivariate pattern that is returned by the summarize function. For more information, see summarize function.
`x_vector`	The new sample data.

Example

Query statement

* | with data_table as
(
    select 1 as entity_id, 'A' as entity_group, cast(array[1, 2, 3, 0] as array(double)) as features union all
    select 2 as entity_id, 'A' as entity_group, cast(array[4, 5, 6, 0] as array(double)) as features union all
    select 3 as entity_id, 'A' as entity_group, cast(array[7, 8, 9, 0] as array(double)) as features union all
    select 4 as entity_id, 'A' as entity_group, cast(array[10, 11, 1, 0] as array(double)) as features union all
    select 5 as entity_id, 'A' as entity_group, cast(array[13, 14, 15, 0] as array(double)) as features union all
    select 6 as entity_id, 'A' as entity_group, cast(array[16, 17, 18, 0] as array(double)) as features union all
    select 7 as entity_id, 'A' as entity_group, cast(array[19, 20, 21, 0] as array(double)) as features union all
    select 8 as entity_id, 'A' as entity_group, cast(array[22, 23, 1, 0] as array(double)) as features
),

summaries as
(
    select entity_group,
        summarize(array_agg(features)) as statistical_summary
    from data_table
    group by entity_group
)

select t1.entity_id,
    t1.entity_group,
    normalize_vector(t2.statistical_summary, t1.features) as normalized_features
from data_table as t1
    join summaries as t2
    on t1.entity_group = t2.entity_group

Query and analysis results

The normalized_features parameter indicates the normalization results of the sample vector specified by the x_vector parameter.

entity_id	entity_group	normalized_features

entity_id	entity_group	normalized_features
2	A	[0.14285714285714286,0.14285714285714286,0.25,0.5]
4	A	[0.42857142857142857,0.42857142857142857,0.0,0.5]
3	A	[0.2857142857142857,0.2857142857142857,0.4,0.5]
...	...	...

standardize_vector function

The standardize_vector function standardizes a new sample vector specified by the x_vector parameter based on a pattern specified by the summary parameter. Make sure that the components of the vector have a mean of 0 and a standard deviation of 1. You can set the summary parameter to a pattern that is returned by the summarize function. For more information, see summarize function.

array(double) standardize_vector(varchar summary, array(double) x_vector)

Parameter	Description

Parameter	Description
`summary`	A multivariate pattern that is returned by the summarize function. For more information, see summarize function.
`x_vector`	The new sample data.

Example

Query statement

* | with data_table as
(
    select 1 as entity_id, 'A' as entity_group, cast(array[1, 2, 3, 0] as array(double)) as features union all
    select 2 as entity_id, 'A' as entity_group, cast(array[4, 5, 6, 0] as array(double)) as features union all
    select 3 as entity_id, 'A' as entity_group, cast(array[7, 8, 9, 0] as array(double)) as features union all
    select 4 as entity_id, 'A' as entity_group, cast(array[10, 11, 1, 0] as array(double)) as features union all
    select 5 as entity_id, 'A' as entity_group, cast(array[13, 14, 15, 0] as array(double)) as features union all
    select 6 as entity_id, 'A' as entity_group, cast(array[16, 17, 18, 0] as array(double)) as features union all
    select 7 as entity_id, 'A' as entity_group, cast(array[19, 20, 21, 0] as array(double)) as features union all
    select 8 as entity_id, 'A' as entity_group, cast(array[22, 23, 1, 0] as array(double)) as features
),

summaries as
(
    select entity_group,
        summarize(array_agg(features)) as statistical_summary
    from data_table
    group by entity_group
)

select t1.entity_id,
    t1.entity_group,
    standardize_vector(t2.statistical_summary, t1.features) as standardized_features
from data_table as t1
    join summaries as t2
    on t1.entity_group = t2.entity_group

Query and analysis results

The standardized_features parameter indicates the standardization results of the sample vector specified by the x_vector parameter.

entity_id	entity_group	standardized_features

entity_id	entity_group	standardized_features
2	A	[-1.0910894511799619,-1.0910894511799619,-0.4415031470273609,0.0]
4	A	[-0.21821789023599237,-0.21821789023599237,-1.1207387578386854,0.0]
3	A	[-0.6546536707079771,-0.6546536707079771,-0.03396178054056622,0.0]
...	...	...

mah_distance function

The mah_distance function calculates the Mahalanobis distance for a new sample vector specified by the x_vector parameter based on a pattern specified by the summary parameter. You can set the summary parameter to a pattern that is returned by the summarize function. For more information, see summarize function. You can use the Mahalanobis distance to handle the scale differences between different variables in an effective manner. The Mahalanobis distance measures the distance between the sample vector specified by the x_vector parameter after the vector is standardized and the centroid of the vector. If the Mahalanobis distance is 1, the distance between the sample vector and the centroid of the vector is equal to the average distance between all vectors and the centroid of the vectors.

double mah_distance(varchar summary, array(double) x_vector)

Parameter	Description

Parameter	Description
`summary`	A multivariate pattern that is returned by the summarize function. For more information, see summarize function.
`x_vector`	The new sample data.

Example

Query statement

* | with data_table as
(
    select 1 as entity_id, 'A' as entity_group, cast(array[1, 2, 3, 0] as array(double)) as features union all
    select 2 as entity_id, 'A' as entity_group, cast(array[4, 5, 6, 0] as array(double)) as features union all
    select 3 as entity_id, 'A' as entity_group, cast(array[7, 8, 9, 0] as array(double)) as features union all
    select 4 as entity_id, 'A' as entity_group, cast(array[10, 11, 1, 0] as array(double)) as features union all
    select 5 as entity_id, 'A' as entity_group, cast(array[13, 14, 15, 0] as array(double)) as features union all
    select 6 as entity_id, 'A' as entity_group, cast(array[16, 17, 18, 0] as array(double)) as features union all
    select 7 as entity_id, 'A' as entity_group, cast(array[19, 20, 21, 0] as array(double)) as features union all
    select 8 as entity_id, 'A' as entity_group, cast(array[22, 23, 1, 0] as array(double)) as features
),

summaries as
(
    select entity_group,
        summarize(array_agg(features)) as statistical_summary
    from data_table
    group by entity_group
)

select t1.entity_id,
    t1.entity_group,
    mah_distance(t2.statistical_summary, t1.features) as std_distance
from data_table as t1
    join summaries as t2
    on t1.entity_group = t2.entity_group

Query and analysis results
The std_distance parameter indicates the Mahalanobis distance of the sample vector specified by the x_vector parameter.
entity_id
entity_group
std_distance
entity_id
entity_group
std_distance
8
A
2.386927730244857
7
A
1.6809080087793125
1
A
1.5554594371997328
...
...
...

standard_distance function

The standard_distance function calculates the standardized distance for a new sample vector specified by the metric_value parameter based on a pattern sepcified by the summary parameter. You can set the summary parameter to a pattern that is returned by the summarize function. For more information, see summarize function. summarize function The standardized distance is similar to the Mahalanobis distance. The Mahalanobis distance measures the standardized distance between a vector that consists of multiple metrics and the centroid of the vector. The standardized distance measures the distance between a metric of a vector and the centroid of the vector. The element_index parameter specifies the index of the metric. The index starts from 0. The metric_value parameter specifies the value of the metric.

double standard_distance(varchar summary, double metric_value, int element_index)

Parameter	Description

Parameter	Description
`summary`	A multivariate pattern that is returned by the summarize function. For more information, see summarize function.
`metric_value`	The metric.
`element_index`	The index of the metric in the multivariate pattern specified by the `summary` parameter. The index starts from 0.

Example

Query statement

* | with data_table as
(
    select 1 as entity_id, 'A' as entity_group, cast(array[1, 2, 3, 0] as array(double)) as features union all
    select 2 as entity_id, 'A' as entity_group, cast(array[4, 5, 6, 0] as array(double)) as features union all
    select 3 as entity_id, 'A' as entity_group, cast(array[7, 8, 9, 0] as array(double)) as features union all
    select 4 as entity_id, 'A' as entity_group, cast(array[10, 11, 1, 0] as array(double)) as features union all
    select 5 as entity_id, 'A' as entity_group, cast(array[13, 14, 15, 0] as array(double)) as features union all
    select 6 as entity_id, 'A' as entity_group, cast(array[16, 17, 18, 0] as array(double)) as features union all
    select 7 as entity_id, 'A' as entity_group, cast(array[19, 20, 21, 0] as array(double)) as features union all
    select 8 as entity_id, 'A' as entity_group, cast(array[22, 23, 1, 0] as array(double)) as features
),

summaries as
(
    select entity_group,
        summarize(array_agg(features)) as statistical_summary
    from data_table
    group by entity_group
)

select t1.entity_id,
    t1.entity_group,
    standard_distance(t2.statistical_summary, 30, 1) as std_distance
from data_table as t1
    join summaries as t2
    on t1.entity_group = t2.entity_group

Query and analysis results
The std_distance parameter indicates the standardized distance of the metric specified by the metric_value parameter.
entity_id
entity_group
std_distance
entity_id
entity_group
std_distance
8
A
2.386927730244857
7
A
1.6809080087793125
1
A
1.5554594371997328
...
...
...

anomaly_level function

The anomaly_level function calculates the Mahalanobis distance for a new sample vector specified by the x_vector parameter based on a pattern specified by the summary parameter and rounds down each distance value to obtain different levels of abnormal probabilities. You can set the summary parameter to a pattern that is returned by the summarize function. For more information, see summarize function. If the return value is 0.1, the probability that the new sample vector is abnormal is 10%. This value indicates first-level anomalies. If the return value is 0.01, the probability that the new sample vector is abnormal is 1%. This value indicates second-level anomalies. If the return value is 0.001, the probability that the new sample vector is abnormal is 0.1%. This value indicates third-level anomalies. If the return value is 0.0001, the probability that the new sample vector is abnormal is 0.01%. This value indicates fourth-level anomalies.An increase in the level of anomalies indicates a decrease in the abnormal probability of the new sample vector and an increase in the suspiciability of the vector. In most cases, you can configure a threshold to filter anomalies based on the anomaly detection results. For example, you can configure a threshold to retain only fourth-level anomalies and higher.

If you specify a value for the element_index parameter, the function returns a probability only for the metric with the specified index of the specified vector. If you do not configure the element_index parameter, the function returns a probability for each metric of the specified vector.

double anomaly_level(varchar summary, array(double) x_vector)

double anomaly_level(varchar summary, array(double) x_vector, int element_index)

Parameter	Description

Parameter	Description
`summary`	A multivariate pattern that is returned by the summarize function. For more information, see summarize function.
`x_vector`	The new sample data.
`element_index`	Optional. The index.

Example

Query statement

* |
with dummy as
(
    select sequence(1, 1000) as seq_data,
        count(*) as record_count from log
),

sample_data as
(
    select 'G1' as group_id,
        s.seq_num,
         -- Generate 1,000 two-dimensional random vectors that are distributed around the (100, 5000) range. The standard deviation values of two components are 20 and 500.
        inverse_normal_cdf(100, 20, random()) as x1,
        inverse_normal_cdf(5000, 500, rand()) as x2
    from dummy,
        unnest(seq_data) as s(seq_num)
),

data_summary as
(
    select group_id,
        summarize(array_agg(array[x1, x2])) as metric_summary
    from sample_data
    group by group_id
),

new_data as
(
    select 'G1' as group_id, 1001 as object_id, 100.0 as x1, 5000.0 as x2
    union all select 'G1' as group_id, 1002 as object_id, 118.0 as x1, 5450.0 as x2
    union all select 'G1' as group_id, 1003 as object_id, 138.0 as x1, 5950.0 as x2
    union all select 'G1' as group_id, 1004 as object_id, 158.0 as x1, 6450.0 as x2
    union all select 'G1' as group_id, 1005 as object_id, 178.0 as x1, 6950.0 as x2
    union all select 'G1' as group_id, 1006 as object_id, 198.0 as x1, 7450.0 as x2
    union all select 'G1' as group_id, 1007 as object_id, 318.0 as x1, 10000.0 as x2
)

select n.group_id,
    json_extract(s.metric_summary, '$.means') as metric_vector_mean,
    json_extract(s.metric_summary, '$.covariance') as metric_covariance,
    n.object_id,
    n.x1,
    n.x2,
    anomaly_level(s.metric_summary, array[x1, x2]) as anomaly_level
from data_summary as s
    join new_data as n
        on s.group_id = n.group_id
order by n.group_id, n.object_id
limit 100000

Query and analysis results
The anomaly_level parameter indicates the abnormal probability of the sample vector specified by the x_vector parameter.
group_id
object_id
anomaly_level
group_id
object_id
anomaly_level
G1
1007
13.0
G1
1006
5.0
G1
1005
4.0
...
...
...

Functions

summarize function

Example

merge_summary function

Example

normalize_vector function

Example

standardize_vector function

Example

mah_distance function

Example

standard_distance function

Example

anomaly_level function

Example

Sales Support

Technical Support

Connect & Report Abuse

group_id	object_id	anomaly_level
G1	1007	13.0
G1	1006	5.0
G1	1005	4.0
...	...	...

About Alibaba Cloud

Our Global Network

Quick Start

Global Offices

Olympic Games Paris 2024 New

Stade Roland Garros – Glitz from the Past New

Place de la Concorde – “Breaking” the Barriers New

Vaires-sur-Marne Nautical Stadium – Sports with Sustainability New

International Broadcast Center – Images, Sounds, and Data that Captivate Billions New

Customer Success Stories New

Trust Center

Security & Compliance Center

Cloud Compliance Resources

Security Compliance FAQs

Product & Feature Update New

Cloud Forward

Press Room

Alibaba Cloud e-Magazine New

Alibaba Cloud in Analyst Research

Notice

Go Global Service New

Go Global Alliance with Alibaba Cloud

Asia Accelerator Hot

Information Compliance

China Gateway - MLPS 2.0 Compliance New

China Gateway - Networking

China Gateway - Global Application Acceleration New

China Gateway - Security

China Gateway - Data Security New

ICP Support Hot

China Gateway - Omnichannel Data Mid-End New

China Gateway - Organizational Data Mid-End New

China Gateway - Business Mid-End New

China Gateway - AI Service for Conversational Chatbots New

China Gateway - Online Education

China Gateway - Domain Registration

Work at Alibaba Cloud

Experienced Professionals

Students and Graduates

Free Trial

Pricing

Promo Center

Price Reduction

Pay Less and Deploy More

FinOps

Elastic Compute Service (ECS)

Simple Application Server (SAS)

Elastic GPU Service

Elastic Desktop Service (EDS)

Object Storage Service (OSS)

Cloud Enterprise Network (CEN)

Web Application Firewall (WAF)

Domain Names

Container Compute Service (ACS)

Secure Access Service Edge (SASE)

Intelligent Media Services(IMS)

Edge Security Acceleration (ESA)(Original DCDN)

Intelligent Media Management

DingTalk Enterprise

YiDA

Alibaba Cloud Model Studio

Apsara Prime - For Easy Cloud Product Selection

Alibaba Cloud ECS - Cater All Your Cloud Hosting Needs

1TB CDN—Get Free 1 TB Outbound Traffic Plan Now

Security—Under Attack? Get Free Security Support

Short Message Service - Free Testing is Available

Elastic Compute Service (ECS) Hot

CloudBox

Compute Nest

Dedicated Host Hot

ECS Bare Metal Instance

Elastic GPU Service Featured

Simple Application Server (SAS) Hot

Auto Scaling

Cloud Phone Beta

Elastic Desktop Service (EDS) Featured

Batch Compute

Elastic High Performance Computing (E-HPC)

Super Computing Cluster (SCC)

Function Compute (FC)