All Products
Search
Document Center

Simple Log Service:Multivariate pattern identification functions

Last Updated:Oct 31, 2024

Simple Log Service provides the anomaly detection feature to identify anomalies in service systems and the root causes of the anomalies. The anomaly detection feature can automatically identify abnormal metric changes based on the current pattern of a metric and machine learning. You can use multivariate pattern identification functions to identify metric patterns. This topic describes the multivariate pattern identification functions.

Functions

Function

Syntax

Description

Data type of the return value

summarize function

  • summarize(array(array(double)) data_samples)

  • summarize(array(array(double)) data_samples, array(double) weights)

Identifies and returns a multivariate pattern based on the specified samples and sample weights. The sample weights are optional. Statistical patterns cover a variety of statistics and joint statistics, such as a mean, standard deviation, and covariance matrix.

varchar

merge_summary function

  • merge_summary(varchar summary1, varchar summary2)

  • merge_summary(varchar summary1, double weight1, varchar summary2, double weight2)

Merges multivariate patterns that are returned by the summarize function. The multivariate patterns can be patterns that are obtained by learning from the same dataset in different stages or patterns that are obtained by learning from two independent datasets. For more information, see summarize function.

varchar

normalize_vector function

normalize_vector(varchar summary, array(double) x_vector)

Normalizes a new sample vector specified by the x_vector parameter based on a pattern specified by the summary parameter. Make sure that each component of the vector is mapped to the [0, 1] interval. You can set the summary parameter to a pattern that is returned by the summarize function. For more information, see summarize function.

array(double)

standardize_vector function

standardize_vector(varchar summary, array(double) x_vector)

Standardizes a new sample vector specified by the x_vector parameter based on a pattern specified by the summary parameter. Make sure that the components of the vector have a mean of 0 and a standard deviation of 1. You can set the summary parameter to a pattern that is returned by the summarize function. For more information, see summarize function.

array(double)

mah_distance function

mah_distance(varchar summary, array(double) x_vector)

Calculates the Mahalanobis distance for a new sample vector specified by the x_vector parameter based on a pattern specified by the summary parameter. You can set the summary parameter to a pattern that is returned by the summarize function. For more information, see summarize function. You can use the Mahalanobis distance to handle the scale differences between different variables in an effective manner. The Mahalanobis distance measures the distance between the sample vector specified by the x_vector parameter after the vector is standardized and the centroid of the vector. If the Mahalanobis distance is 1, the distance between the sample vector and the centroid of the vector is equal to the average distance between all vectors and the centroid of the vectors.

double

standard_distance function

standard_distance(varchar summary, double metric_value, int element_index)

Calculates the standardized distance for a metric specified by the metric_value parameter based on a pattern specified by the summary parameter. You can set the summary parameter to a pattern that is returned by the summarize function. For more information, see summarize function. The standardized distance is similar to the Mahalanobis distance. The Mahalanobis distance measures the standardized distance between a vector that consists of multiple metrics and the centroid of the vector. The standardized distance measures the distance between a metric of a vector and the centroid of the vector. The element_index parameter specifies the index of the metric. The metric_value parameter specifies the value of the metric.

double

anomaly_level function

  • anomaly_level(varchar summary, array(double) x_vector)

  • anomaly_level(varchar summary, array(double) x_vector, int element_index)

Calculates the Mahalanobis distance for a new sample vector specified by the x_vector parameter based on a pattern specified by the summary parameter and rounds down each distance value to obtain different levels of abnormal probabilities. You can set the summary parameter to a pattern that is returned by the summarize function. For more information, see summarize function. If the return value is 0.1, the probability that the new sample vector is abnormal is 10%. This value indicates first-level anomalies. If the return value is 0.01, the probability that the new sample vector is abnormal is 1%. This value indicates second-level anomalies. If the return value is 0.001, the probability that the new sample vector is abnormal is 0.1%. This value indicates third-level anomalies. If the return value is 0.0001, the probability that the new sample vector is abnormal is 0.01%. This value indicates fourth-level anomalies.An increase in the level of anomalies indicates a decrease in the abnormal probability of the new sample vector and an increase in the suspiciability of the vector. In most cases, you can configure a threshold to filter anomalies based on the anomaly detection results. For example, you can configure a threshold to retain only fourth-level anomalies and higher.

If you specify a value for the element_index parameter, the function returns a probability only for the metric with the specified index of the specified vector. If you do not configure the element_index parameter, the function returns a probability for each metric of the specified vector.

array(double)

summarize function

The summarize function identifies and returns a multivariate pattern based on the specified samples and sample weights. The sample weights are optional. Statistical patterns cover a variety of statistics and joint statistics, such as a mean, standard deviation, and covariance matrix.

varchar summarize(array(array(double)) data_samples)

Or

varchar summarize(array(array(double)) data_samples, array(double) weights)

Parameter

Description

data_samples

A two-dimensional array. The array can be used as a two-dimensional table. Each column specifies a variable. Each row specifies the variable values of a sample.

weights

Optional. The sample weight. The value is a one-dimensional array in the same length as one of the one-dimensional arrays that constitute the two-dimensional array specified by the data_samples parameter. If you do not specify a value for this parameter, the same weight is used for all samples.

Example

  • Query statement

    * | with data_table as
    (
        select 1 as entity_id, 'A' as entity_group, cast(array[1, 2, 3, 0] as array(double)) as features union all
        select 2 as entity_id, 'A' as entity_group, cast(array[4, 5, 6, 0] as array(double)) as features union all
        select 3 as entity_id, 'A' as entity_group, cast(array[7, 8, 9, 0] as array(double)) as features union all
        select 4 as entity_id, 'A' as entity_group, cast(array[10, 11, 1, 0] as array(double)) as features union all
        select 5 as entity_id, 'A' as entity_group, cast(array[13, 14, 15, 0] as array(double)) as features union all
        select 6 as entity_id, 'A' as entity_group, cast(array[16, 17, 18, 0] as array(double)) as features union all
        select 7 as entity_id, 'A' as entity_group, cast(array[19, 20, 21, 0] as array(double)) as features union all
        select 8 as entity_id, 'A' as entity_group, cast(array[22, 23, 1, 0] as array(double)) as features
    )
        select entity_group,
            summarize(array_agg(features)) as statistical_summary
        from data_table
        group by entity_group
  • Query and analysis results

    entity_group

    statistical_summary

    A

    {
      "sampleCount": 8,
      "vectorSize": 4,
      "means": [
        11.5,
        12.5,
        9.25,
        0.0
      ],
      "stdDevs": [
        6.87386354243376,
        6.87386354243376,
        7.361215932167728,
        0.0
      ],
      "variances": [
        47.25,
        47.25,
        54.1875,
        0.0
      ],
      "mins": [
        1.0,
        2.0,
        1.0,
        0.0
      ],
      "maxs": [
        22.0,
        23.0,
        21.0,
        0.0
      ],
      "covariance": [
        [
          47.25,
          47.25,
          19.125,
          0.0
        ],
        [
          47.25,
          47.25,
          19.125,
          0.0
        ],
        [
          19.125,
          19.125,
          54.1875,
          0.0
        ],
        [
          0.0,
          0.0,
          0.0,
          0.0
        ]
      ],
      "correlations": [
        [
          1.0,
          1.0,
          0.37796447300922725,
          0.0
        ],
        [
          1.0,
          1.0,
          0.37796447300922725,
          0.0
        ],
        [
          0.37796447300922725,
          0.37796447300922725,
          1.0,
          0.0
        ],
        [
          0.0,
          0.0,
          0.0,
          1.0
        ]
      ],
      "sums": [
        92.0,
        100.0,
        74.0,
        0.0
      ],
      "weightSum": 8.0,
      "sumProducts": [
        [
          1436.0,
          1528.0,
          1004.0,
          0.0
        ],
        [
          1528.0,
          1628.0,
          1078.0,
          0.0
        ],
        [
          1004.0,
          1078.0,
          1118.0,
          0.0
        ],
        [
          0.0,
          0.0,
          0.0,
          0.0
        ]
      ],
      "isSummarized": true
    }
    

    Parameters in the query and analysis results

    Parameter

    Description

    sampleCount

    The number of samples.

    vectorSize

    The length of the vector.

    means

    The average values of all vector components.

    stdDevs

    The standard deviation values of all vector components.

    variances

    The variance values of all vector components.

    mins

    The minimum values of all vector components.

    maxs

    The maximum values of all vector components.

    covariance

    The covariance matrix of all vector components.

    correlations

    The correlation coefficient matrix of all vector components.

    sums

    The sum of all vector components.

    weightSum

    The sum of all sample weights.

    sumProducts

    The intermediate results that are used to merge statistical patterns.

    isSummarized

    Indicates whether a normal multivariate pattern is returned. Valid values:

    • true

    • false

merge_summary function

The merge_summary function merges multivariate patterns that are returned by the summarize function. The multivariate patterns can be patterns that are obtained by learning from the same dataset in different stages or patterns that are obtained by learning from two independent datasets. For more information, see summarize function.

varchar merge_summary(varchar summary1, varchar summary2)

Or

varchar merge_summary(varchar summary1, double weight1, varchar summary2, double weight2)

Parameter

Description

summary1

A multivariate pattern that is returned by the summarize function. For more information, see summarize function.

weight1

The weight of the multivariate pattern specified by the summary1 parameter.

summary2

A multivariate pattern that is returned by the summarize function. For more information, see summarize function.

weight2

The weight of the multivariate pattern specified by the summary2 parameter.

Example

  • Query statement

    * | with data_table_01 as
    (
        select 1 as entity_id, 'A' as entity_group, cast(array[1, 2, 3, 0] as array(double)) as features union all
        select 2 as entity_id, 'A' as entity_group, cast(array[4, 5, 6, 0] as array(double)) as features union all
        select 3 as entity_id, 'A' as entity_group, cast(array[7, 8, 9, 0] as array(double)) as features union all
        select 4 as entity_id, 'A' as entity_group, cast(array[10, 11, 1, 0] as array(double)) as features
    ),
    
    summaries_01 as
    (
        select entity_group,
            summarize(array_agg(features)) as statistical_summary
        from data_table_01
        group by entity_group
    ),
    
    data_table_02 as
    (
        select 5 as entity_id, 'A' as entity_group, cast(array[13, 14, 15, 0] as array(double)) as features union all
        select 6 as entity_id, 'A' as entity_group, cast(array[16, 17, 18, 0] as array(double)) as features union all
        select 7 as entity_id, 'A' as entity_group, cast(array[19, 20, 21, 0] as array(double)) as features union all
        select 8 as entity_id, 'A' as entity_group, cast(array[22, 23, 1, 0] as array(double)) as features
    ),
    
    summaries_02 as
    (
        select entity_group,
            summarize(array_agg(features)) as statistical_summary
        from data_table_02
        group by entity_group
    )
    select s1.entity_group,
        merge_summary(s1.statistical_summary, s2.statistical_summary) as statistical_summary
    from summaries_01 as s1
        join summaries_02 as s2
        on s1.entity_group = s2.entity_group
  • Query and analysis results

    The statistical_summary parameter indicates the multivariate pattern that is obtained after the merge operation.

    entity_group

    statistical_summary

    2

    {
      "sampleCount": 8,
      "vectorSize": 4,
      "means": [
        11.5,
        12.5,
        9.25,
        0.0
      ],
      "stdDevs": [
        6.87386354243376,
        6.87386354243376,
        7.361215932167728,
        0.0
      ],
      "variances": [
        47.25,
        47.25,
        54.1875,
        0.0
      ],
      "mins": [
        1.0,
        2.0,
        1.0,
        0.0
      ],
      "maxs": [
        22.0,
        23.0,
        21.0,
        0.0
      ],
      "covariance": [
        [
          47.25,
          47.25,
          19.125,
          0.0
        ],
        [
          47.25,
          47.25,
          19.125,
          0.0
        ],
        [
          19.125,
          19.125,
          54.1875,
          0.0
        ],
        [
          0.0,
          0.0,
          0.0,
          0.0
        ]
      ],
      "correlations": [
        [
          1.0,
          1.0,
          0.37796447300922725,
          0.0
        ],
        [
          1.0,
          1.0,
          0.37796447300922725,
          0.0
        ],
        [
          0.37796447300922725,
          0.37796447300922725,
          1.0,
          0.0
        ],
        [
          0.0,
          0.0,
          0.0,
          1.0
        ]
      ],
      "sums": [
        92.0,
        100.0,
        74.0,
        0.0
      ],
      "weightSum": 8.0,
      "sumProducts": [
        [
          1436.0,
          1528.0,
          1004.0,
          0.0
        ],
        [
          1528.0,
          1628.0,
          1078.0,
          0.0
        ],
        [
          1004.0,
          1078.0,
          1118.0,
          0.0
        ],
        [
          0.0,
          0.0,
          0.0,
          0.0
        ]
      ],
      "isSummarized": true
    }

    Parameters in the query and analysis results

    Parameter

    Description

    sampleCount

    The number of samples.

    vectorSize

    The length of the vector.

    means

    The average values of all vector components.

    stdDevs

    The standard deviation values of all vector components.

    variances

    The variance values of all vector components.

    mins

    The minimum values of all vector components.

    maxs

    The maximum values of all vector components.

    covariance

    The covariance matrix of all vector components.

    correlations

    The correlation coefficient matrix of all vector components.

    sums

    The sum of all vector components.

    weightSum

    The sum of all sample weights.

    sumProducts

    The intermediate results that are used to merge statistical patterns.

    isSummarized

    Indicates whether a normal multivariate pattern is returned. Valid values:

    • true

    • false

normalize_vector function

The normalize_vector function normalizes a new sample vector specified by the x_vector parameter based on a pattern specified by the summary parameter. Make sure that each component of the vector is mapped to the [0, 1] interval. You can set the summary parameter to a pattern that is returned by the summarize function. For more information, see summarize function.

array(double) normalize_vector(varchar summary, array(double) x_vector)

Parameter

Description

summary

A multivariate pattern that is returned by the summarize function. For more information, see summarize function.

x_vector

The new sample data.

Example

  • Query statement

    * | with data_table as
    (
        select 1 as entity_id, 'A' as entity_group, cast(array[1, 2, 3, 0] as array(double)) as features union all
        select 2 as entity_id, 'A' as entity_group, cast(array[4, 5, 6, 0] as array(double)) as features union all
        select 3 as entity_id, 'A' as entity_group, cast(array[7, 8, 9, 0] as array(double)) as features union all
        select 4 as entity_id, 'A' as entity_group, cast(array[10, 11, 1, 0] as array(double)) as features union all
        select 5 as entity_id, 'A' as entity_group, cast(array[13, 14, 15, 0] as array(double)) as features union all
        select 6 as entity_id, 'A' as entity_group, cast(array[16, 17, 18, 0] as array(double)) as features union all
        select 7 as entity_id, 'A' as entity_group, cast(array[19, 20, 21, 0] as array(double)) as features union all
        select 8 as entity_id, 'A' as entity_group, cast(array[22, 23, 1, 0] as array(double)) as features
    ),
    
    summaries as
    (
        select entity_group,
            summarize(array_agg(features)) as statistical_summary
        from data_table
        group by entity_group
    )
    
    select t1.entity_id,
        t1.entity_group,
        normalize_vector(t2.statistical_summary, t1.features) as normalized_features
    from data_table as t1
        join summaries as t2
        on t1.entity_group = t2.entity_group
  • Query and analysis results

    The normalized_features parameter indicates the normalization results of the sample vector specified by the x_vector parameter.

    entity_id

    entity_group

    normalized_features

    2

    A

    [0.14285714285714286,0.14285714285714286,0.25,0.5]

    4

    A

    [0.42857142857142857,0.42857142857142857,0.0,0.5]

    3

    A

    [0.2857142857142857,0.2857142857142857,0.4,0.5]

    ...

    ...

    ...

standardize_vector function

The standardize_vector function standardizes a new sample vector specified by the x_vector parameter based on a pattern specified by the summary parameter. Make sure that the components of the vector have a mean of 0 and a standard deviation of 1. You can set the summary parameter to a pattern that is returned by the summarize function. For more information, see summarize function.

array(double) standardize_vector(varchar summary, array(double) x_vector)

Parameter

Description

summary

A multivariate pattern that is returned by the summarize function. For more information, see summarize function.

x_vector

The new sample data.

Example

  • Query statement

    * | with data_table as
    (
        select 1 as entity_id, 'A' as entity_group, cast(array[1, 2, 3, 0] as array(double)) as features union all
        select 2 as entity_id, 'A' as entity_group, cast(array[4, 5, 6, 0] as array(double)) as features union all
        select 3 as entity_id, 'A' as entity_group, cast(array[7, 8, 9, 0] as array(double)) as features union all
        select 4 as entity_id, 'A' as entity_group, cast(array[10, 11, 1, 0] as array(double)) as features union all
        select 5 as entity_id, 'A' as entity_group, cast(array[13, 14, 15, 0] as array(double)) as features union all
        select 6 as entity_id, 'A' as entity_group, cast(array[16, 17, 18, 0] as array(double)) as features union all
        select 7 as entity_id, 'A' as entity_group, cast(array[19, 20, 21, 0] as array(double)) as features union all
        select 8 as entity_id, 'A' as entity_group, cast(array[22, 23, 1, 0] as array(double)) as features
    ),
    
    summaries as
    (
        select entity_group,
            summarize(array_agg(features)) as statistical_summary
        from data_table
        group by entity_group
    )
    
    select t1.entity_id,
        t1.entity_group,
        standardize_vector(t2.statistical_summary, t1.features) as standardized_features
    from data_table as t1
        join summaries as t2
        on t1.entity_group = t2.entity_group
  • Query and analysis results

    The standardized_features parameter indicates the standardization results of the sample vector specified by the x_vector parameter.

    entity_id

    entity_group

    standardized_features

    2

    A

    [-1.0910894511799619,-1.0910894511799619,-0.4415031470273609,0.0]

    4

    A

    [-0.21821789023599237,-0.21821789023599237,-1.1207387578386854,0.0]

    3

    A

    [-0.6546536707079771,-0.6546536707079771,-0.03396178054056622,0.0]

    ...

    ...

    ...

mah_distance function

The mah_distance function calculates the Mahalanobis distance for a new sample vector specified by the x_vector parameter based on a pattern specified by the summary parameter. You can set the summary parameter to a pattern that is returned by the summarize function. For more information, see summarize function. You can use the Mahalanobis distance to handle the scale differences between different variables in an effective manner. The Mahalanobis distance measures the distance between the sample vector specified by the x_vector parameter after the vector is standardized and the centroid of the vector. If the Mahalanobis distance is 1, the distance between the sample vector and the centroid of the vector is equal to the average distance between all vectors and the centroid of the vectors.

double mah_distance(varchar summary, array(double) x_vector)

Parameter

Description

summary

A multivariate pattern that is returned by the summarize function. For more information, see summarize function.

x_vector

The new sample data.

Example

  • Query statement

    * | with data_table as
    (
        select 1 as entity_id, 'A' as entity_group, cast(array[1, 2, 3, 0] as array(double)) as features union all
        select 2 as entity_id, 'A' as entity_group, cast(array[4, 5, 6, 0] as array(double)) as features union all
        select 3 as entity_id, 'A' as entity_group, cast(array[7, 8, 9, 0] as array(double)) as features union all
        select 4 as entity_id, 'A' as entity_group, cast(array[10, 11, 1, 0] as array(double)) as features union all
        select 5 as entity_id, 'A' as entity_group, cast(array[13, 14, 15, 0] as array(double)) as features union all
        select 6 as entity_id, 'A' as entity_group, cast(array[16, 17, 18, 0] as array(double)) as features union all
        select 7 as entity_id, 'A' as entity_group, cast(array[19, 20, 21, 0] as array(double)) as features union all
        select 8 as entity_id, 'A' as entity_group, cast(array[22, 23, 1, 0] as array(double)) as features
    ),
    
    summaries as
    (
        select entity_group,
            summarize(array_agg(features)) as statistical_summary
        from data_table
        group by entity_group
    )
    
    select t1.entity_id,
        t1.entity_group,
        mah_distance(t2.statistical_summary, t1.features) as std_distance
    from data_table as t1
        join summaries as t2
        on t1.entity_group = t2.entity_group
  • Query and analysis results

    The std_distance parameter indicates the Mahalanobis distance of the sample vector specified by the x_vector parameter.

    entity_id

    entity_group

    std_distance

    8

    A

    2.386927730244857

    7

    A

    1.6809080087793125

    1

    A

    1.5554594371997328

    ...

    ...

    ...

standard_distance function

The standard_distance function calculates the standardized distance for a new sample vector specified by the metric_value parameter based on a pattern sepcified by the summary parameter. You can set the summary parameter to a pattern that is returned by the summarize function. For more information, see summarize function. summarize function The standardized distance is similar to the Mahalanobis distance. The Mahalanobis distance measures the standardized distance between a vector that consists of multiple metrics and the centroid of the vector. The standardized distance measures the distance between a metric of a vector and the centroid of the vector. The element_index parameter specifies the index of the metric. The index starts from 0. The metric_value parameter specifies the value of the metric.

double standard_distance(varchar summary, double metric_value, int element_index)

Parameter

Description

summary

A multivariate pattern that is returned by the summarize function. For more information, see summarize function.

metric_value

The metric.

element_index

The index of the metric in the multivariate pattern specified by the summary parameter. The index starts from 0.

Example

  • Query statement

    * | with data_table as
    (
        select 1 as entity_id, 'A' as entity_group, cast(array[1, 2, 3, 0] as array(double)) as features union all
        select 2 as entity_id, 'A' as entity_group, cast(array[4, 5, 6, 0] as array(double)) as features union all
        select 3 as entity_id, 'A' as entity_group, cast(array[7, 8, 9, 0] as array(double)) as features union all
        select 4 as entity_id, 'A' as entity_group, cast(array[10, 11, 1, 0] as array(double)) as features union all
        select 5 as entity_id, 'A' as entity_group, cast(array[13, 14, 15, 0] as array(double)) as features union all
        select 6 as entity_id, 'A' as entity_group, cast(array[16, 17, 18, 0] as array(double)) as features union all
        select 7 as entity_id, 'A' as entity_group, cast(array[19, 20, 21, 0] as array(double)) as features union all
        select 8 as entity_id, 'A' as entity_group, cast(array[22, 23, 1, 0] as array(double)) as features
    ),
    
    summaries as
    (
        select entity_group,
            summarize(array_agg(features)) as statistical_summary
        from data_table
        group by entity_group
    )
    
    select t1.entity_id,
        t1.entity_group,
        standard_distance(t2.statistical_summary, 30, 1) as std_distance
    from data_table as t1
        join summaries as t2
        on t1.entity_group = t2.entity_group
    
  • Query and analysis results

    The std_distance parameter indicates the standardized distance of the metric specified by the metric_value parameter.

    entity_id

    entity_group

    std_distance

    8

    A

    2.386927730244857

    7

    A

    1.6809080087793125

    1

    A

    1.5554594371997328

    ...

    ...

    ...

anomaly_level function

The anomaly_level function calculates the Mahalanobis distance for a new sample vector specified by the x_vector parameter based on a pattern specified by the summary parameter and rounds down each distance value to obtain different levels of abnormal probabilities. You can set the summary parameter to a pattern that is returned by the summarize function. For more information, see summarize function. If the return value is 0.1, the probability that the new sample vector is abnormal is 10%. This value indicates first-level anomalies. If the return value is 0.01, the probability that the new sample vector is abnormal is 1%. This value indicates second-level anomalies. If the return value is 0.001, the probability that the new sample vector is abnormal is 0.1%. This value indicates third-level anomalies. If the return value is 0.0001, the probability that the new sample vector is abnormal is 0.01%. This value indicates fourth-level anomalies.An increase in the level of anomalies indicates a decrease in the abnormal probability of the new sample vector and an increase in the suspiciability of the vector. In most cases, you can configure a threshold to filter anomalies based on the anomaly detection results. For example, you can configure a threshold to retain only fourth-level anomalies and higher.

If you specify a value for the element_index parameter, the function returns a probability only for the metric with the specified index of the specified vector. If you do not configure the element_index parameter, the function returns a probability for each metric of the specified vector.

double anomaly_level(varchar summary, array(double) x_vector)

Or

double anomaly_level(varchar summary, array(double) x_vector, int element_index)

Parameter

Description

summary

A multivariate pattern that is returned by the summarize function. For more information, see summarize function.

x_vector

The new sample data.

element_index

Optional. The index.

Example

  • Query statement

    * |
    with dummy as
    (
        select sequence(1, 1000) as seq_data,
            count(*) as record_count from log
    ),
    
    sample_data as
    (
        select 'G1' as group_id,
            s.seq_num,
             -- Generate 1,000 two-dimensional random vectors that are distributed around the (100, 5000) range. The standard deviation values of two components are 20 and 500.
            inverse_normal_cdf(100, 20, random()) as x1,
            inverse_normal_cdf(5000, 500, rand()) as x2
        from dummy,
            unnest(seq_data) as s(seq_num)
    ),
    
    data_summary as
    (
        select group_id,
            summarize(array_agg(array[x1, x2])) as metric_summary
        from sample_data
        group by group_id
    ),
    
    new_data as
    (
        select 'G1' as group_id, 1001 as object_id, 100.0 as x1, 5000.0 as x2
        union all select 'G1' as group_id, 1002 as object_id, 118.0 as x1, 5450.0 as x2
        union all select 'G1' as group_id, 1003 as object_id, 138.0 as x1, 5950.0 as x2
        union all select 'G1' as group_id, 1004 as object_id, 158.0 as x1, 6450.0 as x2
        union all select 'G1' as group_id, 1005 as object_id, 178.0 as x1, 6950.0 as x2
        union all select 'G1' as group_id, 1006 as object_id, 198.0 as x1, 7450.0 as x2
        union all select 'G1' as group_id, 1007 as object_id, 318.0 as x1, 10000.0 as x2
    )
    
    select n.group_id,
        json_extract(s.metric_summary, '$.means') as metric_vector_mean,
        json_extract(s.metric_summary, '$.covariance') as metric_covariance,
        n.object_id,
        n.x1,
        n.x2,
        anomaly_level(s.metric_summary, array[x1, x2]) as anomaly_level
    from data_summary as s
        join new_data as n
            on s.group_id = n.group_id
    order by n.group_id, n.object_id
    limit 100000
    
  • Query and analysis results

    The anomaly_level parameter indicates the abnormal probability of the sample vector specified by the x_vector parameter.

    group_id

    object_id

    anomaly_level

    G1

    1007

    13.0

    G1

    1006

    5.0

    G1

    1005

    4.0

    ...

    ...

    ...