All Products
Search
Document Center

Simple Log Service:Pattern mining (tabular data drill-down analysis) function

Last Updated:Nov 10, 2025

The pattern mining function analyzes samples with multiple attributes based on specific conditions. It finds distinct patterns that affect data partitioning, helping you quickly identify the causes of differences between the conditions.

diff_patterns

This function compares two datasets with the same structure to identify patterns of differing discrete attributes (dimensions). It is designed to compare templates (frequent itemsets) in tabular data.

Syntax

diff_patterns($TABLE, $HEADER, $SPLIT_COLUMN, $TEST_VAL, $CONTROL_VAL, $METRIC_COLUMN, $WEIGHT_COLUMN, $PARAM)

Parameters

Important

When you use the pattern difference statistics function, ensure that the $SPLIT_COLUMN contains rows for both $TEST_VAL and $CONTROL_VAL. The function cannot perform a comparison if data for the test group or the control group is missing.

Parameter

Data type

Required

Description

$TABLE

row<array<T>, array<E>, ..., array<F>>

Yes

The input table for the diff_patterns function. Each column is a dimension column for drill-down analysis.

$HEADER

array<varchar>

Yes

The column names. These names correspond to the columns in $TABLE. The number of header names must match the number of table columns.

$SPLIT_COLUMN

varchar

Yes

The name of the target column used to distinguish between groups. The tag category currently supports only the string type.

$TEST_VAL

varchar

Yes

The value in $SPLIT_COLUMN that corresponds to the test group.

Ensure that $TEST_VAL exists in $SPLIT_COLUMN. Otherwise, the two groups cannot be distinguished. Typically, $TEST_VAL corresponds to an abnormal value.

$CONTROL_VAL

varchar

Yes

The value in $SPLIT_COLUMN that corresponds to the control group.

Ensure that $CONTROL_VAL exists in $SPLIT_COLUMN. Otherwise, the two groups cannot be distinguished. $CONTROL_VAL corresponds to a normal value.

$METRIC_COLUMN

double/int

No

The metric column. The default value is '', which means no metric column is specified.

If you want to analyze the impact of dimension combinations on a specific metric in addition to the data groups, specify a metric column. For example, if a 'qps' column records your metric, the algorithm can calculate the differences in this metric for various dimension combinations.

$WEIGHT_COLUMN

varchar

No

The weight column. The default value is '' , which means no weight column is specified.

If this parameter is not set, all rows are assumed to have the same weight. If a weight column is set, all calculations consider the weights. For example, a row with a weight of 2 is treated as if it appeared twice compared to rows with a weight of 1.

$PARAM

varchar

No

For more information, see param parameter details.

param parameter details

Parameter

Description

Type

Required

Default

Valid values

minimum_support_fraction

The minimum support of an output pattern in the test group. For example, if a pattern appears with a frequency of 0.1 in the test group, its support is 0.1. This parameter controls the number of output patterns by adjusting the sensitivity.

double

No

0.05

(0, 1)

allow_sample

Specifies whether to allow sampling for calculations.

bool

No

true

[true, false]

agg_op

A pattern filters multiple rows of data. The function then calculates the aggregate value of these rows in the METRIC_COLUMN. This parameter specifies the aggregation operation to use. If METRIC_COLUMN is not specified, the default value is 1 for the test group and 0 for the control group.

string

No

"avg"

["sum", "avg", "max", "min", "count", "p95"]

shapley_value_min_fraction

The minimum Shapley value fraction for a condition. This parameter controls the number of conditions in a pattern.

double

No

0.1

(0, 1)

Examples

  • Query analysis:

    The `set session enable_remote_functions=true` flag is required because this feature is in public preview. You must add this flag manually. This requirement will be removed in future versions.
    * | set session enable_remote_functions=true ;
    with t0 as (select  JSON_EXTRACT_SCALAR(entity, '$.platform') AS platform,  JSON_EXTRACT_SCALAR(entity, '$.region') AS region, cast(value as double) as value, if((value > 100), 'true', 'false') as anomaly_label from log), 
    t1 as ( select array_agg(platform) as platform, array_agg(region) as region, array_agg(anomaly_label) as anomaly_label, array_agg(value) as value from t0),
    t2 as (select row(platform, region, anomaly_label, value) as table_row from t1),
    t3 as (select diff_patterns(table_row, ARRAY['platform', 'region', 'anomaly_label', 'value'],'anomaly_label', 'true', 'false', 'value') as ret from t2)
    select * from t3
  • Outputs:

    [["\"platform\"='Amazon Prime'","\"platform\"='Console'","\"platform\"='VR'","\"platform\"='Mobile'","\"platform\"='PC'","\"platform\"='Hulu'","\"platform\"='Netflix' AND \"region\"='Asia'","\"platform\"='YouTube'","\"platform\"='Disney+' AND \"region\"='South America'"],[18720,56921,44516,47520,44640,8640,4320,15840,4320],[0,4999,7324,5760,4320,0,0,1440,0],[0.05248902272841978,0.15960083668399478,0.12481844742405635,0.13324136538752713,0.1251661311216164,0.024225702797732206,0.012112851398866103,0.04441378846250904,0.012112851398866103],[0.0,0.0032988991351195829,0.004833194091941553,0.003801092022062172,0.0028508190165466289,0.0,0.0,0.000950273005515543,0.0],[0.05248902272841978,0.1563019375488752,0.1199852533321148,0.12944027336546497,0.12231531210506977,0.024225702797732206,0.012112851398866103,0.0434635154569935,0.012112851398866103],[608.6946996258738,552.6152340898742,501.3011076154948,525.0147121416072,580.8151715760738,409.282718625793,631.6115865793741,603.7699053925991,526.960199868643],[0.0,58.46518308219622,55.396758003518488,49.12926473747681,44.68946999270535,0.0,0.0,20.76262974302821,0.0],null]

Return value details

Parameter

Type

Description

Example

$RET.patterns

array<varchar>

The table templates, which are frequent itemsets.

["\"platform\"='Amazon Prime'","\"platform\"='Console'","\"platform\"='VR'","\"platform\"='Mobile'","\"platform\"='PC'","\"platform\"='Hulu'","\"platform\"='Netflix' AND \"region\"='Asia'","\"platform\"='YouTube'","\"platform\"='Disney+' AND \"region\"='South America'"]

$RET.test_supports

array<bigint>

The frequency of the corresponding template in the test group.

[18720,56921,44516,47520,44640,8640,4320,15840,4320],[0,4999,7324,5760,4320,0,0,1440,0]

$RET.control_supports

array<bigint>

The frequency of the corresponding template in the control group.

By comparison, the results that appear earlier in the diff_patterns output have a much higher frequency in the test group (abnormal group) than in the control group (normal group).

0,4999,7324,5760,4320,0,0,1440,0]

$RET.test_ratio

array<double>

The ratio of the corresponding template in the test group (label=1) to the total test dataset.

[0.05248902272841978,0.15960083668399478,0.12481844742405635,0.13324136538752713,0.1251661311216164,0.024225702797732206,0.012112851398866103,0.04441378846250904,0.012112851398866103]

$RET.control_ratio

array<double>

The ratio of the corresponding template in the control group (label=0) to the total validation set.

[0.0,0.0032988991351195829,0.004833194091941553,0.003801092022062172,0.0028508190165466289,0.0,0.0,0.000950273005515543,0.0]

$RET.difference

array<double>

The difference between the test group data and the control group data.

[0.05248902272841978,0.1563019375488752,0.1199852533321148,0.12944027336546497,0.12231531210506977,0.024225702797732206,0.012112851398866103,0.0434635154569935,0.012112851398866103]

$RET.test_average_metric

array<double>

The average value of METRIC_COLUMN for data that matches the pattern. If METRIC_COLUMN is '', this return value is not meaningful.

[608.6946996258738,552.6152340898742,501.3011076154948,525.0147121416072,580.8151715760738,409.282718625793,631.6115865793741,603.7699053925991,526.960199868643]

$RET.control_average_metric

array<double>

The average value of METRIC_COLUMN for data that does not match the pattern. If METRIC_COLUMN is '', this return value is not meaningful.

[0.0,58.46518308219622,55.396758003518488,49.12926473747681,44.68946999270535,0.0,0.0,20.76262974302821,0.0]

$RET.error_msg

varchar/null

The error message. If no error occurs, the return value is null.

null