Simple Log Service SQL関数でサポートされている回帰モデル - Simple Log Service

回帰モデルは、データ分析、予測、自動モニタリング、および異常検出に使用できます。複雑なシステム管理では、回帰モデルを使用し、しきい値とアラートルールを設定して、問題の特定の適時性と精度を大幅に向上させ、システムの安定性を確保できます。このトピックでは、回帰分析関数の構文について説明します。このトピックでは、関数の使用方法の例。

背景情報

以下の式が使用される: y = a1 × x1 + a2 × x2 + b + noise。

パラメーター	説明
`x1`	Simple Log Serviceによって収集されるデータの列。
`x2`	Simple Log Serviceによって収集されるデータの列。
`ノイズ`	ランダム変数。
`y`	計算結果。

回帰分析関数は、提供されるx1、x2、およびyと重みデータとに基づいて、a1、a2、およびbの値を識別する。次に、関数は計算結果を返します。 a1, a2, bは係数である。

回帰分析関数のサンプルログには、6つのインデックス付きフィールドがあります。次の図は、インデックス付きフィールドを示しています。詳細については、「インデックスの作成」をご参照ください。

次のコードは、サンプルログを示しています。

{"group_id":"A","observation_id":"S001","time_offset":"0","x1":"1","x2":"5","y":"23.91700530543459"}
{"group_id":"A","observation_id":"S002","time_offset":"-1","x1":"2","x2":"2","y":"6.931858878794941"}
{"group_id":"A","observation_id":"S003","time_offset":"-2","x1":"3","x2":"8","y":"16.17603801639615"}
{"group_id":"A","observation_id":"S004","time_offset":"-3","x1":"4","x2":"6","y":"24.97127625789946"}
{"group_id":"A","observation_id":"S005","time_offset":"-4","x1":"5","x2":"2","y":"11.933292736756384"}
{"group_id":"A","observation_id":"S006","time_offset":"-5","x1":"6","x2":"8","y":"21.034262717019995"}
{"group_id":"A","observation_id":"S007","time_offset":"-6","x1":"7","x2":"1","y":"25.966770392099868"}
{"group_id":"A","observation_id":"S008","time_offset":"-7","x1":"8","x2":"7","y":"16.93019469603219"}
{"group_id":"A","observation_id":"S009","time_offset":"-8","x1":"9","x2":"2","y":"19.967258015889847"}
{"group_id":"A","observation_id":"S010","time_offset":"-9","x1":"10","x2":"3","y":"27.0277513207651"}

関数

関数	構文	説明	戻り値のデータ型
linear_model関数	linear_model(array(double)) x_samples, array(double) y_samples) linear_model(array(double)) x_samples, array(double) y_samples, array(double) weights)	JSON形式の回帰モデルを返します。この関数はスカラー関数です。入力は、array_agg関数とオプションの重みによって集計されたサンプルです。	varchar
linear_model_predict関数	linear_model_predict(varchar model_in_json, array(double) x_sample)	既存の回帰モデルと指定した独立変数に基づいてデータ予測を実行します。	double
recent_regression関数	recent_regression(double y, array(double) x_array, double cur_sample_time_period, double cur_batch_begin_period, double cur_batch_end_period, double time_unit, double damping_weight_per_time_unit)	オンラインモードで最近収集したデータに基づいて、回帰モデルのパラメーターと状態変数を更新します。回帰モデルの重みは、サンプル年齢に基づいて調整される。サンプルの重要性は、サンプルが古くなるにつれて指数関数的に減衰する。	varchar
merge_recent_regression関数	merge_recent_regression(varchar model_1_json, varchar model_2_json)	関数が2回呼び出された後にrecent_regression関数によって返される回帰モデルのパラメーターと状態変数をマージします。結果は、2つのデータセットに基づいてトレーニングされた新しい回帰モデルのパラメータおよび状態変数と同じです。	varchar
recent_regression_predict関数	recent_regression_predict(varchar model_json, array(double) x_sample)	適応回帰モデルに基づいてデータ予測を実行します。	double

サンプル重みを持つ回帰モデル

回帰モデルのサンプル重みを指定できます。時間と従属変数に関連するサンプル重みを指定できます。回帰モデルのサンプル重みがサンプルが古くなるにつれて減衰する場合、回帰モデルは、システムの変化に適応するために、最新のデータにより焦点を合わせる。回帰モデルのサンプル重みが従属変数の絶対値の逆数である場合、回帰モデルは相対誤差を最小化することができる。

linear_model関数

linear_model関数は、JSON形式の回帰モデルを返します。この関数はスカラー関数です。入力は、array_agg関数とオプションの重みによって集計されたサンプルです。詳細は、「array_agg関数」をご参照ください。

varchar linear_model(array(array(double)) x_samples, array(double) y_samples)

または

varchar linear_model(array(array(double)) x_samples, array(double) y_samples, array(double) weights)

パラメーター	説明
`x_samples`	複数の独立変数サンプルで構成されるデータ行列。各行は、独立変数サンプルに対する観測演算を示す。
`y_samples`	従属変数サンプルからなるベクトル。
`重み`	オプションです。このパラメーターを空のままにすると、すべての変数サンプルに同じ重みが指定されます。

例

クエリ文

 * |   select group_id,
        linear_model(
            array_agg(array[x1, x2]),
            array_agg(y)
        ) as model
    from log
    group by group_id

によるグループ

クエリおよび分析の結果

クエリおよび分析結果の係数パラメーターは、入力データに基づいてトレーニングされた線形回帰モデルの係数を示します。

データ予測では、linear_model関数の戻り値がlinear_model_predict関数の入力パラメーターとして使用されます。

グループ_id

モデル

{
  "coefficients": [
    0.8350068912618618,
    -0.741283054726383,
    19.17405856472653
  ],
  "isBuilt": true,
  "isBuildSuccessful": true,
  "sampleCount": 10,
  "xCount": 2,
  "wSum": 10.0,
  "ySumSquare": 3930.0,
  "ySum": 188.0,
  "xXSumProducts": [
    [
      385.0,
      367.0
    ],
    [
      367.0,
      475.0
    ]
  ],
  "xYSumProducts": [
    1104.0,
    1239.0
  ],
  "xSums": [
    55.0,
    67.0
  ],
  "xMeans": [
    5.5,
    6.7
  ],
  "xStdDevs": [
    2.8722813232690143,
    1.6155494421403511
  ],
  "xVariances": [
    8.25,
    2.6099999999999994
  ],
  "yMean": 18.8,
  "yStdDev": 6.289674077406551,
  "yVariance": 39.559999999999945,
  "xCorrelations": [
    [
      1.0,
      -0.03232540919176149
    ],
    [
      -0.03232540919176149,
      1.0
    ]
  ],
  "xYCorrelations": [
    0.3874743195572169,
    -0.202730375711539
  ],
  "regularized": true,
  "regularWeight": 1.0E-6
}

linear_model_predict関数

linear_model_predict関数は、指定した回帰モデルと入力変数サンプルに基づいてデータ予測を実行します。

double linear_model_predict(varchar model_in_json, array(double) x_sample)

パラメーター	説明
`model_in_json`	linear_model関数によって返される回帰モデル。詳細は、「linear_model関数」をご参照ください。
`x_sample`	新しい独立変数。The new independent variable.

例

クエリ文

* | with group_models as
(
    select group_id,
        linear_model(
            array_agg(array[x1, x2]),
            array_agg(y)
        ) as model
    from log
    group by group_id
)

select d.group_id,
    d.y,
    linear_model_predict(m.model, array[x1, x2]) as predicted_y
from group_models as m
    join log as d
    on m.group_id = d.group_id

クエリおよび分析の結果
predicted_yパラメータの値は、独立変数に基づいて計算される。
グループ_id
observation_id
y
predicted_y
A
S001
23.91700530543459
15.68867910570816
A
S002
6.931858878794941
15.352330987812993
...
...
...

オンライン適応回帰アルゴリズム

オンライン適応回帰アルゴリズムは、アルゴリズムが新しいデータを受信すると、新しいデータで回帰モデルを増分的に更新する。このアルゴリズムは、大量のデータを処理する際に、バッチアルゴリズムよりも効率的なコンピューティングとコスト効率の高いストレージをサポートします。オンライン適応回帰アルゴリズムは、連続プロファイリングに適している。データ処理後、アルゴリズムはサンプルを破棄します。これはより実用的で便利です。

オンライン適応回帰アルゴリズムは、アルゴリズムが統計的特徴および使用される回帰モデルを増分的に計算するとき、統計的特徴に対する履歴サンプルの影響を自動的かつ指数関数的に減衰させる。このように、最新のサンプルは高い重みを維持し、回帰モデルはシステムの変化に適応することができる。

recent_regression関数

recent_regression関数は、オンラインモードで最近収集されたデータに基づいて、回帰モデルのパラメーターと状態変数を更新します。回帰モデルの重みは、サンプル年齢に基づいて調整される。サンプルの重要性は、サンプルが古くなるにつれて指数関数的に減衰する。

varchar recent_regression(double y, array(double) x_array, double sample_time, double cur_batch_begin_period, double cur_batch_end_period, double time_unit, double unit_damping_weight)

パラメーター	説明
`y`	従属変数サンプル。
`x_array`	独立変数サンプルの配列。
`sample_time`	サンプル行のデータの時刻。値は数字に変換されます。
`cur_batch_begin_period`	モデルトレーニングに使用されるデータの時間範囲の開始時刻。
`cur_batch_end_period`	モデルトレーニングに使用されるデータの時間範囲の終了時刻。時間範囲は閉じた間隔であり、`[batch_window_begin_time, batch_window_end_time]` として表示されます。
`time_unit`	時間間隔。時間間隔の単位は、`sample_time`パラメーターで指定された値の単位と同じです。
`unit_damping_weight`	指数崩壊ベース。このパラメーターをtime_unitパラメーターと一緒に設定すると、サンプルの重みは時間によって異なります。 time_unitパラメーターで指定された値が1増加すると、サンプル重みはunit_damping_weightパラメーターで指定された固定値に基づいて減衰します。サンプルの重みが半減期に基づいて指数関数的に減衰するようにパラメーターを設定できます。例えば、最新の時点のデータの重みは1である。そして、1日前のデータは0.5、2日前のデータは0.25、3日前のデータは0.125となる。 unit_damping_weightパラメーターの値は、次の式に基づいて計算されます。 unit_damping_weight = 2 ^ - (サンプルの時間間隔 /半減期)

例

クエリ文

  * | select group_id,
        recent_regression(
          y, array[x1, x2, 1.0], -- The dependent and independent variable samples.
          time_offset, -- The point in time of the sample.
          -4,          -- The start time of the time range for the current sample batch.
          0,           -- The end time of the time range for the current sample batch.
          1,           -- The time interval.
          0.999        -- The exponential decay base.
        ) as reg_model
    from log
    where time_offset >= -4 and time_offset <= 0
    group by group_id

によるグループ

クエリおよび分析の結果

クエリおよび分析結果の係数パラメーターは、入力データに基づいてトレーニングされた線形回帰モデルの係数を示します。

データ予測では、recent_regression_predict関数の入力パラメーターとしてrecent_regression_predict関数の戻り値が使用されます。

グループ_id

reg_model

{
  "sampleCount": 5,
  "xCount": 3,
  "timeUnit": 1.0,
  "beginTimePeriod": -4.0,
  "endTimePeriod": 0.0,
  "unitDampingWeight": 0.999,
  "wSum": 4.990009995001,
  "ySumSquare": 1644.6974283836598,
  "ySum": 83.76770287757991,
  "xXSumSquares": [
    [
      54.830206884025,
      70.82220388003,
      14.960044976005001
    ],
    [
      70.82220388003,
      173.70327985603598,
      25.955043976006
    ],
    [
      14.960044976005001,
      25.955043976006,
      4.990009995001
    ]
  ],
  "xYSumProducts": [
    245.21187055562675,
    402.5070758759011,
    83.76770287757991
  ],
  "xSums": [
    14.960044976005001,
    25.955043976006,
    4.990009995001
  ],
  "xMeans": [
    2.997999000200801,
    5.201401199999158,
    1.0
  ],
  "xStdDevs": [
    1.4142126422148122,
    2.7848935986573244,
    0.0
  ],
  "xVariances": [
    1.9999973974002003,
    7.755632355842543,
    0.0
  ],
  "yMean": 16.78708118049834,
  "yStdDev": 6.913170639821401,
  "yVariance": 47.79192829528864,
  "xCorrelations": [
    [
      1.0,
      -0.35572473794248516,
      0.0
    ],
    [
      -0.35572473794248516,
      1.0,
      0.0
    ],
    [
      0.0,
      0.0,
      1.0
    ]
  ],
  "xYCorrelations": [
    -0.12142097167729436,
    -0.34560624507434407,
    0.0
  ],
  "coefficients": [
    -1.3675797278475395,
    -1.104969989478544,
    0.0,
    26.634476066516903
  ],
  "isBuilt": true,
  "isBuildSuccessful": true
}

merge_recent_regression関数

merge_recent_regression関数は、関数が2回呼び出された後にrecent_regression関数によって返される回帰モデルのパラメーターと状態変数をマージします。結果は、2つのデータセットに基づいてトレーニングされた新しい回帰モデルのパラメータおよび状態変数と同じです。

varchar merge_recent_regression(varchar model_1_json, varchar model_2_json)

パラメーター	説明
`model_1_json`	recent_regression関数の戻り値。詳細については、「recent_regression関数」をご参照ください。
`model_2_json`	recent_regression関数の戻り値。詳細については、「recent_regression関数」をご参照ください。

例

クエリ文

* | with model1 as
(
    select group_id,
        recent_regression(
          y, array[x1, x2, 1.0], -- The dependent and independent variable samples.
          time_offset, -- The point in time of the sample.
          -4,          -- The start time of the time range for the current sample batch.
          0,           -- The end time of the time range for the current sample batch.
          1,           -- The time interval.
          0.999        -- The exponential decay base.
        ) as reg_model
    from log
    where time_offset >= -4 and time_offset <= 0
    group by group_id
),

model2 as
(
    select group_id,
        recent_regression(y, array[x1, x2, 1.0], time_offset, -9, -5, 1, 0.999) as reg_model
    from log
    where time_offset >= -9 and time_offset <= -5
    group by group_id
)

select m1.group_id,
    merge_recent_regression(m1.reg_model, m2.reg_model) as reg_model
from model1 as m1
    join model2 as m2
        on m1.group_id = m2.group_id

クエリおよび分析の結果

クエリおよび分析結果の係数パラメーターは、入力データに基づいてトレーニングされた線形回帰モデルの係数を示します。

データ予測では、merge_recent_regression_predict関数の入力パラメーターとして、merge_recent_regent_regression_regrection関数の戻り値が使用されます。

グループ_id

reg_model

{
  "sampleCount": 10,
  "xCount": 3,
  "timeUnit": 1.0,
  "beginTimePeriod": -9.0,
  "endTimePeriod": 0.0,
  "unitDampingWeight": 0.999,
  "wSum": 9.955119790251791,
  "ySumSquare": 4159.2626495224,
  "ySum": 193.9139516502596,
  "xXSumSquares": [
    [
      382.3684973894312,
      268.46629177582946,
      54.67098815430803
    ],
    [
      268.46629177582946,
      358.44803436913094,
      51.78255011892536
    ],
    [
      54.67098815430803,
      51.78255011892536,
      9.955119790251791
    ]
  ],
  "xYSumProducts": [
    1132.090921413269,
    919.4071924317548,
    193.9139516502596
  ],
  "xSums": [
    54.67098815430803,
    51.78255011892536,
    9.955119790251791
  ],
  "xMeans": [
    5.4917458861562585,
    5.201599901352432,
    1.0
  ],
  "xStdDevs": [
    2.8722740635191735,
    2.991614845817865,
    0.0
  ],
  "xVariances": [
    8.249958295964944,
    8.949759385717847,
    0.0
  ],
  "yMean": 19.478816502051856,
  "yStdDev": 6.1949232381571,
  "yVariance": 38.37707392665885,
  "xCorrelations": [
    [
      1.0,
      -0.1859947674356197,
      0.0
    ],
    [
      -0.1859947674356197,
      1.0,
      0.0
    ],
    [
      0.0,
      0.0,
      1.0
    ]
  ],
  "xYCorrelations": [
    0.3791693893070564,
    -0.4837793996174176,
    0.0
  ],
  "coefficients": [
    0.6460732812209116,
    -0.8864195347835274,
    0.0,
    20.541545982438304
  ],
  "isBuilt": true,
  "isBuildSuccessful": true
}

recent_regression_predict関数

recent_regression_predict関数は、適応回帰モデルに基づいてデータ予測を実行します。

double recent_regression_predict(varchar model_json, array(double) x_sample)

パラメーター	説明
model_json	recent_regressionまたはmerge_recent_regression関数の戻り値。
`x_sample`	予測値の計算に使用される独立したサンプル。

例

クエリ文

* | with model1 as
(
    select group_id,
        recent_regression(
          y, array[x1, x2, 1.0], -- The dependent and independent variable samples.
          time_offset, -- The point in time of the sample.
          -4,          -- The start time of the time range for the current sample batch.
          0,           -- The end time of the time range for the current sample batch.
          1,           -- The time interval.
          0.999        -- The exponential decay base.
        ) as reg_model
    from log
    where time_offset >= -4 and time_offset <= 0
    group by group_id
),

model2 as
(
    select group_id,
        recent_regression(y, array[x1, x2, 1.0], time_offset, -9, -5, 1, 0.999) as reg_model
    from log
    where time_offset >= -9 and time_offset <= -5
    group by group_id
),

model as
(
    select m1.group_id,
        merge_recent_regression(m1.reg_model, m2.reg_model) as reg_model
    from model1 as m1
        join model2 as m2
            on m1.group_id = m2.group_id
),

new_data as
(
    select 'A' as group_id, 1 as obs_id, 3.0 as x1, 5.0 as x2, 1.0 as x3 union all
    select 'A' as group_id, 2 as obs_id, 7.0 as x1, 8.0 as x2, 1.0 as x3
)

select m.group_id,
    n.obs_id,
    recent_regression_predict(m.reg_model, array[n.x1, n.x2, 1.0]) as predicted_value
from model as m
    join new_data as n
        on m.group_id = n.group_id
order by m.group_id, n.obs_id

クエリおよび分析の結果
predicted_valueパラメーターの値は、予測値を示します。
グループ_id
obs_id
predicted_value
A
1
17.489274877305804
A
2
22.3233353394362

グループ_id	observation_id	y	predicted_y
A	S001	23.91700530543459	15.68867910570816
A	S002	6.931858878794941	15.352330987812993
...		...	...

グループ_id	obs_id	predicted_value
A	1	17.489274877305804
A	2	22.3233353394362