Time series analysis functions - Simple Log Service - Alibaba Cloud Documentation Center

Time series analysis functions are used to process metrics that are generated in Internet service systems or business operations. The metrics can be used to forecast future trends and identify anomalies during operations. Simple Log Service provides time series analysis functions. This topic describes the syntax of time series analysis functions. This topic also provides examples on how to use the functions.

Background information

The following figure shows the indexes of fields in the sample log that is used for the time series analysis functions. For more information, see Create indexes.

The following code shows the sample log:

{"metric_id":1,"metric_value":90.0,"time_period":"2024-01-01 00:00:00"}
{"metric_id":1,"metric_value":92.0,"time_period":"2024-01-01 00:01:00"}
{"metric_id":1,"metric_value":93.0,"time_period":"2024-01-01 00:02:00"}
{"metric_id":2,"metric_value":85.0,"time_period":"2024-01-01 00:00:00"}
{"metric_id":2,"metric_value":87.0,"time_period":"2024-01-01 00:01:00"}
{"metric_id":2,"metric_value":89.0,"time_period":"2024-01-01 00:02:00"}
{"metric_id":3,"metric_value":96.0,"time_period":"2024-01-01 00:00:00"}
{"metric_id":3,"metric_value":97.0,"time_period":"2024-01-01 00:01:00"}
{"metric_id":3,"metric_value":98.0,"time_period":"2024-01-01 00:02:00"}

Functions

Function	Syntax	Description	Data type of the return value

Function	Syntax	Description	Data type of the return value
ts_forecast function	ts_forecast(array(timestamp) time_periods, array(double) metric_values, timestamp data_begin_time, timestamp data_end_time, timestamp prediction_end_time, varchar time_unit, varchar options)	Forecasts the futuristic metric changes in a specific time range based on the time series that you specify and returns the results in the JSON format.	varchar
ts_extract_features or ts_extract_features_in_json function	ts_extract_features(array(double) period_unix_time_array, array(double) metric_value_array)	Extracts the features of the time series that you specify. The features can be used to identify machine models.	array(double)
ts_extract_features or ts_extract_features_in_json function	ts_extract_features_in_json(array(double) period_unix_time_array, array(double) metric_value_array)		varchar
ts_fill_missing or ts_impute function	ts_fill_missing(array(double) unixtime_periods, array(double) metric_values, double begin_unixtime, end_unixtime, varchar time_unit, varchar fill_option) ts_impute(array(double) unixtime_periods, array(double) metric_values, double begin_unixtime, end_unixtime, varchar time_unit, varchar fill_option) ts_fill_nan(array(double) unixtime_periods, array(double) metric_values, double begin_unixtime, end_unixtime, varchar time_unit, varchar fill_option)	Fills data at the missing points in time.	array(array(double))
ts_analyze_cycles function	ts_analyze_cycles(array(double) unixtime_periods, array(double) metric_values, double begin_unixtime, end_unixtime, varchar time_unit, varchar fill_option)	Analyzes the approximate cycles of the time series that you specify.	varchar

ts_forecast function

The ts_forecast function forecasts the futuristic metric changes in a specific time range based on the time series that you specify and returns the results in the JSON format.

varchar ts_forecast(array(varchar) time_periods, array(double) metric_values, varchar data_begin_time, varchar data_end_time, timestamp prediction_end_time, varchar time_unit, varchar options)

Parameter	Description

Parameter	Description
`time_periods`	An array of time values in the time series. Supported data types of the parameter: The timestamp type. The varchar type. Example: `2024-01-01 10:00:00`. The double type. You must specify a UNIX time value of the double type.
`metric_values`	An array of metric values in the time series. The function focuses on data points with non-zero metric values. You must specify a time range to fill missing data.
`data_begin_time`	The start time of the time range for the time series. Supported data types of the parameter: The timestamp type. The varchar type. Example: `2024-01-01 10:00:00`. The double type. You must specify a UNIX time value of the double type.
`data_end_time`	The end time of the time range for the time series. Supported data types of the parameter: The timestamp type. The varchar type. Example: `2024-01-01 10:00:00`. The double type. You must specify a UNIX time value of the double type. Note The time range that you specify for the time series is a left-closed, right-open interval. The start time specified by the `data_begin_time` parameter is included, but the end time specified by the `data_end_time` parameter is not included.
`prediction_end_time`	The end time of forecasting. The end time is not included. Supported data types of the parameter: The timestamp type. The double type. You must specify a UNIX time value of the double type.
`time_unit`	The time interval of the time series. Examples: `10 second`, `1 minute`, `1 hour`, `1 day`, `1 week`, `1 month`, and `1 year`.
`options`	The data filling method and the settings on whether to export historical data. This parameter is optional. Data filling methods: value mean median min max local-interpolation Whether to export historical data: true false Sample settings: fillnan=value:0,outputhist=false: uses 0 to fill missing data and does not export historical data. fillnan=mean,outputhist=false: uses the average value to fill missing data and does not export historical data. fillnan=local-interpolation,outputhist=false: uses the local interpolation method to fill missing data and does not export historical data.

Examples

Query statement

* | select metric_id,
    ts_forecast(
      array_agg(time_period),  -- time_period array
      array_agg(metric_value), -- metric_value array
      '2024-01-01 00:00:00',   -- sample start time (inclusive)
      '2024-01-01 00:05:00',   -- sample end time (exclusive)
      '2024-01-01 01:00:00',   -- prediction end time (exclusive)
      '1 miute',               -- time unit
      'avg'                
    ) as forecast_outcome
    from log
where time_period >= '2024-01-01 00:00:00'
    and time_period < '2024-01-01 00:06:00'
group by metric_id

Query and analysis results

{
   "predictions":[
      {
         "timePeriod":1704067200,
         "prediction":59.2251832913215,
         "actualValue":"NaN",
         "upperBoundBy3StdDev":84.66650365159649,
         "lowerBoundBy3StdDev":33.78386293104651,
         "isForecast":true
      },
      {
         "timePeriod":1704067260,
         "prediction":57.62225491132578,
         "actualValue":"NaN",
         "upperBoundBy3StdDev":83.06357527160077,
         "lowerBoundBy3StdDev":32.18093455105079,
         "isForecast":true
      },
      ...
   ],
   "isHistoryIncluded":false,
   "errorSummary":{
      "meanAbsError":6.359932652686714,
      "meanAbsPercentageError":0.08384313053350512,
      "rootMeanSquareError":8.480440120091664,
      "standardDeviation":8.480440120091664,
      "meanError":-0.31236100941915534,
      "sampleSize":10080,
      "targetMean":97.94062488445213,
      "targetStdDev":11.186867641715377,
      "rsquare":0.92532833455449703,
      "correlation":0.961939880946048
   }
}

Parameters in the query and analysis results

Parameter	Description

Parameter	Description
`predictions`	The forecasting results. The value is an array that consists of multiple structs. Each struct indicates the forecasting results at a specific point in time. The `data_end_time` and `prediction_end_time` parameters specify the time range of forecasting, which is a left-open, right-closed interval. If you specify `outputhist=true` for the options parameter, the array that is returned includes historical data. In this case, the time range of the output data is `[data_begin_time, prediction_end_time)`. The following list describes the parameters in each struct: `timePeriod`: the point in time of forecasting. `prediction`: the forecast value of the metric. `actualValue`: the actual value of the metric. For the time range of [data_end_time, prediction_end_time), the value of the actualValue parameter is always NaN because no actual values are generated for the metric within the time range. If you specify `outputhist=true` for the options parameter, the array that is returned for the `predictions` parameter includes historical data within the time range of [data_begin_time, data_end_time). In this case, the value of the `actualValue` parameter is an actual value of the metric in the historical data. `upperBoundBy3StdDev` and `lowerBoundBy3StdDev`: the upper limit and lower limit of the forecast interval, which is within the three standard deviations of forecast errors. The training for the forecasting is centered on the value of the prediction parameter. In most cases, a three standard deviation interval represents a 99% confidence interval. The actual values collected in the future have a 99% probability of falling within the forecast interval. You can also calculate a custom confidence interval based on the values of the prediction and errorSummary.standardDeviation parameters. For example, you can use a two standard deviation interval. However, if an interval is narrow, the confidence level decreases. A two standard deviation interval represents only a 95% confidence interval. `isForecast`: specifies whether the time of the data point is futuristic or historical. The function compares the time with the value of the `data_end_time` parameter to determine whether the time is futuristic or historical. If the time is less than the value of the `data_end_time` parameter, the function returns false, which indicates that the time is historical. Otherwise, the function returns true, which indicates that the time is futuristic.
`isHistoryIncluded`	Specifies whether the value of the `predictions` parameter includes historical data. Valid values: true false
`errorSummary`	The summary of forecast error statistics that are collected when the forecast model is tested based on historical data. The forecast model is tested based on historical data. In metric forecasting, differences between forecast values and actual values result in forecast errors. You can evaluate the overall accuracy of forecasting by performing forecasting on historical data and collecting related statistics. `meanAbsError`: the mean absolute error (MAE) of the forecast errors. `meanAbsPercentageError`: the mean absolute percentage error (MAPE) of the forecast errors. `rootMeanSquareError`: the mean squared error (MSE) of the forecast errors. `standardDeviation`: the standard deviation of the forecast errors. The value indicates how much the data points tend to deviate from the mean in the forecasting. `meanError`: the average of the forecast errors. `sampleSize`: the number of historical data samples. `targetMean`: the average of the historical data samples. `targetStdDev`: the standard deviation of the historical data samples. `rsquare`: the R-squared value. The value is a metric used to measure the forecasting accuracy of a regression model that is tested based on historical data. The value must fall within the range of `(0,1)`. A value closer to 1 indicates higher forecasting accuracy. `correlation`: the square root of the value of the `rsquare` parameter.

ts_extract_features or ts_extract_features_in_json function

The ts_extract_features or ts_extract_features_in_json function extracts the features of the time series that you specify. The features can be used to identify machine models.

array(double) ts_extract_features(array(double) period_unix_time_array, array(double) metric_value_array)

varchar ts_extract_features_in_json(array(double) period_unix_time_array, array(double) metric_value_array)

Parameter	Description

Parameter	Description
`period_unix_time_array`	An array of UNIX time values in the time series.
`metric_value_array`	An array of metric values in the time series. Note The `period_unix_time_array` and `metric_value_array` parameters are used to describe the same time series.

Examples

Example 1

Query statement

* | select metric_id,
    ts_extract_features(
      array_agg(to_unixtime(time_period)),
      array_agg(metric_value)) as feature_values
      group by metric_id

Query and analysis results
The value of the feature_values parameter is an array of generated feature values.
metric_id
feature_values
metric_id
feature_values
1
[0.123456, ...]

Example 2

Query statement

* | select metric_id,
    ts_extract_features_in_json(
      array_agg(to_unixtime(time_period)),
      array_agg(metric_value)) as feature_values
      group by metric_id

Query and analysis results

names: an array of feature vector names.
values: an array of feature vector values.

metric_id	feature_values

metric_id	feature_values
1	{"values":[1.0,0.0,0.0,0.0,645.0,41685.0,1.0,1.0,0.0,64.5,64.5,10.0,2.8722813232690143],"names":["value__variance_larger_than_standard_deviation","value__has_duplicate_max","value__has_duplicate_min","value__has_duplicate","value__sum_values","value__abs_energy","value__mean_abs_change","value__mean_change"]

ts_fill_missing or ts_impute function

The ts_fill_missing or ts_impute function fills data at the missing points in time.

array(array(double)) ts_fill_missing(array(double) unixtime_periods, array(double) metric_values, double begin_unixtime,  double end_unixtime, varchar time_unit, varchar fill_option)

array(array(double)) ts_impute(array(double) unixtime_periods, array(double) metric_values, double begin_unixtime,  double end_unixtime, varchar time_unit, varchar fill_option)

array(array(double)) ts_fill_nan(array(double) unixtime_periods, array(double) metric_values, double begin_unixtime,  double end_unixtime, varchar time_unit, varchar fill_option)

Parameter	Description

Parameter	Description
`unixtime_periods`	An array of time values in the time series. The array contains UNIX time values of the double type.
`metric_values`	An array of metric values in the time series. The value is of the double type.
`begin_unixtime`	The start time of the time range for the time series. The parameter can be a UNIX time value of the double type.
`end_unixtime`	The end time of the time range for the time series. Note The time range that you specify for the time series is a left-closed, right-open interval. The start time specified by the `begin_unixtime` parameter is included, but the end time specified by the `end_unixtime` parameter is not included.
`time_unit`	The time interval of the time series. Examples: `10 second`, `1 minute`, `1 hour`, `1 day`, `1 week`, `1 month`, and `1 year`.
fill_option	The data filling method and the settings on whether to export historical data. This parameter is optional. Data filling methods: value mean median min max local-interpolation Whether to export historical data: true false Sample settings: fillnan=value:0,outputhist=false: uses 0 to fill missing data and does not export historical data. fillnan=mean,outputhist=false: uses the average value to fill missing data and does not export historical data. fillnan=local-interpolation,outputhist=false: uses the local interpolation method to fill missing data and does not export historical data.

Examples

Query statement

* | select metric_id,
        ts_fill_missing(
            array_agg(to_unixtime(time_period)), -- time periods
            array_agg(metric_value),             -- metric values
            to_unixtime(timestamp '2024-01-01 00:00:00'), -- series begin time
            to_unixtime(timestamp '2024-01-01 00:05:00'), -- series end time
            '1 minute',                          -- series time unit
            'value=0'                 -- fill the missing values with 0
        ) as imputed_time_series
        from log
    where time_period >= '2024-01-01 00:00:00'
      and time_period < '2024-01-01 00:05:00'
    group by metric_id

Query and analysis results

metric_id	imputed_time_series

metric_id	imputed_time_series
1	[[1704038400.0,1704038460.0,1704038520.0,1704038580.0,1704038640.0],[60.0,61.0,62.0,63.0,64.0]]
2	[[1704038400.0,1704038460.0,1704038520.0,1704038580.0,1704038640.0],[70.0,71.0,72.0,73.0,74.0]]
3	[[1704038400.0,1704038460.0,1704038520.0,1704038580.0,1704038640.0],[80.0,81.0,82.0,83.0,84.0]]

ts_analyze_cycles function

The ts_analyze_cycles function analyzes the approximate cycles of the time series that you specify.

varchar ts_analyze_cycles(array(double) unixtime_periods, array(double) metric_values, double begin_unixtime,  end_unixtime, varchar time_unit, varchar fill_option)

Parameter	Description

Parameter	Description
`unixtime_periods`	An array of time values in the time series. The array contains UNIX time values of the double type.
`metric_values`	An array of metric values in the time series. The value is of the double type.
`begin_unixtime`	The start time of the time range for the time series. The parameter can be a UNIX time value of the double type.
`end_unixtime`	The end time of the time range for the time series. Note The time range that you specify for the time series is a left-closed, right-open interval. The start time specified by the `begin_unixtime` parameter is included, but the end time specified by the `end_unixtime` parameter is not included.
`time_unit`	The time interval of the time series. Examples: `10 second`, `1 minute`, `1 hour`, `1 day`, `1 week`, `1 month`, and `1 year`.
fill_option	The data filling method and the settings on whether to export historical data. Data filling methods: value mean median min max local-interpolation Whether to export historical data: true false Sample settings: fillnan=value:0,outputhist=false: uses 0 to fill missing data and does not export historical data. fillnan=mean,outputhist=false: uses the average value to fill missing data and does not export historical data. fillnan=local-interpolation,outputhist=false: uses the local interpolation method to fill missing data and does not export historical data.

Examples

Query statement

* | select metric_id,
        ts_analyze_cycles(
            array_agg(to_unixtime(time_period)), -- time periods
            array_agg(metric_value),             -- metric values
            to_unixtime(timestamp '2024-01-01 00:00:00'), -- series begin time
            to_unixtime(timestamp '2024-01-01 00:05:00'), -- series end time
            '1 minute',                          -- series time unit
            'value=0'                 -- fill the missing values with 0
        ) as imputed_time_series
        from log
    where time_period >= '2024-01-01 00:00:00'
      and time_period < '2024-01-01 00:05:00'
    group by metric_id

Query and analysis results

metric_id	cycle_analysis

metric_id

cycle_analysis

{"cycleSize":66,"cycleCorrelation":0.9859745299608902,"frequencyWeight":2.8240516697103124E7,"topCycles":[{"cycleSize":66,"cycleCorrelation":0.9859745299608902,"frequencyWeight":2.8240516697103124E7},{"cycleSize":72,"cycleCorrelation":0.9750729127628865,"frequencyWeight":9538762.646349236},{"cycleSize":22,"cycleCorrelation":0.9628902341901314,"frequencyWeight":1.5834018306832194E7},{"cycleSize":33,"cycleCorrelation":0.9518752073734051,"frequencyWeight":1.1270983504364485E7}]}

The following table describes the parameters in the query and analysis results.

Parameter	Description

Parameter	Description
`cycleSize`	The length of the most approximate cycle.
`cycleCorrelation`	The correlation of the most approximate cycle.
`frequencyWeight`	The spectral density of the cycle after the power spectrum analysis of the time series is performed.
`topCycles`	The data of cycles that are involved in the multi-cycle analysis of the time series. Up to first five cycles can be returned. Each cycle contains the `cycleSize`, `cycleCorrelation`, and `frequencyWeight` fields.

metric_id	feature_values
1	[0.123456, ...]