All Products
Search
Document Center

Simple Log Service:Time series analysis functions

Last Updated:Oct 20, 2024

Time series analysis functions are used to process metrics that are generated in Internet service systems or business operations. The metrics can be used to forecast future trends and identify anomalies during operations. Simple Log Service provides time series analysis functions. This topic describes the syntax of time series analysis functions. This topic also provides examples on how to use the functions.

Background information

  • The following figure shows the indexes of fields in the sample log that is used for the time series analysis functions. For more information, see Create indexes.

    image

  • The following code shows the sample log:

    {"metric_id":1,"metric_value":90.0,"time_period":"2024-01-01 00:00:00"}
    {"metric_id":1,"metric_value":92.0,"time_period":"2024-01-01 00:01:00"}
    {"metric_id":1,"metric_value":93.0,"time_period":"2024-01-01 00:02:00"}
    {"metric_id":2,"metric_value":85.0,"time_period":"2024-01-01 00:00:00"}
    {"metric_id":2,"metric_value":87.0,"time_period":"2024-01-01 00:01:00"}
    {"metric_id":2,"metric_value":89.0,"time_period":"2024-01-01 00:02:00"}
    {"metric_id":3,"metric_value":96.0,"time_period":"2024-01-01 00:00:00"}
    {"metric_id":3,"metric_value":97.0,"time_period":"2024-01-01 00:01:00"}
    {"metric_id":3,"metric_value":98.0,"time_period":"2024-01-01 00:02:00"}

Functions

Function

Syntax

Description

Data type of the return value

ts_forecast function

ts_forecast(array(timestamp) time_periods, array(double) metric_values, timestamp data_begin_time, timestamp data_end_time, timestamp prediction_end_time, varchar time_unit, varchar options)

Forecasts the futuristic metric changes in a specific time range based on the time series that you specify and returns the results in the JSON format.

varchar

ts_extract_features or ts_extract_features_in_json function

ts_extract_features(array(double) period_unix_time_array, array(double) metric_value_array)

Extracts the features of the time series that you specify. The features can be used to identify machine models.

array(double)

ts_extract_features_in_json(array(double) period_unix_time_array, array(double) metric_value_array)

varchar

ts_fill_missing or ts_impute function

  • ts_fill_missing(array(double) unixtime_periods, array(double) metric_values, double begin_unixtime, end_unixtime, varchar time_unit, varchar fill_option)

  • ts_impute(array(double) unixtime_periods, array(double) metric_values, double begin_unixtime, end_unixtime, varchar time_unit, varchar fill_option)

  • ts_fill_nan(array(double) unixtime_periods, array(double) metric_values, double begin_unixtime, end_unixtime, varchar time_unit, varchar fill_option)

Fills data at the missing points in time.

array(array(double))

ts_analyze_cycles function

ts_analyze_cycles(array(double) unixtime_periods, array(double) metric_values, double begin_unixtime, end_unixtime, varchar time_unit, varchar fill_option)

Analyzes the approximate cycles of the time series that you specify.

varchar

ts_forecast function

The ts_forecast function forecasts the futuristic metric changes in a specific time range based on the time series that you specify and returns the results in the JSON format.

varchar ts_forecast(array(varchar) time_periods, array(double) metric_values, varchar data_begin_time, varchar data_end_time, timestamp prediction_end_time, varchar time_unit, varchar options)

Parameter

Description

time_periods

An array of time values in the time series.

Supported data types of the parameter:

  • The timestamp type.

  • The varchar type. Example: 2024-01-01 10:00:00.

  • The double type. You must specify a UNIX time value of the double type.

metric_values

An array of metric values in the time series. The function focuses on data points with non-zero metric values. You must specify a time range to fill missing data.

data_begin_time

The start time of the time range for the time series.

Supported data types of the parameter:

  • The timestamp type.

  • The varchar type. Example: 2024-01-01 10:00:00.

  • The double type. You must specify a UNIX time value of the double type.

data_end_time

The end time of the time range for the time series.

Supported data types of the parameter:

  • The timestamp type.

  • The varchar type. Example: 2024-01-01 10:00:00.

  • The double type. You must specify a UNIX time value of the double type.

Note

The time range that you specify for the time series is a left-closed, right-open interval. The start time specified by the data_begin_time parameter is included, but the end time specified by the data_end_time parameter is not included.

prediction_end_time

The end time of forecasting. The end time is not included.

Supported data types of the parameter:

  1. The timestamp type.

  2. The double type. You must specify a UNIX time value of the double type.

time_unit

The time interval of the time series. Examples: 10 second, 1 minute, 1 hour, 1 day, 1 week, 1 month, and 1 year.

options

The data filling method and the settings on whether to export historical data. This parameter is optional.

  • Data filling methods:

    • value

    • mean

    • median

    • min

    • max

    • local-interpolation

  • Whether to export historical data:

    • true

    • false

  • Sample settings:

    • fillnan=value:0,outputhist=false: uses 0 to fill missing data and does not export historical data.

    • fillnan=mean,outputhist=false: uses the average value to fill missing data and does not export historical data.

    • fillnan=local-interpolation,outputhist=false: uses the local interpolation method to fill missing data and does not export historical data.

Examples

  • Query statement

    * | select metric_id,
        ts_forecast(
          array_agg(time_period),  -- time_period array
          array_agg(metric_value), -- metric_value array
          '2024-01-01 00:00:00',   -- sample start time (inclusive)
          '2024-01-01 00:05:00',   -- sample end time (exclusive)
          '2024-01-01 01:00:00',   -- prediction end time (exclusive)
          '1 miute',               -- time unit
          'avg'                
        ) as forecast_outcome
        from log
    where time_period >= '2024-01-01 00:00:00'
        and time_period < '2024-01-01 00:06:00'
    group by metric_id
  • Query and analysis results

    {
       "predictions":[
          {
             "timePeriod":1704067200,
             "prediction":59.2251832913215,
             "actualValue":"NaN",
             "upperBoundBy3StdDev":84.66650365159649,
             "lowerBoundBy3StdDev":33.78386293104651,
             "isForecast":true
          },
          {
             "timePeriod":1704067260,
             "prediction":57.62225491132578,
             "actualValue":"NaN",
             "upperBoundBy3StdDev":83.06357527160077,
             "lowerBoundBy3StdDev":32.18093455105079,
             "isForecast":true
          },
          ...
       ],
       "isHistoryIncluded":false,
       "errorSummary":{
          "meanAbsError":6.359932652686714,
          "meanAbsPercentageError":0.08384313053350512,
          "rootMeanSquareError":8.480440120091664,
          "standardDeviation":8.480440120091664,
          "meanError":-0.31236100941915534,
          "sampleSize":10080,
          "targetMean":97.94062488445213,
          "targetStdDev":11.186867641715377,
          "rsquare":0.92532833455449703,
          "correlation":0.961939880946048
       }
    }

    Parameters in the query and analysis results

    Parameter

    Description

    predictions

    The forecasting results. The value is an array that consists of multiple structs. Each struct indicates the forecasting results at a specific point in time. The data_end_time and prediction_end_time parameters specify the time range of forecasting, which is a left-open, right-closed interval. If you specify outputhist=true for the options parameter, the array that is returned includes historical data. In this case, the time range of the output data is [data_begin_time, prediction_end_time).

    The following list describes the parameters in each struct:

    • timePeriod: the point in time of forecasting.

    • prediction: the forecast value of the metric.

    • actualValue: the actual value of the metric. For the time range of [data_end_time, prediction_end_time), the value of the actualValue parameter is always NaN because no actual values are generated for the metric within the time range. If you specify outputhist=true for the options parameter, the array that is returned for the predictions parameter includes historical data within the time range of [data_begin_time, data_end_time). In this case, the value of the actualValue parameter is an actual value of the metric in the historical data.

    • upperBoundBy3StdDev and lowerBoundBy3StdDev: the upper limit and lower limit of the forecast interval, which is within the three standard deviations of forecast errors. The training for the forecasting is centered on the value of the prediction parameter. In most cases, a three standard deviation interval represents a 99% confidence interval. The actual values collected in the future have a 99% probability of falling within the forecast interval. You can also calculate a custom confidence interval based on the values of the prediction and errorSummary.standardDeviation parameters. For example, you can use a two standard deviation interval. However, if an interval is narrow, the confidence level decreases. A two standard deviation interval represents only a 95% confidence interval.

    • isForecast: specifies whether the time of the data point is futuristic or historical. The function compares the time with the value of the data_end_time parameter to determine whether the time is futuristic or historical. If the time is less than the value of the data_end_time parameter, the function returns false, which indicates that the time is historical. Otherwise, the function returns true, which indicates that the time is futuristic.

    isHistoryIncluded

    Specifies whether the value of the predictions parameter includes historical data. Valid values:

    • true

    • false

    errorSummary

    The summary of forecast error statistics that are collected when the forecast model is tested based on historical data. The forecast model is tested based on historical data. In metric forecasting, differences between forecast values and actual values result in forecast errors. You can evaluate the overall accuracy of forecasting by performing forecasting on historical data and collecting related statistics.

    • meanAbsError: the mean absolute error (MAE) of the forecast errors.

    • meanAbsPercentageError: the mean absolute percentage error (MAPE) of the forecast errors.

    • rootMeanSquareError: the mean squared error (MSE) of the forecast errors.

    • standardDeviation: the standard deviation of the forecast errors. The value indicates how much the data points tend to deviate from the mean in the forecasting.

    • meanError: the average of the forecast errors.

    • sampleSize: the number of historical data samples.

    • targetMean: the average of the historical data samples.

    • targetStdDev: the standard deviation of the historical data samples.

    • rsquare: the R-squared value. The value is a metric used to measure the forecasting accuracy of a regression model that is tested based on historical data. The value must fall within the range of (0,1). A value closer to 1 indicates higher forecasting accuracy.

    • correlation: the square root of the value of the rsquare parameter.

ts_extract_features or ts_extract_features_in_json function

The ts_extract_features or ts_extract_features_in_json function extracts the features of the time series that you specify. The features can be used to identify machine models.

array(double) ts_extract_features(array(double) period_unix_time_array, array(double) metric_value_array)

Or

varchar ts_extract_features_in_json(array(double) period_unix_time_array, array(double) metric_value_array)

Parameter

Description

period_unix_time_array

An array of UNIX time values in the time series.

metric_value_array

An array of metric values in the time series.

Note

The period_unix_time_array and metric_value_array parameters are used to describe the same time series.

Examples

Example 1

  • Query statement

    * | select metric_id,
        ts_extract_features(
          array_agg(to_unixtime(time_period)),
          array_agg(metric_value)) as feature_values
          group by metric_id
  • Query and analysis results

    The value of the feature_values parameter is an array of generated feature values.

    metric_id

    feature_values

    1

    [0.123456, ...]

Example 2

  • Query statement

    * | select metric_id,
        ts_extract_features_in_json(
          array_agg(to_unixtime(time_period)),
          array_agg(metric_value)) as feature_values
          group by metric_id
  • Query and analysis results

    • names: an array of feature vector names.

    • values: an array of feature vector values.

    metric_id

    feature_values

    1

    {"values":[1.0,0.0,0.0,0.0,645.0,41685.0,1.0,1.0,0.0,64.5,64.5,10.0,2.8722813232690143],"names":["value__variance_larger_than_standard_deviation","value__has_duplicate_max","value__has_duplicate_min","value__has_duplicate","value__sum_values","value__abs_energy","value__mean_abs_change","value__mean_change"]

ts_fill_missing or ts_impute function

The ts_fill_missing or ts_impute function fills data at the missing points in time.

array(array(double)) ts_fill_missing(array(double) unixtime_periods, array(double) metric_values, double begin_unixtime,  double end_unixtime, varchar time_unit, varchar fill_option)

Or

array(array(double)) ts_impute(array(double) unixtime_periods, array(double) metric_values, double begin_unixtime,  double end_unixtime, varchar time_unit, varchar fill_option)

Or

array(array(double)) ts_fill_nan(array(double) unixtime_periods, array(double) metric_values, double begin_unixtime,  double end_unixtime, varchar time_unit, varchar fill_option)

Parameter

Description

unixtime_periods

An array of time values in the time series. The array contains UNIX time values of the double type.

metric_values

An array of metric values in the time series. The value is of the double type.

begin_unixtime

The start time of the time range for the time series. The parameter can be a UNIX time value of the double type.

end_unixtime

The end time of the time range for the time series.

Note

The time range that you specify for the time series is a left-closed, right-open interval. The start time specified by the begin_unixtime parameter is included, but the end time specified by the end_unixtime parameter is not included.

time_unit

The time interval of the time series. Examples: 10 second, 1 minute, 1 hour, 1 day, 1 week, 1 month, and 1 year.

fill_option

The data filling method and the settings on whether to export historical data. This parameter is optional.

  • Data filling methods:

    • value

    • mean

    • median

    • min

    • max

    • local-interpolation

  • Whether to export historical data:

    • true

    • false

  • Sample settings:

    • fillnan=value:0,outputhist=false: uses 0 to fill missing data and does not export historical data.

    • fillnan=mean,outputhist=false: uses the average value to fill missing data and does not export historical data.

    • fillnan=local-interpolation,outputhist=false: uses the local interpolation method to fill missing data and does not export historical data.

Examples

  • Query statement

    * | select metric_id,
            ts_fill_missing(
                array_agg(to_unixtime(time_period)), -- time periods
                array_agg(metric_value),             -- metric values
                to_unixtime(timestamp '2024-01-01 00:00:00'), -- series begin time
                to_unixtime(timestamp '2024-01-01 00:05:00'), -- series end time
                '1 minute',                          -- series time unit
                'value=0'                 -- fill the missing values with 0
            ) as imputed_time_series
            from log
        where time_period >= '2024-01-01 00:00:00'
          and time_period < '2024-01-01 00:05:00'
        group by metric_id
  • Query and analysis results

    metric_id

    imputed_time_series

    1

    [[1704038400.0,1704038460.0,1704038520.0,1704038580.0,1704038640.0],[60.0,61.0,62.0,63.0,64.0]]

    2

    [[1704038400.0,1704038460.0,1704038520.0,1704038580.0,1704038640.0],[70.0,71.0,72.0,73.0,74.0]]

    3

    [[1704038400.0,1704038460.0,1704038520.0,1704038580.0,1704038640.0],[80.0,81.0,82.0,83.0,84.0]]

ts_analyze_cycles function

The ts_analyze_cycles function analyzes the approximate cycles of the time series that you specify.

varchar ts_analyze_cycles(array(double) unixtime_periods, array(double) metric_values, double begin_unixtime,  end_unixtime, varchar time_unit, varchar fill_option)

Parameter

Description

unixtime_periods

An array of time values in the time series. The array contains UNIX time values of the double type.

metric_values

An array of metric values in the time series. The value is of the double type.

begin_unixtime

The start time of the time range for the time series. The parameter can be a UNIX time value of the double type.

end_unixtime

The end time of the time range for the time series.

Note

The time range that you specify for the time series is a left-closed, right-open interval. The start time specified by the begin_unixtime parameter is included, but the end time specified by the end_unixtime parameter is not included.

time_unit

The time interval of the time series. Examples: 10 second, 1 minute, 1 hour, 1 day, 1 week, 1 month, and 1 year.

fill_option

The data filling method and the settings on whether to export historical data.

  • Data filling methods:

    • value

    • mean

    • median

    • min

    • max

    • local-interpolation

  • Whether to export historical data:

    • true

    • false

  • Sample settings:

    • fillnan=value:0,outputhist=false: uses 0 to fill missing data and does not export historical data.

    • fillnan=mean,outputhist=false: uses the average value to fill missing data and does not export historical data.

    • fillnan=local-interpolation,outputhist=false: uses the local interpolation method to fill missing data and does not export historical data.

Examples

  • Query statement

    * | select metric_id,
            ts_analyze_cycles(
                array_agg(to_unixtime(time_period)), -- time periods
                array_agg(metric_value),             -- metric values
                to_unixtime(timestamp '2024-01-01 00:00:00'), -- series begin time
                to_unixtime(timestamp '2024-01-01 00:05:00'), -- series end time
                '1 minute',                          -- series time unit
                'value=0'                 -- fill the missing values with 0
            ) as imputed_time_series
            from log
        where time_period >= '2024-01-01 00:00:00'
          and time_period < '2024-01-01 00:05:00'
        group by metric_id
  • Query and analysis results

    metric_id

    cycle_analysis

    1

    {"cycleSize":66,"cycleCorrelation":0.9859745299608902,"frequencyWeight":2.8240516697103124E7,"topCycles":[{"cycleSize":66,"cycleCorrelation":0.9859745299608902,"frequencyWeight":2.8240516697103124E7},{"cycleSize":72,"cycleCorrelation":0.9750729127628865,"frequencyWeight":9538762.646349236},{"cycleSize":22,"cycleCorrelation":0.9628902341901314,"frequencyWeight":1.5834018306832194E7},{"cycleSize":33,"cycleCorrelation":0.9518752073734051,"frequencyWeight":1.1270983504364485E7}]}

    The following table describes the parameters in the query and analysis results.

    Parameter

    Description

    cycleSize

    The length of the most approximate cycle.

    cycleCorrelation

    The correlation of the most approximate cycle.

    frequencyWeight

    The spectral density of the cycle after the power spectrum analysis of the time series is performed.

    topCycles

    The data of cycles that are involved in the multi-cycle analysis of the time series. Up to first five cycles can be returned. Each cycle contains the cycleSize, cycleCorrelation, and frequencyWeight fields.