Time series analysis functions are used to process metrics that are generated in Internet service systems or business operations. The metrics can be used to forecast future trends and identify anomalies during operations. Simple Log Service provides time series analysis functions. This topic describes the syntax of time series analysis functions. This topic also provides examples on how to use the functions.
Background information
The following figure shows the indexes of fields in the sample log that is used for the time series analysis functions. For more information, see Create indexes.
The following code shows the sample log:
{"metric_id":1,"metric_value":90.0,"time_period":"2024-01-01 00:00:00"} {"metric_id":1,"metric_value":92.0,"time_period":"2024-01-01 00:01:00"} {"metric_id":1,"metric_value":93.0,"time_period":"2024-01-01 00:02:00"} {"metric_id":2,"metric_value":85.0,"time_period":"2024-01-01 00:00:00"} {"metric_id":2,"metric_value":87.0,"time_period":"2024-01-01 00:01:00"} {"metric_id":2,"metric_value":89.0,"time_period":"2024-01-01 00:02:00"} {"metric_id":3,"metric_value":96.0,"time_period":"2024-01-01 00:00:00"} {"metric_id":3,"metric_value":97.0,"time_period":"2024-01-01 00:01:00"} {"metric_id":3,"metric_value":98.0,"time_period":"2024-01-01 00:02:00"}
Functions
Function | Syntax | Description | Data type of the return value |
ts_forecast(array(timestamp) time_periods, array(double) metric_values, timestamp data_begin_time, timestamp data_end_time, timestamp prediction_end_time, varchar time_unit, varchar options) | Forecasts the futuristic metric changes in a specific time range based on the time series that you specify and returns the results in the JSON format. | varchar | |
ts_extract_features(array(double) period_unix_time_array, array(double) metric_value_array) | Extracts the features of the time series that you specify. The features can be used to identify machine models. | array(double) | |
ts_extract_features_in_json(array(double) period_unix_time_array, array(double) metric_value_array) | varchar | ||
| Fills data at the missing points in time. | array(array(double)) | |
ts_analyze_cycles(array(double) unixtime_periods, array(double) metric_values, double begin_unixtime, end_unixtime, varchar time_unit, varchar fill_option) | Analyzes the approximate cycles of the time series that you specify. | varchar |
ts_forecast function
The ts_forecast function forecasts the futuristic metric changes in a specific time range based on the time series that you specify and returns the results in the JSON format.
varchar ts_forecast(array(varchar) time_periods, array(double) metric_values, varchar data_begin_time, varchar data_end_time, timestamp prediction_end_time, varchar time_unit, varchar options)
Parameter | Description |
| An array of time values in the time series. Supported data types of the parameter:
|
| An array of metric values in the time series. The function focuses on data points with non-zero metric values. You must specify a time range to fill missing data. |
| The start time of the time range for the time series. Supported data types of the parameter:
|
| The end time of the time range for the time series. Supported data types of the parameter:
Note The time range that you specify for the time series is a left-closed, right-open interval. The start time specified by the |
| The end time of forecasting. The end time is not included. Supported data types of the parameter:
|
| The time interval of the time series. Examples: |
| The data filling method and the settings on whether to export historical data. This parameter is optional.
|
Examples
Query statement
* | select metric_id, ts_forecast( array_agg(time_period), -- time_period array array_agg(metric_value), -- metric_value array '2024-01-01 00:00:00', -- sample start time (inclusive) '2024-01-01 00:05:00', -- sample end time (exclusive) '2024-01-01 01:00:00', -- prediction end time (exclusive) '1 miute', -- time unit 'avg' ) as forecast_outcome from log where time_period >= '2024-01-01 00:00:00' and time_period < '2024-01-01 00:06:00' group by metric_id
Query and analysis results
{ "predictions":[ { "timePeriod":1704067200, "prediction":59.2251832913215, "actualValue":"NaN", "upperBoundBy3StdDev":84.66650365159649, "lowerBoundBy3StdDev":33.78386293104651, "isForecast":true }, { "timePeriod":1704067260, "prediction":57.62225491132578, "actualValue":"NaN", "upperBoundBy3StdDev":83.06357527160077, "lowerBoundBy3StdDev":32.18093455105079, "isForecast":true }, ... ], "isHistoryIncluded":false, "errorSummary":{ "meanAbsError":6.359932652686714, "meanAbsPercentageError":0.08384313053350512, "rootMeanSquareError":8.480440120091664, "standardDeviation":8.480440120091664, "meanError":-0.31236100941915534, "sampleSize":10080, "targetMean":97.94062488445213, "targetStdDev":11.186867641715377, "rsquare":0.92532833455449703, "correlation":0.961939880946048 } }
Parameters in the query and analysis results
Parameter
Description
predictions
The forecasting results. The value is an array that consists of multiple structs. Each struct indicates the forecasting results at a specific point in time. The
data_end_time
andprediction_end_time
parameters specify the time range of forecasting, which is a left-open, right-closed interval. If you specifyoutputhist=true
for the options parameter, the array that is returned includes historical data. In this case, the time range of the output data is[data_begin_time, prediction_end_time)
.The following list describes the parameters in each struct:
timePeriod
: the point in time of forecasting.prediction
: the forecast value of the metric.actualValue
: the actual value of the metric. For the time range of [data_end_time, prediction_end_time), the value of the actualValue parameter is always NaN because no actual values are generated for the metric within the time range. If you specifyoutputhist=true
for the options parameter, the array that is returned for thepredictions
parameter includes historical data within the time range of [data_begin_time, data_end_time). In this case, the value of theactualValue
parameter is an actual value of the metric in the historical data.upperBoundBy3StdDev
andlowerBoundBy3StdDev
: the upper limit and lower limit of the forecast interval, which is within the three standard deviations of forecast errors. The training for the forecasting is centered on the value of the prediction parameter. In most cases, a three standard deviation interval represents a 99% confidence interval. The actual values collected in the future have a 99% probability of falling within the forecast interval. You can also calculate a custom confidence interval based on the values of the prediction and errorSummary.standardDeviation parameters. For example, you can use a two standard deviation interval. However, if an interval is narrow, the confidence level decreases. A two standard deviation interval represents only a 95% confidence interval.isForecast
: specifies whether the time of the data point is futuristic or historical. The function compares the time with the value of thedata_end_time
parameter to determine whether the time is futuristic or historical. If the time is less than the value of thedata_end_time
parameter, the function returns false, which indicates that the time is historical. Otherwise, the function returns true, which indicates that the time is futuristic.
isHistoryIncluded
Specifies whether the value of the
predictions
parameter includes historical data. Valid values:true
false
errorSummary
The summary of forecast error statistics that are collected when the forecast model is tested based on historical data. The forecast model is tested based on historical data. In metric forecasting, differences between forecast values and actual values result in forecast errors. You can evaluate the overall accuracy of forecasting by performing forecasting on historical data and collecting related statistics.
meanAbsError
: the mean absolute error (MAE) of the forecast errors.meanAbsPercentageError
: the mean absolute percentage error (MAPE) of the forecast errors.rootMeanSquareError
: the mean squared error (MSE) of the forecast errors.standardDeviation
: the standard deviation of the forecast errors. The value indicates how much the data points tend to deviate from the mean in the forecasting.meanError
: the average of the forecast errors.sampleSize
: the number of historical data samples.targetMean
: the average of the historical data samples.targetStdDev
: the standard deviation of the historical data samples.rsquare
: the R-squared value. The value is a metric used to measure the forecasting accuracy of a regression model that is tested based on historical data. The value must fall within the range of(0,1)
. A value closer to 1 indicates higher forecasting accuracy.correlation
: the square root of the value of thersquare
parameter.
ts_extract_features or ts_extract_features_in_json function
The ts_extract_features or ts_extract_features_in_json function extracts the features of the time series that you specify. The features can be used to identify machine models.
array(double) ts_extract_features(array(double) period_unix_time_array, array(double) metric_value_array)
Or
varchar ts_extract_features_in_json(array(double) period_unix_time_array, array(double) metric_value_array)
Parameter | Description |
| An array of UNIX time values in the time series. |
| An array of metric values in the time series. Note The |
Examples
Example 1
Query statement
* | select metric_id, ts_extract_features( array_agg(to_unixtime(time_period)), array_agg(metric_value)) as feature_values group by metric_id
Query and analysis results
The value of the
feature_values
parameter is an array of generated feature values.metric_id
feature_values
1
[0.123456, ...]
Example 2
Query statement
* | select metric_id, ts_extract_features_in_json( array_agg(to_unixtime(time_period)), array_agg(metric_value)) as feature_values group by metric_id
Query and analysis results
names
: an array of feature vector names.values
: an array of feature vector values.
metric_id
feature_values
1
{"values":[1.0,0.0,0.0,0.0,645.0,41685.0,1.0,1.0,0.0,64.5,64.5,10.0,2.8722813232690143],"names":["value__variance_larger_than_standard_deviation","value__has_duplicate_max","value__has_duplicate_min","value__has_duplicate","value__sum_values","value__abs_energy","value__mean_abs_change","value__mean_change"]
ts_fill_missing or ts_impute function
The ts_fill_missing or ts_impute function fills data at the missing points in time.
array(array(double)) ts_fill_missing(array(double) unixtime_periods, array(double) metric_values, double begin_unixtime, double end_unixtime, varchar time_unit, varchar fill_option)
Or
array(array(double)) ts_impute(array(double) unixtime_periods, array(double) metric_values, double begin_unixtime, double end_unixtime, varchar time_unit, varchar fill_option)
Or
array(array(double)) ts_fill_nan(array(double) unixtime_periods, array(double) metric_values, double begin_unixtime, double end_unixtime, varchar time_unit, varchar fill_option)
Parameter | Description |
| An array of time values in the time series. The array contains UNIX time values of the double type. |
| An array of metric values in the time series. The value is of the double type. |
| The start time of the time range for the time series. The parameter can be a UNIX time value of the double type. |
| The end time of the time range for the time series. Note The time range that you specify for the time series is a left-closed, right-open interval. The start time specified by the |
| The time interval of the time series. Examples: |
fill_option | The data filling method and the settings on whether to export historical data. This parameter is optional.
|
Examples
Query statement
* | select metric_id, ts_fill_missing( array_agg(to_unixtime(time_period)), -- time periods array_agg(metric_value), -- metric values to_unixtime(timestamp '2024-01-01 00:00:00'), -- series begin time to_unixtime(timestamp '2024-01-01 00:05:00'), -- series end time '1 minute', -- series time unit 'value=0' -- fill the missing values with 0 ) as imputed_time_series from log where time_period >= '2024-01-01 00:00:00' and time_period < '2024-01-01 00:05:00' group by metric_id
Query and analysis results
metric_id
imputed_time_series
1
[[1704038400.0,1704038460.0,1704038520.0,1704038580.0,1704038640.0],[60.0,61.0,62.0,63.0,64.0]]
2
[[1704038400.0,1704038460.0,1704038520.0,1704038580.0,1704038640.0],[70.0,71.0,72.0,73.0,74.0]]
3
[[1704038400.0,1704038460.0,1704038520.0,1704038580.0,1704038640.0],[80.0,81.0,82.0,83.0,84.0]]
ts_analyze_cycles function
The ts_analyze_cycles function analyzes the approximate cycles of the time series that you specify.
varchar ts_analyze_cycles(array(double) unixtime_periods, array(double) metric_values, double begin_unixtime, end_unixtime, varchar time_unit, varchar fill_option)
Parameter | Description |
| An array of time values in the time series. The array contains UNIX time values of the double type. |
| An array of metric values in the time series. The value is of the double type. |
| The start time of the time range for the time series. The parameter can be a UNIX time value of the double type. |
| The end time of the time range for the time series. Note The time range that you specify for the time series is a left-closed, right-open interval. The start time specified by the |
| The time interval of the time series. Examples: |
fill_option | The data filling method and the settings on whether to export historical data.
|
Examples
Query statement
* | select metric_id, ts_analyze_cycles( array_agg(to_unixtime(time_period)), -- time periods array_agg(metric_value), -- metric values to_unixtime(timestamp '2024-01-01 00:00:00'), -- series begin time to_unixtime(timestamp '2024-01-01 00:05:00'), -- series end time '1 minute', -- series time unit 'value=0' -- fill the missing values with 0 ) as imputed_time_series from log where time_period >= '2024-01-01 00:00:00' and time_period < '2024-01-01 00:05:00' group by metric_id
Query and analysis results
metric_id
cycle_analysis
1
{"cycleSize":66,"cycleCorrelation":0.9859745299608902,"frequencyWeight":2.8240516697103124E7,"topCycles":[{"cycleSize":66,"cycleCorrelation":0.9859745299608902,"frequencyWeight":2.8240516697103124E7},{"cycleSize":72,"cycleCorrelation":0.9750729127628865,"frequencyWeight":9538762.646349236},{"cycleSize":22,"cycleCorrelation":0.9628902341901314,"frequencyWeight":1.5834018306832194E7},{"cycleSize":33,"cycleCorrelation":0.9518752073734051,"frequencyWeight":1.1270983504364485E7}]}
The following table describes the parameters in the query and analysis results.
Parameter
Description
cycleSize
The length of the most approximate cycle.
cycleCorrelation
The correlation of the most approximate cycle.
frequencyWeight
The spectral density of the cycle after the power spectrum analysis of the time series is performed.
topCycles
The data of cycles that are involved in the multi-cycle analysis of the time series. Up to first five cycles can be returned. Each cycle contains the
cycleSize
,cycleCorrelation
, andfrequencyWeight
fields.