Machine learning syntax and functions - Simple Log Service

Simple Log Service provides the machine learning feature that supports multiple algorithms and calling methods. You can use the analytic statement and machine learning functions to call machine learning algorithms to analyze the characteristics of one or more fields within a period of time. Simple Log Service offers various time series analysis algorithms. You can call these algorithms to solve problems that are related to time series data. For example, you can predict time series, detect time series anomalies, decompose time series, and cluster multiple time series. In addition, the algorithms are compatible with standard SQL functions. This simplifies the usage of the algorithms and improves the efficiency of troubleshooting.

Features

Supports various smooth operations on single-time series data.
Supports algorithms that are used for the prediction, anomaly detection, change point detection, inflection point detection, and multi-period estimation of single-time series data.
Supports decomposition operations on single-time series data.
Supports various clustering algorithms of multi-time series data.
Supports multi-field pattern mining (based on the sequence of numeric data or text).

Limits

When you use the machine learning feature of Simple Log Service, you must take note of the following limits:

The specified time series data must be sampled based on the same interval.
The specified time series data cannot contain data that is repeatedly sampled from the same point in time.

The processing capacity cannot exceed the maximum capacity. The following table describes the limits.

Item	Limit
Capacity of the time-series data processing	Data can be collected from a maximum of 150,000 consecutive points in time. If the data volume exceeds the processing capacity, you must aggregate the data or reduce the sampling amount.
Capacity of the density-based clustering algorithm	A maximum of 5,000 time series curves can be clustered at a time. Each curve cannot contain more than 1,440 points in time.
Capacity of the hierarchical clustering algorithm	A maximum of 2,000 time series curves can be clustered at a time. Each curve cannot contain more than 1,440 points in time.

Machine learning functions

Category		Function	Description
Time series	Smooth function	ts_smooth_simple	Uses the Holt Winters algorithm to smooth time series data.
		ts_smooth_fir	Uses the finite impulse response (FIR) filter to smooth time series data.
		ts_smooth_iir	Uses the infinite impulse response (IIR) filter to smooth time series data.
	Multi-period estimation function	ts_period_detect	Estimates time series data by period.
	Change point detection function	ts_cp_detect	Detects the intervals in which data has different statistical features. The interval endpoints are change points.
	Change point detection function	ts_breakout_detect	Detects the points in time at which data experiences dramatic changes.
	Maximum value detection function	ts_find_peaks	Detects the local maximum value of time series data in a specified window.
	Prediction and anomaly detection function	ts_predicate_simple	Uses default parameters to model time series data, predict time series data, and detect anomalies.
		ts_predicate_ar	Uses an autoregressive (AR) model to model time series data, predict time series data, and detect anomalies.
		ts_predicate_arma	Uses an autoregressive moving average (ARMA) model to model time series data, predict time series data, and detect anomalies.
		ts_predicate_arima	Uses an autoregressive integrated moving average (ARIMA) model to model time series data, predict time series data, and detect anomalies.
		ts_regression_predict	Predicts the long-run trend for a single periodic time series.
	Sequence decomposition function	ts_decompose	Uses the Seasonal and Trend decomposition using Loess (STL) algorithm to decompose time series data.
	Time series clustering function	ts_density_cluster	Uses a density-based clustering method to cluster multiple time series.
		ts_hierarchical_cluster	Uses a hierarchical clustering method to cluster multiple time series.
		ts_similar_instance	Queries time series curves that are similar to a specified time series curve.
	Kernal density estimation functions	kernel_density_estimation	Uses the smooth peak function to fit the observed data points. In this way, the function simulates the real probability distribution curve.
	Time series padding function	series_padding	Pads data points that are missing in a time series.
	Anomaly comparison function	anomaly_compare	Compares the degree of difference of an observed object in two periods of time.
Pattern mining	Frequent pattern statistical function	pattern_stat	Mines representative combinations of attributes among the given multi-attribute field samples to obtain the frequent pattern in statistical patterns.
	Differential pattern statistical function	pattern_diff	Identifies the pattern that causes differences between two collections in specified conditions.
	Root cause analysis function	rca_kpi_search	Analyze the subdimension attributes that cause anomalies of the monitoring metric.
	Correlation analysis functions	ts_association_analysis	Identifies the metrics that are correlated to a specified metric among multiple observed metrics in the system.
	Correlation analysis functions	ts_similar	Identifies the metrics that are correlated to specified time series data among multiple observed metrics in the system.
	Request URL classification function	url_classify	Classifies a request URL and attaches a tag to the URL. The function also provides the regular expression that defines the pattern of the tag.