Alibaba Cloud Log Service (SLS) provides a series of tools for DevOps and AIOps, which cover methods such as anomaly detection, time series clustering, and time series prediction. To make the service more accessible, we have integrated our algorithms into SQL to allow relevant configurations at the lowest cost. This article introduces the best practices for time series clustering and correlation analysis.
This article covers several useful functions for time series similarity analysis, involving time series clustering and similarity calculation using Log Service. These functions apply to the following scenarios:
The preceding scenarios come down to two aspects: time series clustering (by shape and by value) and determination of time series similarity.
The SLS platform provides two functions. Check out the documentation page for more details.
ts_density_cluster
ts_hierarchical_cluster
The first function targets clustering based on the curve shape and its underlying core clustering algorithm is the algorithm of density-based clustering (DBSCAN). The second function targets clustering based on the similarity among original curves, with more emphasis on factors such as the Euclidean distance between curves. Its underlying core clustering algorithm is the hierarchical clustering algorithm. For more information about how these functions work, see my previous articles, or search for relevant information online. The following section describes how to use these functions in SLS.
* | select DISTINCT index_name, machine, region from log
* | select count(1) as num from (select DISTINCT index_name, machine, region from log)
* and index_name : load |
select
__time__,
value,
concat(
region, '#', machine, '#', index_name
) as ins
from log order by __time__
limit 10000
*
and index_name : load |
select
date_trunc('minute', __time__) as time,
region,
avg(value) as value
from log group by time, region order by time limit 1000
By executing query01, we get the following information, which indicates the number of different curves contained in the current Logstore and the identifier of each curve. To better observe the 1,300 curves, we use a flow diagram. However, charting all these curves in one diagram consumes considerable browser resources, and it would be very difficult to gain insight from the diagram even if it were generated. We use query04 to observe the visualization of a few curves and compare their visual effect with the effect of query03.
Based on the preceding observation, can we cluster some curves and group similar curves to reduce dimensions for visualized analysis?
The following SQL statements enable quick curve clustering. In this example, the chosen metric is machine load because we want to know how usage changes for different machines. For this purpose, we use the ts_hierarchical_cluster
function to get a facet chart. To make the chart more intuitive, store it in the dashboard.
*
and index_name : load |
select
ts_hierarchical_cluster(time, value, ins)
from
(
select
__time__ as time,
value,
concat(
region, '#', machine, '#', index_name
) as ins
from
log
)
We get a list of machines that have a similar metric curve as aysls-pub-cn-beijing-k8s#192.168.7.254:9100#load
by executing the following SQL statements and use the following flow diagram to visualize the result. The options for determining similarity offered by the similarity function include shape, manhattan, and euclidean.
*
and index_name : load |
select
cast(
cast(ts_value as double) as bigint
) as ts_value,
cast(ds_value as double) as ds_value,
name
from
(
select
tt[1][1] as name,
tt[2] as ts,
tt[3] as ds
from
(
select
ts_similar_instance(
time, value, ins, 'aysls-pub-cn-beijing-k8s#192.168.7.254:9100#load',
10,
'euclidean'
) as res
from
(
select
__time__ as time,
value,
concat(
region, '#', machine, '#', index_name
) as ins
from
log
)
),
unnest(res) as t(tt)
),
unnest(ts) as t(ts_value),
unnest(ds) as t(ds_value)
order by
ts_value
limit
10000
Alibaba Cloud Community - February 10, 2022
Alibaba Cloud Community - October 19, 2021
Alibaba Clouder - July 16, 2018
DavidZhang - July 5, 2022
Data Geek - April 12, 2024
Alibaba Clouder - October 11, 2018
A platform that provides enterprise-level data modeling services based on machine learning algorithms to quickly meet your needs for data-driven operations.
Learn MoreAn all-in-one service for log-type data
Learn MoreThis technology can be used to predict the spread of COVID-19 and help decision makers evaluate the impact of various prevention and control measures on the development of the epidemic.
Learn MoreAccelerate software development and delivery by integrating DevOps with the cloud
Learn More