Alibaba Cloud Lindorm is a cloud-native multi-modal database provided by Alibaba Cloud. Its time series engine includes Lindorm ML, which enables easy and quick time series machine learning tasks such as anomaly detection, predictive maintenance, clustering, exception detection, and SVM using only SQL.
In this tutorial, I will explain how to use Lindorm ML to automatically and regularly detect exceptions in time series data by creating continuous queries.
For example, let's assume that there are temperature and humidity sensors for IoT, and the related data is uploaded to Lindorm every second. Typically, temperature and humidity do not experience sudden fluctuations. However, extremely high or low values would be considered as exceptions in the data.
Deploy ECS and Lindorm in the same region/zone and use a script on ECS instead of temperature and humidity sensors to write data to Lindorm every second. Then, use SQL on Lindorm to try timing exception detection with Lindorm ML.
As Lindorm ML only provides a time series engine, we need to create a Lindorm instance that comes with a time series engine.
Once completed, the new instance will be created with the Creating status.
Lindorm has specific requirements, such as the minimum number of nodes needed, depending on the type of engine. As a time series engine, you must verify that the number of nodes meets the requirements. If there are insufficient nodes, the "Buy Now" button will be disabled and an error message will appear.
Once the instance status changes to Running, you can perform several configurations, such as changing the maintenance time, instance name, and updating the whitelist.
General configuration items are detailed on the instance page. By clicking the Modify button next to the corresponding item, you can modify the settings.
Similar to other engines, you can apply for a public endpoint by clicking the relevant button based on the network requirements.
The instance comes with a whitelist, and it only accepts connections from the whitelisted IP addresses for security. If you don't add an IP address to the whitelist, an error may occur when connecting.
Since we are sending the request from an ECS under the same Region/Zone/VPN, we only need to add the intranet IP address to the whitelist, and there is no need to apply for a public endpoint.
We will use an ECS instance under the same VPC as the work environment. Connect to the pre-prepared ECS instance and perform several configurations.
Lindorm CLI connects to the Lindorm time series engine and uses Lindorm ML to detect data anomalies. When you download and extract the Lindorm CLI installer, you will get an executable tool named "lindorm-cli." The author is using Linux to build it on ECS, but if you are using a different operating system, please refer to Lindorm CLI.
wget -O lindorm-cli-linux-latest.tar.gz https://tsdbtools.oss-cn-hangzhou.aliyuncs.com/lindorm-cli-linux-latest.tar.gz?spm=a2c63.p38356.0.0.338d5a2egBGtdx&file=lindorm-cli-linux-latest.tar.gz
tar -xvf lindorm-cli-linux-latest.tar.gz
We generate and send dummy data for temperature and humidity sensors using Java code built on ECS. To run the Java file on the ECS instance, you need to prepare the Java environment
sudo apt-get update
sudo apt-get install openjdk-8-jdk
java -version
Connect to the Lindorm instance using Lindorm CLI and create the necessary database and table. Although you can leave this step for the Java code, it is better to separate the data generation process by moving the creation step here since you may execute the Java code multiple times.
Check the connection information on the database connection page of the console.
Execute the command to connect to the Lindorm instance.
./lindorm-cli -url jdbc:lindorm:tsdb:url=http://ld-xxx-proxy-tsdb.lindorm.rds.aliyuncs.com:8242 -username root -password root
Create the database and table using SQL. You can also use a custom table structure, but you need to modify the relevant information in the Java code later.
create database demo;
use demo;
create table demo.sensor (device_id varchar tag,region varchar tag,time bigint,temperature double,humidity double,primary key(device_id));
Lindorm ML is a Machine Learning and Deep Learning technology for databasesd developed by DAMO Academy.
This time, we will perform data anomaly detection. The detection algorithm learns the data in a time-series manner and detects any exceptions if present. The supported detection algorithms are esd, nsigma, ttest, Online STL with T-test, and Online STL with ESD. As a simple demo using Lindorm ML, we will use esd in the following code, which is suitable for single data points or spike-type anomalies.
In this scenario, the sensors record temperature and humidity data in the environment and send the data to the Lindorm instance every second. To generate data anomalies, you need to decrease the probability of generating high or low dummy data in the middle.
The connection to the time series engine, data generation, and transmission are performed in the Java Native SDK on an hourly basis. There is a sample source code on the Help page that you can use as a reference.
Add the following to the pom.xml file of your Maven project. The dependency section contains the necessary packages for the Lindorm Java Native SDK. The build plugin section describes the tool used to build the executable package. The section needs to be updated according to your project configuration.
<dependencies>
<dependency>
<groupId>com.aliyun.lindorm</groupId>
<artifactId>lindorm-tsdb-client</artifactId>
<version>1.0.0</version>
</dependency>
</dependencies>
<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-assembly-plugin</artifactId>
<configuration>
<archive>
<manifest>
<mainClass>xxxxxxx</mainClass>
</manifest>
</archive>
<descriptorRefs>
<descriptorRef>jar-with-dependencies</descriptorRef>
</descriptorRefs>
</configuration>
<executions>
<execution>
<id>make-assembly</id>
<phase>package</phase>
<goals>
<goal>single</goal>
</goals>
</execution>
</executions>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<version>3.2</version>
<configuration>
<source>1.8</source>
<target>1.8</target>
</configuration>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-surefire-plugin</artifactId>
<version>2.17</version>
</plugin>
</plugins>
</build>
The following is the same code. Please update it with your own connection information from the console.
package org.lindorm.demo;
import com.aliyun.lindorm.tsdb.client.ClientOptions;
import com.aliyun.lindorm.tsdb.client.LindormTSDBClient;
import com.aliyun.lindorm.tsdb.client.LindormTSDBFactory;
import com.aliyun.lindorm.tsdb.client.exception.LindormTSDBException;
import com.aliyun.lindorm.tsdb.client.model.Record;
import com.aliyun.lindorm.tsdb.client.model.WriteResult;
import com.aliyun.lindorm.tsdb.client.utils.ExceptionUtils;
import java.text.SimpleDateFormat;
import java.util.ArrayList;
import java.util.Date;
import java.util.List;
import java.util.Random;
import java.util.concurrent.CompletableFuture;
public class MainAPP {
public static void main(String[] args) {
// 1. Create a Lindorm client based on the connection URL
String url = "http://ld-xxxxx-proxy-tsdb.lindorm.rds.aliyuncs.com:8242";
// LindormTSDBClient - Thread-safe, reusable, no need to create and destroy frequently
ClientOptions options = ClientOptions.newBuilder(url).build();
LindormTSDBClient lindormTSDBClient = LindormTSDBFactory.connect(options);
// 2. Create a database named "demo" and a table named "sensor."
// Use the following statement to perform this operation.
// Since this function may be executed multiple times, use the SQL interface of the HTTP API.
/*lindormTSDBClient.execute("CREATE DATABASE demo");
lindormTSDBClient.execute("demo", "CREATE TABLE sensor (device_id VARCHAR TAG,region VARCHAR TAG,time BIGINT,temperature DOUBLE,humidity DOUBLE,PRIMARY KEY(device_id))");*/
// 3. Insert the table with the generated data.
long currentTime = System.currentTimeMillis();
SimpleDateFormat format=new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");
for(int j = 0; j < 5; j++) {
// Prepare an hour's worth of data.
int numRecords = 3600;
List<Record> records = new ArrayList<>(numRecords);
currentTime += 3600 * 1000;
for (int i = 0; i < numRecords; i++) {
Record record = Record
.table("sensor")
.time(currentTime + i * 1000)
.tag("device_id", "demo")
.tag("region", "tokyo-jp")
.addField("temperature", generateRandomValue(15.0))
.addField("humidity", generateRandomValue(50.0))
.build();
records.add(record);
}
System.out.println("Insert data from: " + format.format(new Date(currentTime)));
CompletableFuture<WriteResult> future = lindormTSDBClient.write("demo", records);
// Process asynchronous write results.
future.whenComplete((r, ex) -> {
// Process write failures.
if (ex != null) {
System.out.println("Failed to write.");
Throwable throwable = ExceptionUtils.getRootCause(ex);
if (throwable instanceof LindormTSDBException) {
LindormTSDBException e = (LindormTSDBException) throwable;
System.out.println("LindormTSDBException occurred. This means that the request was delivered to Lindorm TSDB"
+ " but it was rejected as an error response for some reason.");
System.out.println("Error Code: " + e.getCode());
System.out.println("SQL State: " + e.getSqlstate());
System.out.println("Error Message: " + e.getMessage());
} else {
throwable.printStackTrace();
}
} else {
System.out.println("Write successfully.");
}
});
// We have a simple synchronization wait like the following example
System.out.println(future.join());
}
lindormTSDBClient.shutdown();
}
private static double generateRandomValue(double baseValue){
Random random = new Random();
int flag = random.nextInt(100);
double results = 0.0;
if(flag > 5){
results = baseValue + random.nextDouble()*(random.nextInt() % 5);
}else {
results = baseValue + random.nextDouble()*(random.nextInt() % 5) + 100;
}
return Double.parseDouble(String.format("%.1f",results));
}
}
Build an executable package using the mvn package
command.
When you upload the package to the ECS instance and run it, dummy data is generated.
Check the generated dummy data on the Lindorm instance.
We will detect data anomalies using the pre-defined anomaly_detect
function. anomaly_detect
is a function that performs anomaly detection using supervised machine learning, which is included in Lindorm ML. You need to input the field name to be detected, algorithm, and the behavior when detected as arguments to the anomaly_detect
function. As mentioned earlier, we will use the esd algorithm here. To manually find exceptions, we combine the deterrent results with dummy data, as shown in the image below. The general temperature is around 15 degrees, but the exception data is 114.1 degrees.
select device_id, region, time, anomaly_detect(temperature, 'esd', 'adhoc_state=true') as detect_result from sensor where device_id in ('demo') and time >= '2022-11-16 15:00:00' and time < '2022-11-16 15:01:00' sample by 0;
select device_id, region, time, temperature from sensor where device_id in ('demo') and time >= '2022-11-16 15:00:00' and time < '2022-11-16 15:01:00';
The dummy data is generated based on the execution time, so please change the time zone in the SQL statement according to the situation.
Continuous queries are time-series queries that are automatically and regularly executed within the time-series engine.
In continuous detection queries, the same algorithm and parameters of the data anomaly detection function are used. Therefore, the queries share the same exception detection status. The second detection query is executed based on the exception detection status returned from the first query, resulting in more accurate results.
Next, we will update the Java code to send dummy data every 10 seconds. To perform continuous queries, it is desirable to generate continuous dummy data. When you update the sample Java code like below, dummy data will be sent every 10 seconds.
package org.lindorm.demo;
import com.aliyun.lindorm.tsdb.client.ClientOptions;
import com.aliyun.lindorm.tsdb.client.LindormTSDBClient;
import com.aliyun.lindorm.tsdb.client.LindormTSDBFactory;
import com.aliyun.lindorm.tsdb.client.exception.LindormTSDBException;
import com.aliyun.lindorm.tsdb.client.model.Record;
import com.aliyun.lindorm.tsdb.client.model.WriteResult;
import com.aliyun.lindorm.tsdb.client.utils.ExceptionUtils;
import java.text.SimpleDateFormat;
import java.util.ArrayList;
import java.util.Date;
import java.util.List;
import java.util.Random;
import java.util.concurrent.CompletableFuture;
public class MainAPPInterval {
public static void main(String[] args) throws InterruptedException {
// 1. Creating a Lindorm client based on the connection URL
String url = "http://ld-xxxxx-proxy-tsdb.lindorm.rds.aliyuncs.com:8242";
// LindormTSDBClient - Thread-safe, reusable, no need to create and destroy frequently.
ClientOptions options = ClientOptions.newBuilder(url).build();
LindormTSDBClient lindormTSDBClient = LindormTSDBFactory.connect(options);
// 2. Create a database named "demo" and a table named "sensor."
// Use the following statement to perform this operation.
// Since this function may be executed multiple times, use the SQL interface of the HTTP API.
/*lindormTSDBClient.execute("CREATE DATABASE demo");
lindormTSDBClient.execute("demo", "CREATE TABLE sensor (device_id VARCHAR TAG,region VARCHAR TAG,time BIGINT,temperature DOUBLE,humidity DOUBLE,PRIMARY KEY(device_id))");*/
// 3. Insert the table with the generated data.
SimpleDateFormat format=new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");
for(int j = 0; j < 2 * 3600; j++) {
// Prepare data
int numRecords = 10;
List<Record> records = new ArrayList<>(numRecords);
long currentTime = System.currentTimeMillis();
for (int i = 0; i < numRecords; i++) {
Record record = Record
.table("sensor")
.time(currentTime + i * 1000)
.tag("device_id", "demo")
.tag("region", "tokyo-jp")
.addField("temperature", generateRandomValue(15.0))
.addField("humidity", generateRandomValue(50.0))
.build();
records.add(record);
}
System.out.println("Insert data from: " + format.format(new Date(currentTime)));
CompletableFuture<WriteResult> future = lindormTSDBClient.write("demo", records);
// Process asynchronous write results.
future.whenComplete((r, ex) -> {
// Process write failures.
if (ex != null) {
System.out.println("Failed to write.");
Throwable throwable = ExceptionUtils.getRootCause(ex);
if (throwable instanceof LindormTSDBException) {
LindormTSDBException e = (LindormTSDBException) throwable;
System.out.println("LindormTSDBException occurred. This means that the request was delivered to Lindorm TSDB"
+ " but it was rejected as an error response for some reason.");
System.out.println("Error Code: " + e.getCode());
System.out.println("SQL State: " + e.getSqlstate());
System.out.println("Error Message: " + e.getMessage());
} else {
throwable.printStackTrace();
}
} else {
System.out.println("Write successfully.");
}
});
// We have a simple synchronization wait like the following example
System.out.println(future.join());
Thread.sleep(10000);
}
lindormTSDBClient.shutdown();
}
private static double generateRandomValue(double baseValue){
Random random = new Random();
int flag = random.nextInt(100);
double results = 0.0;
if(flag > 5){
results = baseValue + random.nextDouble()*(random.nextInt() % 5);
}else {
results = baseValue + random.nextDouble()*(random.nextInt() % 5) + 100;
}
return Double.parseDouble(String.format("%.1f",results));
}
}
Build an executable package and run it on an ECS instance as before.
To continuously save query results, you need a new table. Create a new result table along with the query.
CREATE TABLE demo.anomaly_points(
device_id varchar tag,
region varchar tag,
time bigint,
anomaly_result boolean,
PRIMARY KEY(device_id)
);
CREATE continuous query demo.cq_detector WITH(
interval = '1m'
) AS
INSERT INTO demo.anomaly_points(
device_id,
region,
time,
anomaly_result
)
SELECT
device_id,
region,
time,
anomaly_detect(temperature, 'esd') AS anomaly_result
FROM
demo.sensor
WHERE
device_id = 'demo' sample BY
0;
show continuous queries;
In this example, we detect exceptions from the previous minute's data in the dummy data table every minute using the esd algorithm.
You can display the data difference with the following command.
select count(*) from demo.anomaly_points;
select max(time) from demo.anomaly_points;
You can see that the number of table data and the maximum time are updated every minute with continuous queries.
After the verification is completed, you can delete the continuous query.
drop continuous query demo.cq_detector;
Pre-defined detection functions accept parameters such as maxAnomalyRatio
and warmupCount
. You can find related information in the help documentation.
SELECT
device_id,
region,
time,
anomaly_detect(temperature, 'esd', 'lenHistoryWindow=30,maxAnomalyRatio=0.1') AS detect_result
FROM
sensor
WHERE
device_id IN('demo')
AND time >= '2022-11-16 15:00:00'
AND time < '2022-11-16 00:01:00' SAMPLE BY
0;
Different input parameters will yield different detection results.
However, if you input the wrong parameters, several errors may occur. For example, the lenHistoryWindow
of the esd algorithm must not be less than 20. If you set it to 2, a NullPointerException will occur instead of the query result.
lindorm:demo> SELECT device_id, region, time, anomaly_detect(temperature, 'esd', 'lenHistoryWindow=2,maxAnomalyRatio=0.8') AS detect_result FROM sensor WHERE device_id in ('demo') and time >= '2022-11-16 15:00:00' and time < '2022-11-16 00:01:00' SAMPLE BY 0;
ERROR 9000 (HY000): Server internal error; Please try again, or follow the error message to troubleshoot the problem.
Caused by: java.lang.NullPointerException
at com.alibaba.lindorm.tsdb.tsql.connector.LindormTSDBConnector.runDownsampleQuery(LindormTSDBConnector.java:141)
at com.alibaba.lindorm.tsdb.tsql.connector.downsample.DownsampleExecutorImpl.getDataRows(DownsampleExecutorImpl.java:49)
at com.alibaba.lindorm.tsdb.tsql.simple.SimpleExecutorCommand.lambda$execute$0(SimpleExecutorCommand.java:88)
at com.alibaba.lindorm.tsdb.tsql.jdbc.LindormTSQLMetaImpl$LindormTSQLSignature$1.enumerator(LindormTSQLMetaImpl.java:430)
at org.apache.calcite.linq4j.AbstractEnumerable.iterator(AbstractEnumerable.java:33)
at org.apache.calcite.avatica.MetaImpl.createCursor(MetaImpl.java:90)
at org.apache.calcite.avatica.AvaticaResultSet.execute(AvaticaResultSet.java:186)
at org.apache.calcite.avatica.AvaticaConnection$1.execute(AvaticaConnection.java:666)
This error, ERROR 8012(42000), occurs because the detected field is in the select statement with the detection function. In other words, the field value and detection result could not be displayed together. To solve this issue, remove the detected field from the select
statement.
lindorm:demo> select device_id, region, time, temperature, anomaly_detect(temperature, 'esd', 'adhoc_state=true') as detect_result from sensor where device_id in ('demo') and time >= '2022-11-16 15:00:00' and time < '2022-11-16 15:01:00' sample by 0;
ERROR 8012 (42000): Unsupported operation; Field aggregator must be specified in downsample query: temperature
This article showcases an example of exception detection using SQL with Lindorm ML.
Lindorm is a Multi-Modal Database that enables fast storage of time series data and early detection of anomalies and threats. This makes it possible to store time series data, such as IoT data, in a time series engine and JSON data, such as metrics, in a wide-column wide table engine. It also allows for quick retrieval of relevant JSON information related to abnormalities in time series data. The ability to perform supervised and unsupervised machine learning within the database without the need for separate product services for machine learning, ETL, and data transfer is a significant advantage of the Lindorm service.
In-database machine learning:
https://www.alibabacloud.com/help/en/lindorm/latest/in-database-machine-learning
Use Lindorm ML for time series forecasting:
https://www.alibabacloud.com/help/en/lindorm/latest/time-series-prediction
Use Lindorm ML for time series anomaly detection:
https://www.alibabacloud.com/help/en/lindorm/latest/time-series-anomaly-detections
This article is a translated piece of work from SoftBank: https://www.softbank.jp/biz/blog/cloud-technology/articles/202301/lindorm-ml-to-detect-anomalies/
Using Let's Encrypt to Enable HTTPS for a Streamlit Web Service
9 posts | 0 followers
FollowAlibaba Cloud Native Community - August 30, 2022
weibin - October 17, 2019
Alibaba Cloud Security - December 12, 2019
Yuriy Yuzifovich - November 8, 2022
ApsaraDB - July 8, 2020
chuan - February 27, 2020
9 posts | 0 followers
FollowA platform that provides enterprise-level data modeling services based on machine learning algorithms to quickly meet your needs for data-driven operations.
Learn MoreA cost-effective online time series database service that offers high availability and auto scaling features
Learn MoreTSDB is a stable, reliable, and cost-effective online high-performance time series database service.
Learn MoreThis technology can be used to predict the spread of COVID-19 and help decision makers evaluate the impact of various prevention and control measures on the development of the epidemic.
Learn MoreMore Posts by Hironobu Ohara
Dikky Ryan Pratama June 27, 2023 at 12:48 am
awesome!