All Products
Search
Document Center

Simple Log Service:Overview of Intelligent Anomaly Analysis

Last Updated:Sep 03, 2024

The Intelligent Anomaly Analysis application is a highly available service that can be hosted and scaled. The application provides the following capabilities: intelligent inspection, text analysis, and root cause diagnosis. This topic describes the architecture, benefits, scenarios, terms, limits, and billing of the Intelligent Anomaly Analysis application.

Important

Only the users on a specified whitelist can use root cause diagnosis. You can submit a ticket to apply to be added to the whitelist.

Architecture

The Intelligent Anomaly Analysis application focuses on core elements such as metrics, program logs, and service relationships in O&M scenarios. The application generates anomalous events by using methods such as machine learning, and performs association analysis on time series data and events based on service topologies. This reduces the O&M complexity for enterprises and improves service quality. The following figure shows the architecture of Intelligent Anomaly Analysis.

image

The architecture includes the following functional components:

  • Logstores: Simple Log Service provides Logstores to store log data. You can use the SQL-92 syntax to query and analyze log data. For more information, see Log analysis overview.

  • Metricstores: Simple Log Service provides Metricstores to store time series data. You can use the SQL-92 or PromQL syntax to analyze time series data. For more information, see Overview of query and analysis on metric data.

  • Machine learning algorithms: Simple Log Service performs deep integration based on specific scenarios and provides a series of algorithms for time series data and text to generate anomaly data. For more information, see Intelligent inspection algorithms and Text analysis algorithms.

  • Alert monitoring: Simple Log Service generates alerts for anomaly inspection results. For more information, see Introduction to the alerting feature.

Benefits

  • Supports intelligent inspection based on a large number of entity metrics. You can inspect different anomalies by performing simple configurations. You do not need to pay attention to alert monitoring rules.

  • Intelligently analyzes unstructured log data that is in the text format and mines the log data to automatically detect abnormal patterns.

  • Allows you to evaluate the inspection results that are generated by algorithms. This helps improve model training and learning.

  • Provides 99.9% availability for alerting, which is powered by the high availability and data reliability of Simple Log Service.

  • Improves user experience by deeply integrating the alerting feature.

Scenarios

We recommend that you use the Intelligent Anomaly Analysis application in the following scenarios:

  • A large number of objects need to be observed in multiple dimensions.

  • No thresholds are specified for observed objects. You must pay attention to the types of metrics.

  • A large number of service rules need to be formulated for observed objects.

  • Text logs need to be mined for patterns if the text logs contain unstructured data.

  • A clear service topology exists in trace scenarios.

  • A custom service topology exists.

Terms

Term

Description

time series

During the configuration of an inspection job for time series, standard time series must be provided for algorithms. Each time series includes UNIX time-stamped metric values that are recorded at equally spaced periods of time.

entity

An entity is an observed object in an intelligent inspection job.

For example, anomaly detection is performed on a service that runs on a machine, and the entity description is "192.0.2.0": Machine IP address,"80": Service port. In this example, you can uniquely identify the entity by using the machine IP address and service port.

golden metric

A golden metric accurately describes the quality of a service or the stability of an entity. Examples:

  • If you want to describe the request quality of a domain name, you can use the following golden metrics: average response latency per minute, number of requests per minute, number of failed requests per minute, and volume of write traffic per minute.

  • If you want to describe the status of a machine, you can use the following golden metrics: CPU utilization in user mode per minute, CPU utilization in kernel mode per minute, size of resident memory per minute, number of disk I/Os per minute, and average system load per minute.

  • If you want to describe the status of an Object Storage Service (OSS) bucket, you can use the following golden metrics: number of write operations in the bucket per minute, number of read operations in the bucket per minute, and volume of write traffic in the bucket per minute.

anomaly type

Intelligent Anomaly Analysis provides seven built-in anomaly types, which are commonly used. The anomaly types can be used for filtering. For more information, see Anomaly types in intelligent inspection and Anomaly types in text analysis.

normalization method

The normalization method is used to simplify calculation. The method converts a dimensional expression into a dimensionless expression, which is equivalent to a scalar. This improves the performance of anomaly detection.

filtering method

The filtering method filters out signals at unwanted frequencies in a specified band. The method is commonly used for inhibition and interference prevention. Filtering can smooth curves. This helps improve the performance of anomaly detection.

evaluate

You can evaluate the results of intelligent inspection to report your feedback on intelligent inspection. The Intelligent Anomaly Analysis application can receive your feedback.

false positive

During time series inspection, an algorithm model detects anomalies and notifies you of the anomalies by using alert notification methods. If the anomalies are not as expected, you can evaluate the anomalies and report your feedback to the Intelligent Inspection Analysis application. The application performs machine learning based on your feedback.

false negative

During time series inspection, if an algorithm model detects no anomalies, you can evaluate the inspection result of each data point and report your feedback.

pattern extraction

This method extracts patterns from text objects by using analysis, distillation, and induction. A pattern can describe a class of similar text.

clustering

In the clustering process, a set of physical or abstract objects are divided into multiple classes that consist of similar objects. A cluster is generated after clustering. A cluster is a set of data objects that are similar to each other but are different from the objects in other clusters.

unsupervised

Unlabeled training samples are used to resolve issues that occur during pattern recognition.

supervised

Supervised learning refers to machine learning tasks that train functions or models from labeled training datasets.

log constant

In most cases, logs are generated by running the logging or print command in programs. For example, the log connect mysql server, latency 212ms may be generated by running logging.info("connect mysql server, latency %dms"). Log constants are always contained in the output of log commands. In this example, the log constant is connect mysql server, latency ms.

log variable

In most cases, logs are generated by running the logging or print command in programs. For example, the log connect mysql server, latency 212ms may be generated by running logging.info("connect mysql server, latency %dms"). Log variables always change in the output of log commands. In this example, the log variable is 212.

log template

A log template consists of log constants and the wildcard characters for log variables. A log template is in the text format.

For example, the log template for the log connect mysql server, latency 212ms is connect mysql server, latency *ms. The asterisk (*) is used to replace the numeric variable 212.

You can use different wildcard characters based on the types of log variables. For example, you can use NUM to represent numeric variables. In this example, the log template is connect mysql server, latency NUMms.

log category

A log category is represented by a log template. If a log matches a log template, the log belongs to the category that is represented by the template.

Limits

Job type

Item

Description

Intelligent inspection

Scale of inspection entities

A single job supports up to 10,000 inspection entities.

If you require a larger scale, submit a ticket.

Granularity of inspection time series

The curve of a single entity must be equally spaced and continuous. In SQL scenarios, the minimum granularity that is supported is minute.

If you require a finer granularity, submit a ticket.

Notification of anomaly inspection results

You can evaluate only the anomalies that are included in the notifications from DingTalk chatbots.

If you require a different notification method, submit a ticket.

Text analysis

Scale of text fields

A single job supports up to five text fields.

Scale of general field templates

A single job supports up to six general field templates.

Billing

Intelligent Inspection Analysis is in public preview and does not generate fees.