Simple Log Service provides the text analysis feature to automatically and intelligently detect text in a large number of logs. This topic describes the background information, features, terms, scheduling and running scenarios, and usage notes of the text analysis feature.
Background information
When a service runs, a large number of logs are generated. The logs include system logs and business logs. Logs are widely used in system monitoring and troubleshooting. You can use traditional log analysis methods to filter logs by matching log levels and keywords and then analyze the logs. For example, you can monitor the content and number of Error logs. You can also monitor logs that contain keywords such as Failed and Unsuccessfully. If you use traditional methods to monitor and analyze logs in a distributed environment in which microservices are deployed, you may encounter the following challenges:
Terabytes or even petabytes of logs are generated per day, and manual analysis is labor-intensive.
In a distributed environment in which microservices are deployed, Warning logs or Error logs do not necessarily indicate system exceptions. Warning logs or Error logs may be generated due to system scaling, updates, or iterations. Professional knowledge is required during manual analysis to identify anomalies in logs.
To address the preceding challenges, automated and intelligent analysis of logs and troubleshooting are required. If the challenges are addressed, the potential of logs can be tapped, and the labor cost of log anomaly analysis can be reduced. Automated and intelligent analysis of logs has the following characteristics:
Processes a large number of logs in an efficient manner.
Identifies anomalies in logs or narrows down the scope of logs used for troubleshooting.
Allows you to configure parameters for text analysis based on your business requirements.
Simple Log Service provides the text analysis feature that has the preceding characteristics to integrate and analyze the text in logs based on the log anomaly analysis algorithm. When you enable the text analysis feature, you need to only configure monitored objects and parameters for the log anomaly analysis algorithm. The algorithm automatically identifies anomalies in logs. This way, you can focus on the log content that you need to pay attention to.
Features
The text analysis feature allows you to pull text from logs by using consumer groups. You do not need to configure indexes. A text analysis job pulls data at regular intervals based on the scheduling rules that you specify and pushes the data to the text analysis model. The text analysis model writes the analysis result to the internal-ml-log Logstore and displays the analysis result on a dashboard. This way, you can view the analysis result.
You can configure monitored objects. You can configure log fields that you want to analyze, configure parameters for the log anomaly analysis algorithm based on your business requirements, and then start a text analysis job. The values of the log fields must be of the text data type.
The system analyzes data at regular intervals. The log anomaly analysis algorithm is used to analyze data in each time window.
The system provides and displays the analysis result. The analysis result is exported to a specified Logstore, and a dashboard is generated to display the analysis result.
Terms
The following table introduces the terms that are related to the text analysis feature of Simple Log Service.
Term | Description |
job | A text analysis job includes data features and algorithm model parameters. |
instance | A text analysis job creates a text analysis instance based on the configuration of the job. The instance pulls data at regular intervals, runs the algorithm model, and then distributes the analysis result based on the configuration of the job.
|
instance ID | Each instance is identified by a unique ID. |
creation time | Each instance is created at a specific point in time. In most cases, an instance is created for a text analysis job based on the scheduling rules of the job. If historical data needs to be processed or the delay caused by the timeout of the previous instance is offset, an instance is immediately created. |
start time | Each instance starts to run at a specific point in time. If the job to which an instance belongs is retried, the start time is the most recent time at which the instance starts to run. |
end time | Each instance stops running at a specific point in time. If the job to which an instance belongs is retried, the end time is the most recent time at which the instance stops running. |
status | Each instance is in a specific state at a specific point in time. Valid values:
|
algorithm configuration | The configuration of the algorithm includes the following items:
|
analysis event | An analysis event includes the following items:
|
Scheduling and running
Each job can create one or more instances. Only one instance can run in a job at a time regardless of whether the instance is run on schedule or is retried due to an anomaly. You cannot concurrently run multiple instances in a single job. The following list describes common scheduling and running scenarios:
Scenario 1: A text analysis job starts at the current point in time. If you start a job at the current point in time, the algorithm model cannot pull historical data. The job trains the algorithm model in the initialization time windows that you configure and suppresses the anomalies in the analysis result when the algorithm model is initialized. After the initialization time windows elapse, the algorithm model dynamically adapts to new data and is updated.
Scenario 2: The parameters for scheduling rules are modified. If you modify the scheduling rules of a job, the job re-creates an instance based on the new scheduling rules. The algorithm model records the point in time before which all historical data is analyzed and continues to analyze new data.
Scenario 3: A failed instance is retried. If an instance fails to run due to an issue, Simple Log Service can automatically retry the instance. The issues include insufficient permissions, unavailable source Logstores, unavailable destination Logstores, and invalid configurations. If an instance is stuck in the STARTING state, the configuration of the instance may have failed. Simple Log Service generates an error log and sends the log to the internal-etl-log Logstore. You can check the configuration of the instance and retry the instance. After the instance is scheduled and run, Simple Log Service changes the status of the instance to SUCCEEDED or FAILED based on the retry result.
Usage notes
To improve the efficiency of text analysis, we recommend that you specify monitored objects based on your business requirements.
Specify the text fields that you want to analyze in logs. If you specify a large number of redundant fields, the analysis effect may be compromised, and the analysis speed may decrease.
Obtain the changes in the time series data of monitored objects to check the stability and periodicity of the data and predict potential anomalies. This way, you can appropriately configure parameters for the algorithm.