Use service logs to monitor Simple Log Service - Simple Log Service

0.0.201

If you want to monitor and perform O&M operations on Simple Log Service, you can query and analyze service logs. For example, you can use service logs to monitor the status and exceptions of Logtail and view the latency logs of consumer groups and the operation logs of resources.

Background information

When you enable the service log feature, you can select log types, including detailed logs, important logs, and job operational logs. For more information, see Log types.

Detailed logs: If you enable the service log feature for detailed logs, a Logstore named internal-operation_log is created in the project you selected, and a dashboard is created in the current project.
Important logs: If you enable the service log feature for important logs, a Logstore named internal-diagnostic_log is created in the project you selected. The Logstore is used to store the consumption delay logs of consumer groups and Logtail heartbeat logs.
Job operational logs: If you enable the service log feature for job operational logs, a Logstore named internal-diagnostic_log is created in the project you selected. The Logstore is used to store the logs of data import, Scheduled SQL, and data shipping jobs.

Prerequisites

The service log feature is enabled. For more information, see Use the service log feature.

Monitor the heartbeat status of Logtail

Query the status logs of Logtail

On the query and analysis page of the internal-diagnostic_log Logstore, execute the following statement to query the status logs of Logtail. For more information, see Query and analyze logs.
```
__topic__: logtail_status
```

View the status logs of Logtail. For more information about the fields in the log, see Logtail status logs. Sample log:

{
    "os_detail": "Windows Server 2012 R2",
    "__time__": 1645164875,
    "__topic__": "logtail_status",
    "memory": "25",
    "os": "Windows",
    "__source__": "log_service",
    "ip": "203.**.**.110",
    "cpu": "0.010405",
    "project": "aliyun-test-project",
    "version": "1.0.0.22",
    "uuid": "bf00****688b0",
    "hostname": "iZ1****Z",
    "instance_id": "5897****4735",
    "__pack_meta__": "0|MTYzNjM1Mzk5NDExMTcxOTQzNw==|1|0",
    "user_defined_id": "",
    "user": "SYSTEM",
    "detail_metric": "{\n\t\"config_count\" : \"1\",\n\t\"config_get_last_time\" : \"2022-02-18 14:14:23\",\n\t\"config_prefer_real_ip\" : \"false\"...
    "status": "ok"
}

Monitor the heartbeat status of Logtail

Count the number of normal Logtail heartbeat connections: On the query and analysis page of the internal-diagnostic_log Logstore, count the number of normal Logtail heartbeat connections based on the procedure shown in the following figure. Then, configure an alert rule. For more information, see Query and analyze logs.
Query statement:
```
__topic__: logtail_status | SELECT COUNT(DISTINCT ip) as ip_count
```
Configure an alert rule: If the number of normal heartbeat connections returned in the query and analysis results is less than the number of servers in all machine groups that are bound to Logtail, an alert is triggered. In the following alert rule, 100 is specified for the Trigger Condition parameter, which indicates that the total number of servers is 100. For more information about how to configure alert rules, see Configure an alert rule in Simple Log Service.
Check for servers that have abnormal heartbeat status: If an alert is triggered, specific servers in a machine group do not have a heartbeat connection. You can view the status of a machine group in the Simple Log Service console. For more information about how to troubleshoot errors that cause abnormal heartbeat status, see How do I troubleshoot an error that is related to a Logtail machine group in a host environment?

Monitor Logtail exceptions

On the query and analysis page of the internal-diagnostic_log Logstore, execute the __topic__: logtail_alarm query statement to query the alert logs of Logtail. For more information, see Query and analyze logs and Logtail alert logs. The alert logs help you identify Logtail exceptions at the earliest opportunity. This way, you can modify your Logtail configuration to ensure log integrity. For example, you can execute the following statement to query the number of times that exceptions occur at 15-minute intervals by exception type:

__topic__: logtail_alarm | select sum(alarm_count)as errorCount, alarm_type  GROUP BY alarm_type

Query the latency logs of a consumer group

Sample log

When you use a consumer group to consume data, you can monitor the consumption progress based on the latency logs of the consumer group. If high latency exists, you can change the number of consumers at the earliest opportunity to improve the consumption speed. For more information, see Use consumer groups to consume data. The following code provides a sample latency log of a consumer group. For more information about the fields in the log, see Latency logs of a consumer group.

{
    "__time__": 1645166007,
    "consumer_group": "consumerGroupX",
    "__topic__": "consumergroup_log",
    "__pack_meta__": "1|MTYzNjM1Mzk5NDExMTg5NjU2Mg==|3|0",
    "__source__": "log_service",
    "project": "aliyun-test-project",
    "fallbehind": "9518678",
    "shard": "1",
    "logstore": "nginx-moni"
}

Query the latency logs of a consumer group

On the query and analysis page of the internal-diagnostic_log Logstore, execute the __topic__: consumergroup_log statement to query the latency logs of a consumer group. For more information, see Query and analyze logs. For example, you can execute the following statement to query the consumption latency of a consumer group named consumerGroupX:

__topic__: consumergroup_log and consumer_group: consumerGroupX | SELECT max_by(fallbehind, __time__) as fallbehind

Query the operation logs of all resources in a project

Sample log

Logs on the operations to create, modify, update, and delete resources and data read and write operations in a project are stored in a Logstore named internal-operation_log. The logs include the request logs sent by all clients, such as consoles, consumer groups, and SDKs. The following code provides a sample operation log. For more information about the fields in the log, see Detailed logs.

{
    "NetOutFlow": "1",
    "InvokerUid": "1418****2562",
    "CallerType": "Sts",
    "InFlow": "0",
    "SourceIP": "203.**.**.220",
    "__pack_meta__": "0|MTYzNjM1Mzk5MzY1NDYwODQzMg==|2|1",
    "RoleSessionName": "STS-ETL-WORKER",
    "APIVersion": "0.6.0",
    "UserAgent": "log-python-sdk-v-0.6.46, sys.version_info(major=3, minor=7, micro=3, releaselevel='final', serial=0), linux-consumergroup-etl-2bd3fdfdd63595d56b1ac24393bf5991",
    "InputLines": "0",
    "Status": "200",
    "__time__": 1645167812,
    "__topic__": "operation_log",
    "NetInflow": "0",
    "RequestId": "620F44C456458F67F72160A0",
    "LogStore": "nginx-moni",
    "__source__": "log_service",
    "Method": "PullData",
    "ClientIP": "203.**.**.330",
    "Latency": "2191",
    "Role": "aliyunlogetlrole",
    "NetworkOut": "0",
    "Project": "aliyun-test-project",
    "AccessKeyId": "STS.NUE****1hm",
    "Shard": "0"
}

The following table describes the types of user information.

Type	Field

Type	Field
Alibaba Cloud account	InvokerUid: the ID of the Alibaba Cloud account CallerType: Parent
RAM user	InvokerUid: the ID of the Resource Access Management (RAM) user CallerType: Subuser
Sts	InvokerUid: the ID of the Alibaba Cloud account CallerType: Sts RoleSessionName: the name of the session

Query the number of failed requests

On the query and analysis page of the internal-operation_log Logstore, execute the following statement to query the number of failed requests. In this example, a request whose HTTP status code is greater than 200 is considered a failed request. For more information, see Query and analyze logs.

Status > 200 | select count(*) as pv

Feedback

Previous: Service log dashboardsNext: Project monitoring

On this page （1, T）

Background information

Prerequisites

Monitor the heartbeat status of Logtail

Query the status logs of Logtail

Monitor the heartbeat status of Logtail

Monitor Logtail exceptions

Query the latency logs of a consumer group

Sample log

Query the latency logs of a consumer group

Query the operation logs of all resources in a project

Sample log

Query the number of failed requests

About Alibaba Cloud

Our Global Network

Quick Start

Global Offices

Olympic Games Paris 2024 New

Stade Roland Garros – Glitz from the Past New

Place de la Concorde – “Breaking” the Barriers New

Vaires-sur-Marne Nautical Stadium – Sports with Sustainability New

International Broadcast Center – Images, Sounds, and Data that Captivate Billions New

Customer Success Stories New

Trust Center

Security & Compliance Center

Cloud Compliance Resources

Security Compliance FAQs

Product & Feature Update New

Cloud Forward

Press Room

Alibaba Cloud e-Magazine New

Alibaba Cloud in Analyst Research

Notice

Go Global Service New

Go Global Alliance with Alibaba Cloud

Asia Accelerator Hot

Information Compliance

China Gateway - MLPS 2.0 Compliance New

China Gateway - Networking

China Gateway - Global Application Acceleration New

China Gateway - Security

China Gateway - Data Security New

ICP Support Hot

China Gateway - Omnichannel Data Mid-End New

China Gateway - Organizational Data Mid-End New

China Gateway - Business Mid-End New

China Gateway - AI Service for Conversational Chatbots New

China Gateway - Online Education

China Gateway - Domain Registration

Work at Alibaba Cloud

Experienced Professionals

Students and Graduates

Free Trial

Pricing

Promo Center

Price Reduction

Pay Less and Deploy More

FinOps

Elastic Compute Service (ECS)

Simple Application Server (SAS)

Elastic GPU Service

Elastic Desktop Service (EDS)

Object Storage Service (OSS)

Cloud Enterprise Network (CEN)

Web Application Firewall (WAF)

Domain Names

Container Compute Service (ACS)

Secure Access Service Edge (SASE)

Intelligent Media Services(IMS)

Edge Security Acceleration (ESA)(Original DCDN)

Intelligent Media Management

DingTalk Enterprise

YiDA

Alibaba Cloud Model Studio

Apsara Prime - For Easy Cloud Product Selection

Alibaba Cloud ECS - Cater All Your Cloud Hosting Needs

1TB CDN—Get Free 1 TB Outbound Traffic Plan Now

Security—Under Attack? Get Free Security Support

Short Message Service - Free Testing is Available

Elastic Compute Service (ECS) Hot

CloudBox

Compute Nest

Dedicated Host Hot

ECS Bare Metal Instance

Elastic GPU Service Featured

Simple Application Server (SAS) Hot

Auto Scaling

Cloud Phone Beta

Elastic Desktop Service (EDS) Featured

Batch Compute

Elastic High Performance Computing (E-HPC)

Super Computing Cluster (SCC)

Function Compute (FC)