Monitoring and logging can help you ensure the availability, performance, and healthiness of your services. You can enable monitoring to collect metrics. Alibaba Cloud provides a wide variety of monitoring and auditing services, such as Network Intelligence Service (NIS), CloudMonitor, and Cloud Config. These services can monitor resource usage and service performance in real time, generate alerts, and notify you of anomalies.
NIS
NIS is an intelligent, self-service platform that helps you design, deploy, and maintain websites. It improves your work efficiency. NIS provides statistics that can help you design your network and troubleshoot network issues.
Cloud Enterprise Network (CEN) is integrated with NIS, which can diagnose transit routers, analyze inter-region and intra-region network traffic, and probe data transmission paths to help you maintain service availability.
Instance diagnostics
CEN supports the instance diagnostic feature, which can diagnose Enterprise Edition transit routers, including configurations and health status, and provide troubleshooting solutions. For more information, see Diagnose a transfer router.
Traffic analysis
The traffic analysis feature can be used to monitor real-time and historical network traffic, and generate visualized time series charts in the NIS console based on analysis results. You can troubleshoot issues based on the traffic data and collected metrics.
This feature also allows you to monitor and analyze inter-region and intra-region network traffic of CEN.
Inter-region traffic analysis: You can use this feature to analyze inbound and outbound traffic that passes through Enterprise Edition transit routers across regions. The traffic data is displayed in the form of 5-tuples. For more information, see Work with the Internet traffic analysis function.
Intra-region traffic analysis: You can use this feature to analyze inbound and outbound traffic that passes through Enterprise Edition transit routers when the transit routers are connected to virtual private clouds (VPCs) in the same region. For more information, see Work with the Internet traffic analysis function.
Reachability analyzer
If you use CEN to connect networks, you can use the reachability analyzer to test the connectivity between network resources. This feature helps you improve service availability. For more information, see Work with the reachability analyzer.
Alibaba Cloud resource healthiness updates
We recommend that you keep track of the health status of your Alibaba Cloud resources so that you can handle exceptions at the earliest opportunity. For more information, visit Alibaba Cloud Resource Healthiness Updates.
On the Alibaba Cloud Resource Healthiness Updates page, you can check the health status of every service in each region, and find the methods to subscribe to Really Simple Syndication (RSS) feeds about service exceptions.
CloudMonitor
CloudMonitor is integrated with basic Alibaba Cloud services and is free of charge. CloudMonitor can monitor system events of CEN and collect CEN metrics in real time. You can determine whether workloads are running as expected based on the system events and metrics that are collected by CloudMonitor. In addition, you can create alert rules for system events and monitoring metrics so that you can be notified of anomalies at the earliest opportunity.
System event monitoring
CloudMonitor supports system event monitoring, which can automatically record service errors and O&M events. It also supports queries and auditing of service-related system events that indicate the service status. After you classify resources into different application groups, service-related system events are automatically associated with the resources in application groups. This helps you check various monitoring information in one place and efficiently analyze and troubleshoot issues if business exceptions occur.
CloudMonitor also supports event alerting. You can create alert rules with different event priorities, enable CloudMonitor to send you notifications through emails and DingTalk messages, or configure callback URLs. These automatic O&M measures ensure that you are notified of high-severity events.
For more information about the CEN system events supported by CloudMonitor and how to create alert rules for CEN system events, see Monitor route usage.
Metric monitoring
CloudMonitor can automatically collect the metrics of cloud resources within your Alibaba Cloud account. You can view the monitoring charts of each cloud service. You can also create alert rules to monitor resources. If an alert is triggered based on the alert rules, CloudMonitor sends an alert notification to you. This way, you are notified of the status of your resources at the earliest opportunity.
CEN provides metrics of different resources. For more information about the CEN metrics and how to create alert rules, see the following topics:
The preceding topics describe how to create alert rules in the CEN console. For more information about how to create alert rules in the CloudMonitor console, see Create an alert rule.
References
Dashboards
You can customize monitoring dashboards to collect specified metrics. For more information, see Manage the monitoring charts of a custom dashboard.
Alert blacklists
Alert blacklists allow you to block notifications of specified metrics. For more information, see Manage blacklist policies.
By default, Alibaba Cloud accounts have full permissions on resources, and Resource Access Management (RAM) users do not have permissions on resources. If a RAM user needs to view monitoring data, the Alibaba Cloud account must grant the required permissions to the RAM user. For more information about CloudMonitor permissions, see Grant permissions to a RAM user.
Log Service
Log Service is a cloud-native observation and analysis platform that provides large-scale, low-cost, and real-time services to logs, metrics, and traces. Log Service allows you to collect, process, query, analyze, visualize, consume, and deliver data. You can configure alerts in the Log Service console. Log Service helps enterprises improve their digital capabilities in terms of R&D, O&M, and data security. For more information, see What is Log Service?
CEN is integrated with Log Service. You can use Log Service by enabling the flow log feature in the CEN console. Log Service can be used to monitor, audit, analyze, and process traffic information about CEN resources. For example, you can analyze bandwidth usage, troubleshoot network errors, reduce data transfer costs, and analyze traffic anomalies based on the traffic information captured by flow logs.
Flow logs
Flow logs can be used to capture information about inter-region network traffic that flows between transit routers and virtual border routers (VBRs). You can enable flow logs to capture network information within 1 minute or within 10 minutes. During the specified time window, flow logs first aggregate the captured network information, and then deliver the aggregated information to Log Service as log entries.
For more information about the fields supported by flow logs, see What is a flow log?
For more information about how to configure flow logs for inter-region connections and VBR connections, see Configure a flow log.
After you create a flow log, the information about network traffic over inter-region connections and VBR connections is stored in Logstores of Log Service. By default, the maximum retention period is 180 days, which can be modified as needed. For more information, see Modify the configurations of a Logstore.
For more information about how to query and analyze log data in the Log Service console, see Log search overview and Log analysis overview.
Flow logs are supported by Enterprise Edition transit routers only in some regions. For more information, see Limits.
Log queries and analysis may incur fees. For more information, see Billing overview.
By default, Alibaba Cloud accounts have full permissions on resources, and RAM users do not have permissions on resources. If a RAM user needs to manage Log Service, the Alibaba Cloud account must grant the required permissions to the RAM user. For more information, see Create a RAM user and authorize the RAM user to access Simple Log Service.
Cloud resource configuration auditing
Cloud Config is an auditing service that can trace and audit resource configurations. It monitors the compliance status of cloud resources to make sure that your infrastructure complies with laws and regulations.
CEN is integrated with Cloud Config, which is free of charge. Cloud Config supports only some Alibaba Cloud services. Some of your resources may not be on the resource list. For more information about the CEN resources supported by Cloud Config, see Supported Cloud Services.
Cloud Config can audit the operations performed by your Alibaba Cloud account and all RAM users created by your Alibaba Cloud account. By default, configuration changes are recorded every 10 minutes.
You can view operations performed on CEN resources in the Cloud Config console. For more information, see View the resource list.
Cloud Config can deliver resource configuration changes and compliance violation events to specified Logstores of Log Service in which you can query and analyze log data. You can deliver CEN configuration changes and compliance violation events to Log Service for data query and analysis. This ensures that CEN complies with laws and regulations. For more information, see Deliver resource data to a Simple Log Service Logstore.
Cloud resource operation auditing
ActionTrail is a service that can monitor and record the operations performed by Alibaba Cloud accounts. For example, console operations, OpenAPI calls, and developer tool operations are recorded. ActionTrail records these actions as events. You can download these events and deliver them to Log Service or Object Storage Service (OSS). Then, you can perform operations such as behavior analysis, security analysis, resource change tracking, and compliance auditing based on the events.
CEN is integrated with ActionTrail. For more information about the CEN events that can be audited by ActionTrail, see Auditable events of CEN. For more information about the data recorded by ActionTrail, see Management event structure. For more information about how to query events, see Get started with the event query feature.
By default, ActionTrail tracks and retains events from the last 90 days. If you need to retain events for a longer period of time, create a trail to deliver events to Log Service or OSS. For more information, see Getting Started.
After you create a trail to deliver events to a Log Service Logstore or an OSS bucket, you can query or analyze the events in the Log Service or OSS console. For more information, see Query events in the Log Service or OSS console.
If you need to trace a historical event, submit a ticket to request the required permissions.