Build a Full-stack Monitoring System
ARMS provides full-stack performance monitoring and alerting and end-to-end tracing analysis. It monitors and analyzes user behavior and page performance in client environments such as browsers, mini programs, and mobile apps, to improve user experience. It monitors service calls, database queries, and system loads in distributed or microservices architectures and container-based or serverless deployment environments. With end-to-end tracing analysis, ARMS implements comprehensive monitoring and optimization for application performance.
End-to-End Multi-scenario Coverage
Covers a wide range of monitoring scenarios such as network quality, web applications, mini programs, backend applications, containers, cloud services, and infrastructure.
Centralized Display and Analysis
Builds a centralized O&M and monitoring dashboard to provide multiple models for analyzing the root causes of bottlenecks.
End-to-End Tracing Analysis
Supports end-to-end tracing based on full samples. This provides a basis for troubleshooting.
Centralized Alert Management
Builds a centralized alert management system for AI-enabled alert management and emergency coordination.
Open Source Support
Supports open source standards such as OpenTelemetry, Prometheus, and Grafana.
High Availability and Cost-efficiency
Provides low-consumption agents and a high-availability platform, and supports centralized billing by GB. This reduces monitoring costs.
Features
Browser Monitoring
Provides performance monitoring for web applications, mini programs, and mobile apps.
Application Overview
Provides an out-of-the-box application overview dashboard based on Grafana. The dashboard displays key metrics such as the number of sessions, page views (PVs) and unique visitors (UVs), access speed, JavaScript error rate, and crash rate of frontend applications in real time.
Application Details Monitoring
Provides monitoring dashboards for data exploration, session tracing, page or resource loading, and API request details.
Application Diagnostics
Provides JavaScript error diagnostics and crash or application not responding (ANR) analysis.
Synthetic Monitoring
Simulates the real users of multiple Internet service providers (ISPs) in different regions to monitor websites and APIs.
Multiple Monitoring Types and Nodes
Supports different monitoring types such as Elastic Compute Service (ECS) instances, PCs, and mobile clients, and has more than 200,000 user nodes worldwide, more than 500 monitoring points across IDCs owned by more than 400 ISPs, and hundreds of thousands of registered members. This way, the monitoring range can adapt to your business types and scale.
Network Quality Diagnostics
Automatically discovers distributed traces associated with monitoring requests in different monitoring scenarios by using tracing analysis. This way, the root causes of failed and slow requests can be accurately identified.
Application Monitoring
Provides performance monitoring and tracing analysis for Java applications.
Application Details Monitoring
Provides monitoring dashboards for Java virtual machines (JVMs), thread pools, pods, servers, and SQL calls.
Trace Data Analysis
Allows you to analyze the stored full trace data in real time based on filter conditions or aggregation dimensions. This way, the requirements for custom diagnostics in various scenarios can be met.
Application Diagnostics
Provides real-time diagnostics, Arthas diagnostics, exception analysis, and log analysis.
eBPF-based Application Monitoring
Provides eBPF-based non-intrusive multi-language application performance monitoring for Kubernetes clusters.
Application Overview
Displays all identified and integrated application services and their calls in a panoramic topology.
Application Details Monitoring
Provides dashboards and analysis for application topologies, dependent services, instance monitoring, trace data, and events.
Intelligent Alerting
Builds a centralized alert management system by aggregating multiple alert sources.
Alert Overview
Displays key alert metrics, alert statistics overview, and typical emergency response metrics. This way, you can monitor the health status of your business in real time.
Alert Integration
Provides abundant components for integrating cloud service providers and mainstream monitoring systems, and supports multiple notification methods such as text messages, phone calls, DingTalk, emails, and Fetion, as well as multiple collaboration tools such as Aone, Jira, and PagerDuty.
Alert Collaboration
Supports alert collaboration. ARMS supports multiple alert policies, such as notification, escalation, silence, and suppression policies. ARMS allows you to use GUI-based event processing flows to orchestrate procedures and process reported alert events. This meets your specific requirements for event handling in various scenarios. ARMS also supports full lifecycle management for alerts in an instant messaging (IM) tool or the console.
Sub-services
ARMS - Application Monitoring
Provides end-to-end tracing analysis and code-level real-time performance monitoring.
Features
- Monitors the health status of applications in real time.
- Sorts service dependencies.
- Reduces the latency and eliminates failures.
ARMS - Browser Monitoring
Applies to different clients such as web applications, websites, and mini programs.
Features
- Applies to iOS apps, Android apps, web applications, and mini programs.
- Supports session statistics and exception tracing.
- Associates API requests with backend services.
ARMS - Synthetic Monitoring
Simulates the real users of multiple ISPs in different regions to monitor websites and APIs.
Features
- Provides hundreds of thousands of monitoring points worldwide.
- Supports different monitoring types such as ECS instances, PCs, and mobile clients.
- Integrates tracing analysis and alert management.
Scenarios
Scenarios and Requirements
Provides performance monitoring and user experience analysis for web applications, websites, mini programs, and mobile apps.
Benefits
-
Routine Inspection
Simulates the users of multiple ISPs in different regions based on ECS instances, PCs, and mobile clients, and monitors web applications, websites, and APIs to identify network quality fluctuation and website or API unavailability at the earliest opportunity.
-
User Experience Analysis
Analyzes the key performance metrics of applications during network request initiation, page loading, and resource loading, and traces stack details for the exceptions that affect user experience, such as application crashes, Application Not Responding (ANR) errors, and stuttering issues. This helps locate the scope of the impacts that may be caused by these exceptions and improve user experience and application performance.
-
Tracing Analysis
Associates API requests with backend services, and allows you to analyze the traces between frontend requests and backend services to identify the performance bottlenecks of network requests.
Scenarios and Requirements
Provides performance monitoring and tracing analysis for multi-language, distributed, and microservices applications. Multiple programming languages such as Java, PHP, and Node.js are supported.
Benefits
-
Multiple Programming Languages and Access Methods
Provides multiple access methods for different deployment environments such as ECS instances, serverless architectures, and containers, and supports multiple programming languages including Java, PHP, and Node.js.
-
Global Topology
Displays the health status of applications, services, and servers, displays the upstream and downstream dependencies of applications, and allows you to quickly identify the services that caused failures, applications affected by the failures, and associated servers.
-
Application Details
Monitors JVMs, thread pools, servers, and pods to identify service exceptions at the earliest opportunity.
-
Application Diagnostics
Provides capabilities such as real-time diagnostics, exception analysis, log analysis, and Arthas diagnostics to quickly identify root causes.
-
End-to-End Tracing Analysis
Allows you to analyze the stored full trace data in real time based on filter conditions or aggregation dimensions. This way, the requirements for custom diagnostics in various scenarios can be met.
Recommended Services
Scenarios and Requirements
Monitors the metrics for multi-cloud container clusters, cloud services, and self-managed services in a centralized manner.
Benefits
-
Full-stack Metric Monitoring
Monitors the multi-cloud and multi-cluster metrics, metrics for cloud services, metrics for the system layer and self-managed application component layer, and custom business metrics.
-
Centralized Access to Cloud Services
Allows you to access various cloud services. ARMS provides data source configurations and preset dashboards for cloud services to display monitoring data in a centralized manner. ARMS also provides Prometheus Grafana dashboards for mainstream cloud services, such as Container Service for Kubernetes (ACK) and ApsaraMQ for Kafka, to help O&M teams perform finer-grained metric monitoring.
Recommended Services
Scenarios and Requirements
Manages multiple data sources in a centralized manner to display end-to-end monitoring data.
Benefits
-
Centralized Access to Multiple Data Sources
Integrates various data sources such as Alibaba Cloud services, SQL databases, time series databases, logs, traces, and enterprise applications. You can use plug-ins to integrate data sources with ease. ARMS also allows you to use virtual private cloud (VPC) data channels to access data across clouds, regions, and VPCs.
-
Preset Visualized Plug-ins and Dashboard Templates
Provides approximately 100 chart and table components and dozens of dashboard templates to help you display and analyze different types of data in different scenarios.
Recommended Services
Scenarios and Requirements
Manages multiple alert sources in a centralized manner for cross-platform and cross-team emergency collaboration.
Benefits
-
Integration of Alert Sources and Notification Methods
Provides abundant components for integrating Alibaba Cloud Simple Log Service, Prometheus, ARMS, and mainstream open source monitoring systems, and supports multiple notification methods such as text messages, phone calls, DingTalk, emails, and Fetion, as well as multiple collaboration systems such as Aone, Jira, and PagerDuty.
-
Centralized Management
Provides centralized alert management. ARMS supports multiple alert policies, such as notification, escalation, silence, and suppression policies, and allows you to define event matching rules to accurately identify alert events. This way, alert notification policies of the same type can be configured in a centralized manner. ARMS allows you to use GUI-based event processing flows to orchestrate simple procedures and process alert events reported by an alert source. This meets your specific requirements for event handling in various scenarios.
-
Alert Event Statistics and Analysis
Allows you to analyze the generated alert event data in real time based on filter conditions. This way, the requirements for custom analysis and diagnostics in various scenarios can be met.