All Products
Search
Document Center

Microservices Engine:Monitoring center

Last Updated:Mar 11, 2026

When your Nacos engine handles service registration, configuration distribution, and push notifications at scale, you need real-time visibility into performance bottlenecks, capacity limits, and infrastructure health. The Monitoring Center in Microservices Engine (MSE) provides unified dashboards that track these metrics across eight categories, so you can detect anomalies and resolve issues before they affect your services.

Prerequisites

Before you begin, make sure that you have:

Choose a dashboard

MSE provides two monitoring dashboards. The Grafana dashboard is recommended because it covers significantly more metrics.

DashboardMetrics coverageDefault time rangeRecommended for
Grafana dashboard8 metric categories across dedicated tabsLast 15 minutesAll users on Professional Edition
Legacy dashboard3 basic metrics (service count, provider count, write RT)Last 30 minutesUsers who have not yet upgraded

If your engine still uses the legacy dashboard, upgrade to the Grafana dashboard for full observability.

Enable the Grafana dashboard

Basic Edition engines

The Grafana dashboard is automatically enabled after you upgrade your engine to Professional Edition. For more information, see Upgrade a Nacos version.

Professional Edition engines (version 2.0.3 or earlier)

If your engine runs version 2.0.3 or earlier, enable the Grafana dashboard manually:

  1. Log on to the MSE console and select a region in the top navigation bar.

  2. In the left-side navigation pane, choose Microservices Registry > Instances.

  3. Click the name of the target instance.

  4. In the left-side navigation pane, click Observation Analysis.

  5. Click Upgrade Monitoring Dashboard and follow the on-screen instructions.

    Upgrade Monitoring Dashboard

After the upgrade completes, the Monitoring Center page becomes available.

Use the Grafana dashboard

The Grafana dashboard organizes metrics into purpose-built tabs. Start with the Overview tab to spot anomalies, then drill into specific tabs to investigate root causes.

Investigation goalStart here
Quick health checkOverview and Top N Monitoring
Service registry issuesRegistry Monitoring and Push monitoring
Configuration issuesConfiguration center monitoring
Infrastructure problemsJVM Monitoring and Resource Monitoring
Connection issuesNumber of connections monitoring

Open the Monitoring Center

  1. Log on to the MSE console and select a region in the top navigation bar.

  2. In the left-side navigation pane, choose Microservices Registry > Instances.

  3. On the Instances page, click the name of the target instance.

  4. In the left-side navigation pane, click Monitoring Center.

Dashboard controls

  • Time range: The default monitoring window is the last 15 minutes. To change it, click the time range selector in the upper-right corner and choose a preset or custom range.

  • Data granularity: Hover over any point on a chart to see per-node metric values at that moment, accurate to the minute.

  • Refresh: Click the Refresh icon in the upper-right corner to reload the current data.

Overview tab

The Overview tab provides a high-level summary of your engine's health. Use it as your first stop during routine checks and incident triage.

Overview section

MetricWhat it measuresWhat to look for
Number of nodesTotal engine nodes in the clusterA sudden drop indicates a node failure. Verify that the count matches your expected cluster size.
Number of configurationsTotal configuration entries managed by the engineUnexpected changes may indicate unauthorized configuration updates.
Number of Service ProvidersTotal registered service provider instancesA sudden drop suggests provider instances are deregistering, which may point to deployment issues or network failures.
Queries per secondRead request throughput (QPS)Spikes beyond your baseline may indicate a traffic surge. A drop to zero may indicate engine unavailability.
Operations per secondWrite request throughput (TPS)A sustained spike may indicate a batch update or a runaway client.
Number of connectionsActive client connections to the engineCompare against baseline. A sudden drop may indicate network partitioning.

Usage level section

MetricWhat it measuresWhat to look for
Configuration number using water levelConfiguration count as a percentage of the engine's capacityValues approaching 100% mean the engine is nearing its capacity limit. Scale up the engine specification before saturation.
Service Provider Water LevelService provider count as a percentage of the engine's capacitySame as above. Plan capacity upgrades when usage is consistently high.
Connection using water levelConnection count as a percentage of the engine's capacitySame as above. High connection usage can cause new clients to fail to connect.
Note

The Eureka client supports only short-lived connections, so connection counts are not reported for Eureka-based applications.

Registry Monitoring tab

The Registry Monitoring tab tracks service registration and discovery performance. Use it to diagnose slow service discovery, registration failures, or capacity issues.

MetricWhat it measuresWhat to look for
Number of ServicesTotal registered servicesA sudden drop indicates services are deregistering unexpectedly.
Number of Service ProvidersTotal service provider instancesCompare against your expected deployment size. A mismatch indicates registration failures.
Number of service subscribersTotal service subscriber instancesA rapid increase may indicate a subscriber storm from a misconfigured client.
Registration Center TPSWrite transactions per second for registration operationsA sustained spike combined with rising write RT signals resource contention.
Registration Center QPSRead queries per second for discovery operationsSpikes here correlate with increased service discovery requests from new deployments or scaling events.
Registration Center Write RTAverage response time for write operationsRising write latency warrants investigation. Check the Resource Monitoring and JVM Monitoring tabs to identify the bottleneck.
Registration Center Read RTAverage response time for read operationsSame as write RT. Rising read latency may indicate increased load or garbage collection pressure.
Note

Nacos 2.0.4 and later include four built-in services for address discovery using the Diamond protocol (Application Configuration Management). The service count and provider count shown here are the actual values plus 4.

Note

The Eureka client does not support service subscription and uses polling queries instead. Service subscriber counts are not reported for Eureka-based applications.

Configuration center monitoring tab

The Configuration center monitoring tab tracks configuration management performance. Use it to investigate slow configuration pushes, listener accumulation, or write bottlenecks.

MetricWhat it measuresWhat to look for
Number of configurationsTotal configuration entriesA sudden change may indicate a batch import or accidental deletion.
Configure the number of listenersTotal configuration listeners across all entriesA spike typically corresponds to a batch deployment rollout. Sustained high listener counts increase push overhead.
Configuration Center TPSWrite transactions per second for configuration changesA spike combined with rising write RT signals the engine is under write pressure.
Configuration Center QPSRead queries per second for configuration lookupsHigh QPS may indicate clients are polling too aggressively rather than using push-based updates.
Configuration Center Write RTAverage response time for configuration writesHealthy values are in the low millisecond range. Rising values indicate resource contention.
Configuration Center Read RTAverage response time for configuration readsSame as write RT.

Push monitoring tab

The Push monitoring tab tracks how effectively the engine pushes service change notifications to subscribers. A healthy push pipeline is critical for service discovery responsiveness.

MetricWhat it measuresWhat to look for
Service Push Success RatePercentage of push notifications delivered successfullyA rate below 100% warrants immediate investigation. Check the Number of connections monitoring tab for connectivity issues and verify that subscribers are reachable.
Time-consuming service pushAverage latency per push notificationRising latency may indicate network congestion or overloaded subscriber clients.
Service Push TPSPush notifications sent per secondCorrelate with deployment or scaling events. A sustained spike without a corresponding event may indicate a push storm.
Service Empty ProportionPercentage of pushes with empty service listsA non-zero proportion may indicate services are deregistering unexpectedly. Check the Registry Monitoring tab to confirm provider counts.
Note

The Eureka client uses polling queries rather than push-based notifications, so push metrics are not available for Eureka-based applications.

Number of connections monitoring tab

The Number of connections monitoring tab tracks client connectivity to the engine. Use it to diagnose connection drops, version inconsistencies, and network issues.

MetricWhat it measuresWhat to look for
Number of client versionsDistribution of Nacos client versions connected to the engineMultiple old versions may indicate inconsistent deployments. Standardize client versions to avoid compatibility issues.
Number of Long LinksActive persistent (long) connections between clients and the engineA drop may indicate network issues or client-side failures. Cross-reference with the Resource Monitoring tab to check for network traffic anomalies.
Note

The Eureka client supports only short-lived connections, so connection metrics are not reported for Eureka-based applications.

JVM Monitoring tab

The JVM Monitoring tab exposes garbage collection (GC) and memory metrics for the engine's Java Virtual Machine (JVM). Use it to diagnose latency spikes caused by GC pressure or memory exhaustion.

MetricWhat it measuresWhat to look for
Young GC TimeTotal time spent on young generation garbage collectionA sustained increase correlates with higher object allocation rates.
Young GC TimesNumber of young generation GC eventsFrequent young GC is normal under load, but a sudden increase may indicate a memory leak or traffic spike.
Full GC timeTotal time spent on full garbage collectionAny full GC causes a stop-the-world pause. Frequent full GC events directly increase response times.
Full GC TimesNumber of full GC eventsAny full GC causes a stop-the-world pause. Frequent full GC events directly increase response times. If this metric is consistently elevated, consider upgrading the engine specification.
Heap Memory UsagePercentage of heap memory in useHigh heap memory usage increases GC frequency and response times. Consider upgrading the engine specification if usage remains high.

Resource Monitoring tab

The Resource Monitoring tab provides infrastructure-level metrics for the engine nodes. Use it to determine whether performance issues are caused by resource constraints.

MetricWhat it measuresWhat to look for
Inlet flowInbound network trafficA sudden spike may point to a traffic surge.
Outlet flowOutbound network trafficCorrelate with push TPS. High outbound traffic with low push success rate may indicate network saturation.
Memory UsageSystem memory utilizationSustained high usage indicates the engine may need a specification upgrade.
CPU UsageCPU utilizationSustained high usage indicates the engine may need a specification upgrade.
Number of nodesCurrent node count in the clusterA drop indicates a node failure.
Load IndicatorSystem load averageA high load average relative to the number of CPU cores indicates the engine may be overloaded.

Top N Monitoring tab

The Top N Monitoring tab highlights the most active services and configurations. Use it to identify hotspots that consume disproportionate resources.

Service Top N Dashboard

MetricWhat it measuresWhat to look for
Number of service providers TopNServices with the most provider instancesIf a single service dominates, evaluate whether it should be split into smaller services.
Number of service subscribers TopNServices with the most subscriber instancesExcessive subscribers on one service increase push overhead.
IP Push Failure Times TopNClient IPs with the most push notification failuresRecurring IPs may indicate specific clients with network or configuration problems.

Configure TopN Dashboard

MetricWhat it measuresWhat to look for
Number of Configuration Changes TopNConfigurations with the most frequent changesFrequent changes to a single configuration may indicate a misconfigured automation pipeline.
Configure the number of listeners TopNConfigurations with the most listenersConfigurations with excessive listeners increase push overhead. Consider splitting the configuration if possible.

Advanced features

Managed Service for Grafana integration

For advanced observability, click Using Grafana Expert Edition in the upper-right corner to open the Managed Service for Grafana console. This provides multi-tenant Grafana dashboards with additional visualization and alerting options.

Embed monitoring pages

To embed a specific monitoring tab in an external dashboard or portal, click Open in New Window XX (where XX is the tab name) in the upper-right corner. This opens the tab in a standalone page with a shareable URL.

For example, on the Registry Monitoring tab, click Open in New Window Registry Monitoring to open the registry monitoring view in a separate browser tab.

Use the legacy dashboard

If the Grafana dashboard is not enabled, the legacy dashboard provides a limited set of metrics. To access the full set of monitoring capabilities, upgrade to the Grafana dashboard.

  1. Log on to the MSE console and select a region in the top navigation bar.

  2. In the left-side navigation pane, choose Microservices Registry > Instances.

  3. On the Instances page, click the name of the target instance.

  4. In the left-side navigation pane, click Observation Analysis.

  5. Click the Monitoring tab. The following metrics are displayed:

    MetricWhat it measures
    Number of servicesTotal registered services
    Number of service providersTotal service provider instances
    Average response time (RT) of the service write interface (ms)Average write latency in milliseconds

Legacy dashboard controls:

  • Time range: The default monitoring window is the last 30 minutes. Preset options include Last 30 minutes, Last 1 hour, Last 6 hours, and Last 24 hours. Custom time ranges are also supported.

  • Node filtering: Monitoring data of three nodes in the engine is displayed in different colors. Click a node name in the chart legend to show or hide that node's data. At least one node must remain visible.

  • Data granularity: Hover over any point on a chart to see metric values of the three nodes at that moment, accurate to the minute.

  • Refresh: Click the Refresh icon in the upper-right corner to reload the current data.