How to troubleshoot MSE Nacos instance connection failures - Microservices Engine

When an application fails to connect to a Microservices Engine (MSE) Nacos instance, the root cause is typically a network issue, a misconfigured endpoint, or a client-server version mismatch. This topic walks you through each scenario with targeted diagnostic steps.

Symptoms

A failed connection attempt produces one of the following error messages:

Client not connected,currentstatus:STARTING
Client not connected,currentstatus:UNHEALTHY
no available server, currentServerAddr: xxxxx
Connection refused

Quick diagnostics

Before you investigate error-specific fixes, verify basic network connectivity between your client and the MSE Nacos instance.

Port architecture

MSE Nacos requires two ports. Make sure both are open in your security group and firewall rules.

Port	Protocol	Purpose	Offset from main port
8848	HTTP	Client HTTP requests (service registration, configuration)	0 (main port)
9848	gRPC	Client gRPC requests (Nacos 2.x and later)	+1000

Test connectivity

Run the following commands from the client machine to verify that both ports are reachable:

ping ${mse.nacos.host}
telnet ${mse.nacos.host} 8848
telnet ${mse.nacos.host} 9848
curl ${mse.nacos.host}:8848/nacos/v1/ns/service/list

If ping fails, the hostname cannot be resolved or the host is unreachable. Check your DNS settings and network configuration.
If telnet to port 8848 succeeds but port 9848 fails, a firewall or security group rule is likely blocking the gRPC port.
If the curl request returns a valid JSON response, HTTP connectivity to the Nacos server is working.

Verify the endpoint configuration

Confirm that your application points to the correct MSE Nacos endpoint and port. The endpoint is typically set in one of the following locations:

Spring Boot / Spring Cloud Alibaba (application.properties or application.yml):

spring:
  cloud:
    nacos:
      server-addr: <mse-nacos-endpoint>:8848

Dubbo (dubbo.properties or XML configuration):

dubbo.registry.address=nacos://<mse-nacos-endpoint>:8848

JVM arguments:

-Dnacos.server-addr=<mse-nacos-endpoint>:8848

Environment variables:

export NACOS_SERVER_ADDR=<mse-nacos-endpoint>:8848

Replace <mse-nacos-endpoint> with the actual endpoint of your MSE Nacos instance. You can find this value on the instance details page in the MSE console.

Note

If you deployed your application through Enterprise Distributed Application Service (EDAS) or Serverless App Engine (SAE), turn on Use Configured Registration Center in the deployment settings.

Troubleshoot by error message

`Client not connected,currentstatus:STARTING`

Cause: The Nacos client failed to establish a gRPC connection with the server. This typically occurs when a Nacos 2.x client connects to a Basic Edition MSE instance, which does not support gRPC. Nacos 2.x clients require the Professional Edition.

Resolution:

Confirm that the endpoint and port are correct. See the "Verify the endpoint configuration" section.
Check your MSE instance edition:
- Nacos 2.x clients require the Professional Edition.
- If your instance is a Basic Edition, upgrade to the latest Professional Edition. For instructions, see Upgrade a Nacos version.
If the edition is correct, test gRPC port connectivity: If this fails, check your security group rules and firewall settings to make sure port 9848 (gRPC) is open.
```
   telnet ${mse.nacos.host} 9848
```

`Client not connected,currentstatus:UNHEALTHY`

Cause: The client established a connection but lost it and cannot reconnect. Common causes include server-side resource exhaustion, such as high CPU load, frequent full garbage collections (GCs), or exceeded connection limits.

Resolution:

Check the server resource metrics. See the "Check server-side metrics" section for detailed steps.
If CPU load or memory usage is near or above 100%, upgrade the instance. For instructions, see Change instance specifications.
If metrics look normal, verify that the number of persistent connections has not exceeded the limit for your specification. See Estimate instance capabilities for limits.

`no available server, currentServerAddr: xxxxx`

Cause: The client cannot reach any server at the configured address. This typically points to a network issue or a misconfigured endpoint.

Resolution:

Run the connectivity tests in the "Test connectivity" section.
Compare the currentServerAddr value in the error message against the actual MSE Nacos endpoint. If they differ, locate and correct the incorrect value in your configuration files, environment variables, or JVM arguments.
If a virtual private network (VPN) is in use, verify that the VPN connection is active and routes traffic to the MSE Nacos instance correctly. Temporarily disable the VPN to test direct connectivity.

`Connection refused`

Cause: The client tried to connect to an endpoint that is not listening. The IP address in the error message reveals the actual target.

For example, Connection refused: /127.0.0.1:9848 indicates the client is connecting to localhost instead of the MSE Nacos instance.

Resolution:

Note the IP address or hostname in the error message:
- 127.0.0.1 or localhost -- the application is using a local address instead of the MSE Nacos endpoint.
- An unexpected IP address -- the endpoint is misconfigured.
Locate the incorrect address in the following places, then replace it with the correct MSE Nacos endpoint:
- Application configuration files (application.properties, application.yml, dubbo.properties)
- Environment variables on the client machine
- JVM startup arguments (-Dnacos.server-addr=...)
- Hardcoded values in the application source code

Check server-side metrics

If connectivity and configuration are correct but the problem persists, inspect the MSE Nacos instance metrics in the MSE console:

Go to the Monitoring Center page. For details, see Monitor engines.
On the Overview tab, check whether Queries per second or Operations per second exceeds the transactions per second (TPS) limit for your specification. See Estimate instance capabilities for TPS limits by specification.
On the Number of connections monitoring tab, check whether Number of long links exceeds the connection limit. See Estimate instance capabilities for connection limits by specification.
On the jvm Monitoring tab, check whether full GCs occur frequently. If No data is displayed, no full GCs have occurred.
On the Resource monitoring tab, check:
- Whether inbound traffic or outbound traffic exceeds your instance bandwidth. This is particularly relevant for instances with the Internet network type.
- Whether memory usage or CPU load is near or above 100%. Values at or above 100% trigger throttling.
If any metric exceeds its limit, upgrade the instance. For instructions, see Change instance specifications.