This topic explains how to address the issue of being unable to connect to a Linux Elastic Compute Service (ECS) instance.
Emergency Logon to Linux Instances: In an emergency that requires immediate access to a Linux instance for O&M operations, you can initially log on to the instance using VNC. For more information, see Connecting to Instances Using VNC.
Cause
There are many potential causes for SSH remote logon failures, including the PAM framework, security group settings, and SSH configurations. You can diagnose and resolve the issue according to your specific circumstances.
No specific error message
Use the self-service troubleshooting tool to identify issues
Begin by using the self-service troubleshooting tool to identify issues with your instance. Then, follow the provided guidance to resolve the issue. The steps for using the self-service troubleshooting tool are as follows:
Log on to the ECS console.
In the left-side navigation pane, click Troubleshooting.
In the top navigation bar, select the region and resource group to which the resource belongs.
-
Under the Instance Troubleshooting tab, select Instance Cannot Connect or Start Abnormally and follow the prompts to enter the details of the instance you want to check. The configuration items are described below:
-
Specific Issue: The description is as follows.
Issue
Description
Workbench Unable to Connect via Private Network
(Recommended) When using the Workbench tool, you cannot connect to the instance through its private IP address.
Workbench Unable to Connect via Public Network
(Recommended) When using the Workbench tool, you cannot connect to the instance through its public IP address.
SSH Connection Failure
(Recommended) You cannot connect to the instance using a third-party SSH tool.
Remote Connection to Instance Unavailable
Troubleshoot the issue that prevents the instance from being connected remotely.
-
Configuration of the Instance to Be Checked: The description is as follows.
When selecting Workbench Cannot Connect Over The Private Network, Workbench Cannot Connect Over The Public Network, or SSH Cannot Connect, you must complete the subsequent options.
Configuration item
Description
Example
VPC
Select the VPC where the instance is located.
vpc-bp1******
Requester VPC
Set the IP address of the host from which you initiate the SSH connection.
NoteWhen you select Workbench Cannot Connect Over The Private Network or Workbench Cannot Connect Over The Public Network, the information is automatically populated and does not need to be modified.
If you do not know the IP address of your local machine, you can visit
https://cip.cc/
to obtain it.
47.***.***.***
Destination Instance
Select the instance to be connected remotely, which is the instance to be checked.
i-******
Target Port
The SSH remote connection port of the destination instance (default is 22).
22
-
-
Click Start Troubleshooting, wait for the system to diagnose the issue, and once the diagnosis is finished, follow the on-screen instructions to resolve the problem.
Manually troubleshoot issues
If you do not receive an error message from the system when a remote connection fails, you can manually troubleshoot the issue by following these steps:
Step 1: Use Alibaba Cloud Workbench to test remote logon
Use Alibaba Cloud Workbench to connect to the instance. If an exception occurs during remote logon, Workbench will return a specific error message and solution. The test steps are as follows:
Log on to the ECS console.
In the left-side navigation pane, choose .
In the top navigation bar, select the region and resource group to which the resource belongs.
-
On the instance list page, locate the instance you want to connect to and click Actions in the Remote connection column.
-
In the pop-up Remote connection dialog box, click Workbench and then click Sign in now.
-
Verify whether you can connect to the instance.
Workbench automatically populates the basic information required for logon to the target instance. Confirm that the information is correct, enter the username and authentication information, and proceed based on the following results. For more information on connecting to a Linux instance using Workbench, see Connecting to a Linux Instance Using Workbench.
-
If you still cannot log on, Workbench will return an error message and solution. Follow the system prompts to resolve the issue. After resolving, test the remote logon with Workbench again. For assistance, here are some common exceptions when using Workbench: Issues Connecting to Instances Using VNC
-
If you can log on normally using Workbench, it indicates that the SSH service on the target instance is functioning properly, ruling out the possibility of an SSH server-side issue. Continue to Step 2: Check the Network for further troubleshooting.
-
Step 2: Check the network
If you cannot connect to a Linux instance, check the network connectivity of the instance.
-
Attempt to connect to the instance from computers within different CIDR blocks or from different network providers to determine if the issue lies with the local network or the server side.
-
If the issue is related to your local network or your internet service provider, contact your IT department or service provider.
-
If the issue is with a network interface card (NIC) driver, reinstall the driver.
-
-
Run the ping command from your local client to test the network connectivity to the instance.
-
If a network exception occurs, capture packets for analysis. For more information, see How to Capture Packets for Network Exception Analysis.
-
If packet loss occurs or the instance is unreachable, use tools such as
tracert
ormtr
to perform link tests and identify the root cause. For more information, see Using the MTR Tool to Analyze Network Links. -
If intermittent packet loss occurs and the network connectivity of the ECS instance remains unstable, the instance may be infected with a virus. For more information, see Testing Intermittent Packet Loss of an ECS Instance's IP Address Using the Ping Command
-
If ping is not disabled in the kernel of the Linux instance and you cannot ping the instance from your local client, the firewall within the instance may be configured to block traffic from the client.
For more information, see Solutions for Ping Failure in ECS Instances with Ping Enabled in the Linux Kernel.
-
Step 3: Check ports and security groups
Ensure that the necessary connection ports are open in the security groups associated with the instance.
Log on to the ECS console.
In the left-side navigation pane, choose .
In the top navigation bar, select the region and resource group to which the resource belongs.
-
On the Instance List page, click the corresponding instance ID.
-
Under the Security Groups tab, click Manage Rules in the Operation column of the security group.
-
On the Security Group Rule page, you can select from several methods to add an inbound security group rule. For more information, see adding security group rules.
-
Method 1: Quickly Add a Security Group Rule
-
Authorization Policy: Allow
-
Port Range: SSH (22)
-
Authorization Object: Set this to your local IP address. You can visit
https://cip.cc/
to retrieve your local IP address.
-
-
Method 2: Manually Add a Security Group Rule
-
Authorization Policy: Allow
-
Priority: 1 (the highest priority for security rules, with smaller numbers indicating higher priority)
-
Protocol Type: Custom (TCP)
-
Port Range: SSH (22)
-
Authorization Object: Set this to your local IP address. You can visit
https://cip.cc/
to retrieve your local IP address.
-
-
-
Execute the following command to verify that the port is functioning as expected.
telnet [$IP] [$Port]
Note-
[$IP] represents the IP address of the Linux instance.
-
[$Port] represents the RDP port number of the Linux instance.
The system should display a message similar to the following. For example, if you run the
telnet 192.168.0.1 22
command, the expected result is similar to the following.Trying 192.168.0.1 ... Connected to 192.168.0.1. Escape character is '^]'
If the port test fails, refer to Port Availability Detection when the Ping Command Works but the Port is Unavailable for troubleshooting.
-
Step 4: Check CPU load, bandwidth, and memory usage
If you are unable to connect to a Linux instance, it may be due to high CPU load, insufficient public bandwidth, or low memory.
-
Examine the CPU load on the instance and take action based on the results.
-
In case of high CPU load.
When applications hosted on the instance frequently perform disk read/write operations, initiate numerous network requests, or generate compute-intensive workloads, a high CPU load is anticipated. Upgrading the instance type can alleviate resource bottlenecks. For more information, see an overview of upgrade and downgrade methods.
NoteFor additional guidance on addressing high CPU load, see Query and Case Analysis of CPU Load in Linux.
-
If the CPU load is not high, proceed to the next step.
-
-
Address insufficient public bandwidth issues.
If you are unable to connect to a Linux instance, it may be due to insufficient public bandwidth. To troubleshoot, follow these steps:
Log on to the ECS console.
In the left-side navigation pane, choose .
In the top navigation bar, select the region and resource group to which the resource belongs.
-
On the Instance List page, click the corresponding instance ID. Then, in the Configuration Information section, check the Public Bandwidth details.
If the bandwidth is 0 Mbps, it indicates that no public bandwidth was allocated when the instance was created. You can upgrade the bandwidth to resolve the issue. For more information, see Modifying the Peak Public Bandwidth.
-
Investigate insufficient memory issues.
If the desktop environment does not display correctly for the Linux instance or the instance exits unexpectedly after you connect, the instance may have low memory. Check the memory usage of the instance as follows:
-
Log on to the Linux instance using VNC.
For detailed instructions, see Logging on to a Linux Instance Using Password Authentication.
-
Monitor the memory usage. If the instance has insufficient memory, consider upgrading the instance type to address the resource bottleneck. For more information, see Overview of Upgrade and Downgrade Methods.
-
Specific error message exists
Error messages typically appear when a connection failure occurs. These messages can help identify and resolve the underlying issue.
PAM security framework
The PAM framework in Linux enforces access control policies, such as account and logon policies, by loading appropriate security modules. Invalid configurations or triggered policies can result in SSH logon failures. To troubleshoot, refer to the error message and consider the following scenarios:
Linux instance system environment configuration
System environment issues in a Linux instance, such as virus infections, invalid account configurations, and environment misconfigurations, can also lead to SSH logon failures. Use the error message to guide your troubleshooting, considering these examples:
-
If the SSH service fails to start with the error "main process exited, code=exited", follow the provided link for troubleshooting steps.
-
System exceptions may arise following an SSH logon to a Linux instance because of Ulimit constraints
-
Error When Logging into a Linux ECS Instance via SSH Command
-
SSH remote connection issues may arise when the SELinux service is enabled on a Linux instance
SSH service and parameter configuration
The default SSH service configuration file is /etc/ssh/sshd_config
. Invalid parameter settings or enabled features and policies in this file can prevent successful SSH logon. Address these issues by referring to the error message and the following cases:
-
If you encounter the error message "User root not allowed because not listed in" when using the SSH command to log on to a Linux instance, it indicates a permissions issue.
-
If you attempt to log on to a Linux instance as the root user via SSH and receive the error message "Permission denied, please try again", it indicates an issue with your access permissions.
-
If you encounter the error message "Too many authentication failures for root" when attempting to log on to an instance via SSH, refer to this troubleshooting guide.
-
Starting the SSH service results in the error "error while loading shared libraries"
-
If you encounter the error "fatal: Cannot bind any address" when starting the SSH service on a Linux ECS instance, refer to this troubleshooting guide.
-
If the SSH service fails to start with the error message "Bad configuration options", refer to the linked document for troubleshooting steps.
-
SSH logon or data transmission speed may be slow due to the enabling of UseDNS in SSH configuration
SSH service directory or file configuration
The SSH service verifies directory and file permissions and groups at runtime for security. Incorrect permissions can disrupt the SSH service and prevent client logon. Resolve these issues by referring to the error message and the following scenarios:
-
If you encounter the error message "No supported key exchange algorithms" when using the SSH command to log on to a Linux instance, refer to this document for troubleshooting steps.
-
If the SSH service startup fails with the error message "must be owned by root and not group or world-writable," it indicates a permissions issue.
SSH service key configuration
SSH relies on asymmetric key encryption for secure data exchange. The client and server must validate keys to ensure message integrity and encryption. If you encounter key-related issues, refer to the following case:
If you encounter the error message "Host key verification failed" when attempting to log on to an ECS instance via SSH, refer to this troubleshooting guide.