This topic describes how to troubleshoot the issue that you cannot connect to a Linux Elastic Compute Service (ECS) instance.
Causes
An SSH connection failure may be caused by various factors, such as the Pluggable Authentication Module (PAM) framework, security group settings, and SSH configurations. The following scenarios may occur when you cannot connect to a Linux instance. To troubleshoot the issue, perform operations based on the scenario.
You want to log on to the Linux instance
If you want to log on to the Linux instance to which you cannot connect, perform the following steps to check the status of the instance and then send a command to the Linux instance by using Cloud Assistant or log on to the instance by using Virtual Network Computing (VNC):
Step 1: Check the status of the instance
Check the status of the instance regardless of the cause of the connection failure. An instance can provide external services only if the instance is in the Running state. Perform the following steps:
Log on to the ECS console.
In the left-side navigation pane, choose .
In the top navigation bar, select the region and resource group to which the resource belongs.
Step 2: Log on to the instance by using VNC
If Cloud Assistant is unavailable or do not meet your business requirements, you can use VNC to log on to the instance.
Log on to the ECS console.
In the left-side navigation pane, choose .
In the top navigation bar, select the region and resource group to which the resource belongs.
On the Instances page, find the instance to which you want to connect and click Connect in the Actions column.
In the Remote connection dialog box, click Show Other Logon Methods. Then, click Sign in now in the VNC section.
Log on to the operating system of the instance.
Enter a username, such as root or ecs-user, and press the
Enter
key.Enter the password that corresponds to the username and press the
Enter
key.NoteThe characters of the password are hidden when you enter the password for the Linux instance. Make sure that you enter the correct password.
Step 3: Use Cloud Assistant to send a command to the Linux instance
Use Cloud Assistant provided by Alibaba Cloud to send a command to the Linux instance. Perform the following steps:
Log on to the ECS console.
In the left-side navigation pane, choose .
In the top navigation bar, select the region and resource group to which the resource belongs.
Enter a command and click Run to run the command on the Linux instance without the need to log on to the instance.
For more information about Cloud Assistant, see Overview.
No error message is returned
If no error message is returned when you cannot connect to a Linux instance that is in the Running state, perform the following steps to troubleshoot the issue:
Step 1: Use Alibaba Cloud Workbench to connect to the instance
Use Workbench to connect to the instance. If you cannot connect to the instance by using Workbench, Workbench returns an error message and the corresponding solution. Perform the following steps:
Log on to the ECS console.
In the left-side navigation pane, choose .
In the top navigation bar, select the region and resource group to which the resource belongs.
On the Instance page, find the Linux instance to which you want to connect. In the Actions column, click Remote connection.
In the Remote connection dialog box, click Sign in now in the Workbench section.
Check whether you can connect to the instance.
In the Instance Login dialog box, the basic information about the instance is automatically populated by Workbench. Make sure that the basic information is correct. Then, enter a username and authentication information for the instance. Perform actions based on whether you can use Workbench to connect to the instance. For information about how to use Workbench to connect to a Linux instance, see Connect to a Linux instance by using a password or key.
If you cannot use Workbench to connect to the instance, Workbench returns an error message and the corresponding solution. Resolve the issue based on the error message and the solution. After you troubleshoot the issue, use Workbench to connect to the instance. For information about the causes of and solutions to issues that may occur when you connect to instances by using Workbench, see the Use Workbench to connect to a Linux instance section of the "Issues that occur when VNC or Workbench is used to connect to an instance" topic.
If you can use Workbench to connect to the instance, SSH works as expected on the instance. In this case, proceed to Step 2: Check network connectivity.
Step 2: Check network connectivity
If you cannot connect to a Linux instance, check the network connectivity of the instance.
Use computers from different CIDR blocks or different operators to connect to the instance over other networks and check whether the issue is related to the on-premises network or the server side.
If the issue is related to your on-premises network or your carrier, contact your on-premises IT personnel or your operator.
If an exception occurs on a network interface (NIC) driver, re-install the driver.
Run the ping command on your on-premises client to test the network connectivity of the instance.
If a network exception occurs, capture packets to analyze the cause of the exception. For more information, see Capture packets when a network exception occurs.
If the ping packet is dropped or the instance cannot be pinged, use the
tracert
orMTR
tool to test the network path and identify the cause of the issue. For more information, see Test network paths when packet loss or connection failures occur after the ping command is run.If intermittent packet loss occurs, the network connectivity of the instance is unstable and the instance may be infected by viruses. For information about how to troubleshoot the issue, see Use the ping command to test intermittent packet loss of the IP address of an ECS instance.
If ping is enabled in the kernel of the instance but you cannot ping the instance from your on-premises client, the firewall in the instance may be configured to deny traffic from the client.
For information about how to troubleshoot the issue, see Solution to Linux ECS instances that do not disallow ping but cannot ping.
Step 3: Check the ports and security groups of the instance
Check whether the required connection ports are open in the security groups of the instance.
Log on to the ECS console.
In the left-side navigation pane, choose .
In the top navigation bar, select the region and resource group to which the resource belongs.
On the Instance page, click the ID of the instance.
Click the Security Groups tab, find a security group, and then click Manage Rules in the Actions column.
On the Security Group Details tab, use one of the following methods to add a security group rule. For more information, see Add a security group rule.
Method 1: Use the Quick Add feature to add a security group rule
Action: Allow.
Port Range: SSH (22).
Authorization Object: 0.0.0.0/0, which specifies all IP addresses.
Method 2: Manually add a security group rule
Action: Allow.
Priority: 1, which specifies the highest priority. A smaller number specifies a higher priority.
Protocol Type: Custom TCP.
Port Range: SSH (22).
Authorization Object: 0.0.0.0/0, which specifies all IP addresses. You can also specify authorization objects based on your business requirements.
Run the following command to check whether the SSH port is open on the instance:
telnet [$IP] [$Port]
Note[$IP] specifies the IP address of the instance.
[$Port] specifies the SSH port number of the instance.
Sample command:
telnet 192.168.0.1 22
. The following command output indicates that the SSH port is open on the instance:Trying 192.168.0.1 ... Connected to 192.168.0.1. Escape character is '^]'
If the SSH port is closed on the instance, perform the operations in What do I do if I cannot ping the public IP address of an ECS instance? to troubleshoot the issue.
Step 4: Check the CPU load, bandwidth usage, and memory usage of the instance
If you cannot connect to a Linux instance, the instance may have high CPU load, insufficient public bandwidth, or insufficient memory.
Check the CPU load on the instance and perform operations based on the check result.
If the CPU load is high, upgrade the instance type.
If the applications that are hosted on the instance perform large numbers of read/write operations on disks, initiate large numbers of network requests, or generate compute-intensive workloads, the CPU load on the instance becomes high. In this case, we recommend that you upgrade the instance type to resolve resource bottleneck issues. For more information, see Overview of instance configuration changes.
NoteFor information about how to resolve the high-CPU-load issue, see Query and case analysis Linux CPU load.
If the CPU load is not high, proceed to the next step.
Troubleshoot an insufficient public bandwidth issue.
If you cannot connect to a Linux instance, the instance may have insufficient public bandwidth. To troubleshoot the issue, perform the following steps:
Log on to the ECS console.
In the left-side navigation pane, choose .
In the top navigation bar, select the region and resource group to which the resource belongs.
On the Instance page, click the ID of the instance. In the Configuration Information section, view the value of Internet Bandwidth.
If the value is 0 Mbps, the instance does not have public bandwidth. To allocate public bandwidth to the instance, upgrade the public bandwidth configurations. For more information, see the Modify the maximum public bandwidth section in the "Overview of instance configuration changes" topic.
Troubleshoot an insufficient memory issue.
If the desktop is not displayed as expected for the Linux instance and the instance exits without an error message after you connect to the instance, the instance may have insufficient memory. In this case, check the memory usage of the instance. Perform the following steps:
Log on to the instance by using VNC.
For more information, see Connect to an instance by using VNC.
Check the memory usage. If the instance memory is insufficient, we recommend that you upgrade the instance to an instance type that has a larger memory size. For more information, see Overview of instance configuration changes.
An error message is returned
In most cases, an error message is returned when you cannot connect to an instance. You can identify and resolve the issue based on the error message.
PAM framework
The PAM framework in Linux can load the required security modules and implement access control policies, such as account policies and logon policies. If configurations are invalid or relevant policies are triggered, SSH logon may fail. If SSH logon fails, troubleshoot the issue based on the returned error message. For more information, see the following topics:
System environment of the Linux instance
Exceptions, such as virus infection, invalid account configurations, and invalid environment configurations, in the system environment of a Linux instance may also cause SSH logon to fail. If SSH logon fails, troubleshoot the issue based on the returned error message. For more information, see the following topics:
SSH service and parameter settings
The default configuration file of the SSH service is /etc/ssh/sshd_config
. If the parameter settings in the configuration file are invalid or relevant features or policies are enabled in the configuration file, SSH logon may fail. If SSH logon fails, troubleshoot the issue based on the returned error message. For more information, see the following topics:
SSH service-related directories or files
The SSH service checks the permission configurations and groups of relevant directories or files at runtime to ensure security. Improper permissions for the directories or files may cause the failure of the SSH service to run as expected and result in logon failures from clients. If SSH logon fails, troubleshoot the issue based on the returned error message. For more information, see the following topics:
SSH key configurations
SSH uses asymmetric encryption to encrypt data. The client and the server exchange and validate keys for message integrity and encryption. If SSH logon fails, troubleshoot the issue based on the returned error message. For more information, see the following topic: