All Products
Search
Document Center

Elastic Compute Service:Troubleshooting methods for connection issues to Linux instances

Last Updated:Feb 18, 2025

This topic explains how to address the issue of being unable to connect to a Linux Elastic Compute Service (ECS) instance.

Important

Emergency Logon to Linux Instances: In an emergency that requires immediate access to a Linux instance for O&M operations, you can initially log on to the instance using VNC. For more information, see Connecting to Instances Using VNC.

Cause

There are many potential causes for SSH remote logon failures, including the PAM framework, security group settings, and SSH configurations. You can diagnose and resolve the issue according to your specific circumstances.

No specific error message

Use the self-service troubleshooting tool to identify issues

Begin by using the self-service troubleshooting tool to identify issues with your instance. Then, follow the provided guidance to resolve the issue. The steps for using the self-service troubleshooting tool are as follows:

  1. Log on to the ECS console.

  2. In the left-side navigation pane, click Troubleshooting.

  3. In the top navigation bar, select the region and resource group to which the resource belongs. 地域

  4. Under the Instance Troubleshooting tab, select Instance Cannot Connect or Start Abnormally and follow the prompts to enter the details of the instance you want to check. The configuration items are described below:

    • Specific Issue: The description is as follows.

      Issue

      Description

      Workbench Unable to Connect via Private Network

      (Recommended) When using the Workbench tool, you cannot connect to the instance through its private IP address.

      Workbench Unable to Connect via Public Network

      (Recommended) When using the Workbench tool, you cannot connect to the instance through its public IP address.

      SSH Connection Failure

      (Recommended) You cannot connect to the instance using a third-party SSH tool.

      Remote Connection to Instance Unavailable

      Troubleshoot the issue that prevents the instance from being connected remotely.

    • Configuration of the Instance to Be Checked: The description is as follows.

      When selecting Workbench Cannot Connect Over The Private Network, Workbench Cannot Connect Over The Public Network, or SSH Cannot Connect, you must complete the subsequent options.

      Configuration item

      Description

      Example

      VPC

      Select the VPC where the instance is located.

      vpc-bp1******

      Requester VPC

      Set the IP address of the host from which you initiate the SSH connection.

      Note
      • When you select Workbench Cannot Connect Over The Private Network or Workbench Cannot Connect Over The Public Network, the information is automatically populated and does not need to be modified.

      • If you do not know the IP address of your local machine, you can visit https://cip.cc/ to obtain it.

      47.***.***.***

      Destination Instance

      Select the instance to be connected remotely, which is the instance to be checked.

      i-******

      Target Port

      The SSH remote connection port of the destination instance (default is 22).

      22

  5. Click Start Troubleshooting, wait for the system to diagnose the issue, and once the diagnosis is finished, follow the on-screen instructions to resolve the problem.

Manually troubleshoot issues

If you do not receive an error message from the system when a remote connection fails, you can manually troubleshoot the issue by following these steps:

Step 1: Use Alibaba Cloud Workbench to test remote logon

Use Alibaba Cloud Workbench to connect to the instance. If an exception occurs during remote logon, Workbench will return a specific error message and solution. The test steps are as follows:

  1. Log on to the ECS console.

  2. In the left-side navigation pane, choose Instances & Images > Instances.

  3. In the top navigation bar, select the region and resource group to which the resource belongs. 地域

  4. On the instance list page, locate the instance you want to connect to and click Actions in the Remote connection column.

  5. In the pop-up Remote connection dialog box, click Workbench and then click Sign in now.

  6. Verify whether you can connect to the instance.

    Workbench automatically populates the basic information required for logon to the target instance. Confirm that the information is correct, enter the username and authentication information, and proceed based on the following results. For more information on connecting to a Linux instance using Workbench, see Connecting to a Linux Instance Using Workbench.

    • If you still cannot log on, Workbench will return an error message and solution. Follow the system prompts to resolve the issue. After resolving, test the remote logon with Workbench again. For assistance, here are some common exceptions when using Workbench: Issues Connecting to Instances Using VNC

    • If you can log on normally using Workbench, it indicates that the SSH service on the target instance is functioning properly, ruling out the possibility of an SSH server-side issue. Continue to Step 2: Check the Network for further troubleshooting.

Step 2: Check the network

If you cannot connect to a Linux instance, check the network connectivity of the instance.

  1. Attempt to connect to the instance from computers within different CIDR blocks or from different network providers to determine if the issue lies with the local network or the server side.

    • If the issue is related to your local network or your internet service provider, contact your IT department or service provider.

    • If the issue is with a network interface card (NIC) driver, reinstall the driver.

  2. Run the ping command from your local client to test the network connectivity to the instance.

Step 3: Check ports and security groups

Ensure that the necessary connection ports are open in the security groups associated with the instance.

  1. Log on to the ECS console.

  2. In the left-side navigation pane, choose Instances & Images > Instances.

  3. In the top navigation bar, select the region and resource group to which the resource belongs. 地域

  4. On the Instance List page, click the corresponding instance ID.

  5. Under the Security Groups tab, click Manage Rules in the Operation column of the security group.

  6. On the Security Group Rule page, you can select from several methods to add an inbound security group rule. For more information, see adding security group rules.

    • Method 1: Quickly Add a Security Group Rule

      • Authorization Policy: Allow

      • Port Range: SSH (22)

      • Authorization Object: Set this to your local IP address. You can visit https://cip.cc/ to retrieve your local IP address.

    • Method 2: Manually Add a Security Group Rule

      • Authorization Policy: Allow

      • Priority: 1 (the highest priority for security rules, with smaller numbers indicating higher priority)

      • Protocol Type: Custom (TCP)

      • Port Range: SSH (22)

      • Authorization Object: Set this to your local IP address. You can visit https://cip.cc/ to retrieve your local IP address.

  7. Execute the following command to verify that the port is functioning as expected.

    telnet [$IP] [$Port]
    Note
    • [$IP] represents the IP address of the Linux instance.

    • [$Port] represents the RDP port number of the Linux instance.

    The system should display a message similar to the following. For example, if you run the telnet 192.168.0.1 22 command, the expected result is similar to the following.

    Trying 192.168.0.1 ...
    Connected to 192.168.0.1.
    Escape character is '^]'

    If the port test fails, refer to Port Availability Detection when the Ping Command Works but the Port is Unavailable for troubleshooting.

Step 4: Check CPU load, bandwidth, and memory usage

If you are unable to connect to a Linux instance, it may be due to high CPU load, insufficient public bandwidth, or low memory.

  1. Examine the CPU load on the instance and take action based on the results.

    • In case of high CPU load.

      When applications hosted on the instance frequently perform disk read/write operations, initiate numerous network requests, or generate compute-intensive workloads, a high CPU load is anticipated. Upgrading the instance type can alleviate resource bottlenecks. For more information, see an overview of upgrade and downgrade methods.

      Note

      For additional guidance on addressing high CPU load, see Query and Case Analysis of CPU Load in Linux.

    • If the CPU load is not high, proceed to the next step.

  2. Address insufficient public bandwidth issues.

    If you are unable to connect to a Linux instance, it may be due to insufficient public bandwidth. To troubleshoot, follow these steps:

    1. Log on to the ECS console.

    2. In the left-side navigation pane, choose Instances & Images > Instances.

    3. In the top navigation bar, select the region and resource group to which the resource belongs. 地域

    4. On the Instance List page, click the corresponding instance ID. Then, in the Configuration Information section, check the Public Bandwidth details.

      If the bandwidth is 0 Mbps, it indicates that no public bandwidth was allocated when the instance was created. You can upgrade the bandwidth to resolve the issue. For more information, see Modifying the Peak Public Bandwidth.

  3. Investigate insufficient memory issues.

    If the desktop environment does not display correctly for the Linux instance or the instance exits unexpectedly after you connect, the instance may have low memory. Check the memory usage of the instance as follows:

    1. Log on to the Linux instance using VNC.

      For detailed instructions, see Logging on to a Linux Instance Using Password Authentication.

    2. Monitor the memory usage. If the instance has insufficient memory, consider upgrading the instance type to address the resource bottleneck. For more information, see Overview of Upgrade and Downgrade Methods.

Specific error message exists

Error messages typically appear when a connection failure occurs. These messages can help identify and resolve the underlying issue.

PAM security framework

The PAM framework in Linux enforces access control policies, such as account and logon policies, by loading appropriate security modules. Invalid configurations or triggered policies can result in SSH logon failures. To troubleshoot, refer to the error message and consider the following scenarios:

Linux instance system environment configuration

System environment issues in a Linux instance, such as virus infections, invalid account configurations, and environment misconfigurations, can also lead to SSH logon failures. Use the error message to guide your troubleshooting, considering these examples:

SSH service and parameter configuration

The default SSH service configuration file is /etc/ssh/sshd_config. Invalid parameter settings or enabled features and policies in this file can prevent successful SSH logon. Address these issues by referring to the error message and the following cases:

SSH service directory or file configuration

The SSH service verifies directory and file permissions and groups at runtime for security. Incorrect permissions can disrupt the SSH service and prevent client logon. Resolve these issues by referring to the error message and the following scenarios:

  • If you encounter the error message "No supported key exchange algorithms" when using the SSH command to log on to a Linux instance, refer to this document for troubleshooting steps.

  • If the SSH service startup fails with the error message "must be owned by root and not group or world-writable," it indicates a permissions issue.

SSH service key configuration

SSH relies on asymmetric key encryption for secure data exchange. The client and server must validate keys to ensure message integrity and encryption. If you encounter key-related issues, refer to the following case:

If you encounter the error message "Host key verification failed" when attempting to log on to an ECS instance via SSH, refer to this troubleshooting guide.