how to test network paths when packet loss or connection failures occur after the ping command is run -

If packet loss occurs or a network connection cannot be established when you run the ping command to test the network connectivity between a client and a server or Server Load Balancer (SLB) instance, you can use testing tools to test the network path and diagnose the issue. This topic describes how testing tools work, how to use the tools, how to test a network path, and how to analyze the test results.

Procedure for testing a network path

The following figure describes the procedure for testing a network path. 链路测试流程图

Steps in the flowchart:

Obtain the public IP address of your local network
Access an IP address search website such as IP Search by using your local network to obtain the public IP address of your local network.
Test the network path from the client to the server by using MTR and the ping command
Use My Traceroute (MTR) and the ping command to test the network path from the client to the server.
- Ping test: Repeatedly ping the domain name or IP address of the server from the client. We recommend that you send at least 100 packets to ping the server, and record the test results.
- MTR test: Use WinMTR if the client runs Windows or run an mtr command if the client runs Linux to perform an MTR test on the domain name or IP address of the server and record the test results.
Test the network path from the server to the client by using MTR and the ping command
Log on to the server and test the network path from the server to the client by using MTR and the ping command.
- Ping test: Repeatedly ping the IP address of the client from the server. We recommend that you send at least 100 packets to ping the client, and record the test results.
- MTR test: Use WinMTR if the server runs Windows or run an mtr command if the server runs Linux to perform an MTR test on the IP address of the client and record the test results.
Analyze the test results
Analyze the test results. For more information, see Use MTR and analyze the MTR test results. After the problematic node is detected, visit an IP address search website such as IP Search to view the carrier and network to which the node belongs and solve the issue.
- If the problematic node resides in the local network of the client, check the local network to troubleshoot the issue.
- If the problematic node belongs to a carrier, contact the carrier or Alibaba Cloud after-sales technical support.

Test the network path

Linux operating system

MTR is a network diagnostic tool that combines traceroute and ping to check the health of a network path. MTR continuously probes all nodes of the path and provides the detection results. This means that MTR works in a way that prevents node changes from affecting the test results. To obtain more accurate results, we recommend that you use MTR.

MTR (recommended)

Install MTR

sudo yum install mtr

Use MTR

mtr command syntax:

mtr [-hvrctglspni46] [-help] [-version] [-report] [-report-cycles=COUNT] [-curses] [-gtk] [-raw] [-split] [-no-dns] [-address interface] [-psize=bytes/-s bytes] [-interval=SECONDS] HOSTNAME [PACKETSIZE]

The following table describes common optional parameters. You can run the man mtr command to view the description of more parameters.

Optional parameter	Description
`-r` or `-report`	Puts MTR into report mode.
`-p` or `-split`	Configures MTR to display output in a format that is suitable for a split-user interface.
`-s` or `-psize`	Specifies the size of each ping packet.
`-n` or `-no-dns`	Configures MTR to display numeric IP addresses and not to resolve the host names.
`-a` or `-address`	Configures source IP addresses for packets. Note This parameter is used in scenarios in which a host has multiple IP addresses.
-4	Uses IPv4 only.
-6	Uses IPv6 only.

You can also use parameters in the mtr command to quickly switch modes. The following table describes the parameters.

Parameter	Description
`?` or `h`	Displays a summary of command line argument options.
`d`	Switches the display mode.
`n`	Enables or disables DNS resolution.
`u`	Uses Internet Control Message Protocol (ICMP) packets or User Datagram Protocol (UDP) packets for probing.

Sample mtr command output

For example, if the mtr <Destination IP address> command is run, the following command output is displayed.

The following table describes the parameters in the command output based on the default configurations.

Parameter	Description
Host	The IP address or domain name of the node. You can press the `n` key to switch the display mode.
Loss%	The packet loss rate of the node.
Snt	The number of packets that are sent. Default value: 10. The value can be specified by using the `-c` parameter.
Last	The latency of the last probe.
Avg	The average latency of all probes.
Best	The lowest latency of all probes.
Wrst	The highest latency of all probes.
StDev	The standard deviation of the latency on the node. A larger value indicates a larger difference between the response times for data packets on the node.

traceroute

Install traceroute

sudo yum install traceroute

Use traceroute

Sample traceroute command syntax:

traceroute [-I] [ -m Max_ttl ] [ -n ] [ -p Port ] [ -q Nqueries ] [ -r ] [ -s SRC_Addr ] [ -t TypeOfService ] [ -f flow ] [ -v ] [ -w WaitTime ] Host [ PacketSize ]

The following table describes common optional parameters. You can run the man traceroute command to view the description of more parameters.

Optional parameter	Description
-d	Enables socket-level debugging.
-f	Sets the first time-to-live (TTL) of the packet.
-F	Disables fragmentation.
-g	Specifies one or more source route gateways. Up to 8 gateways can be specified.
-i	Uses the specified network interface controller (NIC) to send packets when the host has multiple NICs.
-I	Uses ICMP packets instead of UDP packets for probing.
-m	Sets the maximum TTL of the packet.
-n	Uses IP addresses instead of hostnames to prevent reverse Domain Name System (DNS) lookups.
-p	Sets a port for the UDP communication protocol.
-r	Ignores the common route tables and sends packets to the destination host.
-s	Sets the IP address of the on-premises host from which packets are sent.
-t	Sets the type-of-service (TOS) in probes.
-v	Displays execution results in detail.
-w	Sets the timeout period for each reply.
-x	Toggles IP checksums.

Sample traceroute command output

For example, if you run the traceroute -I <Destination IP address> command, the following command output is displayed.

Windows operating system

WinMTR is a visual Windows MTR application that provides simplified functionalities. WinMTR supports only specific MTR parameters. By default, WinMTR sends ICMP packets for probing. This configuration cannot be changed. Compared with tracert, WinMTR can isolate the impacts of node fluctuations from test results and provide more accurate test results. We recommend that you use WinMTR whenever possible.

WinMTR (recommended)

Install and use WinMTR

The downloaded WinMTR package does not need to be installed. Decompress the package, and then start WinMTR.

Download WinMTR from the WinMTR official website.
Decompress the WinMTR package and double-click WinMTR.exe to start WinMTR.

In the WinMTR window, enter the IP address or domain name of the destination server in the Host field.

Important

The specified IP address or domain name cannot contain spaces.

You can configure the other parameters. The following table describes the parameters.

Parameter	Description
Copy Text to clipboard	Copies the test results to the clipboard as text.
Copy HTML to clipboard	Copies the test results in the HTML format to the clipboard.
Export TEXT	Exports the test results to a specified file in the text format.
Export HTML	Exports the test results to a specified file in the HTML format.
Options	The optional parameters, including: Interval (sec): the interval between probes. Default value: 1. Unit: seconds. ping size(bytes): the size of each packet used for ping probes. Default value: 64. Unit: bytes. Max hosts in LRU list: the maximum number of hosts on the least recently used (LRU) list. Default value: 128. Resolve names: indicate that the domain names of nodes are displayed instead of the IP addresses of the nodes.

Click Start to perform a test.
After the test is started, Start changes to Stop and WinMTR displays the test results.
Wait for WinMTR to run for a period of time and click Stop to stop the test.

Sample test results that are returned by WinMTR

For example, if the domain name of the destination server is used in WinMTR to perform a test, the following test results are displayed.

测试进行中

The following table describes the parameters in the test results based on the default configurations.

Parameter	Description
Hostname	The IP address or domain name of the node.
Nr	The number of the node.
Loss%	The packet loss rate of the node.
Sent	The number of packets that are sent.
Recv	The number of packets that are received.
Best	The lowest latency of all probes.
Avg	The average latency of all probes.
Worst	The highest latency of all probes.
Last	The latency of the last probe.
StDev	The standard deviation of the latency on the node. A larger value indicates a larger difference between the response times for data packets on the node.

tracert

tracert is a network diagnostic command-line program that comes with Windows and traces the path that IP packets take when they are sent to an address.

Use tracert

Run the tracert command in PowerShell or Command Prompt.

tracert [-d] [-h maximum_hops] [-j host-list] [-w timeout] [-R] [-S srcaddr] [-4] [-6] target_name

Parameter	Description
-d	Prevents tracert from resolving IP addresses to hostnames. This means that DNS reverse lookup is disabled.
-h	Specifies the maximum number of hops to search for the destination IP address.
-j	Specifies a loose source route along the host-list.
-w	Specifies the amount of time in milliseconds to wait for each reply.
-R	Traces the round-trip path (IPv6-only).
-S	Specifies the source address to use (IPv6-only).
-4	Forces using IPv4.
-6	Forces using IPv6.
target_host	Specifies the destination host name or IP address.

Sample traceroute command output

For example, if you run the tracert -d <Destination IP address> command, the following command output is displayed.

C:\> tracert -d 192.168.0.1
Trace a route to server 192.168.0.1, over a maximum of 30 hops.
1 The request timed out. 
2 9 ms 3 ms 12 ms 192.168.XXX.XXX
3 4 ms 9 ms 2 ms 10.20.XXX.XXX
4 9 ms 2 ms 1 ms 10.35.XXX.XXX
5 11 ms 211.24.X.XX
6 3 ms 2 ms 2 ms 200.12.XXX.XXX
7 2 ms 2 ms 1 ms 42.28.XXX.XXX
8 32 ms 4 ms 3 ms 42.25.2XX.2XX
9 The request timed out. 
10 3 ms 2 ms 2 ms Trace a route to server 192.168.0.1, over a maximum of 30 hops.

The tracing operation is complete.

Description of network path test results

The mtr command provides higher accuracy. This section provides an analysis of the test results that are returned by the mtr command. The following figure shows a sample mtr command output.

Networks

In most cases, the network path from a client to a destination server passes through the following networks:

Local network of the client
The local network of the client consists of a LAN and the networks of local carriers, and comprises two or three nodes, as shown in section A in the preceding figure. If an exception occurs on a node in the LAN, check the LAN and troubleshoot the exception. If an exception occurs in the network of a local carrier, report the exception to the local carrier.
Carrier networks
The network path passes through the backbone networks of multiple carriers that comprise multiple nodes, as shown in section B in the preceding figure. If an exception occurs on a node in carrier networks, you can query the IP address of the node to identify the carrier to which the IP address belongs, and then contact the carrier or Alibaba Cloud after-sales technical support for troubleshooting.
Local network of the destination server
The destination server resides in the network of a carrier that comprises two or three nodes before the IP address of the destination server on the path, as shown in section C in the preceding figure. If an exception occurs on a node in the local network of the destination server, report the exception to the local carrier.

Path load balancing

If the loads of some intermediate nodes on the network path are balanced as shown in section D in the preceding figure, the mtr command numbers, probes, and collects MTR data of only the start node and end node. For the other nodes, the command output indicates only the IP address or domain name information of each node.

Avg and StDev

Due to factors such as link jitters, the Wrst value and Best value of a node may differ greatly. Avg indicates the average latency of all probes after the MTR test starts, and reflects the network quality of a node. A larger StDev value indicates a larger difference between the latencies of data packets on a node and that data packets are more discrete on the node. StDev can be used to determine whether the Avg value reflects the network quality of a node. For example, if the StDev value is large, the latency of packets is unknown. Some packets may be sent at a low latency, such as 25 ms, and other packets may be sent at a high latency, such as 350 ms. In this case, the final Avg value may be in the normal range. In this scenario, the Avg value does not reflect the actual network quality.

We recommend that you analyze the Avg value and StDev value based on the following items:

If the StDev value of a node is large, check the Best value and Wrst value of the node to determine whether a network issue occurred on the node.
If the StDev value of a node is not large, determine whether the node experiences a network issue based on the Avg value of the node.
Note
No time range standards can be used to determine whether the StDev value of a node is large or not. You can determine whether the StDev value of a node is large based on the other latency values of the node. For example, if the Avg value is 30 ms and the StDev value is 25 ms, the StDev value is determined to be large. If the Avg value is 325 ms and the StDev value is 25 ms, the StDev is determined to be not large.

Loss%

If the packet loss rate (Loss%) of a node is not zero, a network exception may occur at this hop. Packet loss may occur on a node due to the following reasons:

The ICMP transmission rate of the node is limited by the carrier for security purposes or performance reasons.
An exception occurred in the node.
Check whether packet loss occurred on the subsequent nodes to identify the cause of packet loss.
If no packet loss occurred on the subsequent nodes, the packet loss on the node is caused by the ICMP throttling policy of the carrier, as shown in the second hop in the preceding figure. You can ignore this packet loss issue.
If packet loss occurred on all subsequent nodes, a network exception occurred in the node and caused packet loss, as shown in the fifth hop in the preceding figure.
If packet loss occurred only on some of the subsequent nodes, the ICMP transmission rate of the node is limited by the carrier and a network exception occurred in the node. In this case, if packet loss repeatedly occurred on the node and the subsequent nodes and the packet loss rate of each node is different, the packet loss rate of the last several hops takes precedence. Packet loss occurred at the fifth hop, sixth hop, and seventh hop, and the packet loss rate of the seventh hop is 40%, as shown in the preceding figure. The packet loss rate of the seventh hop is used for reference.

Latency

Latency spike at hops
If latency spikes after a hop, a network exception occurs in the node at the hop. Latency spiked after the fifth hop, as shown in the preceding figure. A network exception occurred in the node at the hop. A high latency does not indicate that an exception occurs on a node. Even though latency spiked after the fifth hop, the test data reached the destination host, as shown in the preceding figure. A high latency may also occur for the response path. We recommend that you perform a reverse path test to analyze the issue.
Latency increase caused by ICMP throttling
ICMP throttling on a node may also cause a latency spike on the node but does not affect the latency on the subsequent nodes. The packet loss rate at the third hop reaches 100% and a latency spike occurred at the hop, as shown in the preceding figure. The latency on the subsequent nodes immediately decreased to the normal level. You can determine that the latency spike and the packet loss on the node are caused by ICMP throttling.

Common scenarios for path exceptions

Note

This section describes common scenarios in which exceptions occur on network paths. In the examples, the mtr command is used. The test results that are returned vary based on the operating system and test tool.

Misconfigured network of the destination host

In this example, all packets are lost at the end of data transmission, as shown in the preceding figure. ICMP may be disabled in the security policies of the destination server, such as firewall policies and iptables policies. As a result, the destination host cannot send responses, and packets cannot reach the destination IP address. You must check the security policies of the destination server.

ICMP throttling

In this example, all packets are lost at the end of data transmission, as shown in the preceding figure. ICMP may be disabled in the security policies of the destination server, such as firewall policies, iptables policies, or throttling policies of carriers. As a result, the destination host cannot send responses, and packets cannot reach the destination IP address. You must check the security policies of the destination server or perform a reverse path test to analyze the issue.

Loop

In this example, a routing loop occurred after packets passed through the fifth hop, and packets could not reach the destination server, as shown in the preceding figure. In most cases, the routing loop is caused by an exception in the route configuration of a carrier node. You must contact the carrier to resolve the issue.

Node interruption

In this example, packets cannot receive feedback after they pass through the fourth hop. In most cases, this issue occurs due to the interruption of the node at the hop. We recommend that you perform a reverse path test to check the issue. You must contact the carrier to which the node belongs.