Path MTU Discovery (PMTUD) is a feature provided by the TCP/IP stack to dynamically discover the path maximum transmission unit (PMTU) between two hosts. The PMTU of a network path is the smallest MTU along the path and the largest packet size that does not require fragmentation anywhere on the path. If a router receives a packet that is larger than the MTU of a link along a network path, the router drops the packet and sends an Internet Control Message Protocol (ICMP) error message that instructs the sending host to reduce the packet size. This helps avoid packet fragmentation during transmission. PMTUD helps significantly improve network performance and avoid packet drop that occurs when packets are too large.
Impacts of PMTUD on network performance
PMTUD dynamically discovers the PMTU on a network path based on the interactions between the sending host and the network devices and limits the size of packets transmitted along the network path to the PMTU. This process helps improve network efficiency and reliability and reduces potential issues caused by packet fragmentation. PMTUD provides the following benefits:
Avoidance of IP fragmentation: PMTUD ensures that the size of each packet transmitted along a network path is not greater than the MTU of any hop on the path. This avoids IP fragmentation. IP fragmentation places additional processing burdens on network devices and may cause packet loss and increase latency. For information about how to avoid IP fragmentation on TCP connections, see the Avoid IP fragmentation on TCP connections section of this topic.
Improved network efficiency: PMTUD helps determine the optimal packet size to reduce network congestion and improve data transmission efficiency. Larger packet sizes can reduce header overhead and increase throughput, as long as the packet sizes do not exceed the maximum packet size supported by all devices on the network path.
Reduced packet loss and latency: IP fragmentation involves breaking a packet into a number of fragments that can be reassembled at a later time. If a single fragment is lost, all fragments must be retransmitted. PMTUD helps avoid IP fragmentation to reduce packet loss and latency.
Automatic adjustment of packet sizes: PMTUD allows a host that sends a packet to dynamically resize the packet based on the PMTU between the host and the receiving host without the need to learn about the network configuration beforehand.
Improved network robustness: PMTUD allows networks to adapt to various link types and configurations, which improves network robustness and flexibility.
Components and systems that support PMTUD
Operating systems: Many modern operating systems, including but not limited to Linux and Windows operating systems, support PMTUD. In most cases, Alibaba Cloud Elastic Compute Service (ECS) instances support PMTUD.
Network devices: Specific advanced network devices, such as routers and switches, may have the PMTUD feature built in and perform PMTUD when packets pass through. Not all network devices support PMTUD. When a network device that does not support PMTUD receives an IP packet from a host, the network device returns an ICMP message that may not include the MTU of the network device to the host if the packet is larger than the MTU of the network device and the Don't Fragment (DF) flag in the packet is set to 1.
Middleware or libraries: In specific programming environments, specialized libraries or middleware may be provided to perform PMTUD.
Applications: Specific network applications may implement their own PMTUD logic to dynamically resize packets when the applications send the packets.
Network protocols: PMTUD is suitable for IPv4 and IPv6 transport layer protocols, including but not limited to TCP, UDP, and ICMP.
Cloud service providers: Alibaba Cloud network forwarding components perform PMTUD in compliance with RFC standards to ensure network connectivity. For more information, see the How PMTUD works section of this topic.
How PMTUD works
The DF flag in IP packets plays an important role in PMTUD. The DF flag can be set to 1 to indicate that IP packets must not be fragmented. PMTUD relies on ICMP messages to dynamically discover the PMTU for the network path between two hosts and limit the packet size on the network path to the PMTU. This avoids packet fragmentation during transmission. PMTUD involves the following steps:
The DF flag is configured on a sending host. When a host sends an IP packet, the host sets the DF flag in the IP header of the packet to 1, which specifies that the packet is not allowed to be fragmented on the network path.
The MTU limit is hit. When the packet in which the DF flag is set to 1 arrives at a network device, the network device drops the packet if the packet size exceeds the MTU of the network device or the link.
In IPv4, network devices can determine whether to perform PMTUD based on the DF flag. If a packet in which the DF flag is set to 0 arrives at a network device that supports fragmentation and the packet size exceeds the MTU of the network device, the network device may fragment the packet instead of dropping the packet.
NoteSpecific forwarding components in Alibaba Cloud do not support fragmentation, such as edge gateways in cross-domain or cross-region communication scenarios. The forwarding components drop oversized packets, instead of fragmenting them, and then send ICMP error messages even if the DF flag in the packets is set to 0.
In IPv6, PMTUD is mandatory and all hosts and routers must support PMTUD. The DF flag in IPv6 packets is implicitly configured. All IPv6 packets are considered to have the DF flag set to 1.
An ICMP error message is sent. If the network device that drops the packet supports PMTUD, the network device sends an ICMP error message to notify the sending host of the MTU issue. The ICMP error message includes the MTU of the network device, which is the PMTU between the sending host and the network device.
In IPv4, the network device sends the following ICMP error message to the sending host: Destination Unreachable: Fragmentation Needed and Don't Fragment was Set (Type 3, Code 4).
In IPv6, the network device sends the following ICMP error message to the sending host: ICMPv6 Packet Too Big (PTB) (Type 2, Code 0).
PMTUD is performed and the ICMP error message is processed. After the sending host receives the ICMP error message, the host parses out and caches the PMTU from the message. Then, the following operations are performed based on the PMTU:
By default, the operating system kernel of the sending host fragments the packet based on the PMTU and resends the packet fragments.
An application resizes the packet and processes the ICMP error message. When an application that supports PMTUD receives the ICMP error message, the application resizes the packet based on the MTU included in the ICMP error message and then resends the resized packet. This approach achieves better network performance by avoiding fragmentation, but requires modifications to applications in most cases.
NoteFor TCP connections, PMTUD outcome may affect the maximum segment size (MSS) at the TCP layer. The TCP layer adjusts the MSS based on the PMTU discovered by PMTUD and uses the new MSS to send subsequent packets, which avoids fragmentation of TCP segments. For more information, see the Dynamically adapt MSS to PMTU during data transmission section of this topic.
The PMTU is cached. The sending host generates a route cache entry in the route table of its operating system, which includes the PMTU for the destination IP address for which the packet is destined. The sending host limits the size of each subsequent packet destined for the same IP address to the PMTU to ensure that the packet is not fragmented on the network path.
The PMTU is updated on a regular basis. The PMTU for a network path is not permanent. When the network path changes or the route cache in the operating system of the sending host ages, the host reperforms PMTUD and updates the PMTU for the network path.
Use PMTUD
Enable PMTUD
Make sure that all devices in a network support PMTUD and PMTUD is enabled for the devices. For example, you can enable PMTUD for a Linux device by writing a zero to the /proc/sys/net/ipv4/ip_no_pmtu_disc file. In most cases, Alibaba Cloud ECS instances support PMTUD.
Old kernel versions or specific types of operating systems may not support PMTUD. If the devices on a network path run a kernel version or operating system that does not support PMTUD, the devices cannot properly process ICMP messages during the PMTUD process. As a result, PMTUD fails to dynamically discover the PMTU for the network path.
If a network device on a network path does not support PMTUD, you may need to manually determine the PMTU for the network path. For example, you can run the ping command to discover the PMTU. After you discover the PMTU for the network path, you can adjust the packet sizes on the sending host based on the PMTU or change the MTU of the network device. For example, you can change the MTU of a network interface (NIC).
Make sure that the receiving host can receive ICMP error messages
Check and configure firewalls and other security devices to allow required ICMP messages to pass through. In specific network environments, firewalls or other security devices may filter out ICMP messages, especially the following ICMP messages: Destination Unreachable: Fragmentation Needed and Don't Fragment was Set (Type 3, Code 4). As a result, the PMTUD process may be blocked.
Configure security groups to allow ICMP traffic. ECS instances can receive ICMP negotiation packets sent by different forwarding components during the PMTUD process only if the security groups to which the instances belong allow ICMP traffic. For more information, see the Security group rules for controlling access to ECS instances by using specific protocols section of the "Security groups for different use cases" topic.
Check whether network traffic reaches the throttling threshold. If network traffic reaches the throttling threshold, ICMP messages may be dropped. Check whether network traffic reaches the throttling threshold.
For Linux instances, perform the operations described in How do I query and analyze the network traffic loads of a Linux instance?
For Windows instances, use the Task Manager to check the network usage.
Modify applications to respond to ICMP messages
Modify applications to respond to ICMP error messages during the PMTUD process and reduce packet sizes based on PMTUs.
If an application is not adapted to PMTUD or the application does not adapt its packet size to the discovered PMTU after the application receives an ICMP error message during the PMTUD process, the operating system kernel may fragment packets sent by the application, which may result in packet transmission failure.
To resolve the issue, you can specify a smaller MSS in the application to adapt to the PMTU.
Avoid IP fragmentation on TCP connections
TCP is a connection-oriented, reliable transport layer protocol designed to ensure the integrity and order of data. IP fragmentation may lead to fragment loss or reassembly errors and affect the reliability of TCP transmission. In most cases, MSS negotiation and PMTUD help TCP connections avoid IP fragmentation and improve network performance and reliability.
Avoid IP fragmentation by using MSS negotiation mechanism during connection establishment
MSS is a TCP parameter that specifies the largest amount of data that can be transmitted in a TCP segment, excluding the TCP header.
MSS negotiation happens in the TCP/IP stack and does not require direct intervention from users or applications. MSS negotiation ensures that the size of data segments is acceptable to the network environments on the sending host and the receiving host, which avoids fragmentation caused by overly large packets or inefficiency caused by overly small packets.
The MSS is negotiated during the TCP three-way handshake when a TCP connection is established between two hosts. The sending host sends the MSS option in a SYN packet to the receiving host. The value of the MSS option is calculated by subtracting the TCP header length from the MTU of the sending host. After the receiving host receives the SYN packet, the receiving host determines an appropriate MSS value based on the MTU of the receiving host and returns the MSS in a SYN-ACK packet to the sending host. This way, the hosts agree on an MSS value that is used for the TCP connection to eliminate inefficiency and retransmission caused by IP fragmentation.
Dynamically adapt MSS to PMTU during data transmission
MSS negotiation does not guarantee that the MTUs of all network devices, such as routers, along a network path are greater than or equal to the negotiated MSS value. If the MTU of a network device on the network path is smaller than the size of a TCP packet, IP fragmentation may still occur even if the MSS is negotiated. Connectionless protocols, such as UDP and ICMP, do not have an MSS negotiation mechanism. In this case, PMTUD is crucial. PMTUD allows both ends of a network path to dynamically discover the PMTU for the network path. Applications can be adapted to respond to ICMP error messages and adjust the MSS based on the discovered PMTU to avoid fragmentation.