Re: TCP MSS, MTU and PMTU Discovery

Diego Sousa · ‎02-05-2019

Hi,

I have some doubts regarding these three topics, in fact I arrived at the Cisco explanatory content at the following URL: https://www.cisco.com/c/en/us/support/docs/ip/generic-routing-encapsulation -gre / 25885-pmtud-ipfrag.html.
During the investigation I understood that the MTU is the maximum packet size that can pass through that particular medium, MSS is used in the transport layer to avoid fragmentation in the hosts that participate in the communication and that the PMTUD is used to discover the smallest MTU between two hosts that have established some TCP or UDP communication, but my question is, at what point is the PMTUD actually used? Are the machines enabled by default?

Vishnu Vardhan S · ‎02-05-2019

Hi !! The PMTU expands to path MTU, as the name states, it is used to find the maximum size of the packet which can be sent through the network without being fragmented to maximize the transmission speed.

It works like the following.

If a router is sending a packet with DF bit set to ON in that the packet should not undergo any fragmentation now if the same packet reaches a router with interface having smaller MTU, in that case, it will send a reply to the original source "Fragmentation needed but DF set" with this message the source will reduce the size of packet and retransmit it. It happens multiple times until the MTU matches.

The problem here is the original source will never come to know the exact MTU of the network.

If you are curious to learn more then find the below RFC.

https://www.ietf.org/rfc/rfc2923.txt?number=2923

Please do not hesitate to click the STAR button if you are satisfied with my answer.

Deepak Kumar · ‎02-05-2019

Hi,

I want to quote blog post here

"When a host needs to transmit data out an interface, it references the interface's Maximum Transmission Unit (MTU) to determine how much data it can put into each packet. Ethernet interfaces, for example, have a default MTU of 1500 bytes, not including the Ethernet header or trailer. This means a host needing to send a TCP data stream would typically use the first 20 of these 1500 bytes for the IP header, the next 20 for the TCP header, and as much of the remaining 1460 bytes as necessary for the data payload. Encapsulating data in maximum-size packets like this allows for the least possible consumption of bandwidth by protocol overhead.

Unfortunately, not all links which compose the Internet have the same MTU. The MTU offered by a link may vary depending on the physical media type or configured encapsulation (such as GRE tunneling or IPsec encryption). When a router decides to forward an IPv4 packet out an interface, but determines that the packet size exceeds the interface's MTU, the router must fragment the packet to transmit it as two (or more) individual pieces, each within the link MTU. Fragmentation is expensive both in router resources and in bandwidth utilization; new headers must be generated and attached to each fragment. (In fact, the IPv6 specification removes transit packet fragmentation from router operation entirely, but this discussion will be left for another time.)

To utilize a path in the most efficient manner possible, hosts must find the path MTU; this is the smallest MTU of any link in the path to the distant end. For example, for two hosts communicating across three routed links with independent MTUs of 1500, 800, and 1200 bytes, the smallest (800 bytes) must be assumed by each end host to avoid fragmentation."

http://packetlife.net/blog/2008/aug/18/path-mtu-discovery/

If we will discuss Path MTU then I am remembering "Cloudflare" downtime faced due to Path MTU configuration issue in 2015.

https://blog.cloudflare.com/path-mtu-discovery-in-practice/

Regards,

Deepak Kumar

Regards,
Deepak Kumar,
Don't forget to vote and accept the solution if this comment will help you!

Joseph W. Doherty · ‎02-06-2019

". . . MSS is used in the transport layer to avoid fragmentation in the hosts that participate in the communication . . ."

Yes and no. Generally, when we say "fragmentation", we mean splitting an IP packet. From the sending host, MSS may be larger than what an IP packet can contain, but the host IP packets sent won't be fragmented. The TCP segment, though, would span multiple IP packets, and so you then have the risk of losing an IP packet will lose part of the TCP segment, which, if I remember correctly, will require the retransmission of the segment, i.e. multiple packets. For this reason, TCP generally sets MSS to not exceed what one IP packet can contain, so if a packet is lost, only that IP packet needs to be retransmitted.

(BTW, the impact of losing part of a transmission's lower stack unit isn't limited to just IP and TCP, similar issue with ATM and IP. I.e. lose an ATM cell, and the whole IP packet needs to be resent.)

That said, if you know a certain path supports a smaller MTU, you can set hosts to use a smaller MSS, to avoid the need to fragment IP packets downstream. (This is what the command "IP TCP mss-adjust" is used for.)

". . . PMTUD is used to discover the smallest MTU between two hosts that have established some TCP or UDP communication . . ."

It's not limited to just TCP or UDP, it's an IP packet parameter.

". . . but my question is, at what point is the PMTUD actually used? Are the machines enabled by default?"

It's generally off by default. All PMTUD does is set the DF bit in the IP packet. At any IP hop, when the IP packet cannot be forwarded without it needing fragmentation (at that hop), and DF bit is set, the IP packet is dropped and a message is sent to the sender saying fragmentation is required. (If DF bit is not set, the IP packet is fragmented and forwarded. Oh, and if the fragmentation needed message is blocked, sometimes done by firewalls, that causes the host to keep sending the too large packets, which all get dropped.)

BTW, since networks are dynamic, a path might change to a link whose MTU is smaller than what was okay earlier. Conversely, a path might change such that larger MTUs are supported than earlier. Once a host is notified that packets need fragmentation (NB: older messages only say fragmentation is needed, it's up to the sending host to try different sizes to find what's good. Newer messages specific the max MTU that's supported.), it generally starts a clock after after some time (generally multi minutes) it will try its max MTU again to determine whether a larger or max MTU might be used now. This cycle, of PMTUD, is repeated. As noted above, for TCP, if MSS has been adjusted, the need for PMTUD, for that TCP traffic, shouldn't be needed although, again, remember network paths can change. I.e. even with an adjusted MSS, to avoid a known small path MTU, PMTUD is often still advisable.)

To avoid the PMTUD fragmentation cycle for other than TCP traffic (as it's not limited by MSS), you need to modify the sending host's MTU down. (This may need to be done to avoid fragmentation of UDP video streams, for example.)

Diego Sousa · ‎02-08-2019

Hi, thank you.

I had to do other research to understand better until I find that not all the computers that perform the PMTUD process, it is actually disabled in some operating systems and serves for the intermediate devices that make the interconnect know the smallest MTU of the path, taking into account the TCP MSS that treats gives no fragmentation in the hosts that originate the communication.
But now the question that came up, why do I have two MTU's if both have to be liars?

Joseph W. Doherty · ‎02-08-2019

". . . until I find that not all the computers that perform the PMTUD process . . ."

As my post noted "It's generally off by default." but any IP host should be capable of doing it. I suspect the reason it's generally disabled is due to the extra time it takes to "recover" when a transit hop informs the sending host its packet was too large (especially if it doesn't inform the sending host what the max MTU is for the transit hop).

Regarding the rest of your reply, unclear what you mean by "intermediate devices" and "know the smallest MTU of the path" as intermediate/transit L3 hops only "know" the MTU for their interfaces.

Also unclear how you understand TCP MSS and its relationship to IP fragmentation. Again, sending TCP hosts can set MSS to whatever they want (i.e. up to TCP max MSS, 64 KB), but almost always they do not set MSS larger than the host's max MTU can carry within a single packet.

Lastly, unclear what you mean by who/what are the two MTU liars.

BTW, some hosts, when PMTUD isn't enabled, don't use their host's max MTU when they transmit to destination off their host's local network, instead they limit themselves to a smaller IP packet size (of 576, the general max MTU that's always available).