Today I took a support call from a customer who was having strange issues with their TDS Cable modem service.  Http pages would load fine but https pages had broken images, broken style sheets or would not load at all.  Itunes was flaky, apps using https were flaky.  It seemed to be many different problems and then I remember the first time I implemented PPPoE in my first WISP, Wickson WIreless.  Same issue and it was MSS or Maximum Segment Size and MTU.  I will leave the discussion of PPPoE out of this post but here is how the issue occurs.

A post from Stretch at packetlife.net does a reasonable job of explaining the issue, I quote: ”

When a host needs to transmit data out an interface, it references the interface’s Maximum Transmission Unit (MTU) to determine how much data it can put into each packet. Ethernet interfaces, for example, have a default MTU of 1500 bytes, not including the Ethernet header or trailer. This means a host needing to send a TCP data stream would typically use the first 20 of these 1500 bytes for the IP header, the next 20 for the TCP header, and as much of the remaining 1460 bytes as necessary for the data payload. Encapsulating data in maximum-size packets like this allows for the least possible consumption of bandwidth by protocol overhead.

Unfortunately, not all links which compose the Internet have the same MTU. The MTU offered by a link may vary depending on the physical media type or configured encapsulation (such as GRE tunneling or IPsec encryption). When a router decides to forward an IPv4 packet out an interface, but determines that the packet size exceeds the interface’s MTU, the router must fragment the packet to transmit it as two (or more) individual pieces, each within the link MTU. Fragmentation is expensive both in router resources and in bandwidth utilization; new headers must be generated and attached to each fragment.”

Fortunately, the internet has a remedy for this problem in RFC 1911 and it is called Path MTU Discovery or PMTU.  The RFC explains the process in detail but let it suffice to say it is not perfect because of the way hosts on the internet behave.  Stateless load balancers for example are unable or unwilling to respond properly to the ICMP messages generated by Path MTU Discovery, thereby breaking the mechanism.  Guess where this technology and effect is seen most often?  You guessed it,  popular sites like  banking, secure sites, community.ubnt.com for example, basically any site that pulls URL’s from other sites, runs SSL or uses load balancing technology.  I am sure there are many more examples but you get the idea; if images are broken, random https sites won’t load and apps behave strangely or erratic, it is likely broken Path MTU Discovery.

Now, the most important thing, how do we fix?  I started by calling the provider.  I was escalated to an engineer who first blamed my router and modem (surprise) and then after much clicking of keys (maybe he was playing a game) he said he didn’t know how to fix it or if there even was a problem.  Thank goodness there was  MikroTik router on the client end of the link.  MikroTik (and Linux) have a feature called “Clamp to pmtu” in mangle.  This feature dynamically changes the MTU settings to match the smallest MTU from point to point and thereby prevents fragmentation and the weirdness I previously described.

Here is my fix:

In Winbox, click IP Firewall Mangle and create a new rule for packets leaving the WAN interface as follows:

 

On the Advanced tab set the matcher to match Syn packets:

On the action tab, set the action as follows:

Repeat for a second rule to match packets coming in the WAN interface as follows:

The end result is this:

If you follow this step by step and insert your WAN interface for ether1, you will fix the problem and be a hero.  Good luck diagnosing and solving PMTU problems!