cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
823
Views
0
Helpful
1
Replies

Variable MTU/MSS Across Site-To-Site VPN w/ ASAs?

stelker77
Level 1
Level 1

Hey all,

 

Thanks in advance for reading this post. I'll try to keep it short but brevity isn't my strong suit.

 

My company is setting up a new virtual server environment for another company. We have noticed there are some issues with the site-to-site VPN thats connecting two of the virtualization platforms together.

 

In short, throughput is less than expected. We can get an upload speed of 25-30 mbps at both sites when using a speedtest site such as fast.com. But across the VPN, iperf is measuring average upload speeds of 5-10 mbps when initiated on a VM from site 01 to site 02, and 10-15 mbps when initiated on a VM from site 02 to site 01. Sometimes the transfer rate falls to around 2 mbps (01 -> 02) or even bottoms out at 0 mbps. Both sites have between 200-300 mbps download speed.

 

I did some digging today and after reading about how detrimental incorrect MTU and MSS settings could be for VPNs, I did some testing by pinging from a VM at site 01 to site 02 and vice versa. By setting the payload size on the ping and setting or not setting the Don't Fragment bit, my goal was to measure the MTU/MSS across the S2S VPN. I found something interesting.

 

It seems that a payload size of 1398 bytes for ping packets is always allowed across the S2S VPN without requiring fragmentation. However, 1400 bytes and up is a crapshoot. Sometimes I can transmit 1400 bytes with DF on, and sometimes I can't. I did notice that if 1400 byte pings are being allowed, and I try something like a 1420 byte ping, it will immediately kill the 1400 byte ping that is working and I get reponses saying that the packet needs to be fragmented. But 1402, 1404 and 1406 byte pings were working fine just before the 1420 byte ping failed.

 

The only theory I have is that sometimes the S2S tunnel is being established with different parameters than other times. I am not able to remotely access the customer's ASAs so I can't gather any live data from that perspective. Some packet captures I gathered previously on-site do appear to indicate that the TCP MSS is being "swapped" to 1380 bytes by the ASA, although I'm not 100% sure that's happening for TCP handshakes in both directions.

 

I am thinking my test traffic indicates that a lower than default MSS (1380) is required. If I successfully send an ICMP payload of 1398 bytes with the DF bit set, that packet has an IP header but not a TCP header, which should typically be 20 bytes. So, that would correspond to an MSS of 1378 bytes. Once I get to 1400 bytes w/ ICMP, I *sometimes* need fragmentation, so a TCP packet with the default 1380 byte MSS would also require fragmentation, due to just 2 extra bytes. Is this line of thinking correct?

 

I did notice there are IKEv2 settings for the tunnel even though I thought the IKEv1 settings were the ones being used. I am 95% sure that the IKEv2 checkbox is unchecked for both sides of the tunnel in ASDM.

 

I'm thinking about just lowering the MSS by a couple of bytes at a time and testing throughput over the course of a few hours until we find the sweet spot. I do wish there was a way to set the MSS only on S2S VPN traffic (is there?). At the end of the day, if we lose 20 bytes per TCP packet, it shouldn't be that consequential, especially considering that we stand to actually gain performance.

 

I tried setting the MSS in iPerf to do some throughput testing but it never seems to be reflected in the packets that are sent (testing on Windows rather than Linux).

 

Thoughts?

1 Reply 1

stelker77
Level 1
Level 1

Quick update:

It seems the Windows ping utility or Windows itself has a 'memory' of some sort. Once I pass the fragmentation threshold, and get an actual reply from the router (ASA) saying that it needs to fragment my packet and can't, it turns out that ping.exe won't send subsequent pings with a payload of greater than 1398 or 1400 bytes for a moment. I am getting messages saying fragmentation is needed in the CLI, but there's no real traffic on the wire. Once I go back to 1398 bytes as a payload size, the pings are actually put on the wire again. So my "constantly shifting MSS" may be a red herring. Will continue looking into it.

Review Cisco Networking for a $25 gift card