Re: Windows access over PIX-PIX VPN Tunnel and fragmented packet

michael.dolan · ‎05-06-2004

Our company have a VPN tunnel between two sites, Site A and Site B.

Site A is the main office and is using a PIX515E U/R. Site B is using a PIX506E.

Site A has a leased line connection to the internet (1MB) and Site B has an ADSL connection to the internet.

The tunnel is up and operational, however we had noticed problems with PC's logging onto the domain very slowly (5-15mins) over the IPSec tunnel (all other network activity seemed fine),and also group policies were not being applied.

Upon investigation I researched the following two points that apply to Windows PC's upon bootup/logon

Firstly, Upon logon, kerberos is used for authentication and by default uses UDP. The size of the packets can exceed 1500 bytes causing the packets to be fragmented.

This was deemed to be the issue with the slow logon. The registry fix MaxPacketSIze (as noted elsewhere) fixed the logon delay but errors were still present in the event log

Secondly, when logging on, Windows pc's apply group policies, but first they ping the domain controller using a packet of 2048 bytes long. This is in order to determine if they are logging on over a slow connection (a calculation is performed on the response time of the packet and the size of the packet. If the result is less than 500k, it considers the connection slow). Due to fact that Group policies were not being applied i tried to simulate the ping by running a manual ping -l 2048 to the Domain Controller but no response was being received!. I ran the same command on numerous public ip addresses to see whether the firewall could handle large packets at all (i.e not over the vpn tunnel) and all were successful. So it seemed to be that only large packets over the IPSec tunnel were the problem.

I ran a debug icmp trace on both firewalls and in which ever direction the ping is performed the closest PIX reports the fact the it has received a fragmented icmp packet but it sends itself an ICMP unreachable message "from its outside interface IP address to its outside interface IP address" of type Code 3. Thus they are essentially recognizing that the packet is too big and reporting that fact to itself and then proceding to do nothing (i.e not to lower the PMTU packet size).

In order to test PMTU, i ran a ping -f -l 1472 (dont fragment falg set and 1472 is the maximum size for MTU of 1500) etc etc to public ip addresses and everything comes back fine. Anything over 1472 and i am successfully receiving PMTU Fragment requied messages. This i assume means there are no black holes or routers blocking icmp messages.

My resolution was to knock the MTU of the outside interfaces of both PIX's to 1492 and everything works fine!!!.

Is this a bug or am i missing something??

We are running 6.3(3) on both sites.

The only thing out of the ordinary is that on Site B there is a router in front of the PIX506E which has the ADSL connection and we have a one-to-one static nat setup between the public ip address of the 857 and the outside interface ip address of the pix. What this has done is made both IPSec SA's establish IPSec/NAT-T tunnels (with UDP encapsulation). Not a problem i would have thought.

ehirsel · ‎05-07-2004

The only time that you should have to adjust an ethernet interface mtu from 1500 to 1492 is when you are crossing a device that talks ethernet v2 on one interface and ieee 802.3 (ether with SNAP) on another - the 8 bytes difference accounts for SNAP.

Examine the router in front of the pix at site B to see if the router is using ether snap encaps, or for some reason has one of its interface mtu size adjusted to 1492 (could be a default setting, or set by mistake in the case the 802.3 is not used).

You mentioned the registry fix MaxPacketSize: What did you adjust it to?

I believe that there is a serivce pack for windows to correct an issue whereby the win os will not adjust mtu size in response to icmp unreachable - fragment needed messages. I believe it was with win 2000 sp3 (to fix the issue with sp2).

You can config windows not to perform pmtu - but I don't know if that will cause the DF bit to be unset in all frames (tcp and udp) or still be set. I'll check and post what I find.

The default tcpmss value in pix code ought to be 1380 to account for IPsec tunnels, but it will not work with udp frames.

michael.dolan · ‎05-07-2004

The routers in front of both of the PIX's are not using snap and there MTU's are 1500.

This was verfied by the fact that when i run a ping -f -l 1472 to any internet host from either site, i am able to receive a reply. 1472 is the size of the icmp data, plus headers = exactly 1500 (as per MS article). This proves that that lowest common MTU between the firewall and most internet hosts is 1500.

The -f flag sets the ping packets DF Flag

I set the MaxPacketSize registry key when the MTU's were 1500 to "1" forcing Kerberos to use TCP, this allowed me to log on quicker BUT after knocking the MTU of both Firewalls down to 1492, i no longer need the registry key and group policies are applying. So the pix is correctly handling UDP packets and large ICMP packets (as explained in my first post).

I understand that that the sysopt connection tcpmss only applies to TCP communication, therefore ICMP and UDP packets do not get influenced by this value?.

If so why am i able to run a ping with a packet of any size to any where, EXCEPT over the vpn tunnel when the MTU is 1500.

My understanding is the windows pc will fragment the packet into seperate packets of 1500 (due to the fact the PMTU is not less than 1500) the firewall receives the packets (which is eveident when i ran a debug icmp trace, where i received messages saying Fragmented ICMP packet received. Then passes the packets. All this works fine.

It is only when the packet traverses the tunnel that I receives a "Request Timed Out" messages from my pings

The debug icmp trace shows that an unreachable is sent, but rather than to the client, it sends it to itself on its outside interface.

Am i right in saying that the pix will do the following when packets (in this case large packets)traverse the VPN Tunnel:

1) packet is sent by a windows PC on Site B, if the packet is 4000 bytes then the windows pc will break the packet into 1500 byte frames

2) the pix will receive the 1500 byte frames

3) Due to the fact that it must add 72 bytes (in my case) for the IpSec headers, the PIX will fragment the packet into multiple packets and send those packets over the tunnel. (I read somewhere that the DF Flag is copied from the original packet to the IPSec IP Header)

I assume this is transparent to the client and the client assumes the PMTU is 1500.

Maybe the PIX does not calulate the right size of the IPSec Headers and therefore when the packet is fragemented before passing over the VPN, the IPSEc HEader is added but is slightly over 1500???

I also admit that 1492 worked so i left it at this value, i have not tried a higher value, like 1496??

Thanks in advance

Brent

ehirsel · ‎05-07-2004

Run the show sysopt command on both pix units and let me know what the tcpmss value reports to be.

I do not believe that the pix copies the DF flag from the original packet to the IPSec packet. Logon was quicket when tcp was used because the pix makes the adjustment and if you do a sniffer trace during the 3-way connect open tcp handshake, your pc client will say mss size-1460 in the syn frame, and the syn ack will report mss size-1380. So when Kerberos was using tcp, the pix transparantly made the proper frame mss adjustment and all was well.

What os and service pack are the workstations running? Also, is there a router between the pix and the user's workstations on either site?

Here is a link from the ms kb that states the win 2000 sp2 introduced a change to force all hosts on the same subnet to use the same mtu size, and the os os will ignore requests to lower it. Later sp's fixed that issue.

http://support.microsoft.com/default.aspx?scid=kb;en-us;301337

Let me know if this pertains to you.

michael.dolan · ‎05-10-2004

The tcpmss value is 1380 - The default

I also ran a network trace and confirmed that within the TCP three-way-handshake the SYN packet has an MSS of 1460 and the SYN-ACK has a MSS of 1380.

I have also done a trace on with UDP packets (Kerberos) with the 1492 MTU on the outside interface of both firewalls and the packets are successfully fragmented by the firewall and sent over the tunnel. However with an MTU of 1500, packets disappear.

All workstations are running the latest sp's, xp1a for xp and sp4 for win2k. The KB article does not relate to us

Thanx

ehirsel · ‎05-10-2004

What make and model is the router that is in front ofs the pix units?

In re-reading your initial post, you mentioned that NAT-T is being used. Here is the algorithm for coming up with the tcpmss of 1380:

1380 data + 20 TCP + 20 IP + 24 AH + 24 ESP_CIPHER + 12 ESP_AUTH + 20 IP = 1500 bytes

You mentioned the ping test against public/internet addressess, but did you try the same test (ping -f -l 1472) for sites across the vpn? If so, how did they test?

Using NAT-T, and not AH, I would expect that using NAT-T should not cause you to adjust the tcpmss lower than 1380, since the 24 bytes that AH uses, goes toward a new udp header with 4 bytes left over.

FYI, MS KB Articles 314053 and 120642 pertain to windows registry parms relating to IP connections and there is mention about path mtu discovery and the two parameters that control it. I would not disable it since small (576 byte) frames would be used otherwise, but the other parm can tell ms win not to set the df bit if some tcp segments go unack'ed. I have never set them and I do not know it they apply, or how they would, for udp connections.

ehirsel · ‎05-12-2004

You mentioned an ADSL connection at site B. Does that router use PPoE over that connection? Similarly does the router at site A use ADSL and/or PPoE too?

I was reading some info on Cisco about Path Mtu Discovery (PMTUD) and it mentioned that PPoE can use 8 bytes making an effective ip mtu of 1492 instead of 1500. Just curious to know if that applies to you.

michael.dolan · ‎05-13-2004

Site B has an ADSL connection that uses PPPoA.

Site A has a standard PPP leased line

Site A has a 1720

Site B has a 837H

I checked the MTU using the ping -f -l command again and noticed that a ping -f -l 1472 on Site A worked fine, ( as i mentioned in my first post) However the highest value i could get on Site B was 1464. So you are correct in that the ADSL connection has an effective MTU of 1492.(Sorry for teh misleading information)

However i tried adjusting Site B's MTU to 1492 and keeping Site A's at 1500 but this did not work. I.e the same symptoms of PC's getting requests timing out. and the router sending fargment required messages back to itself.

I also tried adjusting Site A's MTU to 1492 and keeping Site B's at 1500, again this did not work.

I have to have both sides at 1492 before packet frgamentationa and PMTU works!.

Thanks again

ehirsel · ‎05-14-2004

You can leave both sides at 1492, as it is now, but if you are still willing to run some more tests, here is another suggestion:

1. Force Kerberos to use tcp, as you did before.

2. In one of the ms kb articles, there is a parm to detect black hole pmtu gateways, and one to disable pmtu. In the tests that I have run (I was using a cisco vpn client to connect to a pix), I added the registry parm to disable pmtu. What I noted is that instead of using the smallest allowed frame, per the ms kb doc, I was able to use normal-sized frames and each tcp frame did not have the df bit set. I also noted the udp frames did not have the df bit set too.

On one win host make the corresponding change.

3. Adjust the pix mtu on both sites back to 1500.

See if this will work in your environment.

Windows access over PIX-PIX VPN Tunnel and fragmented packets