Some background regarding our setup. We have a WLC (5508) in our main office in Brisbane that is hosting two WLANs. One provides wireless access to our internal network and the second provides wireless guest access. The guest WLAN is anchored to a controller sitting in the DMZ at our Data Centre.
In the DMZ the anchor controller has a management interface and an interface in the DMZ for the wireless guest access. I am using the DHCP server on the anchor DMZ to provide IPs etc to wireless guest clients. The default gateway is 10.8.144.1 which is a VIP or a pair of firewalls.
Initially everything works fine. Guests connect to the guest network, have to authenticate via a web portal (Cisco ISE server) and then can go on an use the internet. Works perfectly until the firewalls fail over and the secondary firewall takes over the VIP address. All access to the internet is lost at that point. If I try to disconnect and then reconnect a wireless client it connects, as in it will get an IP address, but DNS resolution stops and I do not get redirected to the web auth portal. If the firewalls are failed back to the primary then everything works again, no issues. However, if I reboot the WLC while the secondary firewall has the VIP IP everything will work fine as it did on the primary. If the firewalls now fail over to the primary again everything goes to crap. Until either the firewalls are failed back or the anchor WLC is rebooted.
Initially I thought this was an issue on the firewall, but this doesn't appear to be the case. When the firewall fails over it sends out a gratuitous ARP advising of the change in MAC address for the 10.8.144.1 IP address. The WLC seems to update its ARP table because if I run the command "show arp switch" it has the 10.8.144.1 IP address with the MAC address of the active firewall. From the client perspective I have run a wireshark and captured packets on the wireless interface when trying to connect. The laptop is continuously send ARP requests for 10.8.144.1 but gets not reply. Without this the client cannot send an ethernet frame to the gateway and hence get to the DNS server and WEB portal. Internet access breaks. Doing a TCP dump on the active firewall shows it receiving and then sending a reply to the ARP request. It just never gets to the wireless client. Debugging ARP packets on the anchor WLC seems to indicate that the controller is receiving the ARP replies from the firewall. So I'm at a loss as to why things should break when the firewalls fail over.
To make things even weirder....I have a 3750 switch in the DMZ with SVI of 10.8.144.4. I thought I could get a work around where I would make this the default gateway. The theory being that this interface MAC address would never change. However I was wrong. Even with this IP set as the gateway address for the wireless clients I see the exact same bahaviour when the firewalls fail over. I can't explain it other than to say that the gratuitous ARP sent by the firewalls seems to kill the ability of ARP replies to be sent back to the wireless client.
I'm at a total loss at the moment. Any suggestions, no matter how crazy will be appreciated.
Good detail in your question. First, I have an identical setup and had a recent fw failover and didn't have any issues.
I'm curious, what code is your anchor and foreign controllers on ?
Only difference in my design is that I am using a real dhcp server. But that shouldnt have any play on this ..
Sent from Cisco Technical Support iPhone App
The software versions that I am running are:
1. Anchor: 220.127.116.11
2. Foreign: 18.104.22.168
I don't have a lot of experience with setting up the Cisco wireless network so I have perhaps not configured something correctly.
Also I took another wireshark file capture this time spanning the port on the foreign controller where the EoIP traffic originates / terminates. Once again I see the ARP traffic being forwarded by the foreign controller, but no replies coming back the other way. It would appear as though the issue is with the anchor controller.
I know this is an old thread, but I have the exact same anchor controller setup but running version 8.0.133. I was experiencing this with version 8.0.121 and upgraded to 8.0.133 (that's the only code I could get for now), hoping it would solve the problem.... it does not. Firewalls are Cisco ASAs with firepower IPS. This happens whenever the firewalls failover in any direction. The only fix is to clear host xlate entries on active firewall, or reboot the anchor controller. I don't see a bug report on this issue in any release. Anyone experiencing this on the 8.0 train? Any help? I can't engage TAC right now, waiting for Smartnet issues to be resolved but need to work this out sooner than it will take to get the SmartNet issues worked.
Just to share with you:
There is a good discussion available on Guest Access with Mobility Anchor Chalk Talk:
have you got your issue resolved? I'm facing the same problem in an almost same
configuration (anchor controller in the dmz, 2 checkpoints in between, no arp reply)
Yes I did get this resolved. I was told by TAC that it was a bug with how gratuitous ARP was being handled by the controllers. At the time it was recommended that I upgrade to 7.3.101 version of the software. This fixed the issue for me. Hope it does for you also.
my WLCs are still on 22.214.171.124. The only version I can upgrade to right now
is 126.96.36.199 - but I see nothing in the release notes indicating the problem
beeing fixed there.
Cannot upgrade to 7.3 or higher, because we still use Cisco NCS 1.1.
So first I have to upgrade the NCS, then I can upgrade the controllers.
But I'll have to upgrade the NCS either way, cause we just purchased
Cisco ISE 1.2, which requires the NCS upgrade.
So I'm in a kinda version hell right now, looking for an optimal upgrade