cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
1471
Views
13
Helpful
6
Replies

HA SSO Switchover isn't triggerd when activ WLC loses default-gw

Shiden
Level 1
Level 1

Hello all,

I am currently configuring HA SSO  with RMI+RP on a Catalyst 9800-L (Firmware 17.06.04) Wireless controller.

The peering works perfectly:

WLC#sh chassis Rmi
Chassis/Stack Mac Address : 0845.d117.c840 - Local Mac Address
Mac persistency wait time: Indefinite
Local Redundancy Port Type: Twisted Pair
H/W Current
Chassis# Role Mac Address Priority Version State IP RMI-IP
--------------------------------------------------------------------------------------------------------
*1 Active 0845.d117.c840 2 V02 Ready 169.254.0.13 10.10.0.13
2 Standby 0845.d117.0960 1 V02 Ready 169.254.0.14 10.10.0.14

The switchover also works perfectly when the active WLC goes off. Now I want when the active WLC loses connectivity to the default-gateway, the switchover is triggered as well. Here is what I configured:

management gateway-failover enable

ip default-gateway <ip>

Here the redundancy state of the WLC

WLC#sh redundancy states
my state = 13 -ACTIVE
peer state = 8 -STANDBY HOT
Mode = Duplex
Unit = Primary
Unit ID = 1

Redundancy Mode (Operational) = sso
Redundancy Mode (Configured) = sso
Redundancy State = sso
Maintenance Mode = Disabled
Manual Swact = enabled
Communications = Up

client count = 150
client_notification_TMR = 30000 milliseconds
RF debug mask = 0x0
Gateway Monitoring = Enabled
Gateway monitoring interval = 8 secs

 

Now my problem. When I unplug the uplink (RP is still plugged) nothing happens and I don't why. After the Cisco documentation, the switchover should be triggered.

 https://www.cisco.com/c/dam/en/us/td/docs/wireless/controller/9800/17-1/deployment-guide/c9800-ha-sso-deployment-guide-rel-17-1.pdf (page 30).

The access points goes down because the active WLC is no more reachable. I also see logs that the RMI link is no more reachable on both (active and standby) WLCs. The RMI links don't have to be UP in order that the switchover is triggered, right? Otherwise, what could it be?

I already say thanks to the people who will take time to answer this post.

1 Accepted Solution

Accepted Solutions

Arshad Safrulla
VIP Alumni
VIP Alumni

Make sure tht the mobility mac address is configured. Is RMI IP part of the same subnet as WMI interface? (Recommendation is that it must be part of the same subnet). 

ip default-gateway must be configured and it should be the gateway of the RMI Interface. (In your case 10.10.0.0 network)

Post the below outputs if you need for assitance 

  • show run all | i redun
  • show run | i redun
  • show run interface Vlan <WMI interface VLAN>

Most importantly make sure that the garp is enabled where the Gateway resides and upstream switchports connecting to the WLC are properly configured (great if you can post the config, recommendations- no native vlan, only allow wireless vlans, spanning tree portfast edge added to the ports)

 

 

View solution in original post

6 Replies 6

marce1000
VIP
VIP

 

               >.... Now I want when the active WLC loses connectivity to the default-gateway,...
  - In general HA SSO is not designed for that , it is designed to provide wireless service on a 'box failure' ; with RMI+RP you may have failover for a local link failure too , but not for a default gateway ; that is an external network problem so to speak , 

 M.



-- Each morning when I wake up and look into the mirror I always say ' Why am I so brilliant ? '
    When the mirror will then always repond to me with ' The only thing that exceeds your brilliance is your beauty! '

Arshad Safrulla
VIP Alumni
VIP Alumni

Make sure tht the mobility mac address is configured. Is RMI IP part of the same subnet as WMI interface? (Recommendation is that it must be part of the same subnet). 

ip default-gateway must be configured and it should be the gateway of the RMI Interface. (In your case 10.10.0.0 network)

Post the below outputs if you need for assitance 

  • show run all | i redun
  • show run | i redun
  • show run interface Vlan <WMI interface VLAN>

Most importantly make sure that the garp is enabled where the Gateway resides and upstream switchports connecting to the WLC are properly configured (great if you can post the config, recommendations- no native vlan, only allow wireless vlans, spanning tree portfast edge added to the ports)

 

 

Hello @Arshad Safrulla,

Sorry for my late reply, I have been on vacation for almost 2 weeks. When I came back, I checked the config again and had basically the same configuration that you mentioned. I tried to configure a default gateway as an IP route like this "ip route 0.0.0.0 0.0.0.0 10.10.0.1" because I saw on another forum, this could fix the problem. I tried again to unplug the uplinks, and it finally worked. To be sure this was the reason, I disabled the route again and try the same, but it also worked. Actually I am a confused with HA SSO, it's like, if you are lucky this day it will work. I don't get what was the issue before, but anyway it seems to work now. So I know what you mean @Scott Fella. Additionally, sometimes the WLC is frozen after a switchover and has to be manually restarted.

I thank you all for your answers. I will accept this one because these are excellent advices for a HA SSO.

 

Rich R
VIP
VIP

Actually @marce1000 - the feature is supported from 17.1 (and 17.4 for IPv6) and designed to work exactly that way:
https://www.cisco.com/c/en/us/td/docs/wireless/controller/9800/17-4/config-guide/b_wl_17_4_cg/m_vewlc_high_availability.html#id_109520
https://www.cisco.com/c/dam/en/us/td/docs/wireless/controller/9800/17-6/deployment-guide/c9800-ha-sso-deployment-guide-rel-17-6.pdf

"Default Gateway check is done by periodically sending Internet Control Message Protocol (ICMP) ping to
the gateway. Both the active and the standby controllers use the RMI IP as the source IP. These messages
are sent at 1 second interval. If there are 8 consecutive failures in reaching the gateway, the controller will
declare the gateway as non-reachable.
After 4 ICMP Echo requests fail to get ICMP Echo responses, ARP requests are attempted. If there is no
response for 8 seconds (4 ICMP Echo Requests followed by 4 ARP Requests), the gateway is assumed to
be non-reachable.
IPv6 default gateway detection is supported starting release 17.4. Instead of ICMP and ARP in IPv4, IPv6
shall use ICMP to detect gateway failure."

Scott Fella
Hall of Fame
Hall of Fame

Does the primary ever reboot allowing the secondary unit to take over?  With a hardware failure or just powering down the primary, the secondary just moves in right away, but not in the scenario.  If the primary never reboots, I would suspect some configuration issue or something broken in the back end.  You might also try to rebuild the SSO.

I was never a fan of SSO, I have always tested it and have ran into production issues, which now I have stayed to an N+1.  By no means am I saying SSO stinks, N+1 to me is manageable and your environment might be different.

Open a TAC case since I would think that you have support on this and let us know how it was fixed.

-Scott
*** Please rate helpful posts ***

Agree with @Scott Fella - if you're sure you've followed the config guide correctly and it's not working then time for a TAC case.
We've generally found SSO very reliable.  The only thing we have had occasional trouble with is the gateway reachability test failing and triggering switchover when it shouldn't.  Then different Cisco BU's fight over who lost the checks - WLC or router.  Don't think we've seen that yet with 9800 though so maybe only an AireOS problem.

Review Cisco Networking for a $25 gift card