cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
885
Views
15
Helpful
6
Replies

Best Practices for Pre-HA Failover Health Checks for Standby WLC

ctalsness
Level 1
Level 1

Just recently I have had a couple manual HA failovers between a pairs of HA WLCs fail due to the standby unit being in an apparently unhealthy state.

Besides checking for "Peer State Standby Hot" and "Bulk Sync Status Complete" What are other Best Practice things to check to make sure the standby unit is in a good healthy state before performing a manual failover?

I'm interested in suggestions for both IOS XE and AirOS WLC platforms.

6 Replies 6

Leo Laohoo
Hall of Fame
Hall of Fame

@ctalsness wrote:
"Bulk Sync Status Complete"

That is what I look before and after a failover.

Rich R
VIP
VIP

And if in doubt make sure you have config backups.
Cisco closed our TAC case without any solution/explanation for the case where 9800 crashed and failed over (they were working and in sync), but standby config got corrupted (maybe it was trying to sync at moment of crash) and then they synced a blank wireless config! Luckily we have double backup - second SSO pair in other data centre - so the APs failed over to the other data centre and we had to restore the failed pair from daily backups.

ps: actually let me clarify the fault we saw - they didn't go to complete day 0 config - the redundancy was still working and base config (interfaces etc) was still there but ALL wireless config was lost from the running config which was synced (although it was still there in the startup-config).

Arshad Safrulla
VIP Alumni
VIP Alumni

1. Make sure that AP's are primed properly to have the primary and secondary wlc's configured. This will prevent AP's from joining controllers randomly.

2. If using N+1 make sure that the WLC's have required licenses or synced with smart license portal. Importantly make sure that the WLC's are running the same image. 

3. If you are using certificates make sure that is installed on both Active/Standby or N+1 WLC's.

4. Make sure that the upstream switchports are configured to allow required VLAN's.

5. Make sure that the RP ports are connected properly.

6. Make sure that the service ports of the WLC's are configured (IOS-XE Gig0 in management vrf)

It is highly recommended that you access the WLC's using console directly during a controlled failover to grab the logs. 

Refer the below docs;

High Availability SSO Deployment Guide for Cisco Catalyst 9800 Series Wireless Controllers, Cisco IOS XE Amsterdam 17.3

Cisco Wireless Controller Configuration Guide, Release 8.10 - High Availability [Cisco Wireless LAN Controller Software] - Cisco

High Availability (SSO) Deployment Guide - Cisco

use the show commands appropriately before and after the failovers to verify the behaviour

Some commands for AireOS - 

show redundancy summary
show redundancy infra statistics
show redundancy transport statistics
show redundancy keepalive statistics
show redundancy gw-reachability statistics
show redundancy config-sync statistics
show redundancy ap-sync statistics
show redundancy client-sync statistics

For 9800 - 

show chassis
show chassis rmi
show redundancy
show romvar

After the switchover you may check the below to verify the failover 

show ap summary
show ap uptime
show wireless client summary

 

 

 

Do you have any suggestions for what sort of things to look for when running the above commands? Things that might indicate an an issue on the standby that needs to be addressed before performing the failover?

I would say just look for config sync state and sso state as you mentioned.

For me it is very important to make sure that all the AP’s and clients are synced to the standby WLC before a failover. So I would check that and also make sure that I have the picture before the failover and after the failover to compare and make sure that all the APs and clients reported back to standby WLC.

 

Rich R
VIP
VIP

Good points @Arshad Safrulla and I'll add to that.  Another thing (apart from certificates) that AireOS does not sync is network routes - they must be configured on each WLC.
https://www.cisco.com/c/en/us/td/docs/wireless/controller/technotes/8-1/HA_SSO_DG/High_Availability_DG.html#pgfId-43630

"The Peer Service Port and Static route configuration is a part of a different XML file, and will not be applied if downloaded as part of the configuration file.", in other words, the WLC Active will send its configuration to the Standby in a form of XML file, but the Static Routes are not part of the main XML configuration file so they are not synched to the standby.

https://www.cisco.com/c/en/us/td/docs/wireless/controller/8-5/config-guide/b_cg85/high_availability.html#concept_FC84F350446D4D76A965400D13DA122A.

"The service port and route information that is configured is lost after you enable SSO. You must configure the service port and route information again after you enable SSO. You can configure the service port and route information for the standby-hot controller using the peer-service-port and peer-route commands."

Review Cisco Networking for a $25 gift card