cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
853
Views
0
Helpful
5
Replies

WLC 9800-80 HA switchover happening daily.

pannick
Level 1
Level 1

Hey Gang,

 Here is a new one that we are troubleshooting. Maybe someone has ran in to this. We have a pair of 9800-80 WLCs running in HA that we have been slowly migrating to from our 8540s. The 9800 system is doing an HA switchover daily, sometimes twice. We have a TAC open but so far no positive answer or reason for the issue.

Not sure if this is a bug or not.

Thanks 

We are running code version - 17.9.3. We have 2700,3700,3800 & 9130 AP in our environment.

CWW#show redundancy switchover history
Index Previous Current Switchover Switchover
active active reason time
----- -------- ------- ---------- ----------
4 1 2 Active lost GW 09:28:35 Eastern Thu May 11 2023
5 2 1 Active lost GW 16:48:19 Eastern Thu May 11 2023
6 1 2 Active lost GW 21:33:38 Eastern Thu May 11 2023
7 2 1 Active lost GW 06:53:46 Eastern Fri May 12 2023
8 1 2 Active lost GW 11:08:00 Eastern Fri May 12 2023
9 2 1 Active lost GW 13:07:08 Eastern Fri May 12 2023
10 1 2 Active lost GW 14:08:03 Eastern Fri May 12 2023
11 2 1 Active lost GW 11:09:42 Eastern Sat May 13 2023
12 1 2 Active lost GW 21:04:17 Eastern Sat May 13 2023
13 2 1 Active lost GW 06:32:07 Eastern Sun May 14 2023

5 Replies 5

eglinsky2012
Level 4
Level 4

I'm on the same hardware and software, no switchover issues. What is the topology of the WLC uplinks? Do they uplink to the same VSS/StackWise switch or different switches? Any strange logs in the switch(es)?

marce1000
VIP
VIP

 

 - I would advice to configure a syslog server on the HA SSO pair on follow up on logs send to it and or related logs to the problem you are describing.

   2) Are the controllers any or both restarting too ? ; examine this for instance with : 
                         dir bootflash:/core/ | i core|system-report
                         show version | inc reload
     Make sure the commands can be executed  on both controllers  and or by enabling the standby console 

   3) Have   a checkup review of the (current active) controller configuration with the CLI command show tech wireless ; have the output analyzed with : https://cway.cisco.com/wireless-config-analyzer

   4) Test connectivity to the standby controller with the command : test wireless redundancy rping

   5) Have a test with increasing  https://www.cisco.com/c/en/us/support/docs/wireless/catalyst-9800-series-wireless-controllers/213915-configure-catalyst-9800-wireless-control.html#toc-hId-307825303
                 Verify changes with :  show chassis ha-status local

        6) A number of other related commands for  analyzing  and troubleshooting HA SSO :
show redundancy | i ptime|Location|Current Software state|Switchovers
show chassis
show chassis detail
show chassis ha-status local
show chassis ha-status active
show chassis ha-status standby
show chassis rmi
show redundancy
show redundancy history
show redundancy switchover history
show tech wireless redundancy
show redundancy states
show logging process stack_mgr internal to-file bootflash:

show platform hardware slot R0 ha_port interface stats
show platform hardware slot R0 ha_port sfp idprom (show details of SFP in SP , for fiber based redundancy link)

                Taking packet captures on the RP link
test wireless redundancy packetdump start
 or (test wireless redundancy packetdump start filter port <0-65535>)
test wireless redundancy packetdump stop
      
 show platform software stack-mgr chassis active R0 peer-timeout
show platform software stack-mgr chassis standby R0 peer-timeout
show platform software stack-mgr chassis active R0 sdp-counters
show platform software stack-mgr chassis standby R0 sdp-counters

show redundancy config-sync failures {bem|mcl|prc}
show redundancy config-sync historic mcl
show redundancy config-sync ignored failures historic mcl
show redundancy switchover history

 M.



                         



-- Each morning when I wake up and look into the mirror I always say ' Why am I so brilliant ? '
    When the mirror will then always repond to me with ' The only thing that exceeds your brilliance is your beauty! '

balaji.bandi
Hall of Fame
Hall of Fame

how is your STP running, if the same Layer 2 working with 8540 controller, this is only issue Cat 9800 - then this could be the bug.

I have also seen this issue on 17.6.4, also ports go error disable randomly some time.

 

BB

***** Rate All Helpful Responses *****

How to Ask The Cisco Community for Help

Leo Laohoo
Hall of Fame
Hall of Fame

Post the complete output to the following command: 

  1. dir
  2. sh log on switch 1 up detail
  3. dir flash-1:tracelogs/*.log

Rich R
VIP
VIP

Not seeing this at all on 17.9.3 - 100% stable.
Look into why it's losing site of the gateway.  As per docs "The messages are sent at 1 second interval. If it takes 8 (or configured value) consecutive failures in reaching the gateway, the controller declares the gateway as non-reachable."
Check for control plane policing of ICMP/ARP or other QOS which might drop ARP/pings on gateway.
Check CPU on WLC and gateway.
Look for any packet drops on interfaces.

Review Cisco Networking for a $25 gift card