cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
1522
Views
1
Helpful
11
Replies

WLC 9800-80 HA switchover happening daily.

pannick
Level 1
Level 1

Hey Gang,

 Here is a new one that we are troubleshooting. Maybe someone has ran in to this. We have a pair of 9800-80 WLCs running in HA that we have been slowly migrating to from our 8540s. The 9800 system is doing an HA switchover daily, sometimes twice. We have a TAC open but so far no positive answer or reason for the issue.

Not sure if this is a bug or not.

Thanks 

We are running code version - 17.9.3. We have 2700,3700,3800 & 9130 AP in our environment.

CWW#show redundancy switchover history
Index Previous Current Switchover Switchover
active active reason time
----- -------- ------- ---------- ----------
4 1 2 Active lost GW 09:28:35 Eastern Thu May 11 2023
5 2 1 Active lost GW 16:48:19 Eastern Thu May 11 2023
6 1 2 Active lost GW 21:33:38 Eastern Thu May 11 2023
7 2 1 Active lost GW 06:53:46 Eastern Fri May 12 2023
8 1 2 Active lost GW 11:08:00 Eastern Fri May 12 2023
9 2 1 Active lost GW 13:07:08 Eastern Fri May 12 2023
10 1 2 Active lost GW 14:08:03 Eastern Fri May 12 2023
11 2 1 Active lost GW 11:09:42 Eastern Sat May 13 2023
12 1 2 Active lost GW 21:04:17 Eastern Sat May 13 2023
13 2 1 Active lost GW 06:32:07 Eastern Sun May 14 2023

11 Replies 11

eglinsky2012
Spotlight
Spotlight

I'm on the same hardware and software, no switchover issues. What is the topology of the WLC uplinks? Do they uplink to the same VSS/StackWise switch or different switches? Any strange logs in the switch(es)?

marce1000
VIP
VIP

 

 - I would advice to configure a syslog server on the HA SSO pair on follow up on logs send to it and or related logs to the problem you are describing.

   2) Are the controllers any or both restarting too ? ; examine this for instance with : 
                         dir bootflash:/core/ | i core|system-report
                         show version | inc reload
     Make sure the commands can be executed  on both controllers  and or by enabling the standby console 

   3) Have   a checkup review of the (current active) controller configuration with the CLI command show tech wireless ; have the output analyzed with : https://cway.cisco.com/wireless-config-analyzer

   4) Test connectivity to the standby controller with the command : test wireless redundancy rping

   5) Have a test with increasing  https://www.cisco.com/c/en/us/support/docs/wireless/catalyst-9800-series-wireless-controllers/213915-configure-catalyst-9800-wireless-control.html#toc-hId-307825303
                 Verify changes with :  show chassis ha-status local

        6) A number of other related commands for  analyzing  and troubleshooting HA SSO :
show redundancy | i ptime|Location|Current Software state|Switchovers
show chassis
show chassis detail
show chassis ha-status local
show chassis ha-status active
show chassis ha-status standby
show chassis rmi
show redundancy
show redundancy history
show redundancy switchover history
show tech wireless redundancy
show redundancy states
show logging process stack_mgr internal to-file bootflash:

show platform hardware slot R0 ha_port interface stats
show platform hardware slot R0 ha_port sfp idprom (show details of SFP in SP , for fiber based redundancy link)

                Taking packet captures on the RP link
test wireless redundancy packetdump start
 or (test wireless redundancy packetdump start filter port <0-65535>)
test wireless redundancy packetdump stop
      
 show platform software stack-mgr chassis active R0 peer-timeout
show platform software stack-mgr chassis standby R0 peer-timeout
show platform software stack-mgr chassis active R0 sdp-counters
show platform software stack-mgr chassis standby R0 sdp-counters

show redundancy config-sync failures {bem|mcl|prc}
show redundancy config-sync historic mcl
show redundancy config-sync ignored failures historic mcl
show redundancy switchover history

 M.



                         



-- Each morning when I wake up and look into the mirror I always say ' Why am I so brilliant ? '
    When the mirror will then always repond to me with ' The only thing that exceeds your brilliance is your beauty! '

balaji.bandi
Hall of Fame
Hall of Fame

how is your STP running, if the same Layer 2 working with 8540 controller, this is only issue Cat 9800 - then this could be the bug.

I have also seen this issue on 17.6.4, also ports go error disable randomly some time.

 

BB

***** Rate All Helpful Responses *****

How to Ask The Cisco Community for Help

Leo Laohoo
Hall of Fame
Hall of Fame

Post the complete output to the following command: 

  1. dir
  2. sh log on switch 1 up detail
  3. dir flash-1:tracelogs/*.log

Rich R
VIP
VIP

Not seeing this at all on 17.9.3 - 100% stable.
Look into why it's losing site of the gateway.  As per docs "The messages are sent at 1 second interval. If it takes 8 (or configured value) consecutive failures in reaching the gateway, the controller declares the gateway as non-reachable."
Check for control plane policing of ICMP/ARP or other QOS which might drop ARP/pings on gateway.
Check CPU on WLC and gateway.
Look for any packet drops on interfaces.

agusdubi
Level 1
Level 1

Hello there! where you able to solve this issue? 

What firmware is the WLC on?

Agusdubi,

 Yes, we did. We initially created two port channels on the 9800, thinking one for each device in the pair. We removed one and put all of the interfaces into one port channel and it resolved the issue.

That's interesting because that's not a Cisco supported configuration @pannick .  If you have problems with that TAC might tell you that you're on your own because it's not supported.  All the supported configurations require 2 separate port-channels.
https://www.ciscolive.com/c/dam/r/ciscolive/apjc/docs/2023/pdf/BRKEWN-2846.pdf
https://www.cisco.com/c/dam/en/us/td/docs/wireless/controller/9800/17-1/deployment-guide/c9800-ha-sso-deployment-guide-rel-17-1.pdf

The port channel change was made on the 9800. The wireless router has two port channels. One for each 9800 in the pair. It was TAC who walked us through the change.

Luckily, we haven't had the 9800 issues in almost a year. Knocking on wood.

Your previous post said "We initially created two port channels on the 9800, thinking one for each device in the pair. We removed one and put all of the interfaces into one port channel and it resolved the issue." which implies all ports from both devices in one port-channel" but now you're saying that's not the case which makes more sense.  So which of the supported topologies are you using now?

RichR_0-1730468729087.png

 

Review Cisco Networking for a $25 gift card