05-15-2023 05:48 AM
Hey Gang,
Here is a new one that we are troubleshooting. Maybe someone has ran in to this. We have a pair of 9800-80 WLCs running in HA that we have been slowly migrating to from our 8540s. The 9800 system is doing an HA switchover daily, sometimes twice. We have a TAC open but so far no positive answer or reason for the issue.
Not sure if this is a bug or not.
Thanks
We are running code version - 17.9.3. We have 2700,3700,3800 & 9130 AP in our environment.
CWW#show redundancy switchover history
Index Previous Current Switchover Switchover
active active reason time
----- -------- ------- ---------- ----------
4 1 2 Active lost GW 09:28:35 Eastern Thu May 11 2023
5 2 1 Active lost GW 16:48:19 Eastern Thu May 11 2023
6 1 2 Active lost GW 21:33:38 Eastern Thu May 11 2023
7 2 1 Active lost GW 06:53:46 Eastern Fri May 12 2023
8 1 2 Active lost GW 11:08:00 Eastern Fri May 12 2023
9 2 1 Active lost GW 13:07:08 Eastern Fri May 12 2023
10 1 2 Active lost GW 14:08:03 Eastern Fri May 12 2023
11 2 1 Active lost GW 11:09:42 Eastern Sat May 13 2023
12 1 2 Active lost GW 21:04:17 Eastern Sat May 13 2023
13 2 1 Active lost GW 06:32:07 Eastern Sun May 14 2023
05-15-2023 06:29 AM
I'm on the same hardware and software, no switchover issues. What is the topology of the WLC uplinks? Do they uplink to the same VSS/StackWise switch or different switches? Any strange logs in the switch(es)?
05-15-2023 07:10 AM
- I would advice to configure a syslog server on the HA SSO pair on follow up on logs send to it and or related logs to the problem you are describing.
2) Are the controllers any or both restarting too ? ; examine this for instance with :
dir bootflash:/core/ | i core|system-report
show version | inc reload
Make sure the commands can be executed on both controllers and or by enabling the standby console
3) Have a checkup review of the (current active) controller configuration with the CLI command show tech wireless ; have the output analyzed with : https://cway.cisco.com/wireless-config-analyzer
4) Test connectivity to the standby controller with the command : test wireless redundancy rping
5) Have a test with increasing https://www.cisco.com/c/en/us/support/docs/wireless/catalyst-9800-series-wireless-controllers/213915-configure-catalyst-9800-wireless-control.html#toc-hId-307825303
Verify changes with : show chassis ha-status local
6) A number of other related commands for analyzing and troubleshooting HA SSO :
show redundancy | i ptime|Location|Current Software state|Switchovers
show chassis
show chassis detail
show chassis ha-status local
show chassis ha-status active
show chassis ha-status standby
show chassis rmi
show redundancy
show redundancy history
show redundancy switchover history
show tech wireless redundancy
show redundancy states
show logging process stack_mgr internal to-file bootflash:
show platform hardware slot R0 ha_port interface stats
show platform hardware slot R0 ha_port sfp idprom (show details of SFP in SP , for fiber based redundancy link)
Taking packet captures on the RP link
test wireless redundancy packetdump start
or (test wireless redundancy packetdump start filter port <0-65535>)
test wireless redundancy packetdump stop
show platform software stack-mgr chassis active R0 peer-timeout
show platform software stack-mgr chassis standby R0 peer-timeout
show platform software stack-mgr chassis active R0 sdp-counters
show platform software stack-mgr chassis standby R0 sdp-counters
show redundancy config-sync failures {bem|mcl|prc}
show redundancy config-sync historic mcl
show redundancy config-sync ignored failures historic mcl
show redundancy switchover history
M.
05-15-2023 07:15 AM
how is your STP running, if the same Layer 2 working with 8540 controller, this is only issue Cat 9800 - then this could be the bug.
I have also seen this issue on 17.6.4, also ports go error disable randomly some time.
05-15-2023 09:09 AM
Post the complete output to the following command:
05-15-2023 05:01 PM
Not seeing this at all on 17.9.3 - 100% stable.
Look into why it's losing site of the gateway. As per docs "The messages are sent at 1 second interval. If it takes 8 (or configured value) consecutive failures in reaching the gateway, the controller declares the gateway as non-reachable."
Check for control plane policing of ICMP/ARP or other QOS which might drop ARP/pings on gateway.
Check CPU on WLC and gateway.
Look for any packet drops on interfaces.
10-30-2024 12:47 PM
Hello there! where you able to solve this issue?
10-30-2024 04:16 PM
What firmware is the WLC on?
11-01-2024 04:06 AM
Agusdubi,
Yes, we did. We initially created two port channels on the 9800, thinking one for each device in the pair. We removed one and put all of the interfaces into one port channel and it resolved the issue.
11-01-2024 04:53 AM
That's interesting because that's not a Cisco supported configuration @pannick . If you have problems with that TAC might tell you that you're on your own because it's not supported. All the supported configurations require 2 separate port-channels.
https://www.ciscolive.com/c/dam/r/ciscolive/apjc/docs/2023/pdf/BRKEWN-2846.pdf
https://www.cisco.com/c/dam/en/us/td/docs/wireless/controller/9800/17-1/deployment-guide/c9800-ha-sso-deployment-guide-rel-17-1.pdf
11-01-2024 04:59 AM - edited 11-04-2024 04:40 AM
The port channel change was made on the 9800. The wireless router has two port channels. One for each 9800 in the pair. It was TAC who walked us through the change.
Luckily, we haven't had the 9800 issues in almost a year. Knocking on wood.
11-01-2024 06:46 AM
Your previous post said "We initially created two port channels on the 9800, thinking one for each device in the pair. We removed one and put all of the interfaces into one port channel and it resolved the issue." which implies all ports from both devices in one port-channel" but now you're saying that's not the case which makes more sense. So which of the supported topologies are you using now?
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide