12-06-2023 04:36 AM
Hello all,
a few days ago (December 3rd) my wlc 9800-40 with sw 17.3.7 faced an issue where the active member and the standby switched over. I found the following logs about the event:
Dec 3 15:19:07.495: %REDUNDANCY-3-STANDBY_LOST: Standby processor fault (PEER_NOT_PRESENT)
Dec 3 15:19:07.495: %REDUNDANCY-3-REDUNDANCY_ALARMS: Unable to assert REDUNDANCY alarm
Dec 3 15:19:07.495: %REDUNDANCY-3-STANDBY_LOST: Standby processor fault (PEER_DOWN)
Dec 3 15:19:07.495: %REDUNDANCY-3-STANDBY_LOST: Standby processor fault (PEER_REDUNDANCY_STATE_CHANGE)
Dec 3 15:19:21.994: %PKI-4-TRUSTPOOL_DOWNLOAD_FAILURE: Trustpool Download failed
But when i go look for the "show chassis ha status local" command, it says the last switchover was a month ago (NTP is synchronized correctly):
sho chassis ha-status local
My state = ACTIVE
Peer state = STANDBY HOT
Last switchover reason = active unit removed
Last switchover time = 12:01:02 CET Mon Nov 6 2023
Image Version = 17.3.7
Chassis-HA Local-IP Remote-IP MASK HA-Interface
-----------------------------------------------------------------------------
Chassis-HA Chassis# Priority IFMac Address Peer-timeout(ms)*Max-retry
-----------------------------------------------------------------------------------------
This Boot: 2 1 500*5
Next Boot: 2 1 500*5
sho redundancy switchover history
Index Previous Current Switchover Switchover
active active reason time
----- -------- ------- ---------- ----------
6 2 1 active unit removed 12:54:35 CEST Tue Oct 17 2023
7 1 2 active unit removed 15:27:30 CEST Tue Oct 17 2023
8 2 1 active unit removed 16:41:35 CEST Tue Oct 17 2023
9 1 2 active unit removed 16:48:45 CEST Tue Oct 17 2023
10 2 1 active unit removed 17:14:15 CEST Tue Oct 17 2023
11 1 2 active unit removed 13:17:33 CEST Wed Oct 18 2023
12 2 1 active unit removed 12:37:14 CET Mon Oct 30 2023
13 1 2 active unit removed 08:51:36 CET Tue Oct 31 2023
14 2 1 active unit removed 15:10:57 CET Tue Oct 31 2023
15 1 2 active unit removed 12:01:02 CET Mon Nov 6 2023
It looks like the last switchover or event hasn't been recorded by the controller.
Thanks for your help.
Solved! Go to Solution.
12-06-2023 12:13 PM - edited 12-07-2023 04:05 AM
17.3 is almost end of life so as others have said upgrade as per TAC recommended link below.
If I'm reading the limited info you provided correctly I don't think there was a switchover but the standby might have crashed/reloaded. Look for crashinfo and core files on the standby unit. And use the commands Marce has already recommended + dir bootflash-2: and dir harddisk-2:core
12-06-2023 04:40 AM
- Check if it could be related to a controller crash problem with (CLI) : show version | inc reload
M.
12-06-2023 04:44 AM
Hello Marce and thank you for your immediate feedback!
So by checking the reload reason i found this one:
WLC#sho ver | i reload
Last reload reason: LocalSoft
It's my first time I see this kind of reason, so I don't know what it means.
Thank you very much for your help.
Best regards.
12-06-2023 04:56 AM
>...Last reload reason: LocalSoft
- No real conclusions , also seen for regular controller boot ; I would advice to upgrade to 17.9.4a , because 17.3.x is EOL , below you will find a list of commands for troubleshooting redundancy switchovers :
show redundancy | i ptime|Location|Current Software state|Switchovers
show chassis
show chassis detail
show chassis ha-status local
show chassis ha-status active
show chassis ha-status standby
show chassis rmi
show redundancy
show redundancy history
show redundancy switchover history
show tech wireless redundancy
show redundancy states
show logging process stack_mgr internal to-file bootflash:
M.
12-06-2023 04:50 AM
Do you see on the switch where the WLC connected any port issue or layer 2 connection between WLC ? (if they are not connected back to back).
You can check below command see what is the reasons :
sho chassis ha-status standby
show redundancy switchover history
show version also (show you the reason of reload)
show logging (check the logs)
If this is happening, also worth uplift to IOS XE 17.9.X for better bug fixes
12-06-2023 12:13 PM - edited 12-07-2023 04:05 AM
17.3 is almost end of life so as others have said upgrade as per TAC recommended link below.
If I'm reading the limited info you provided correctly I don't think there was a switchover but the standby might have crashed/reloaded. Look for crashinfo and core files on the standby unit. And use the commands Marce has already recommended + dir bootflash-2: and dir harddisk-2:core
12-06-2023 11:43 PM
Hello Rich and thank you for your suggestion. Since I did not find any particular info in the crashinfo file, I'm now raising a TAC case in order to follow the case with an engineer and see if we can find the issue. Thank you all for your help!
12-06-2023 11:55 PM - edited 12-06-2023 11:56 PM
- It's good to do that , but it's most likely that TAC will immediately respond what we have said ;
try 17.9.4a first
M.
12-07-2023 04:13 AM
Agreed with Marce - TAC will remind you of https://www.cisco.com/c/en/us/products/collateral/ios-nx-os-software/ios-xe-17/ios-xe-17-3-x-eol.html and ask why you have not upgraded yet.
Even if they find a new problem with 17.3 they are not allowed to open a bug for it so really the only thing they can do is recommend upgrade as we have already done.
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide