08-21-2024 02:34 PM
I am attempting to upgrade our 9800-80 HA pairs to version 17.12.4 to resolve a certain bug. The upgrade from 17.9.5 went smoothly in the lab, but upgrading a pre-production pair from 17.9.4a has resulted in the standby being in a boot loop with the following message which occurs after a successful bulk sync:
Chassis 2 reloading, reason - Active and Standby configuration out of sync
This was a normal install-mode upgrade by GUI, not an ISSU upgrade.
I have notified TAC, but meanwhile, has anyone else experienced this and know of a resolution?
I have attached the console log from the bootup sequence to the point at which the sync issue and reboot occur.
Solved! Go to Solution.
09-12-2024 04:56 PM - edited 09-12-2024 05:10 PM
I ask because there is interest bug feature that affects routers, switches, WLC that have applied SMU/APDP/APSP.
(And it is a "feature" because the devs don't want to fix it and calling it a "bug" is just an insult!)
During the bootup of the platform, a built-in script is meant to remove SMU/APSP/APDP pointers if they are "old" when the firmware being loaded. There are times when script would fail and when the old pointers are present in the boot-up script but the "old" SMU/APSP/APDP files have been cleaned out, a router/switch/WLC may, potentially, boot into ROMMON.
The only way is to manually remove the old SMU/APSP/APDP from the bootup script:
install remove file flash:filename.bin
The command "sh install summary" will fail to report SMU/APSP/APDP stuck in "limbo". The only way is to catch the errors during bootup with a console cable.
09-12-2024 05:32 PM
But "install remove" doesn't work if you have already upgraded Leo - then clear is the only way to get rid of them. You would have to do the remove *before* upgrading which makes the whole upgrade process a real mission. So clear is actually quicker in my opinion.
.
09-12-2024 04:01 PM - edited 09-13-2024 02:57 AM
---
09-12-2024 07:19 PM - edited 09-13-2024 03:49 AM
Thanks for pointing my mistake out.
I have made corrections.
09-12-2024 03:59 PM
Sorry I forgot to mention that the hitless SMU was in fact not hitless and initiated reload after which 9800-CL did not recover!
It's not the first time I've seen a "hitless" SMU that was actually a reload SMU though...
Software quality = dreadful!
09-12-2024 06:15 PM
Here’s another fun one that was shared on another forum:
https://bst.cisco.com/bugsearch/bug/CSCwm42613
“Wireless clients are unable to join due to high memory usage - AAA_CHUNK_ATTR_SUBLIST”
I was leaning towards upgrading, now downgrading is seeming more appealing.
Wonder if this is contributing to your memory issues, @Leo Laohoo?
09-12-2024 07:16 PM - edited 09-13-2024 03:48 AM
@eglinsky2012 wrote:
CSCwm42613
Thanks for sharing this Bug ID.
Thankfully, I have not received any complaint(s) that points to this.
The Conditions are very troubling: 700+ APs, around 5000 clients on a 9800-80.
Again, the Bug ID talks about the culprit: wncd/wncmgrd.
I may have said it in jest, but I am no longer laughing -- The 03 May 2024 revised Cisco Catalyst 9800 Series Configuration Best Practices may need to be further revised to:
C9800 design is no different and, generally, Cisco recommends limiting the load to around 10% of the AP and client scale.
09-13-2024 02:58 AM
Another nasty one to track <sigh>!
09-20-2024 11:17 AM - edited 09-20-2024 11:24 AM
@Rich R, @Leo Laohoo - APSP2 for 17.12.4 is now out: https://software.cisco.com/download/home/286321396/type/286325254/release/17.12.4
Not sure what happened to APSP1... and I didn't even get an email for this release, even though I've checked and re-checked my email notifications for each category of 9800 updates! Grrr.
In other news, here's the final verdict from TAC on my 17.12.4 (or perhaps, not software version-related) config sync issue:
I think the scenario you faced was just unfortunate [a fluke - EG]. Of course, it is certainly a best practice to delete the persistent database before upgrading a pair of WLC, so if you plan to upgrade another pair in the future you can follow the same steps to make sure everything will go smoothly.
09-20-2024 07:51 PM
Thanks. APSP2 fixes CSCwm48646 which we hit.
Waiting for an SMU to fix 17.12.3/17.12.4 memory leak due to NMSP (DNAC Spaces) before we upgrade to 17.12.4.
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide