cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
4573
Views
33
Helpful
39
Replies

9800-80 active and standby configuration out of sync - 17.12.4

eglinsky2012
Spotlight
Spotlight

I am attempting to upgrade our 9800-80 HA pairs to version 17.12.4 to resolve a certain bug. The upgrade from 17.9.5 went smoothly in the lab, but upgrading a pre-production pair from 17.9.4a has resulted in the standby being in a boot loop with the following message which occurs after a successful bulk sync:

Chassis 2 reloading, reason - Active and Standby configuration out of sync

This was a normal install-mode upgrade by GUI, not an ISSU upgrade.

I have notified TAC, but meanwhile, has anyone else experienced this and know of a resolution?

I have attached the console log from the bootup sequence to the point at which the sync issue and reboot occur.

39 Replies 39

I ask because there is interest bug feature that affects routers, switches, WLC that have applied SMU/APDP/APSP. 

(And it is a "feature" because the devs don't want to fix it and calling it a "bug" is just an insult!)

During the bootup of the platform, a built-in script is meant to remove SMU/APSP/APDP pointers if they are "old" when the firmware being loaded.  There are times when script would fail and when the old pointers are present in the boot-up script but the "old" SMU/APSP/APDP files have been cleaned out, a router/switch/WLC may, potentially, boot into ROMMON.  

The only way is to manually remove the old SMU/APSP/APDP from the bootup script:  

 

install remove file flash:filename.bin

 

The command "sh install summary" will fail to report SMU/APSP/APDP stuck in "limbo".  The only way is to catch the errors during bootup with a console cable.  

 

But "install remove" doesn't work if you have already upgraded Leo - then clear is the only way to get rid of them. You would have to do the remove *before* upgrading which makes the whole upgrade process a real mission.  So clear is actually quicker in my opinion.

.

Thanks for pointing my mistake out. 

I have made corrections. 

Sorry I forgot to mention  that the hitless SMU was in fact not hitless and initiated reload after which 9800-CL did not recover!
It's not the first time I've seen a "hitless" SMU that was actually a reload SMU though...

Software quality = dreadful!

eglinsky2012
Spotlight
Spotlight

Here’s another fun one that was shared on another forum:

https://bst.cisco.com/bugsearch/bug/CSCwm42613
Wireless clients are unable to join due to high memory usage - AAA_CHUNK_ATTR_SUBLIST”

I was leaning towards upgrading, now downgrading is seeming more appealing.

Wonder if this is contributing to your memory issues, @Leo Laohoo?


@eglinsky2012 wrote:
CSCwm42613

Thanks for sharing this Bug ID.  

Thankfully, I have not received any complaint(s) that points to this.  

The Conditions are very troubling:  700+ APs, around 5000 clients on a 9800-80.

Again, the Bug ID talks about the culprit:  wncd/wncmgrd. 

I may have said it in jest, but I am no longer laughing -- The 03 May 2024 revised Cisco Catalyst 9800 Series Configuration Best Practices may need to be further revised to:  

C9800 design is no different and, generally, Cisco recommends limiting the load to around 10% of the AP and client scale.

 

Another nasty one to track <sigh>!  

eglinsky2012
Spotlight
Spotlight

@Rich R, @Leo Laohoo - APSP2 for 17.12.4 is now out: https://software.cisco.com/download/home/286321396/type/286325254/release/17.12.4

Not sure what happened to APSP1... and I didn't even get an email for this release, even though I've checked and re-checked my email notifications for each category of 9800 updates! Grrr.

In other news, here's the final verdict from TAC on my 17.12.4 (or perhaps, not software version-related) config sync issue:

I think the scenario you faced was just unfortunate [a fluke - EG]. Of course, it is certainly a best practice to delete the persistent database before upgrading a pair of WLC, so if you plan to upgrade another pair in the future you can follow the same steps to make sure everything will go smoothly.

Thanks.  APSP2 fixes CSCwm48646 which we hit.  

Waiting for an SMU to fix 17.12.3/17.12.4 memory leak due to NMSP (DNAC Spaces) before we upgrade to 17.12.4.

Review Cisco Networking for a $25 gift card