- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-21-2024 02:34 PM
I am attempting to upgrade our 9800-80 HA pairs to version 17.12.4 to resolve a certain bug. The upgrade from 17.9.5 went smoothly in the lab, but upgrading a pre-production pair from 17.9.4a has resulted in the standby being in a boot loop with the following message which occurs after a successful bulk sync:
Chassis 2 reloading, reason - Active and Standby configuration out of sync
This was a normal install-mode upgrade by GUI, not an ISSU upgrade.
I have notified TAC, but meanwhile, has anyone else experienced this and know of a resolution?
I have attached the console log from the bootup sequence to the point at which the sync issue and reboot occur.
Solved! Go to Solution.
- Labels:
-
Catalyst Wireless Controllers
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-10-2024 04:30 PM
@Rich R wrote:Frankly I'm dubious that deleting the binary config (which file is that by the way?) will make any difference.
The instructions were to run the following commands, both on active and standby. I was able to do this via console port since we have a console server, otherwise they suggested SSH to RMI IP.
delete /force /recursive bootflash:.dbpersist/persistent-config.tar.gz
delete /force /recursive bootflash:.dbpersist/persistent-config.meta-
Then reload the stack, both units together ("reload" command).
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-10-2024 04:05 PM - edited 09-10-2024 04:07 PM
Thanks for the info about CSCwj77042 @jasonm002 I will ask for that before going for 17.12.4
ps I've already queried why some of the fixes in 17.9.5 APSP5 are missing from 17.12.4! And already confirmed that some are fixed but they haven't updated the bug database which seems to be a fairly frequent thing these days!
Please click Helpful if this post helped you and Select as Solution (drop down menu at top right of this reply) if this answered your query.
------------------------------
TAC recommended codes for AireOS WLC's and TAC recommended codes for 9800 WLC's
Best Practices for AireOS WLC's, Best Practices for 9800 WLC's and Cisco Wireless compatibility matrix
Check your 9800 WLC config with Wireless Config Analyzer using "show tech wireless" output or "config paging disable" then "show run-config" output on AireOS and use Wireless Debug Analyzer to analyze your WLC client debugs
Field Notice: FN63942 APs and WLCs Fail to Create CAPWAP Connections Due to Certificate Expiration
Field Notice: FN72424 Later Versions of WiFi 6 APs Fail to Join WLC - Software Upgrade Required
Field Notice: FN72524 IOS APs stuck in downloading state after 4 Dec 2022 due to Certificate Expired
- Fixed in 8.10.196.0, latest 9800 releases, 8.5.182.12 (8.5.182.13 for 3504) and 8.5.182.109 (IRCM, 8.5.182.111 for 3504)
Field Notice: FN70479 AP Fails to Join or Joins with 1 Radio due to Country Mismatch, RMA needed
How to avoid boot loop due to corrupted image on Wave 2 and Catalyst 11ax Access Points (CSCvx32806)
Field Notice: FN74035 - Wave2 APs DFS May Not Detect Radar After Channel Availability Check Time
Leo's list of bugs affecting 2800/3800/4800/1560 APs
Default AP console baud rate from 17.12.x is 115200 - introduced by CSCwe88390
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-10-2024 06:14 PM
Just want to add that we've also hit this bug when control-plane memory utilization is north of 45%: CSCwi78109
We've observed this bug to be present in 17.9.X and 17.12.3.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-10-2024 11:17 PM
Yes indeed... I had that one pointed out to me by our SE because they have published a SMU for CSCwi78109 on 17.12.4. I tried testing the SMU on 9800-CL in lab on Monday and it left the WLC unbootable! I recovered it from console by reverting to golden image, then deleting the SMU, then clearing install state! Haven't had a chance to have another try or on 9800-80 yet but decided to just use the workaround since we don't need nmsp enabled <smile> so proceed cautiously with that SMU
Please click Helpful if this post helped you and Select as Solution (drop down menu at top right of this reply) if this answered your query.
------------------------------
TAC recommended codes for AireOS WLC's and TAC recommended codes for 9800 WLC's
Best Practices for AireOS WLC's, Best Practices for 9800 WLC's and Cisco Wireless compatibility matrix
Check your 9800 WLC config with Wireless Config Analyzer using "show tech wireless" output or "config paging disable" then "show run-config" output on AireOS and use Wireless Debug Analyzer to analyze your WLC client debugs
Field Notice: FN63942 APs and WLCs Fail to Create CAPWAP Connections Due to Certificate Expiration
Field Notice: FN72424 Later Versions of WiFi 6 APs Fail to Join WLC - Software Upgrade Required
Field Notice: FN72524 IOS APs stuck in downloading state after 4 Dec 2022 due to Certificate Expired
- Fixed in 8.10.196.0, latest 9800 releases, 8.5.182.12 (8.5.182.13 for 3504) and 8.5.182.109 (IRCM, 8.5.182.111 for 3504)
Field Notice: FN70479 AP Fails to Join or Joins with 1 Radio due to Country Mismatch, RMA needed
How to avoid boot loop due to corrupted image on Wave 2 and Catalyst 11ax Access Points (CSCvx32806)
Field Notice: FN74035 - Wave2 APs DFS May Not Detect Radar After Channel Availability Check Time
Leo's list of bugs affecting 2800/3800/4800/1560 APs
Default AP console baud rate from 17.12.x is 115200 - introduced by CSCwe88390
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-10-2024 11:31 PM
Thanks for the tip!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-11-2024 03:35 PM
@Leo Laohoo, @Rich R, where does this leave us? is 17.12.4 worth a try, with the CSCwi78109 SMU and APSP1 for CSCwj77042 applied? Any other known debilitating issues?
I flat-out asked TAC what to do, since 17.9.5/APSP5 seems to have made the CSCwj45141 / CSCwk48338 issue worse. We've even had to reboot a bunch of 2800s over the last couple weeks.
Yet we have all these issues on 17.12. I'm flat-out scared of either 17.12.3 or 17.12.4 at this point and seriously contemplating going back to 17.9.4a/APSP8. We had to reboot our high-density 9100 series on a schedule, otherwise we didn't really have any other client-affecting issues.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-11-2024 07:35 PM
@eglinsky2012 wrote:
is 17.12.4 worth a try
Unfortunately, we do not have a choice.
At the end of the day, it all boils down to poor coding and none-existence quality control. And both factors are outside our control.
@eglinsky2012 wrote:
We've even had to reboot a bunch of 2800s over the last couple weeks.
If this works, daily/weekly reboot of the 2800/3800/4800/1560 (cold-reboot is better) and bi-yearly/yearly reboot of the controllers would be ideal. At the very best, the bugs become "familiar" and everyone has a known method to do perform the workaround. Going to 17.12.X is going to be a big risk because everyone will have to "help Cisco find bugs".
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-12-2024 02:52 AM
Agreed - I'm working on the assumption that 17.12.4 is the best of the bad lot at present.
We've had 91xx 5GHz radios silently stop responding (no errors, no logs, WLC still thinks the radio is up and working but zero clients) on 17.9.4 APSP6 requiring reboots, so you can't win either way.
Please click Helpful if this post helped you and Select as Solution (drop down menu at top right of this reply) if this answered your query.
------------------------------
TAC recommended codes for AireOS WLC's and TAC recommended codes for 9800 WLC's
Best Practices for AireOS WLC's, Best Practices for 9800 WLC's and Cisco Wireless compatibility matrix
Check your 9800 WLC config with Wireless Config Analyzer using "show tech wireless" output or "config paging disable" then "show run-config" output on AireOS and use Wireless Debug Analyzer to analyze your WLC client debugs
Field Notice: FN63942 APs and WLCs Fail to Create CAPWAP Connections Due to Certificate Expiration
Field Notice: FN72424 Later Versions of WiFi 6 APs Fail to Join WLC - Software Upgrade Required
Field Notice: FN72524 IOS APs stuck in downloading state after 4 Dec 2022 due to Certificate Expired
- Fixed in 8.10.196.0, latest 9800 releases, 8.5.182.12 (8.5.182.13 for 3504) and 8.5.182.109 (IRCM, 8.5.182.111 for 3504)
Field Notice: FN70479 AP Fails to Join or Joins with 1 Radio due to Country Mismatch, RMA needed
How to avoid boot loop due to corrupted image on Wave 2 and Catalyst 11ax Access Points (CSCvx32806)
Field Notice: FN74035 - Wave2 APs DFS May Not Detect Radar After Channel Availability Check Time
Leo's list of bugs affecting 2800/3800/4800/1560 APs
Default AP console baud rate from 17.12.x is 115200 - introduced by CSCwe88390
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-12-2024 04:29 AM
I read the daily bug reports and cry myself to sleep.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-12-2024 05:37 AM
@Leo Laohoo - I do the same for my government's tax bills - LOL !
M.
-- Each morning when I wake up and look into the mirror I always say ' Why am I so brilliant ? '
When the mirror will then always repond to me with ' The only thing that exceeds your brilliance is your beauty! '
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-12-2024 05:36 AM
@Rich R wrote:
We've had 91xx 5GHz radios silently stop responding (no errors, no logs, WLC still thinks the radio is up and working but zero clients)
Smells like CSCvx56223.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-12-2024 05:45 AM
Yes except that CSCvx56223 *should* be fixed in 17.9.4 ...
Please click Helpful if this post helped you and Select as Solution (drop down menu at top right of this reply) if this answered your query.
------------------------------
TAC recommended codes for AireOS WLC's and TAC recommended codes for 9800 WLC's
Best Practices for AireOS WLC's, Best Practices for 9800 WLC's and Cisco Wireless compatibility matrix
Check your 9800 WLC config with Wireless Config Analyzer using "show tech wireless" output or "config paging disable" then "show run-config" output on AireOS and use Wireless Debug Analyzer to analyze your WLC client debugs
Field Notice: FN63942 APs and WLCs Fail to Create CAPWAP Connections Due to Certificate Expiration
Field Notice: FN72424 Later Versions of WiFi 6 APs Fail to Join WLC - Software Upgrade Required
Field Notice: FN72524 IOS APs stuck in downloading state after 4 Dec 2022 due to Certificate Expired
- Fixed in 8.10.196.0, latest 9800 releases, 8.5.182.12 (8.5.182.13 for 3504) and 8.5.182.109 (IRCM, 8.5.182.111 for 3504)
Field Notice: FN70479 AP Fails to Join or Joins with 1 Radio due to Country Mismatch, RMA needed
How to avoid boot loop due to corrupted image on Wave 2 and Catalyst 11ax Access Points (CSCvx32806)
Field Notice: FN74035 - Wave2 APs DFS May Not Detect Radar After Channel Availability Check Time
Leo's list of bugs affecting 2800/3800/4800/1560 APs
Default AP console baud rate from 17.12.x is 115200 - introduced by CSCwe88390
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-12-2024 02:09 PM - edited 09-12-2024 03:44 PM
@Rich R @Leo Laohoo A couple updates.
Still haven't heard back from TAC on next steps, but I proceeded upgrading the pre-production controller pair from 17.9.4a to 17.9.5/APSP5/the first 3 published SMUs (I now see that there are 2 more I wasn't aware of). Then I went to 17.12.4 and the CSCwi78109 NVGEN error SMU. No more config sync issues so far on that one, however, there were a couple surprises:
1. Apparently I also forgot to commit 17.12.4 in the lab. When I went to install the SMU there, I found it back on 17.9.5. Upgrading again to 17.12.4 yielded a familiar issue: Standby rebooting due to config sync issue! Just like the other, pre-production controller. That didn't happen after the first time I upgraded it, at least not that I realized... perhaps I missed it. Anyway, I issued the below commands on both the active and the standby (in the very short window between CLI availability and the config sync issue/reboot on the standby):
delete /force /recursive bootflash:.dbpersist/persistent-config.tar.gz
delete /force /recursive bootflash:.dbpersist/persistent-config.meta-
... then rebooted the active while the standby was also starting to reboot, and the units synced successfully upon booting back up together. However, after rebooting the units for SMU (more on that below), once again, reload due to "Active and Standby configuration out of sync". I did not intervene this time. When the standby started rebooting, it rebooted again early in the boot process ("system requested reload"), after the chassis discovery. See attached abridged console output, from the initial config sync issue to the second reboot). After the second reboot, they once again synced and stayed running. I'll see if it remains stable overnight and do some test reloads tomorrow to see if they stay stable.
2. The CSCwi78109 NVGEN error SMU for 17.12.4 is NOT hitless, it's a reload SMU! The software downloads page lies:
Per the WLC, it is reload (and my WLCs did in fact reload):
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-12-2024 03:31 PM - edited 09-12-2024 07:19 PM
@eglinsky2012 wrote:
2. The ... SMU for 17.12.4 is NOT hitless,
1. Another evidence points that developers do not test their codes.
2. Always assume SMU is NEVER "hitless".
When you tested the lab WLC, did you see the bootup in console?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-12-2024 03:42 PM
@Leo Laohoo wrote:
When you tested the lab WLC, did you see the bootup in console?
Yes, that snippet I attached in my previous message was from the console. I can provide the full output from active and standby if there's interest, but it's much the same as the one I posted at the beginning of this thread.
