cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
9777
Views
31
Helpful
23
Replies

9800 ISSU behavior

eglinsky2012
Spotlight
Spotlight

I'm attempting an ISSU upgrade from 17.9.3 to 17.9.4 on 9800-80s. In the lab, it worked well, as expected. The process installed the image to both active and standby, then predownloaded the image to the APs. The predownload failed on the 2700, but life went on. Then the ISSU process stopped, and in the GUI, there was a button to continue the upgrade (I forget exactly what it said). Once I clicked that, the standby rebooted. It took a long time for SSO to a terminal status (there was an error in the logs about software mismatch), but after 15 minutes or so, the active finally rebooted. Then the APs did staggered reboots. Once ISSU was complete, the 2700 that had failed the predownload downloaded its new image and rebooted, which is good.

I was happy with how it went in the lab, so I tried on a production WLC which currently has no APs associated. Upon downloading and installing, it went through the whole process, not stopping after predownloading and rebooting the standby. It got stuck at the "upgrading standby" stage; the standby kept rebooting every few minutes (I assume due to SSO sync failure but didn't think to check), so I did a "reload" on the active WLC, thinking sync would be complete after the reboot when they would be on matching versions. After the reboot, sync completed, but the ISSU process was stuck on "upgrading active" even after an hour + of waiting. So, I did a "redundancy reload shelf," and both WLCs were on 17.9.4 and in sync, but ISSU was stuck at the "upgrading active" step still and the commit timer was still running. So, I did an ISSU terminate, back to 17.9.3.

I tried the process again, this time with a single AP associated, and the process did finish cleanly, however, it again continued through the rebooting and not stopping at predownloading. This is undesirable, since I want to do the install and predownload one day and the reboots the next day. The behavior in the lab would make that possible, but not the behavior on the other one.

My question is, what's the normal behavior? Do any of you who have tried the ISSU process had good results, or is it glitchy?

23 Replies 23

I just completed firmware upgrading about 80 x 9500 (stand alone & VSS) to 17.9.4a (Install Mode) & without using DNAC, ISSU, FSU/eFSU/xFSU.  

I uploaded the firmware (and unpacked the packages) back in December 2023 and scheduled the reboot for 03 January 2024.  

No failure.  

eglinsky2012
Spotlight
Spotlight

For those of you who have had to use the "clear install state" command in the past, was that specifically to resolve/avoid ISSU problems? Or should that be ran ahead of a standard install mode upgrade as well when there is an APSP, APDP, or SMU installed? I'm preparing for my first upgrade off a version in which I have both an SMU and an APSP installed and wondering if I need to do the "clear install state" and wait for the resulting reload or not.

We had to use the clear when just trying to upgrade an SSO pair from 17.6.6a to 17.12.3 because the install was failing for no obvious reason (worked first time after the clear)!

We're planning others 17.9.4 + SMUs + APSP -> 17.12.3 and no final decision yet on how we'll do it.  It means APs will be forced to do multiple downloads if removing the APSP first but if we do the clear then we won't be able to install and pre-download as normal! Might have to force the APs onto N+1 WLC while upgrading their normal primary, manually load the new image to the APs then swap image and push them back to the upgraded WLC and then do the N+1.  Either way it's messy!  We don't even consider using ISSU anymore, just too much extra hassle/risk for us.  When someone can tell me it works first time, every time, with no problems I'll reconsider.

Whether this is the first, second, third time doing ISSU, always raise a proactive TAC Case.  Make sure TAC agent WebEx into the session with the WLC and recording it.  

If something eventually fails, the TAC agent's job is to immediately intervene and stabilize the network.

When ISSU fails and the two units go into a "ping pong" state, it can be a soul destroying moment watching the network slowly grind to a halt.  However, no one can hear the scream/cry on the drive (alone) to the site to yank the power out of the offending unit.  

eglinsky2012
Spotlight
Spotlight

@Rich R, @Leo Laohoo - Thank you for the advice.

Leo, sorry I didn't make it clear, but I am not considering an ISSU upgrade. (I guess I should have started a new thread for this question.) Rather, I plan a regular SSO upgrade with predownload. I'm just not sure if I should plan to do a "clear install state" ahead of time, or if that was only a problem with ISSU. Rich said it was also a problem with a regular upgrade for him. I'd rather not have to do the clear for reasons Rich outlined (extra reload, no predownload after the clear install state and reverting back to 17.9.4a with no APSP). I guess I'll take my chances without the clear install state, and if I have to use it, I may just have to install the APSP another day if I run out of time during the maintenance window.

Also, Rich, why are you going with 17.12.3 over 17.9.5? I'm a fan of MD over ED in general and bigger numbers at the end, but I do remember seeing some conversation about how 17.12.3 solved some issues from the earlier 17.9 versions. Are those not resolved in 17.9.5? Another consideration is it doesn't look like there any APSPs for 17.12.3 (maybe since it's not an MD?), so we might not get the latest AP bug fixes on that version until 17.12.4 comes out.

> Also, Rich, why are you going with 17.12.3 over 17.9.5?
- Crash on lab WLC running 17.9.5 while doing almost nothing (1 or 2 APs) which TAC couldn't satisfactorily explain
- Seeing new bugs getting fixed in 17.12 but not 17.9 and Cisco actively avoiding fixing things in 17.9 unless pressed.  I expect this is in the lead up to 17.12 becoming the preferred release before 17.9 goes end of software maintenance on 30 March 2025.
- The succession of new issues (regressions) popping up on 17.9 - reminiscent of what we saw happening on 17.3.  I get the impression that when they try to backport new fixes to older code they are more likely to introduce regressions which might also explain their apparent reluctance to fix things in 17.9.
- General reports of 17.12.3 being a good choice so far.  It also provides some new features which we could find handy and is a good upgrade path to 17.15 (currently in beta) which will be required for the upcoming WiFi 7 APs. 

* Note these are my personal opinions.

(Just to add to @Rich R response to "why 17.12.3".)

We have moved all of our production 9800-80 (Qty:  6) and test controllers (9800-LF x 4) to 17.12.3.  We have decided not to dilly-dally with 17.9.X.  Too many things that go "kaboom" in the night does not sit well with us (too many TAC cases).  And when 17.9.6 (and anything 17.9.X) is out, I will ignore it.  

I would also like to share something to everyone:  https://imgur.com/a/ePuUnR9

This is one of our production 9800-80 on 17.12.3 (uptime of 5 weeks) and have 3050 number of APs.  The "inflection point", around 16:00 AEST of 16 May 2024, is when the number of APs went above 2900.  

Because of this graph, we have now come to the conclusion that the 9800-80 can only support up to 3000 APs.  


@eglinsky2012 wrote:
Rather, I plan a regular SSO upgrade with predownload. I'm just not sure if I should plan to do a "clear install state" ahead of time, or if that was only a problem with ISSU. Rich said it was also a problem with a regular upgrade for him. I'd rather not have to do the clear for reasons Rich outlined (extra reload, no predownload after the clear install state and reverting back to 17.9.4a with no APSP). I guess I'll take my chances without the clear install state, and if I have to use it, I may just have to install the APSP another day if I run out of time during the maintenance window.
conf t
 service internal
end
clear install state
conf t
 no service internal
end

The command is only when ISSU has failed.  Otherwise, don't use it.  

Please have a look at the PDF attached.  

 

The command is only when ISSU has failed
In fact it will fix any corruption of the install database, not just ISSU failures.  That includes the problems with SMU and APSP not being correctly removed and random corruptions like what we encountered with the 17.6.6a to 17.12.3 upgrade (no SMU or APSP were installed).

The only way to reliably prevent the SMU and APSP from leaving the database corrupt is to either:
1. Deactivate and remove each before upgrade or
2. Use the clear command

Option 1 means APs will reboot and download after APSP is removed and WLC will reboot if the SMU is a reload SMU so can take a while and involve AP and WLC reloads.
So sometimes it's just quicker and easier to do a clear then upgrade.

Of course none of this would be necessary if the install scripts just worked correctly and didn't leave the install database corrupted!  I live in hope that they will eventually work properly <smile>  On the plus side the "install remove inactive" seems to be about 10 times quicker from 17.9 so they've clearly made some improvement.  On older versions I repeatedly had my session timeout after waiting for the prompt resulting in me having to attempt it a few times before succeeding!

Review Cisco Networking for a $25 gift card