cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
1008
Views
16
Helpful
7
Replies

9800 17.9.x APSP SMU install bug and workaround

jasonm002
Level 1
Level 1

It is possible in 17.9.x to get a 9800-40 or 9800-80 in an HA SSO pair (with RMI) to have a previous SMU version APSP being distributed to APs as the active image, so for example I'm on 17.9.3 but APs are downloading 17.9.1 and joining the controller with 17.9.1, which is an invalid state and will cause them to repeatedly disconnect and rejoin the controller on a timer.

The exact trigger for this - I don't know, but I hit it on two separate pairs by doing roughly this (I might be leaving out some steps that I can't remember):

Upgrade to 17.9.1

Install 17.9.1 APSP SMU

upgrade to 17.9.2 without removing 17.9.1 APSP SMU

install remove inactive

upgrade to 17.9.3

install remove inactive

attempt to install 17.9.3 APSP1 SMU - install throws error that it can't find the 17.9.1 APSP SMU from a previous rollback point that it expects to be there

put the 17.9.1 APSP SMU files back on active and standby (both running 17.9.3)

attempt to install 17.9.3 APSP1 SMU, install appears to succeed, but when I do "show image file summary" I see in the AP image active list that... 17.9.1 APSP code is the active AP image.

I had a TAC case open on this but I've not really the time to try to get the DE to accept it as a bug through TAC, so the workaround TAC gave me was to make sure "service internal" is configured and then do "clear install state" in exec mode, which will destroy all the SMUs and rollback points and reload the boxes.

So just FYI - if anyone else runs into this, seems like SMUs are still bugged in general as of 17.9.x so just service internal and clear install state and reload if you're getting errors on install about previous SMUs not being there.

 

 

 

7 Replies 7

Leo Laohoo
Hall of Fame
Hall of Fame

That is not how SMU, APSP and APDP work. 

SMU, APSP and APDP work on specific firmware version.  SMU, APSP and APDP made, for example, 17.9.1, will only work on 17.9.1.  If the WLC is upgraded to 17.9.3, all the 17.9.1 SMU, APSP and APDP will be deleted and removed. 

If you want a specific SMU, APSP and APDP "ported" from one version to another, your Cisco AM/SE will need to be engaged because it will require some coordination for a developer to re-write a SMU, APSP and APDP.

This is definitely not how it's supposed to work, but it's bugged at the moment and you can definitely get a 9800-40 or 80 into the state where it will make an old APSP image active and download it to APs on a version that it's not intended to be compatible with. I had a TAC case open on this and they were able to confirm that they have multiple cases where this has happened and the only thing they can do is offer people the workaround of enabling service internal and then clear install state (which reboots the WLC). 

I don't have the time to work with TAC and be Cisco's free QA unfortunately, so I just closed the case after receiving the workaround. 

TAC mentioned that the reason why the old SMUs need to be on the flash is because there are rollback points that reference those SMUs. Fine, but in this case we clearly see a couple of bugs.

Bug #1 is that the exec command "install remove inactive" deletes these SMUs even though they're being referenced by rollback points in the system. Install remove inactive should check all rollback points for the existence of an SMU and not delete it from the flash if it's being referenced by a rollback point. In my case this is what one of the rollback ids looked like that contained the old SMU:
9800#show install rollback id 6
Rollback id - 6 (Created on 2022-09-14 05:28:18.000000000 -0400)
Label: No Label
Description: No Description
Reload required: YES
State (St): I - Inactive, U - Activated & Uncommitted,
C - Activated & Committed, D - Deactivated & Uncommitted
--------------------------------------------------------------------------------
Type St Filename/Version
--------------------------------------------------------------------------------
IMG C 17.09.01.0.178
??? ? bootflash:C9800-universalk9_wlc.17.09.01.CSCwc82827.SPA.apsp.bin  

The file was missing because "install remove inactive" allowed this file to be deleted even though it was being referenced by an active rollback point, which should not happen.

 

Bug #2 is that at least in 17.9.3, there is definitely a sequence of upgrade events you can follow where it will have the 17.9.1 APSP SMU code in the active AP image list, and it will download 17.9.1 code onto the APs even though it's running 17.9.3. This might be a side effect of bug #1 but they should really add some more checks in the code such that if the AP code in the AP image active or prepare list (exec: show ap image file summary) doesn't match the version on the controller - it doesn't actually get downloaded onto the APs.

 

Thanks @Rich R & @jasonm002 for the updates.  

I'll keep an eye out for this behaviour.  

Rich R
VIP
VIP

No Leo I agree with @jasonm002 - I've seen the same thing happen where the install script does NOT properly remove the previous version SMU when you upgrade and then continues to cause problems forever after (it seems to keep reference to it in some kind of registry even though the file is physically removed).  My own conclusion, which I believe agrees with Jason, is that the only safe thing to do is uninstall the SMUs and SPs before starting the upgrade.  I saw this with 17.6.

I didn't have time to open a TAC case (which can take weeks these days for even the most trivial problem) so at least I know the workaround for future thanks Jason.  I think I downgraded, then removed the SMU, then upgraded again to work around it.

Nobody would want a SMU or APSP ported to a new version anyway because those fixes are normally already in the later release.


@Rich R wrote:
I've seen the same thing happen where the install script does NOT properly remove the previous version SMU when you upgrade and then continues to cause problems forever after (it seems to keep reference to it in some kind of registry even though the file is physically removed).

SMU can be removed before or after.  

I use the command "sh version" or "show install summary" to get the file that just won't go away. Then use the command "install remove file bootflash:filename" to remove the infernal file.

I can assure you Leo - that does NOT work when this happens.  I tried it multiple times in every possible way.  The file is removed but the reference to it in the install registry is not and the install script is incapable of resolving that.  You either have to revert to the previous software with re-installed SMU, then remove it manually before upgrade or use the workaround Jason provided.

I never spent enough time to work out exactly what combination of install commands and/or SMU type triggers it (because all the installs and reloads are VERY time consuming) but it is definitely a thing, and install remove definitely does not solve it.  Just recovering from it wasted enough of my time. Thinking about it now, it might even have been related to trying to use ISSU but my memory of it is a bit faded now.

I've seen this sort of "sticky" or "stuck" SMU in 16.12.X.  Initially, I just ignored them but from 16.12.6, we've gradually noticed the bootup times is slightly longer. 

This is when we found out about the command "install remove file bootflash:filename" which successfully removed the stuck SMU. 

Review Cisco Networking for a $25 gift card