cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
8386
Views
42
Helpful
16
Replies

9800 ISSU upgrade problem

schulcz
Level 1
Level 1

Hi Guys,

I tried to upgrade a HA pair (SSO) from 17.3.5b to 17.6.3.

Process started, standby wlc downloaded image and new image predowload for APs was successful.

After that, the standy wlc starting to reboot, and after it comes back, log on active member shows this:

 

 

ERROR: MCL-Terminating the current ISSU Operation. ISSU Abort operation is initiated
ERROR: Once ISSU abort is done please check full list of mismatched commands via:
ERROR: show redundancy config-sync failures historic mcl
ERROR: install_activate  exit(2 ) Mon Sep 12 20:15:47 CET 2022

 

 

Standby reboot again and comes back with the original software, so it reverted back to the original state. I check "show redundancy config-sync failures historic mcl" on active, and shows username xxx priv 15 secret xxx rows from the config.

Wondering can I login to standby, tried it, I can't, it shows authentication failed. I added users on active again, after that I can login on standby also.

What cause the problem and what can I do? Is the ISSU forgettable?

16 Replies 16

Planning to do a similar upgrade 17.3.5a to 17.6.4 on a 9800-80 (SSO) pair soon. will let you know If I experience any issues like it?

I think TAC may be the one who can have a look at your logs & advices. If you haven't reach out to TAC try that path

HTH
Rasika

Leo Laohoo
Hall of Fame
Hall of Fame

@schulcz wrote:
What cause the problem and what can I do? Is the ISSU forgettable?

Pleas stop believing in the hype about ISSU, FSU/eFSU/xFSU -- They only work in labs and/or in "corner cases".  In production, they usually fail.  

I'm beginning to believe you're right, I've never had success on any platform when using the ISSU method.
Then what do you think, what is the recommended upgrade method for a HA SSO pair to minimalize downtime as possible?

There is a feature in 17.6.3 that I like:  Administration > Software Management > Software Upgrade. 

The feature is "Enable Hitless Upgrade" (N+1 Upgrade)

When this feature is enabled: 

  1. The Active controller will instruct the APs with ZERO clients to move to the Secondary controller.
  2. Secondary controller, through Mobility Group, will tell the Active controllers the APs that have joined the Secondary Controllers. 
  3. Active controller will start moving clients to the Secondary controller until all APs are moved to the Secondary controller. 
  4. Active controller reboots to new version. 
  5. Secondary controller moves APs with ZERO clients to the Active controller. 
  6. Secondary controller moves clients to Active controller until all APs are moved to Active controller. 
  7. Secondary controller reboots to new firmware 


@schulcz wrote:
what is the recommended upgrade method for a HA SSO pair to minimalize downtime as possible?

Bite the bullet.  Organize for the WLC and APs to reboot outside business hours, like 5:00 am.

And use one-shot Install Mode: 

install add file bootflash:C9800-80-universalk9_wlc.17.03.05b.SPA.bin activate commit 

Be aware that 17.3.6 is about to drop in two weeks.  

Arshad Safrulla
VIP Alumni
VIP Alumni

@leo Hitless upgrade works only with N+1, not in SSO as OP's WLC's are in HA SSO.

Is both of your WLC's are in install mode? Did you check whether the SSO state has been achieved before you started the ISSU?

As per the release notes "

  • Controller upgrade from Cisco IOS XE Bengaluru 17.3.x to any release using ISSU may fail if the snmp-server enable traps hsrp command is configured. Ensure that you remove the snmp-server enable traps hsrp command from the configuration before starting an ISSU upgrade because the snmp-server enable traps hsrp command is removed from Cisco IOS XE Bengaluru 17.4.x."

But I agree with Leo as I have tested this in my Lab with positive results, but never on production as I never had successful ISSU upgrades with other Cisco devices/platforms (not 9800).


@Arshad Safrulla wrote:
@leo Hitless upgrade works only with N+1

I did say "The feature is "Enable Hitless Upgrade" (N+1 Upgrade)".


@Arshad Safrulla wrote:
but never on production as I never had successful ISSU upgrades with other Cisco devices/platforms (not 9800).

Believe me when I say, "you're not missing much".  

Cisco has never perfected ISSU, FSU/eFSU (and now xFSU) in classic IOS.  With the rate of garbage codes churning out, it is worst.  This is why there is a strong "push" from customer-facing Cisco staff because the developers themselves cannot be "bothered" to beta test their own codes.  

JPavonM
VIP
VIP

My recomended manual way of doing it is the next (too much interaction but ISSU always fails to me):

  1. install new code in primary and secondary WLCs (no activate, no commit)
  2. pre-downlaod code on APs
  3. activate new code on secondary WLC and commit
  4. move AP to secondary controller (AP will auto-swap partition when checking WLC version)
  5. activate new code on primary WLC and commit
  6. move APs to primary WLC

schulcz
Level 1
Level 1

Thanks for the replies, finally I tried this method on active WLC:

1. Copy image to flash
2. install add file bootflash:C9800-40-universalk9_wlc.17.06.03.SPA.bin
3. sh install summary -> seems everything OK
4. ap image predownload
5. show ap image -> waiting while every AP downloads image
6. ap image swap
7. ap image reset
8. install activate

After that, WLCs starting to do upgrade and asks for reboot, I have chosen yes.

I was connected WLCs by console also, so I saw the boot process. I saw these lines during boot: (a little worried)
% Ambiguous command: "username user1 privilege 15 secret 9 xxxxxxxxxxxxxx type mgmt-user"
% Ambiguous command: "username user2 privilege 15 secret 9 xxxxxxxxxxxxxx type mgmt-user"
% Ambiguous command: "username user3 privilege 15 secret 9 xxxxxxxxxxxxxx type mgmt-user"

After boot was finished, I tried to login WLCs without success. I tried all 3 mgmt users, none of them works.

I opened a TAC case, TAC says I need to made a password recovery. It is not clear what caused the error, but it really pushed me over the edge because the WLCs are at a remote site.

If there was a console attached to the WLC, please provide the entire console output during the bootup. 

Next, why 17.6.3 when 17.6.4 was released a few weeks ago?

That is not good, I understand you lost those configured usernames & could not access WLC after upgrade. Could be due to "type mgmt-user" part is not supported in 17.6.x, not sure why it is configured like that in the first place.

Did your WiFi AP comes good & no other problems resulted by this upgrade ?

Rasika

Rich R
VIP
VIP

I just tested that on 17.6.4 - "type mgmt-user" is not supported - that's what is causing your problem.
I've never used that config though - have you used the config analyser? (see any of @marce1000 's posts for instructions)
As far as I can tell that option is only supported with the user-name user1 approach:

9800(config)#user-name user1
9800(config-user-name)#?
aaa AAA directive
access-class Restrict access by access-class
algorithm-type Algorithm to use for hashing the plaintext secret for the user
autocommand Automatically issue a command after the user logs in
callback-dialstring Callback dialstring
callback-line Associate a specific line with this callback
callback-rotary Associate a rotary group with this callback
common-criteria-policy Enter the common-criteria policy name
creation-time User creation time
description Set description
dnis Do not require password when obtained via DNIS
exit Exit from username sub-mode
mac This entry is for MAC Filtering where username=mac
masked-secret secret input will be masked on screen and will be converted to type 9 by default
nocallback-verify Do not require authentication after callback
noescape Prevent the user from using an escape character
nohangup Do not disconnect after an automatic command
nopassword No password is required for the user to log in
one-time Specify that the username/password is valid for only one time
password Specify the password for the user
privilege Set user privilege level
secret Specify the secret for the user
type Specify the type of user
user-maxlinks Limit the user's number of inbound links
view Set view name
wlan-profile-name Profile name associated with MAC-address

9800(config-user-name)#type ?
default User type is default
lobby-admin User type is lobby-admin
mgmt-user User type is mgmt-user
network-user User type is network-user

 

Based on this thread I reached out to TAC to see if any other known issues with this process prior to our 9800-80 ISSU upgrade. They suggested me to remove following configuration lines from 17.3.5a prior to ISSU upgrade

-snmp-server enable traps hsrp
-snmp-server enable traps wireless bsnMobileStation bsnAccessPoint bsnRogue bsn80211Security bsnAutoRF bsnGeneral MESH wireless_mobility rogue RRM SI

In our configuration we did not have "type mgmt-user" at the end of username configuration, which I believe caused the main issue in your case when you do it manually. We had it in following format.

username <> privilege 15 secret 8 <> 

With that we have gone ahead with ISSU upgrade and did not experience any issues, process went smooth everything as expected. It took little over 10min to sync, once the standby first upgraded to 17.6.4.

Peer Processor Information :
----------------------------
Standby Location = slot 2
Current Software state = in progress to standby cold-bulk
Uptime in current state = 11 minutes

HTH
Rasika
*** Pls rate all useful responses ***

Thanks for that update @Rasika Nayanajith - glad yours went well but unlikely that we will risk it after too many bad experiences with it in the past - it's just too fragile for my liking.

Getting Started

Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community:

Review Cisco Networking products for a $25 gift card