cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
1325
Views
13
Helpful
18
Replies

9800 HA RMI + RP upgrade issues

nikolas-pereira
Level 1
Level 1

The client has a 9800 controller in HA. Every time we update, we encounter issues with access points becoming corrupted because we purchased a batch that had this manufacturing defect. However, Cisco said that this was resolved in later versions (we had problems when updating to 17.6.4 and from 17.6.4 to 17.9.4a, with the boot corrupting on the APs.

Any chance of separating the two controllers, transferring some APs to the other one, and if an error occurs, reverting them back to the other one with the old version?

Example: WLC 1 - 17.12.3 and WLC 2 - 17.9.4a. APs from WLC 2 go to WLC 1, and if an error occurs, they revert back to WLC 2.

I would like a solution that ensures, quickly and error-free, that all access points download the correct image and validate if they will indeed be able to boot and function properly with the controller. I know the controller does this, but in practice, it always goes wrong, and corrupted images are passed on.

We have 700 access points on this controller, all 9120axi, 9120axe, 9115axi, 9124axd. We also have several remote sites, none with more than 300ms roundtrip initially, and the current environment is running with Cisco DNA Center. I want to upgrade to 17.12.3 (golden version), we are in 17.9.4a.

 
 
18 Replies 18

marce1000
VIP
VIP

 

  - This will be difficult to realize in a HA environment  , (well actually impossible ) ; better then is to switch to an  N+1 redundancy based environment, where you can select fer each AP the primary and secondary controller . 

 M.



-- Each morning when I wake up and look into the mirror I always say ' Why am I so brilliant ? '
    When the mirror will then always repond to me with ' The only thing that exceeds your brilliance is your beauty! '

I found this option interesting, and I've even thought about it before. After switching to this redundancy option, I imagine that I need to set the WLC with the new version as primary on all APs and the current version as secondary; that part is okay. After making sure that all the APs that failed and switched to the secondary version are running on the primary with the new version, will I be able to form the HA with RMA + RP again without any major issues or needing a maintenance window?

This pratice is an good option or just a "workaround" for my issues?

 

              >....will I be able to form the HA with RMA + RP again without any major issues or needing a maintenance window?
                         No , the transition when needed does take a maintenance window

  Take this with you for controller 9800 configuration management : 
         Always   have an overall checkup of the  9800 controller's configuration with the CLI command
       show tech wireless and feed the output from that into Wireless Config Analyzer
      use the full command as denoted in green , do not use a simple show tech as input for this procedure

 M.



-- Each morning when I wake up and look into the mirror I always say ' Why am I so brilliant ? '
    When the mirror will then always repond to me with ' The only thing that exceeds your brilliance is your beauty! '

What about performing the update using SMU and APSP to pre-download and execute the update function in N+1?

It seems to be an interesting solution to update the access points and avoid many problems.

https://www.cisco.com/c/dam/en/us/td/docs/wireless/controller/technotes/8-8/Cisco_Catalyst_9800_Series_Wireless_Controllers_Patching.pdf

Are there any observations?

 

  - No  , it's too complicated and will go bad (very bad) possibly , also note that SMUs and APSP apply to patches not to 
    full IOS-XE upgrades such as going to 17.12.3

 M.



-- Each morning when I wake up and look into the mirror I always say ' Why am I so brilliant ? '
    When the mirror will then always repond to me with ' The only thing that exceeds your brilliance is your beauty! '

Thank you for the fast response.

We have the same feedback about ISSU Upgrade + AP image predownload?

 

i'll share this to the customer

 

 

      - Possible  : https://www.cisco.com/c/en/us/td/docs/wireless/controller/9800/tech-notes/b_issu_9800.html
                         Referring to https://www.cisco.com/c/dam/td-xml/en_us/wireless/vewlc/issu/ISSU.xlsx

                        Check the upgrade path dependencies and always follow official Cisco documentation (for ISSU upgrades)

     M.



-- Each morning when I wake up and look into the mirror I always say ' Why am I so brilliant ? '
    When the mirror will then always repond to me with ' The only thing that exceeds your brilliance is your beauty! '

ISSU comes with its own set of problems - high risk of failure during ISSU which will require one or more reloads to recover.  If you decide to use ISSU upgrade then follow the instructions and release notes very carefully and make sure you have TAC with you on a WebEx before you start so they can help if it goes wrong.  We do not use ISSU because we haven't even been able to get it working reliably in lab so simply will not risk it on production.  We simply reload during maintenance window.

marce1000
VIP
VIP

 

 - (Added) :       >.... quickly and error-free, that all access points download the correct image and validate 
                         As far as this item it concerned : 
                                 ref  https://www.ciscolive.com/c/dam/r/ciscolive/emea/docs/2024/pdf/BRKEWN-3628.pdf
                                 >...
                                >Fix : 17.13 has a complete corruption verification and prevention system
                                  Now image is properly verified during download

                                               This of course applies to all versions after IOS-XE 17.13 too.

 M,



-- Each morning when I wake up and look into the mirror I always say ' Why am I so brilliant ? '
    When the mirror will then always repond to me with ' The only thing that exceeds your brilliance is your beauty! '

Rich R
VIP
VIP

There's a LOT been written about this and it had the Cisco developers scratching their heads trying to solve it for a long time:
CSCvx32806CSCwd90081 and CSCwc72021 are supposed to stop the AP from trying to boot a corrupt image but don't stop the image from getting corrupted during download.  It will just keep trying to download over and over until it (hopefully) succeeds ...  Those fixes are in all the latest maintenance releases which you can see in the Fixed Versions list.  But that doesn't help if you are still running an older version used to do the upgrade.

https://bst.cloudapps.cisco.com/bugsearch/bug/CSCwf09053 is supposed to be the ultimate fix for this but not showing any fixed versions yet.  The link @marce1000 provided implies this might be in 17.13 (and later) even though not documented on the bug or release notes.

Read:
https://www.cisco.com/c/en/us/support/docs/wireless/wireless-lan-controller-software/221869-safely-upgrade-access-points-avoiding-i.html
https://www.cisco.com/c/en/us/support/docs/wireless/catalyst-9800-series-wireless-controllers/220443-how-to-avoid-boot-loop-due-to-corrupted.html
https://www.cisco.com/c/en/us/support/docs/field-notices/741/fn74109.html

> I would like a solution that ensures, quickly and error-free, that all access points download the correct image and validate if they will indeed be able to boot and function properly with the controller. I know the controller does this, but in practice, it always goes wrong, and corrupted images are passed on.
- After you have upgraded to 17.12 you can use https downloads which are also mentioned in the doc Marce linked and in the config guide. https downloads should not be affected by this issue at all.

Meanwhile using a completely different method is probably your safest option until you have a fixed software version or https downloads.  If you can download the AP image to a local router, switch or server then you can use the:
ap name <AP-name> tftp-downgrade <tftp-server-IP> <filename>
command to download the image to the AP directly.  You can use an Excel spreadsheet to quickly produce the list of CLI commands for every AP.
For 17.12.4 you would use the corresponding AP image 15.3(3)JPQ3 as per the compatibility matrix (link below).  For example for 9120: https://software.cisco.com/download/home/286322988/type/286288051/release/15.3.3-JPQ3

> I want to upgrade to 17.12.3 (golden version)
- 17.12.4 is likely to become the recommended version in the next few weeks so you might want to wait for that or consider using it now

Great considerations, I really liked them.

I made a more manual action plan that seems to make sense to me:

virtual WLC - 10.0.10.11 (17.12.3)
WLC-01 - 10.0.10.10 (17.9.4a)


Update:

Starting from WLC-01

1 - Initiate a pre-download process on all Access Points connected to the WLC-01 controller, so they download version 17.12.3.

2 - Change the primary controller for the Access Points to the virtual WLC. Upon attempting to join, they will detect the correct version (17.12.3) and perform a swap to use this image.

3 - If some Access Points boot with a corrupted 17.12.3 version, they will attempt to boot five times. On the fifth attempt, they will revert to the backup partition image (17.9.4a) using the alt-boot feature.

4 - After reverting to version 17.9.4a, the Access Points will join the WLC-01 controller with the old version, allowing interventions to be performed on all problematic Access Points.

5 - Update the WLC-01 to version 17.12.3 (only when all Access Points are on the virtual WLC).

6 - Once the update to version 17.12.3 is completed and the controller is operational, initiate the process of returning all Access Points from the virtual controller to the physical controller (WLC-01).

 

 

 
 

That sounds like a workable plan but personally I would go for TFTP download from a local source instead rather than the uncertainty of possible corrupted downloads.

700 x APs would mean the controller would have to be a 9800-40. 

Where is the line that says that the 9800-40 ROMMON will be upgraded to 17.12(1r)?

Good point Leo.  @nikolas-pereira refer to:
https://www.cisco.com/c/en/us/td/docs/wireless/controller/9800/config-guide/b_upgrade_fpga_c9800.html

Review Cisco Networking for a $25 gift card