cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
1637
Views
7
Helpful
6
Replies

ISE 2.2 to 2.3 upgrade experience - URT is no guarantee of success

Arne Bier
VIP
VIP

The upgrade planning for ISE 2.2 patch 2 to ISE 2.3 patch 1 started weeks ago and I practised the upgrade more than three times in the lab.  The config was taken from production nodes and the practice was good and made me feel like I was ready for the real deal.  I even took into account all the special upgrade prep notes mentioned in the Upgrade Guide.  I thought it would be smooth sailing.   How wrong I was ...

The URT ran clean on Wednesday - no errors

Thursday we make one change to a TACACS Policy Set (in prep for upgrade: the patch 1 issue)

Secondary Admin failed to upgrade as a result.  Turns out there was a Java NULL Pointer issue in the upgrade code when processing that Policy Set.  I deleted the Policy Set in the interest of time and upgrade of Sec PAN proceeded.

Primary MnT next - upgrade failed due to some ORACLE HOST_CONFIG file.  This time the automatic rollback failed and it told me I had to rejoin the node to the 2.2 PAN.  After re-join the upgrade issue magically fixed itself.  So far I think I am styling, not having to involve the TAC with my little issues.

First PSN upgraded next.  Reboots.  Takes ages for application restart.  I do show disk and find out / filesystem is 100% full. No more SSH connections.  This time call the TAC with my pre-prepared TAC case so I could get into the Sev 1 queue.  TAC engineer gives us the special ISE recovery .iso to boot the VM.  This is when I discover that the Cisco OVA doesn't include a CD-ROM in the profile.  After some wranglings with the VMWare engineer I persuade him that we can add it in with all the Cisco blessings to not mess up the holy OVA.  We boot the .iso but TAC can't get on due to customer not allowing vendors remote access via WebEx.  Bad news and I felt sorry for the TAC at that point because the engineer wanted to see what was causing the disk issues.  In lieu of that, the TAC points me to the ISE 2.3 upgrade guide CCO doc that states that a PSN can take up to 4 hours to upgrade and that I just needed to be patient.  I disagree because the URT told me it would take 50 minutes.  Plus, how well does a Linux OS work when its root partition is full?  The CLI was unresponsive and refusing new connections.  This is a very small deployment and hardly any traffic on it.  The PAN and MnT upgraded within the time scales as predicted by the URT.

I try another PSN and the same thing happens.   Post upgrade reboot, 100% disk full on root partition.  Mind you, these VM's were originally built using the large CPU, 200GB OVA's.

I left the office with two dead PSN's - unless this is normal behaviour and by tomorrow it will have fixed itself?  Luckily I have one working PSN that is still handling live traffic.   I will most likely have to rebuild two PSN's and then roll the dice again.

The URT is a great idea.  But it brought home an important point, that it only checks the health of the PAN. What happens on the remaining nodes is anyone's guess.

#notimpressed

6 Replies 6

paul
Level 10
Level 10

This is why I again stress that the fresh build/restore method is the best method for any ISE upgrade. It is the only method I use and have done 20-30 upgrades using it.

When you try using the upgrade path either in the GUI or the CLI you are setting yourself up for failure.  If it works that is fine, but if something breaks along the way good luck getting yourself our of it. 

Run the URT tool to make sure it runs clean, kick the secondary admin node out of the deployment, fresh build to 2.3, restore data, validate things look good, then fresh build each node to 2.3 and join them to the 2.3 admin node.

I have never had an issue with this method of upgrade and you have total control of the process.  You aren't relying on pushing a button in a GUI and hoping for the best or hoping the CLI upgrade method doesn't bomb somewhere.

Paul I must have missed your earlier postings on that subject.  I had no idea that one could restore a 2.2 backup on a 2.3 fresh build?

I'll keep that in mind.  At least I can reliably build ISE nodes these days.  Just have to pray that the backup restore actually works.

You can restore any data from a version listed in the release notes as a direct upgrade path. So for 2.3 if you are on 2.0 or later you can restore a backup to a 2.3 node. When you restore the data will automatically get converted to the new version. If you run the URT tool just before the upgrade, then do a backup, you should have no issues with the restore.

Actually my normal method is to build a temporary VM running the new version of code and restore data to it. That VM becomes my anchor point for the new version. I can test against it make sure things work. Once everything is validated I simply rebuild each node and join it to my temp VM. At the end you just play a shell game moving personas around back to the way they were and kick out the Temp VM. The temp VM now becomes my lab box.

If you are doing traditional licensing you may need to do some rehosting as you do the upgrade, but Cisco licensing is very quick to respond.

Paul Haferman

Office- 920.996.3011

Cell- 920.284.9250

I'll run that by the customer tomorrow.

We're in the process of moving Traditional Licenses to Smart Licenses.  So that would be of benefit for such an upgrade style.

I like it.

thanks

Yep I try to move all my customers to Smart Licensing to make thing easier.

Also don’t worry about changing things after the OVA is loaded. The key part of the OVA is the reservations it sets up for memory and CPU. If the customer uses the OVA (I like the .iso for total control of provisioning), I always tell them to delete all the E1000 NICs it puts in and add back on VMXNET3 NIC which is more modern NIC driver to use in VMWare and is supported by ISE.

I have concluded that upgrading ISE nodes is a big no-no.

There should be a big warning sign in the Upgrade Guide to steer people away from this.

I gave up trying to upgrade my PSN nodes because they would crash with root file system 100%.  Even newly deployed 2.3 .OVA would die when registered into the new 2.3 deployment - same drama - root file system 100% full.  When I asked around, the general consensus seems that nobody upgrades their nodes, even as far back as 1.2 release.  Do I feel like the fool now...  and, this unwritten wisdom is apparently known to Cisco engineers too.

I don't have an explanation for the failure, because the upgrade worked in my lab - but in my lab I used an .iso deployment, and not an .ova deployment.   My gut feel says that there is a problem with the .ova deployed nodes that would cause an upgrade to fail.

Busy rebuilding my entire deployment from scratch, just as Paul suggested.  Not got as far as restoring the config backup but I am hopeful that it will work using this technique.