cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
4997
Views
27
Helpful
23
Replies

Stability of ISE version 3.2?

I am testing out the upgrade on a single node ISE 3.0 patch-7 to ISE 3.2 patch-2.  When I run the preparation using ise-upgradebundle-2.7.x-3.1.x-to-3.2.0.542b.SPA.x86_64.tar.gz, it fails at "Configuration at Data Upgrade".  To be clear, this node is up and running just fine with ISE 3.0 patch-7 with no issue when using the ISE 3.1 upgrade bundle ise-upgradebundle-2.6.x-3.0.x-to-3.1.0.518b.SPA.x86_64.tar.gz.  

I also rebooted the ISE several times prior to running the preparation check with ise-upgradebundle-2.7.x-3.1.x-to-3.2.0.542b.SPA.x86_64.tar.gz but it fails every time.  This is what I am seeing in the log:

Fri May 12 10:03:13 UTC 2023 : Changing host config entry to standalone...
Fri May 12 10:03:13 UTC 2023 : ORACLE_SID : cpm11
Fri May 12 10:03:13 UTC 2023 : NODECNT :
Fri May 12 10:03:13 UTC 2023 : - Successful
Fri May 12 10:03:13 UTC 2023 : - Successful
Fri May 12 10:03:13 UTC 2023 :
Fri May 12 10:03:13 UTC 2023 : runDBClone method finished executing
Fri May 12 10:03:13 UTC 2023 : triggerUpgradeOnClonedInstance method started executing
Fri May 12 10:03:14 UTC 2023 : Modifying upgrade scripts to run on cloned database
Fri May 12 10:03:14 UTC 2023 : - Successful
Fri May 12 10:03:36 UTC 2023 :
Fri May 12 10:03:36 UTC 2023 : Running schema upgrade on cloned database
Fri May 12 10:06:59 UTC 2023 : - Successful
Fri May 12 10:06:59 UTC 2023 :
Fri May 12 10:06:59 UTC 2023 : Running data upgrade on cloned database
Fri May 12 10:07:52 UTC 2023 : - Failed
Fri May 12 10:07:52 UTC 2023 : ConfigDBUpgrade : Performing Clean-up
Fri May 12 10:08:11 UTC 2023 : ConfigDBUpgrade : Clean-up Completed

 really don't want to open a TAC case because it will take forever to get the case to a TAC engineer to help me with this issue, weeks and months rather days.

This makes me question the stability of ISE 3.2.  Thoughts?

23 Replies 23

Everyone will have their own horror stories but that doesn't mean there is a fundamental problem with that version of ISE. I have good news so far (touch wood).

I built a customer two node guest wifi solution recently, and granted, it's just RADIUS and Portals, but it's working beautifully. The only issue I had so far has been with SNMPv3, which is a real developer hack-job. It seems to be a bit better in ISE 3.3.

I see a real use-case for the new API in ISE 3.3 now. In an ideal world, I would like to have the entire ISE VM build process, cert creation and deploy, and config in something like Ansible. The Cisco TMEs (Charlie and Thomas) have great YouTube videos on that - it's an eye-opener!  The dream seems to be possible now. I think the initial creation of such a playbook would be hard work, but in the end, we could blow away any version of ISE, and rebuild from ground zero. Never again having to restore from a config backup (which I believe is partly to blame for issues seen in ISE deployments). The other culprit in VM deployments is potentially VM Snapshots (happening without our knowledge or consent) and live vMotions (again, happening without our knowledge). Throw in the odd power failure here and there (disk corruption) and you have a nice mess in the database.

How do I know this?  99% of the time when I have gotten to the end of a TAC case, and the engineers run the CLI root access, they invariably fix something in the Oracle database.   

My gut feeling is also that, under the hood, ISE is a pile of cobbled together Java code that is haemorrhaging errors at an alarming rate. I am amazed it even works.  And it's disappointing to thinks how inefficient the whole thing is (why does it take 10 min to start application services with all that horsepower??). Look at the logs of a system that is "running smoothly" (as far as you're concerned) and you'll be amazed at how many ERROR messages you see logged 24/7. 

Let's not forget that despite all that, when we're working in the Policy Set and making new Profiler Policies, etc, ISE is a great tool. When it does the job, it does it well. And if we find issues then we should report them to TAC to allow them to fix it.

Hi Aren,

personally I had been able to migrate our AAA servers from ACS 4.x to ACS 5.3 then to ISE 2.3, ISE 2.4 . Every time headaches seem to increase along with the release number. I am stuck with 3.x migration since last December. First I tried with at the time reccomended version that was 3.1, I had to open 6 SR to TAC, two of which were blocking ones. At the beginning of September, after waiting for the TAC 8 months  to solve a minor issue with  3.1, I decided to upgrade to the new reccomended release before actual migration from 2.4 . The very next day I had to open other 2 SRs , one of which was identical to one I had opened for version 3.1 . Please note that we are using ISE just as radius server to perform basic wired/wireless 802.1x authentication and wireless guest authentication.  I can't even imagine how many SRs I would have had to open if we used advanced feature as PIXGRID or profiling. Yes ISE is a good product and actually we did not meet any critical issue with 2.4 but maybe It would be the same with freeradius or Microsoft NPS ...

 

 

Arne,

my two  cents: IMHO Cisco is actually rewriting ISE from scratch but the new version will be sold as SAS or will be certfied only on Azure, AWS or google cloud.

Regards

Hi,

I upgrade from 3.1 last patch to 3.2 patch 3 with "full upgrade procedure" since the NADs are still pointing to our 2.4 deployment . So far I meet three issues:

1)mnt reports stopped to work, I am not able to say whether it was because of the upgrade or because the restore of the previous MNT database but reports show just pre upgrade data

2)live logs seem to register just session events due to accounting packets but not auth events even if some fail attempt   seems to be logged randomly

3) For sponsor portal we hit this bug https://bst.cloudapps.cisco.com/bugsearch/bug/CSCwd97582 we alredy met in 3.1 and that was actually solved.

Of course if your nodes are not dual homed you have not to worry about 3).

 

Regards

MM 

 

UPDATE

1)mnt reports stopped to work, I am not able to say whether it was because of the upgrade or because the restore of the previous MNT database but reports show just pre upgrade data

I was able to solve the issue resetting M&T database, that means delete all TS data. Unfortunately the issue is related to the operational backup from ISE 3.1: as soon I restore it on one of the MNT the issue comes back gain. So the options are

1)Keep old deployment MNT up to have operational data report before the migration

2)Renounce to have failed/successfull authentication report

If one upgrade directly the production deployment I'd suggest to de-register one of the two MNTs and use it as stand alone node to have access to pre migration operational data.  

 

I'm in the same boat here. I'm about to upgrade from 3.1 patch 6 to 3.2, but I'm really concerned after reading all the comments here. The 3.1 version is not the greatest, and I've ran into a couple of issues that had been fixed with workarounds, but it is working overall.

Does anyone know, based on experience, if 3.2 is somewhat more stable than 3.1?

Hi @GFernandez07 ,

 1st: " ... Cisco ISE, Release 3.2, has parity with the Cisco ISE patch release: 2.7 Patch 7​, 3.0 Patch 6,​ and 3.1 Patch 3, ... "

 2nd: 3.1 Patch 6 was released on 20/Mar/2023, 3.2 Patch 2 was released on 09/May/2023 and Patch 4 is the latest release (19/Oct/2023)

 3rd: IMHO, creating your "Production LAB" is the best way to check the stability of a new ISE version in your own Production Environment ... I'm having different experience with different Customers on the same "from/to" version of ISE.

Hope this helps !!!

Arne Bier
VIP
VIP

I feel that the more we discuss these upgrade issues, the more we're becoming desensitized to this topic, and eventually so worn down by it that we just accept that it will happen. That's how I feel anyway. When I upgrade my Windows or iPhone I don't give it a second thought. Imagine having to worry if you can login after a patch/upgrade or whether your phone can still send an SMS after upgrading to iOS 17?  Seriously.

There is no excuse for Cisco to keep breaking the fundamentals of ISE (RADIUS/802.1X/TACACS+) - but this is an ongoing theme. The customer can't get access to the Linux CLI to even help themselves when things go wrong.  We are beholden to the TAC and their slow processes to fix issues.

My solution to this is to never upgrade ISE in the traditional way (using the upgrade bundle). Sad but true, I have chosen to always build new VMs (or re-image the SNS) because the upgrade process is such a dumpster fire. Maybe it will work better in ISE 5.x - but by then we'll all have our Ansible Playbooks ready to deploy our ISE as "Infrastructure as code" - no need to upgrade.

My personal view is that ISE has become a sprawl of spaghetti code that is now so hard to maintain, that even Cisco have no way to keep quality control over it. One small change here breaks something else over there. Maybe they should consider re-working the architecture from scratch instead of piling onto that house of cards. Customers are the test bed, and these forums are proof of that. Many of us don't deploy cutting edge ISE features - we just want the basic stuff from version 1.0 - it should be taken for granted that this is rock-solid and maintained to the highest quality assurance testing in Cisco.  

The worst mistake Cisco did to the ISE is to run Oracle database in the backend.  mySQL, MongoDB or Postgres would have been a much better choice than Oracle.