cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
232
Views
0
Helpful
1
Replies

Cisco ISE Rebuild Issue

So we had an ISE which fell over after I've rebuilt our ISE with base software image (3.1.518), ready for deploying it back onto the network with the other appliance in a HA pair. 

I've already raised this with Cisco TAC, but just wondering if someone experienced here can tell me where I have gone wrong?

We've got a pair of SNS-3615-K9's running ISE software version 3.1.0. One is in DC1, the other is in DC2.

Someone else in the team was tasked with upgrading the patch version of both units in the pair from  3.1.0.518-Patch7 to Patch 10.

It was previously decided to do this upgrade one unit at a time. I wasn't originally involved.

After upgrading the first unit (DC1), the GUI of that unit would no longer run, and looking at the Application Server status it was 'Not Running', and it would not come up even after waiting for some time (2 hours). Reloading failed to bring this back up. Luckily the other unit in the deployment was fine, and we were able to promote it to be the primary PAN. 

He's now gone away and I am now tasked with fixing it.

I've rebuilt the failed ISE unit (DC1) with base software image (3.1.518) and then added Patch 7 as it was previously on, same as the other working DC2 unit, ready for re-deploying it back into the pair with the other DC2 unit.

To bring the rebuilt unit back into the deployment I followed these steps on the current active PAN (DC2):

  • Ensured the hostname configured on the newly rebuilt ISE (DC1) was pingable and resolves correctly from the still functional DC2 node.
  • The old ISE unit (DC1) was still listed with a red cross under its node object in the Administration > System > Deployment page of the DC2 unit.
  • De-Registered Old Node Object - The old node was now completely gone from the list on the DC2 ISE.
  • Register New Node Object - Completed the node details, inputting them exactly how they were on the old node. The new node now appeared in the node list, and before it did, the system popup message correctly says: "Node was registered successfully. Data will be sync'd to the node, and then the application server will be restarted on the node. This processing may take several minute to complete. Please update smart licensing registration. When failover is required among multiple PSNs, please put the nodes in a Node Group".
  • Updated Smart Licensing Registration: clicked the "Renew Registration" button on the licensing page. It brought up a green "Server response" message.
  • New ISE was now Successfully Added Back into the deployment. I was able to login into the new ISE using my personal admin account, ( good result!) which showed me the registration/join was successful and now the config must have successfully sync’d across, and now it only has limited options as it's currently the secondary PAN. The licensing warning has now disappeared, and the Licensing page itself has also disappeared (part of the limited options of being a secondary PAN).
  • I could see that the previously configured Node Group contained both ISE's. I did add the newly built ISE back into the Node Group when I did the initial configuration of it during the Registration process. 
  • Promotion of New ISE to PRIMARY unit - I did this from the new ISE (Data Centre 1) that I had just logged into. I tried to log back into both units (Data Centre 1 and Data Centre 2) but on both of them I got a warning (which comes up only after you login to the GUI, and it says "Application server initializing". I tested login to an end device during this time and my TACACs would not work. After about 15 minutes, the GUI for DC1 was back up, (and TACACs was working again for end devices) , but as for the other DC2 unit it is still not working - the GUI and application server process from looking at CLI was not running. I have no idea why. Now this DC1 ISE cannot see the other failed one (DC2), and I cannot login to the GUI of the failed unit
  • Alerts now being generated on SIEM monitoring systems every 15-30 minutes for the failed ISE (DC2). Our NOC can see the failed ISE flapping as if it's going up and down trying to do something?

So in summary, I've fixed the DC1 unit that was not working. This is working fine now.  But the DC2 unit is now broken after failing over/promoting the newly build DC1 unit. I don't understand why.

I've already raised this with Cisco TAC, but just wondering if someone experienced here can tell me where I have gone wrong?

1 Reply 1

Scott Fella
Hall of Fame
Hall of Fame

I have rebuilt a bunch of nodes and maybe its a good idea to rebuild your node in DC2 if its down.  At least that will clear up any stale data.  The issue could of been some corruption when the deployment was patched, but that would be an assumption. Since your ISE in DC1 is now up and functioning, it would be worth it for me to factory reset ISE in DC2 and add that back into the deployment.  I'm also assuming that if you run a test to the ISE node in DC2, things are failing?

-Scott
*** Please rate helpful posts ***