Solved: Re: ISE Distributed Deployment Upgrade and Testing Mid Way

Dokkie · ‎06-26-2019

Hi all,

I am about to undertake an ISE upgrade for a customer to take their distributed deployment from 2.2 to 2.4. We have opted to do an inline upgrade as opposed to a parallel due to the extra VM resources that would be required to do the parallel upgrade. I have noted that the upgrade guide indicates that the upgrade in a distributed deployment can be done with minimal downtime if done in the correct order – ie:

Secondary Admin Node
Primary Monitor Node
Policy Services Nodes
Secondary Monitor
Primary Admin

https://www.cisco.com/c/en/us/td/docs/security/ise/2-4/upgrade_guide/b_ise_upgrade_guide_24/b_ise_upgrade_guide_24_chapter_00.html

The customer wants to know if we can upgrade half of the environment – take stock and test (ie point WLC's to new environment) before completing the rest of the upgrade, see below:

Current Deployment

Node	Persona’s	Description	Version
ise001	PAN, MNT	Primary Admin Node, Secondary Monitoring Node	2.3
ise002	PAN, MNT	Secondary Admin Node, Primary Monitoring Node	2.3
ise003	PSN	Policy Service Node	2.3
ise004	PSN	Policy Service Node	2.3

Halfway Testing Deployment

Node	Persona’s	Description	Version
ise002	PAN, MNT	Secondary Admin Node, Primary Monitoring Node	2.4
ise003	PSN	Policy Service Node	2.4

Node	Persona’s	Description	Version
ise001	PAN, MNT	Primary Admin Node, Secondary Monitoring Node	2.3
ise004	PSN	Policy Service Node	2.3

In theory to me I should be able to do this? Has anyone got any thoughts / reservations?

Cheers,

M

Mike.Cifelli · ‎06-28-2019

I agree with @Damien Miller . Here is something I did a few weeks back and actually shared in another post:

Was running 2.3p5
Planning on getting to 2.4p5
2 PANs 2 PSNs
Ensure your customer has, at a minimum, configuration backups prior to doing anything. Also, I recommend performing the upgrade via CLI. I had issues with the GUI this morning due to some expired certs and other issues. Make sure name lookups work as well. The move I performed for the 4 nodes is as follows:
move secondary PAN to 2.4.x (now is the new PAN until later on)
move PSN1 to 2.4.x (during this move PSN2 with the original PAN will still be servicing requests)
Ensure that PSN1 is functioning as expected for policy services requests by checking radius live logs on secondary PAN on 2.4.x which actually gets promoted to PAN
Once that is confirmed, move PSN2 to 2.4.x and the new 2.4 cluster (now PSN1 is servicing all NAD requests)
Finally, move original PAN; Once moved over promote to primary again;
Apply whatever patch, if necessary after bundle upgrade success
If applying patches, it should take maybe an hour depending on the size of course.

If your NADs are configured to utilize the distributed deployment there should be no outages if things transition smoothly. A workaround, peace of mind, could be having your customer extend the reauthentication timers in the authorization profiles to something like 15 hours, ensuring hosts authenticated and will not reauthenticate again for X amount of time. This could potentially help them if they encountered issues with both PSNs during the transition. Keep in mind that each deployment scenario are unique, but this general approach should be similar. Good luck & HTH!

View solution in original post

Jason Kunst · ‎06-26-2019

Sounds like a split upgrade, only upgrade half and test it, keep other half on old release, take a look at this and let us know

https://community.cisco.com/t5/security-documents/ise-upgrades-best-practices/ta-p/3656934

Dokkie · ‎06-26-2019

Hi Jason, yes the scenario is a split upgrade. I have read the document you linked previously. It doesn't reference the scenario that I am wanting to achieve but is a good read and general upgrade reference document.

Damien Miller · ‎06-26-2019

Assuming you do the inline upgrading from the CLI, then I have no reservations about it at all. Letting the GUI upgrade power through the entire deployment without testing is just asking for trouble. Parallel is nice and most would suggest it, but as you said already, ruled out due to resources. If you have enough disk, but not enough cpu/memory, I would just strip the resource reservations and deploy some new VM's.
Just make sure to have a change freeze in place or be prepared to make changes, and that any troubleshooting will depend on which PSN you hit and that determines where you look for logs. There will be no data synchronization between the two deployments while you stop for testing.
To save some time, stage the upgrade files on disk: and create a repo that points at disk: /, then you won't be waiting for upgrade files to copy.
While not guaranteed, you could also shut down the primary PAN/PSN you plan on upgrading first, take a VM backup, then restore them if you need to roll back. It might result in having to resync nodes but I've used the process successfully in a lab.
Jason's link is a good read.

Dokkie · ‎06-26-2019

Thanks Damien,

Yes I am definitely going to do this from CLI and pre-stage the upgrade files. I am not sure on the rollback process that you suggested re: snapshot as the upgrade guide clearly states: https://www.cisco.com/c/en/us/td/docs/security/ise/2-4/upgrade_guide/b_ise_upgrade_guide_24/b_ise_upgrade_guide_24_chapter_01.html#reference_81B1170544474B7FA98AC3D80742C342

"When Cisco ISE runs on VMware, VMware snapshots are not supported for backing up ISE data.
VMware snapshot saves the status of a VM at a given point of time. In a multi-node Cisco ISE deployment, data in all the nodes are continuously synchronized with the current database information. Restoring a snapshot might cause database replication and synchronization issues. Cisco recommends that you use the backup functionality included in Cisco ISE for archival and restoration of data. "

If testing failed at halfway point, I think I would be forced to fire up new VM's and have them rejoin the existing legacy cluster and reinstate the roles and resynch everything?

M

Mike.Cifelli · ‎06-28-2019

I agree with @Damien Miller . Here is something I did a few weeks back and actually shared in another post:

Was running 2.3p5
Planning on getting to 2.4p5
2 PANs 2 PSNs
Ensure your customer has, at a minimum, configuration backups prior to doing anything. Also, I recommend performing the upgrade via CLI. I had issues with the GUI this morning due to some expired certs and other issues. Make sure name lookups work as well. The move I performed for the 4 nodes is as follows:
move secondary PAN to 2.4.x (now is the new PAN until later on)
move PSN1 to 2.4.x (during this move PSN2 with the original PAN will still be servicing requests)
Ensure that PSN1 is functioning as expected for policy services requests by checking radius live logs on secondary PAN on 2.4.x which actually gets promoted to PAN
Once that is confirmed, move PSN2 to 2.4.x and the new 2.4 cluster (now PSN1 is servicing all NAD requests)
Finally, move original PAN; Once moved over promote to primary again;
Apply whatever patch, if necessary after bundle upgrade success
If applying patches, it should take maybe an hour depending on the size of course.

If your NADs are configured to utilize the distributed deployment there should be no outages if things transition smoothly. A workaround, peace of mind, could be having your customer extend the reauthentication timers in the authorization profiles to something like 15 hours, ensuring hosts authenticated and will not reauthenticate again for X amount of time. This could potentially help them if they encountered issues with both PSNs during the transition. Keep in mind that each deployment scenario are unique, but this general approach should be similar. Good luck & HTH!

Dokkie · ‎07-16-2019

A quick follow up.

I successfully completed the inline upgrade method for a distributed deployment using the split upgrade and test method.

It worked very well. The only issue I had was a bug related to ISE 2.4 Patch 9 guest certificate chaining that TAC helped me workaround.

Damien Miller · ‎07-16-2019

Glad to here you were successful