Re: Scenario ISE-Pri dead and promote ISE-Sec to Primary

Xibachao1 · ‎05-26-2025

Hi all,

We are using two nodes with:

ISE-Pri

Administration, Monitoring, Policy Service

PRI(A), PRI(M)

SESSION,PROFILER,DEVICE ADMIN

ISE-Sec

Administration, Monitoring, Policy Service

SEC(A), SEC(M)

SESSION,PROFILER

I would like to discuss about disaster plan.

What if ISE-Pri dead and we have to promote ISE-Sec to Pri. We temp call it is ISE-SecPri. I understand ISE-SecPri will be restart the services.

And what will happen if i restore the ISE-Pri. We will have ISE-Pri (Primary node) and ISE-SecPri (Primary node) at that time.

How do i get them back to original. (Restart maybe require?)

Jonatan Jonasson · ‎05-26-2025

There are a few aspects to this discussion.

Let's start with:

@Xibachao1 wrote:

And what will happen if i restore the ISE-Pri. We will have ISE-Pri (Primary node) and ISE-SecPri (Primary node) at that time.

If you restore from a config backup, you'll be restoring to a standalone node, the restore process doesn't automatically put the ISE-Pri as a primary in the original deployment.

If you're considering a different type of a restore, ie from vmware snapshot or similar 3rd party backup solution, you might be going down an unsupported route that would create additional issues.
For example, according to current 3.4 information, ISE still doesn't support VM snapshot to backup & restore the environment, because the restore could introduce database and synchronization issues.

See the 2nd note in this document: https://www.cisco.com/c/en/us/td/docs/security/ise/3-4/install_guide/b_ise_installationGuide34/b_ise_InstallationGuide_chapter_2.html

Going back to the original point, if your ISE-Pri dies, and you go through the process of promoting ISE-SecPri to primary, and considering next steps.

You could start with a fresh install of ISE-Pri add certificates, and then join it to the deployment and sync config from ISE-SecPri, where ISE-SecPri is still primary.
- And later decide if you want to switch roles.
You could change start with a fresh install of ISE-Pri, restore from config backup, change to primary in a new deployment, change SecPri to standalone, and rejoin to the new ISE-Pri deployment.

Consider the "Cisco ISE Restore Operation" section for additional guidance on the restore process:
https://www.cisco.com/c/en/us/td/docs/security/ise/3-4/admin_guide/b_ise_admin_3_4/b_ISE_admin_maintain_monitor.html#ID312

Xibachao1 · ‎05-26-2025

Hi Jonatan Jonasson ,

Actually, we have a backup solution (full backup VM ISE-Pri) and we tested restoring ISE-Pri successfully (test case for disaster).

But while the ISE-Pri (restore version) is restoring, I just only keep the secondary mode for ISE-Sec. And when ISE-Pri (restore version) was online, I had to sync up. ISE-Sec doesn't have admin Gui that time, i have a little problems with that. I can not trace the Live Logs, devices profiling....

I want to back with my scenario. What will be happen in that case? That's all my concern.

Jonatan Jonasson · ‎05-27-2025

To your original point:

If ISE-Pri dies, and you promote ISE-SecPri, and then restore ISE-Pri, you would end up with two nodes both in admin-primary, and deployment would not be synced.
To recover from this scenario and get it back to how it originally was, you could change the deployment on ISE-SecPri to standalone, and then re-join it to the deployment from ISE-Pri.

If you're feeling lucky, you could also just restore ISE-SecPri from a time before you promoted it to primary, or even from before ISE-Pri died. If you restore both nodes from the same backup time they should be identical, assuming that the cause of ISE-Pri dying is due to external factors and not a fault in ISE.
(I'm not recommending this though.)

To my, I recommend against this method, point:

I know that you can successfully restore an ISE node using your backup solution and find it working, I've done this myself a number of times.

However, it's still unsupported, and as the note says "might cause database replication and synchronization issues" because all nodes are continuously synchronized.
This ties back to your point, as this restore method relies on that nothing else in the environment has changed in the meantime.
And if you promoted your ISE-PriSec to primary, you've made a significant change.

The reason I recommend against this, and using a supported method instead, is that you can't 100% predict what will be the issue in a real-world disaster scenario, and if you run into bigger issues during DR and need to engage customer-support, you might not get the help you require if you insist on doing something that's explicitly recommended against in the documentation.

Xibachao1 · ‎05-27-2025

Hi Jonatan Jonasson,

Thanks for great respond.

ahollifield · ‎05-27-2025

I would suggest adding a third node. In case you need to promote/swap roles but nodes will restart at the same time in your current design.

https://cs.co/ise-scale

Aref Alsouqi · ‎05-27-2025

You can leverage the PAN auto-failover feature however as @ahollifield suggested you need a third node to be added to your deployment as auto-failover requires minimum of three nodes. Also, I agree with @Jonatan Jonasson, relying on restoring the primary PAN via restoring its VM would not be recommended, especially when you have a secondary PAN promoted. Firstly because you can't have two primary nodes at the same time and secondly because when you restore the original primary via restring the VM, the secondary (new primary) has no way to demote itself. Finally, database inconsistency might happen in that case as the restored primary will have an old state compared to what the secondary (new primary) has.

Xibachao1 · ‎05-27-2025

Hi @ahollifield and @Aref Alsouqi ,

Thanks for your information. I image we need a third node that is monitoring ISE-1 and ISE-2 and make decision who is Primary (active).

In that case, when ISE-Pri dies that lose the connection with ISE-Third and ISE-Third choose the ISE-Sec to primary active node with out downtime.

May i am right about that?

Aref Alsouqi · ‎05-28-2025

You're welcome. Yes you're right, the third node will be the checker and if connectivity is lost with the primary PAN it will instruct the secondary PAN to take over. I think when the secondary PAN is elected to be the primary, its services will still be restarted and it depends on your environment setup if that will have any impact or not. Take a look at this table please, it shows you what services will not be available when the primary PAN is down and the secondary PAN didn't take over yet:

Cisco Identity Services Engine Administrator Guide, Release 3.0 - Deployment of Cisco ISE [Cisco Identity Services Engine] - Cisco

Xibachao1 · ‎05-28-2025

Thank you so much for best support. I'm so appreciate.

@Aref Alsouqi @ahollifield @Jonatan Jonasson

Xibachao1 · ‎05-29-2025

Hi @Aref Alsouqi ,

I have reviewed the document from Cisco and have one point confuse about

I understand two way of that:

1. ISE-2 has only 12 hours to get primary mode and we have to bring ISE-1 back in that time. In this case, what will happen to ISE-2 after 12 hours (ISE-1 is not back yet)?

2. After 12 hours and ISE-1 is still dead, will the cluster delete it and we have to rejoin when available?

Aref Alsouqi · ‎05-29-2025

Good question. Tbh I don't think I ever tested it leaving the primary PAN off for more than 12 hours. However, I don't remember ever seeing any ISE node getting removed from a cluster automatically. So, I don't believe the cluster will remote the original primary PAN from the deployment, what I believe would happen is that when the original primary comes back online no auto synchronization will happen with the new primary PAN (original secondary) and in that case you might need to go and initiate it manually.