Re: ISE failover

Anukalp S · ‎04-15-2018

Hi.. i have recently deployed another ISE node as secondary so that incase primary node is down there should be no impact, i have two node (primary and secondary) in ISE distributed environment.

I have a bit confusion on failover, if incase primary node is down then do i need to promote secondary node as primary so that dot1x auth and tacacs..etc should work OR it will still work without promoting secondary node to primary.

please suggest.

Node-1 - Primary - (Role- Primary admin and Primary monitoring)

Node-2 - Secondary - (Role- Secondary admin and Primary monitoring)

Octavian Szolga · ‎04-16-2018

Hi,

The PAN failover is needed for other services, not for radius and tacacs. You won't be able to change anything in policies and all, but the service itself will be working. It's just a matter of NAD to detect radius/tacacs is down for the first ISE node configured in CLI and switch to the second node.

If other services are in place, it depends..

https://www.cisco.com/c/en/us/td/docs/security/ise/2-3/admin_guide/b_ise_admin_guide_23/b_ise_admin_guide_23_chapter_011.html#ID15

Regards,

Octavian

Anukalp S · ‎04-16-2018

Hi Octavian.. Thanks, so in my case services(radius and tacacs) will run on secondary ISE without promoting it to primary.

One more thing when primay node comes up, will it automatically take role of primary, i mean what would be its current role post it comes up.

Octavian Szolga · ‎04-16-2018

Hi,

Your primary PAN will still be primary after it comes back online. You can restart each node independently and each one will retain its former ISE persona/function.

Thanks,

Octavian

Marvin Rhoads · ‎04-16-2018

In a 2-node deployment such as yours, the normal persona (role) setup is:

Node 1: Primary PAN, Secondary MnT, PSN, Device Admin (TACACS)

Node 2: Secondary PAN, Primary MnT, PSN, Device Admin (TACACS)

Your network access devices are configured to use both nodes for RADIUS and TACACS services so the loss of either one does not affect those services.

As noted earlier, when the primary PAN is down you lose the ability to change settings (and a few more obscure things like automatic profiler updates from cisco.com) but everything else works fine.

Arne Bier · ‎04-16-2018

Just to add my 2c worth to what the others have said. When the PAN is unavailable, then the Guest Sponsor Portal does not work (sponsors cannot login, because the central database lives in the PAN). Guest authentications continue to operate normally.

In a distributed environment you can enable automatic PAN failover but you need an external node such as a MnT or PSN to act as a health monitor of the PAN. I use this because it's handy when PAN crashes at 2AM and nobody is around to promote the other PAN to Primary. These things are not fast - the detection should be intentionally slow to avoid any flip flopping between PAN's - usually 10min to detect a failure, and then the X minutes it takes for you PAN to restart its services (on Secondary and Primary).

Automatic PAN failover must be temporarily disabled if you plan to patch your system, or if you plan to restart the PAN services (e.g. for a TAC case or whatever). But mostly it's an insurance policy that I like to have in place.

One last thing - we are lucky enough to have a big F5 deployment and we use the GTM (Global Traffic Manager) to act as a kind of DNS for the auto failover. Our ISE admins don't care which PAN is active - they can always browse to (made up name) iseadmin.company.com and this resolves to the correct PAN (spread across two data centres). The GTM uses a RESTful API call to both PAN nodes to check which is active, and becomes authoritative for that FQDN. It's one less hassle for the operations people to deal with.

Anukalp S · ‎04-19-2018

Thanks Marvin, Octavian & Arne for your helpful suggestion,

Aside to marvin ..could you please brief about difference if i setup Primary MnT on Primary node versus Secondary MnT on Primary node as you mentioned in your post.

Marvin Rhoads · ‎04-19-2018

@Anukalp S,

I'm following Cisco're recommendation for a standard 2-node deployment. The rationale that's been explained to me is that MnT is a more resource-intensive function so moving it off of the primary PAN helps system performance overall.

Refer to Cisco Live presentation "BRKSEC-3699 Designing ISE for Scale and High Availability" for much much more about this topic.