Solved: Re: ISE AD connector DC Failover how does it work?

Madura Malwatte · ‎06-13-2019

I have question on how DC failover actually works. I read the section from Active Directory Integration with Cisco ISE 2.x on DC failoiver -https://www.cisco.com/c/en/us/td/docs/security/ise/2-0/ise_active_directory_integration/b_ISE_AD_integration_2x.html#reference_42F562CACEA745348AE47B601A29E151

"The AD connector detects if the currently selected DC becomes unavailable during the LDAP, RPC, or Kerberos communication attempt. The DC might be unavailable because it is down or has no network connectivity. In such cases, the AD connector initiates DC selection and fails over to the newly selected DC."

The "LDAP test DCs availability" diagnostic test returns 3 DC's available, and the first one to respond is DC1 which is the one that the ISE nodes select. We are doing some changes to DC1, and it will be down for a few hours. When the DC1 goes down, how quickly will ISE detect it is unavailable and fail over to another DC? How frequently does the AD connector attempt communication via LDAP, RPC, or Kerberos to the DC to determine its unavailable, is this a fixed polling interval? During failover, will AD authentications be disrupted? How can I minimise any outage, can I failover to another DC (say DC2) in a controlled manner and then fail back to DC1 when its back up?

hslai · ‎06-13-2019

The advanced tuning page may define registry settings for preferred DCs. We are not encouraging their general uses so not publicly documented. If you open a proactive TAC case, you should be able to get more info from TAC and TAC may help evaluating your deployment, as you would be able to give more details on the ISE version and patch level, some AD infra details, etc.

My mention to "restart active directory connector" is just in case, and without restarting ISE services as a whole.

To simulate a DC fail-over does not require bringing any DC down. We could simply put a temporary blackhole route using ISE CLI, if you have an ISE in the lab.

View solution in original post

Jason Kunst · ‎06-13-2019

Have you looked at the Cisco Live content on this?
http://cs.co/ise-training

What's new in ISE Active Directory connector - BRKSEC-2132<>

Madura Malwatte · ‎06-13-2019

Hi Jason,

Yes, I did go through BRKSEC-2132, but there are no details on DC failover. It just mentions in one slide about "faster failover" that's about it.

Mike.Cifelli · ‎06-13-2019

We just recently went through upgrading 3 DCs to server 2016. Here is one way you could minimize downtime:
I suggest making one of your secondary DCs the primary for your domain while you work on DC1. Then, assuming you have an ISE cluster, you can do the following:

In my case I have a 4 node cluster (1 PAN, 1 Sec. PAN, 2 PSNs); I did a leave and re-join on one PSN, the sec PAN, the other PSN, and finally the PAN. All 4 came back up with status operational on the new primary DC. We then proceeded to upgrade the original primary DC. This worked for us with no issues. If you do this I suggest leaving without credentials so the object does not get blown away.

I know documentation states this too: You can influence the domain controllers that Cisco ISE uses by creating and using an Active Directory site. See the Microsoft Active Directory documentation on how to create and use sites.

Good luck & HTH!

hslai · ‎06-13-2019

I like Mike.Cifelli's suggestions. It might not be a bad idea to open a proactive TAC case. If you turn TRACE on the Active Directory component right before, you may tail on ad_agent.log to see more details. If auth not going through properly, you could try [ Restart Active Directory Connector ] in the advanced tuning page.

Usually such failures should take only a few seconds. You are correct the CiscoLive session a bit dated, especially because ISE 2.4 made a major improvement on DC failover. Hopefully, he will do an update in a near future. Meanwhile, we are currently working on documenting some of such info.

Madura Malwatte · ‎06-13-2019

Hi @Mike.Cifelli and @hslai thanks for the suggestions.

The documentation also states "Cisco ISE also provides the ability to define a list of preferred DCs per domain. This list of DCs will be prioritized for selection before DNS SRV queries." - do you know where this is configured in ISE?

Mike - when you made your secondary DC the primary, was this by changing the sites in active directory? Also when you did the domain leave / join to secondary DC on PSN, you would have had authentication failures for those 20 seconds or so, while it joined to the new DC?

Hslai - this option as I understand is to bring down the primary DC, wait to see if ISE automatically performs new DC selection and if not, then use "Restart Active Directory Connector"?

I saw that others have asked similar questions about the AD connector mechanism in terms of DC failover in the support forums, but no clear answer given. Unfortunately I don't have a test environment with multiple DC's to test this - and this could be easily tested. Is it possible to get confirmation?

hslai · ‎06-13-2019

The advanced tuning page may define registry settings for preferred DCs. We are not encouraging their general uses so not publicly documented. If you open a proactive TAC case, you should be able to get more info from TAC and TAC may help evaluating your deployment, as you would be able to give more details on the ISE version and patch level, some AD infra details, etc.

My mention to "restart active directory connector" is just in case, and without restarting ISE services as a whole.

To simulate a DC fail-over does not require bringing any DC down. We could simply put a temporary blackhole route using ISE CLI, if you have an ISE in the lab.

Mike.Cifelli · ‎06-14-2019

@Madura Malwatte I honestly cannot provide an exact answer to the DC switch over since our server team on site performed the change. However, for the second question if you have multiple PSNs serving requests and you have some sort of load balancing scenario you should not experience any failures.