Solved: Re: ISE Load Balancing and F5 Health Monitors for Active Directory

dab1 · ‎08-06-2018

In the ISE - f5 deployment guide it reads:

"If AD/LDAP account validation is requires as terms for determining RADIUS status, then it is recommended to return Access-Accept when the identity store is available and to lock down authorization for probe account as noted. There are implications to RADIUS failover that need to be considered on backend store failure, i.e. return Process Error versus Access-Reject and potential impact to the load balancing cluster as a whole. If AD/LDAP is down for all PSNs, then the NADs need to failover to different cluster since the VIP will be declared down."

In a scenario where there are two separate DCs each supported by a different f5/VIP and I would want to have an ISE cluster in one DC fail if AD wasn't reachable by ISE in that DC, I'm interpreting the document that a valid account should be set up that would return an Access-Accept via AD as opposed to making up a bogus account that would fail but return Access-Reject?

That's how the above paragraph from the deployment guide reads, but would not having a bogus account which returned Access-Reject be a good enough test to confirm AD functionality?

Cory Peterson · ‎08-06-2018

The benefit of a real account is that you get an access-accept back and it verifies that the External ID Store is also functioning correctly.

Using a dummy username will fail/access reject if the ID store is up or down.

View solution in original post

Damien Miller · ‎08-06-2018

Someone might come by that can better explain the wording in the document. I think they are trying to differentiate between "Process Error" and "Access-Reject", treating each differently. In practice I have treated an access reject for our health checks as an indication we don't wan to send any further auths to that PSN. Marking that PSN down in the LB VIP group, marking the VIP down when all nodes are down, potentially if all but one has failed. That's a deployment and scale decision.

From past experience I would recommend using two different valid service accounts for the radius health checks. One for DC1 and the second for DC2. Make sure they are either non expiring or on differing password reset schedules. The last thing one wants is a single account having an issue and causing both VIP's to go down.

dab1 · ‎08-06-2018

Thanks Damien...

Would you envisage an issue with setting up a monitor with a non-existent username? So that an Access-Reject is returned in normal times, but a error returned if AD was unavailable?

What is the benefit of using a real account over a bogus account?

>From a customer perspective they would need to set the account up, maintain it etc which they would be reluctant to do without a benefit over a bogus account.

Thanks again.

Craig Hyps · ‎08-06-2018

As the author, I can speak to intent. The intent of my design guidance was to:

Be sure to understand the implications of account used. For example, if use local account but all users authenticate via AD, then AD failure will not result in LB flagging server down. After all, you tested RADIUS auth and it said it was OK. If use AD account then you are validating both PSN and AD are functioning. However, if half of the endpoints are gaining access via MAB and profiling, for example, then AD failure could fail PSN and force those accounts to fail over as well.
Ensure that when Access-Accept is returned, that authorization returns zero access. Most LB health monitors are only checking to see that server is up (Access-Accept returned) and have no interest in the authorization. However, you do not want the health monitor account to be abused and used by malicious user to gain unauthorized access.
Understand the choices for when AD down. For example, you have the option to treat AD down as a process error. Authentication policy can then treat process error as auth failure (access-reject) or drop (no response to NAD). Most LBs will treat Access-Reject and Drops the same, i.e. declare server down. But Cisco IOS can treat Access-Reject as a positive indicator that PSN still alive. It would be useful if load balancers made this configurable.

I am not totally clear on the mention of bogus accounts. I suspect you are looking to treat bogus account as "User not found" and distinguish that from AD down as "Process error". I suppose that could work as well.

Craig

paul · ‎08-07-2018

I typically use a dummy account, i.e. where user not found is set to Continue, but I also setup a specific Policy Set for my F5 RADIUS probes where the admission criteria is "Device Type = F5 AND username = <Dummy Account name>". So there is no way it can be abused elsewhere. Granted I am not checking the back end AD store.

Cory Peterson · ‎08-06-2018

The benefit of a real account is that you get an access-accept back and it verifies that the External ID Store is also functioning correctly.

Using a dummy username will fail/access reject if the ID store is up or down.