Solved: ISE behind load balancer

bert.lefevre · ‎12-08-2011

I have a question regarding ISE profiling servers that are placed behind a load balancer:

If you have a ISE environment where both computers and users are being authenticated, and Machine Access Restriction (MAR) is enabled (so users can only authenticate on a previously authenticated machine), are the ISE servers aware of all succesfull computer authentications handled by the other ISE servers?

For example:

There are 2 ISE appliances (ISE01 and ISE02) behind a load balancer.

A user starts up his computer, and computer authentication is handled by ISE01 (and the authentication is successful). At the moment the user logs in on that computer, the load balancer chooses ISE02 to authenticate the user.

Will ISE02 be aware that the corresponding computer was already succesfully authenticated on ISE01, so that the user is able to log in? Or will it deny the user authentication because it thinks the computer is not (yet) authenticated and Machine Access Restrictions is enabled?

Kind regards,

Bert

Nicolas Darchis · ‎12-08-2011

are the ISE servers aware of all succesfull computer authentications handled by the other ISE servers?

=> No

they are independant servers that just replicate their configuration.

So a user should authenticate always with the same ISE.

Moreover a load balancer kills profiling since profiling requires you to span some traffic to an ISE

View solution in original post

Nicolas Darchis · ‎12-08-2011

are the ISE servers aware of all succesfull computer authentications handled by the other ISE servers?

=> No

they are independant servers that just replicate their configuration.

So a user should authenticate always with the same ISE.

Moreover a load balancer kills profiling since profiling requires you to span some traffic to an ISE

bert.lefevre · ‎12-09-2011

Nicolas,

thanks a lot for this explanation. Now I'm at least warned that we shouldn't place them behind a load balancer (although load balancing ISE policy servers is mentioned in the Cisco ISE user guide under section 9 "Setting up Cisco ISE in a distributed Environment").

Actually I don't understand why Cisco didn't implement synchronization of the machine cache for MAR (which is in fact just a cache of the mac-addresses of the authenticated computers) between ISE servers that are in the same node group. Synchronizing a table of mac-adresses isn't a big challenge I assume? Or is there another reason this wasn't implemented?

Implementing this synchronization would be a big improvement if you ask me, as this adds extra redundancy in case one ISE server fails and users try to log on to machines that were already authenticated on that failed ISE.

Kind regards

Nicolas Darchis · ‎12-09-2011

I would guess it's probably a bit more complex than a mac table synchronization. Especially the real time synchronization could take a lot of bandwitdh/cpu I can guess but yes it would make sense as a feature request. I think the feature request list has to be 1km long now for ISE :-)

I'll check with the developers if it's already on their roadmap or not.

Nicolas Darchis · ‎12-13-2011

After checking, the MAR cache synchronization will be present in ACS 5.4

Logically it should also be included in a future ISE release but no further details.

bert.lefevre · ‎12-14-2011

Nicolas,

I'm glad to hear that the MAR cache sync is already in development for ACS and I hope it will be soon implemented in ISE also. I'll keep an eye on new release notes.

Thanks a lot!

bert.lefevre · ‎12-14-2011

Nicolas,

I'm glad to hear that the MAR cache sync is already in development for ACS and I hope it will be soon implemented in ISE also. I'll keep an eye on new release notes.

Thanks a lot!

Craig Hyps · ‎01-18-2012

>> they are independant servers that just replicate their configuration.

So a user should authenticate always with the same ISE.

Moreover a load balancer kills profiling since profiling requires you to span some traffic to an ISE <<

Not entirely correct. Policy Service nodes are most certainly supported behind a load balancer which is the intention of a node group. This is often the preferred method for high availability and scaling. In addition to supporting load distribution of RADIUS and other requests, members of a node group maintain a heartbeat to determine if a peer member should fail. If so, the Monitoring node is queried to determine if there are any transient sessions which may require clean-up via RADIUS COA to help ensure that an endpoint is left in a defunt auth state. LB functionality will depend on load balancer used. Cisco ACE for example supports stickiness of RADIUS transactions based on source IP, Calling-Station-ID, or Framed-IP-Address.

The impact of LB on profiling or other Policy Service node functions depends on the service/probe in question. For services like client provisioning, posture, and central web auth, https redirection always occurs back to the node which terminated the RADIUS session, so LB is transparent provided direct access is permitted to the real IP for redirected https trnasactions (RADIUS tranasactions would be sent to virtual IP).

Specific to profiling, SNMP Queries can be triggered and will be sent by Policy Service node that received the RADIUS Accounting Start packet (assumes RADIUS probe enabled) or SNMP Trap (assumes SNMP Trap probe enabled). SPAN is only one data collection method used primarily for HTTP or DHCP capture. Methods other than SPAN/RSPAN are available to capture this data, but if used, then it is correct that there is no specific mechansim to move SPANs from one interface to another in case of NIC or node failure. I believe intelligent taps are available that can accomplish this, or else traffic can be mirrored to multiple nodes at the cost of duplicating profile data.

As noted, replication of MAR cache will be added to ACS 5.4, and no, this feature is not altogether trivial due to the number of transactions and updates that must be replicated and kept in sync across each node performing RADIUS services.

/CH

g-hopkinson · ‎01-27-2012

Hi,

Thanks, this is useful information.

In the documentation, it mentions that when using a node group the NAS should have all of the ISE's configured under AAA to allow CoA. Would it be possible to use the VIP address and NAT the ISE's when they instigate a outbound connection from behind the ACE for CoA, or is Radius a bit deeper than that.

Would you configure a node group for a pair of policy nodes on a remote site that were not load balanced, what makes this specific to policy server nodes behind a LB. Assume both policy server nodes where configured in all NAS's on that particular site.

I assume when profiling is carried out all data is replicated to the admin node anyway, this is using DHCP helper, DNS, SNMP. When you start to look at a distributed ISE architechure and using profiling it starts to get messy, potentially a lot of helper, SNMP addresses have to be configured in NAS's.

Thanks.

laposilaszlo · ‎06-29-2015

Hi Craig,

We are in the process of migrating our ISE infrastructure from ACE to F5.

We followed your document for the configuration.

All looks ok except EAP-TLS authentication. (PEAP user/computer works fine)

In the document there is nothing special mentioned that needs to be done for TLS.

I think it may be related to fragmentation but not sure.

I can also add here that if we point the NAD's to the PSN directly it works.

The problem is only when we use the VIP.

(PEAP work with the VIP also)

Do you know if something special needs to be done for TLS to work.

Any information or hint is appreciated.

Thanks,

Laszlo

Craig Hyps · ‎06-29-2015

It is not uncommon to see RADIUS load balancing issues with EAP-TLS related to fragmentation. The typical cases are either 1) failure of load balancer to reassemble large RADIUS packets, for example, TLS with larger key sizes, or 2) dropping of fragments by load balancer that are deemed too small. For first case, both Cisco ACE and F5 LTM should accommodate automatic reassembly if using the standard LB mechanism for RADIUS. LTM does not reassemble FastL4 by default, but that protocol is normally not used and guide does not use that profile for RADIUS. If fragments too small, for both ACE and LTM you would need to change the default minimum fragment size to accept the exceptionally small fragment for reassembly. This can serve as a workaround, but recommend find and eliminate the device causing RADIUS packets to be fragmented below reasonable size.

Another common issue in load balancing is failure to understand exact path taken for the entire flow to/from real servers. Often there is a case where ingress packets take one path but responses take another path. This asymmetry often results in packet drops by load balancer or other device in the path.

/CH

pvzcisco07 · ‎06-20-2017

Hi laposilaszlo,

Did you end up resolving the issue with EAP-TLS?

I am using a F5 to load balance RADIUS & having the same issue.

I am not sure if I want to alter the fragment size as a work around.

Regards,

Raj

laposilaszlo · ‎06-20-2017

Hi Raj,

Yes we solved it.

In our case it was the Nexus switch. It has a security feature that discards small UDP packets.

And the last part of the certificate was a small UDP packet that got discarded.

So we disabled this one and all is ok now.

After this we had some problem on the F5 regarding UDP fragments that was solved with an F5 upgrade.

This was a long time ago so this fix sould be in the current releses.

laszlo

pvzcisco07 · ‎06-20-2017

Hi Laszlo,

Thanks for the quick reply!

We run a Nexus core as well. Could you please tell me how to check/disable the feature?

We are running ISE 2.2 & the issue still seems to persists.

We are running version 11.5.3 on the F5.

Regards,

Raj

laposilaszlo · ‎06-21-2017

MTN-GDC-AGG-N7018A-1# show hardware forwarding ip verify module 3

IPv4 and v6 IDS Checks Status Packets Failed

-----------------------------+---------+------------------

address source broadcast Enabled 0

address source multicast Enabled 0

address destination zero Enabled 0

address identical Enabled 134

address reserved Enabled 2334940

address class-e Disabled --

checksum Enabled 0

protocol Enabled 0

fragment Enabled 34254

length minimum Enabled 0

length consistent Enabled 0

length maximum max-frag Enabled 0

length maximum udp Disabled --

length maximum max-tcp Enabled 0

tcp flags Disabled --

tcp tiny-frag Enabled 176552

version Enabled 0

-----------------------------+---------+------------------

IPv6 IDS Checks Status Packets Failed

-----------------------------+---------+------------------

length consistent Enabled 0

length maximum max-frag Enabled 0

length maximum udp Disabled --

length maximum max-tcp Enabled 0

tcp tiny-frag Enabled 0

version Enabled 0

Workaround:

Disable packet length check using command

Config t

no hardware ip verify length