cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
30410
Views
40
Helpful
38
Comments
Aaron Woland
Cisco Employee
Cisco Employee

So, this is my first blog post on here.  Hope it goes well.

One of the most commonly asked questions of late is how to properly use a load-balancer with Cisco's Identity Services Engine.  Here are some basic guidelines to use when configuring a Load Balancer for the ISE Policy Services Nodes (PSNs).

Understanding terms:

PSN = Policy Services Node.  The PSN is the ISE persona that handles all of the radius requests, and make the policy decisions.  If you are using profiling, the PSN is also handling the profiling for you.

VIP = Virtual IP Address.  This is the IP Address that Load Balancer listens on, and will redirect traffic destined to the VIP to the real IP Addresses of the servers in the Server Farm.

Server Farm = The Grouping of servers that will be load balanced when traffic is destined to the VIP

Endpoint = the actual device accessing the network.

NAD = Network Access Device.  The Access-Layer device (switch / wireless controller) that provides and enforces network access to the endpoint.

SNAT = Source Network Address Translation.  Function of load balancers to hide the source ip address of the NAD, which allows the load-balancer to run "out of band". 

General Guidelines

Edit section

When using a Load-Balancer (anyone's) you must ensure a few things.

  • Each PSN must be reachable by the PAN / MNT directly, without  having to go through NAT (Routed mode LB, not NAT).  NO NAT.  This  includes the Accounting messages, not just the Authentication ones.
  • Each PSN Must also be reachable directly from the Client's – for redirections / CWA / Posture, etc…
  • You may want to "hack" the certs to include the VIP fqdn in the SAN field.
  • Perform sticky (aka: persistance) based on Calling-Station-ID and Framed-IP-address
  • VIP gets listed as the RADIUS server of each NAD for all 802.1X related AAA.
  • Each PSN gets listed individually in the Dynamic-Authorization (CoA).  Use the real IP Address of the PSN, not the VIP.
  • The LoadBalancer(s) get listed as NADs in ISE so their test authentications may be answered.
  • ISE  uses the Layer-3 Address to Identity the NAD, not the NAS-IP-Address in  the RADIUS packet...  This is another reason to avoid SNAT.

Failure Scenarios:
Edit section

  • The VIP is the RADIUS Server, so if the entire VIP is down,  then the NAD should fail over to the 2ndary DataCenter VIP (listed as  the 2ndary RADIUS server on the NAD).
  • Probes on the  Load-Balancers should ensure that RADIUS is responding, as well as HTTPS  (at minimum).  LB Probes should send test RADIUS messages to each  server periodically, to ensure that RADIUS is responding, not just look  for open UDP ports.  Same goes for HTTPS.
  • Should use  node-groups with the L2-adjacent PSN's behind the VIP.  If the session  was in process and one of the PSN's in a node-group fails, then another  of the node-group members will issue a CoA-reauth; forcing the session  to begin again.  At this point, the LB should have failed PSN1 due to  the probes configured in the LB; and so this new authC request will hit  the LB & be directed to a different PSN…

Why can't we use Source NAT (SNAT)?

Edit section

One of the most common questions when load balancing, is: "Why can't  we use SNAT?".  Source NAT is a fantastic thing for general  Load-Balancing - but not with ISE.  The resons listed below pertain to ISE version 1.1.x; and may change with ISE 1.2+

Reason #1:  Network Access Device (NAD) will be wrong:
With SNAT, the source Network Access Device will show up in ISE as being the Load-Balancer, NOT the Network Access Device.

(click image to enlarge)

Source_is-ACE.png

ISE uses sessionized network authentication.  This means ISE is  tracking the session along with the NAD - so the NAD & ISE stay in  sync about the state and location of the endpoint...  This session also  gives ISE the NAD address to send Change of Authorizations to, as well  as the location of the endpoint.  We use the source NAD in many  different ISE Policies - and if all nodes always appear to be coming  from the Load-Balancer, instead of the NAD - how can we know the  location of the endpoint?

Location is not nearly as big of a deal as the Change of  Authorizations. ISE records the Layer-3 Address of the NAD  from the Layer-3 headers.  There is a RADIUS field known as  NAS-IP-Address; which embeds in the IP Address of the Network Device in  the RADIUS Packet.  However, ISE does not currently use that field; and therefore the L3 IP Address of the NAD must be correct for Change of Authorization to be sent to the correct device.  If the NAD  appears as the IP Address of the Load-Balancer, then ISE will send the  Change of Authorization to the Load-Balancer - not the switch.

Reason #2:  URL Redirection and Web Portals:
Next, ISE 1.1.x only has one interface that can be used for all functions.  Yes, we can run RADIUS on any of ISE's four interfaces, but the Gigabit 0/0 interface is the ONLY interface for Management Traffic.  Also, the fqdn of the Policy services node is embedded into the certificate for ISE 1.1.x; and that is what gets used for URL Redirection for WebAuth & Device Registration &  Supplicant Provisioning, etc...

(click image to enlarge)
LB - Cert_FQDN.png

So, when the URL Redirection occurs, the endpoints will need to talk to ISE Directly (not the VIP) - and reach the web portals.  The Portals can ONLY exist on the Gigabit 0/0 Interface in 1.1.x.  (This may change in a future version of ISE).

Reason #3:  Routing Tables:
Unless you add a static route to ISE for every NAD Subnet, ISE does not  have the ability in 1.1.x to return traffic on a different subnet  through a different Gateway, only it's default Gateway.  Therefore, the  Load-Balancer MUST be the Default-Gateway for the ISE PSN's.

Since the Load-balancer must be the default Gateway, then all Management Traffic is also flowing through the Load-Balancer, unles you physically locate the Policy Administrative Node (PAN) and Monitoring & Troubleshooting Node (MNT) behind the load-balancer as well (just don't include those in the ServerFarm).

I hope that helps. 

Aaron

38 Comments
Aaron Woland
Cisco Employee
Cisco Employee

We are in development of a configuration guide that includes F5 configuration, that will hopefully be published as a joint white-paper between Cisco & F5. 

If you take a look at the BYOD Smart Solution (Cisco Validated Design) here:  http://www.cisco.com/c/en/us/solutions/enterprise/data-center-designs-cloud-computing/own_device.html - it includes configuration of a Citrix Load Balancer.

Aaron

Excellent document!! Thanks for sharing!! 

Sloanstar
Level 5
Level 5

<necromancy>

Aaron - Was a deployment guide ever finished for use with F5 LTM?

Additionally, I was wondering if you could expand on the following points in relation to WLC 5508:

  • VIP gets listed as the RADIUS server of each NAD for all 802.1X related AAA.
  • Each PSN gets listed individually in the Dynamic-Authorization (CoA).  Use the real IP Address of the PSN, not the VIP.

 

I'm assuming the following:

VIPs would be defined as "Network User" enabled not supporting RFC 3576

The PSNs would be defined "Network User" disabled supporting RFC 3576

All VIPs & PSNs would be added to the WLAN Security AAA tab. CoA first then Auth?

 

Thank you.

</necromancy>

grabonlee
Level 4
Level 4

Hello Aaron,

I have your book and I must commend you for a wonderful work. Please I would like to know how to achieve ISE failover if a Web redirect URL to a PSN is configured in the WLC in a LWA setup. The primary and secondary radius servers are specified in the WLAN SSID global configuration.

I have a logic that with the PSNs in a node group with no load balancer and the Radius fallback configured on the WLC, the redirect should go to the next PSN even without manually changing the redirect URL. Please what are your thoughts?

P.S

The customized HTML page resides on the PSN.

Jamil Salomon
Level 1
Level 1

Hi Jason,

A guide has been published.

http://www.cisco.com/c/dam/en/us/td/docs/security/ise/how_to/HowTo-95-Cisco_and_F5_Deployment_Guide-ISE_Load_Balancing_Using_BIG-IP.pdf
laposilaszlo
Level 1
Level 1

Hi  Aaron,

 

We are in the process of migrating our ISE infrastructure from ACE to F5.

We followed Craig Hyps document for the configuration.

 

All looks ok except EAP-TLS authentication. (PEAP user/computer works fine)

In the document there is nothing special mentioned that needs to be done for TLS.

 

I think it may be related to fragmentation but not sure.

I can also add here that if we point the NAD's to the PSN directly it works.

The problem is only when we use the VIP.

(PEAP work with the VIP also)

 

Do you know  if something special needs to be done for TLS to work.

Any information or hint is appreciated.

 

Thanks,

Laszlo

 

 

Sloanstar
Level 5
Level 5

TLS works through our vip config, nothing special. Small chain, endpoint->intermediate->CA

How large is the certificate chain you are sending?

You're not doing any NATing on the NADs correct?

What's the error message seen in ISE for the authentication attempts using TLS through the VIP?

Aaron Woland
Cisco Employee
Cisco Employee

Laszlo,

Chances are you are using certificates that are too large to not be fragmented everywhere, and your RADIUS Load Balancer is not re-assembling them before forwarding to the server in the server Farm.

So, the first packet is load-balanced based on the RADIUS data, but the 2nd packet is the fragment & has no RADIUS headers, i.e.: no ability to keep it going to the correct PSN. 

Make sure you are using the correct version of your F5..  They fixed the re-assembly of the fragments in a certain version & I believe Craig even calls it out in that guide.

Aaron

laposilaszlo
Level 1
Level 1

Hi Jason,

 

We have Root CA,  Primary, intermediary, host...not that big but it definitely not fitting in one packet.

NATing on the NAD's...what do you mean here?

NAD's are WLC's which work if we send them to the PSN directly(but this still goes trough the F5 hitting a forwarding virtual server)

If we direct them to the VIP which is a standard VS its not working.

The error message is:

Event5400 Authentication failed
Failure Reason12521 EAP-TLS failed SSL/TLS handshake after a client alert
ResolutionCheck whether the proper server certificate is installed and configured for EAP in the Local Certificates page ( Administration > System > Certificates > Local Certificates ). Also ensure that the certificate authority that signed this server certificate is correctly installed in client's supplicant. Check the previous steps in the log for this EAP-TLS conversation for a message indicating why the handshake failed. Check the OpenSSLErrorMessage and OpenSSLErrorStack for more information.

   

 F5 is 11.5.2 Build 441

Attached you can find the VS that we use for radius balancing.

 

What do you think?

 

thanks,

laszlo

 

laposilaszlo
Level 1
Level 1

It seems that 11.4 is the minimum recommended version....we are using 11.5

But it also says

1.6.0 HF2 incorporates performance enhancements that can improve RADIUS load balancing

performance.

 

Maybe I should try that one.

 

thanks,

laszlo

 

Aaron Woland
Cisco Employee
Cisco Employee

Have you also opened a ticket with F5?  You need to find out why it's not reassembling.

Aaron

Sloanstar
Level 5
Level 5

Laszlo,

 

We're running F5 v10.2.4 HF11 in production, with the TLS authentication working fine.

We're in the process of upgrading to v11.5.2 (HF4 I think) this is in our test are only at the moment and our lab is working fine.

The NAT question was specifically regarding SNAT; making sure that the traffic was traversing the LTM without changing the source IP address (will cause problems with COA if you are using SNAT)

Just want to level-set your topology and confirm that the PSNs next hop is through the LTM.

I have some sanitized configs from my LTM configuration somewhere as I was working with Craig on some of our use cases for his document. I just need to dig them up. They are tmsh, cause that's how we roll.

 

Edit:

Couldn't find the sanitized tmsh, so I just dumped the appropriate config from the CLI

This is a lab config - iRules might need a little tweaking, can't remember if we changed anything going from lab to production.

 

Keep in mind the LTM has it's own partition for ISE

Creation of partitions / vlans / routing tables is outside the scope and not included in the config.

The LTM ISE interface is the default gateway for the ISE PSNs

laposilaszlo
Level 1
Level 1

Hi Jason,

 

I am comparing now what I have with his.

My PSN also have the gateway in the LTM and no SNAT is performed on the incoming traffic.

 

Thanks,

laszlo

 

Sloanstar
Level 5
Level 5

I stopped seeing updates, in my experience this means it's fixed.

Just curious what the issue was, if you care to share.

laposilaszlo
Level 1
Level 1

Hi Jason,

 

Unfortunately not fixed yet. I opened a Tac case also today for this...but I don't think its cisco problem.(just to rule out ISE and the NAD's)

Next week I will have again professional support from F5 and I give an update if we fix it.

 

Thanks,

laszlo

 

 

 

Getting Started

Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community: