cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
616
Views
4
Helpful
5
Replies

IOS RADIUS load balancing

Arne Bier
VIP
VIP

I was listening to the latest episode of Packet Pushers Heavy Networking talking about a customer’s experience with ditching their F5 in favour of the built in IOS-XE RADIUS load balancing feature. 

Is anyone else out there using this, and in particular, will it work well with 802.1X workflows as well as guest portals etc? 

It seems like a great solution for customers who don’t load balance at all, and pin all their traffic to a primary PSN 99% of the time. Or instead attempt some manual load balancing by varying which PSN is used by NAD groups. 

In the podcast you will hear them load balancing over 6 PSN’s - and which is usually something you’d expect to see done by an expensive load balancer setup. 

#bestkeptseceret

1 Accepted Solution

Accepted Solutions

I have customers using this for wired 802.1X and MAB.  It works quite well.  I haven't done any testing with wireless or guest though.

View solution in original post

5 Replies 5

I have customers using this for wired 802.1X and MAB.  It works quite well.  I haven't done any testing with wireless or guest though.

Arne Bier
VIP
VIP

Thanks @ahollifield  - I understand, that folks use this command mostly with the default batch value, and do not in include the optional "ignore-preferred-server" parameter.  And reports so far are that it just works - nice.

I'd have to debug this to ensure that with each 802.1X transaction, ALL the EAP messages are sent to the same PSN, and that hopefully, the accounting start/interim is also sent to the same PSN. At least, initially. 

The default batch size of 25 based on what? And how does that work when the batch has one transaction remaining that it can do to PSNa, and then an 802.1X request comes along ? Will it send the first to PSNa, and then flip over to PSNb for the remainder?  I don't understand how this batching works.

I'll have to look more closely at the debugs in this example but it's over simplified

 

 

@Arne Bier , 

I am the contributor that wrote the blogs and was a guest on the Packet Pushers podcast.  

I'm happy to report everything is still running smoothly.  We had a high-CPU event on one PSN on Friday, March 22nd.  The load balancing caused the queues to build up to that one PSN, which caused less load to be delivered to that PSN.  This is exactly what was supposed to happen, based on the Cisco documentation.  I'm attaching a graphic from our monitoring software.

We didn't notice the problem until Saturday.  We didn't get the ServiceNow ticket in place to reboot the PSN until Sunday evening.  By mid-Sunday it entered a new phase where the latency for that PSN went through the roof (i.e., thousands of milliseconds).  More load was shed because of the IOS-XE load balancing.  We rebooted the node, the CPU went back to normal, and the RADIUS load went back to normal.

Final analysis, this is exactly how things are supposed to work.  It remediated its self until a human had a chance to do a more permanent fix.

With regards to the Endpoint Ownership Change, we're still reading zero across the board, all the time.  I'm using the measurement methodology described here...

https://packetpushers.net/blog/cisco-ise-lb-4/

I personally think the syslog message is broken and doesn't report the 4_EndPoint_OwnerShip_Change metric correctly.  How could it be zero all the time?  

I'm going to be at Cisco Live in Las Vegas.  I'll try to hook up with deep subject mater expert to get more info.

Thanks @danmassa - you've done a great deal of advocacy for this feature and I hope that more people start using the feature. I am applying it wherever possible these days. Cisco Live Las Vegas - that's on my bucket list for sure!

Just run across your blogs at packetpusher on ISE deployment, great insight and very nice read! Thank you for sharing the story and thought process behind it!

Do you use CoA in your wireless environment with ISE for web portal or disconnect a user proactively? 

According to Configure guide

Restrictions for RADIUS Server Load Balancing

  • Incoming RADIUS requests, such as Packet of Disconnect (POD) requests are not supported.

  • Load balancing is not supported on proxy RADIUS servers and private server groups.

  • Load balancing is not supported on Central Web Authentication (CWA).

https://www.cisco.com/c/en/us/td/docs/wireless/controller/9800/17-9/config-guide/b_wl_17_9_cg/m_radius_server_load_balancing.html#restrictions-for-radius-server-load-balancing

Any more insight gained regarding this feature with ISE deployment in Cisco Live?

Just found another bug

https://quickview.cloudapps.cisco.com/quickview/bug/CSCwh61746

WLC 9800 RADIUS Load Balance with 802.1x authentication is invalid design  (This is questionable?)

More bug:

https://bst.cloudapps.cisco.com/bugsearch/bug/CSCvh03827

CWA failure when load balancing across ISE

Symptom: When load-balancing radius servers from the IOS-XE CLI, CWA will not work

Conditions: load balancing AAA servers on the 9800 WLC CWA WLAN

Workaround: Do not load balance with CWA

 

https://www.cisco.com/c/en/us/support/docs/wireless/catalyst-9800-series-wireless-controllers/213920-central-web-authentication-cwa-on-cata.html

Note: CWA does not work if you decide to load-balance (from the Cisco IOS XE CLI configuration) your radius servers due to Cisco bug ID CSCvh03827. The usage of external load balancers is fine.