cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
6190
Views
0
Helpful
11
Replies

RADIUS Server Failover

fatalXerror
Level 5
Level 5

Hi Guys,

What is the condition for the NAD to declare the RADIUS server is dead? Is it just based on network reachability or service reachability?

 

Network reachability means NAD can just reach to the RADIUS server regardless if the server is too much loaded or having an issue.

 

Service reachability means, NAD can determine that the RADSIUS server service is up.

 

Thanks a lot.

11 Replies 11

Maxee
Level 1
Level 1

On IOS you can configure an "automate-tester" to check if your RADIUS responds. It can either be a failed response or a success, it doesn't matter.

The config on my switches looks like that (I don't think this exists in legacy IOS versions <15.0)

radius server ISE1
address ipv4 10.0.0.1 auth-port 1645 acct-port 1646
automate-tester username dummy-user ignore-acct-port probe-on
key xyxx
!
radius server ISE2
address ipv4 10.0.0.2 auth-port 1645 acct-port 1646
automate-tester username dummy-user ignore-acct-port probe-on
key xyxx
!


More information is available at this whitepaper by Cisco
https://www.cisco.com/c/en/us/products/collateral/ios-nx-os-software/identity-based-networking-services/whitepaper_C11-731907.html

Hi @Maxee ,

But what if I don't have that in my configuration, what will happen?

Also, how about for WLC?

Thanks

In IOS there are two criterias as to when RADIUS servers are marked dead.
1. The server did not respond to a (configured) number of retransmissions
2. The server did not respond for the (configured) timeout

both have to be met for switches to declare RADIUS servers dead. Both are service reachability "checks". With "automate-tester" you periodically check if the RADIUS services are available again to mark the RADIUS server as alive again before the configured timings expire.

For example

radius-server dead-criteria time 10 tries 3

This determines how long and how often the switch waits for RADIUS responses before it declares a server dead.

radius-server deadtime 15

This is the time how long a server is declared dead (it's used to prevent flapping)

 

With "automate-tester" the configured "deadtime" is skipped when it detects that the primary RADIUS is available again.

 

It's similar with WLCs. If they don't get properly formed responses from the RADIUS they declare servers dead.

On WLC you can configure three modes

Active: In a failover scenario the WLC takes the sends probe messages (like automate-tester) to the primary RADIUS (I think the default time is 5 minutes when the first probe is sent). If it detects the RADIUS is back online it uses the primary server (with the lowest priority) again.

Passive: If the RADIUS with the highest priority is down or unavailable the WLC takes the next server in the priority. It assumes the primary server is alive (after I think 5 minutes) and sends authentications to that again. If it is still dead it fails over again...

Off: No fallback

 

I hope that helped a bit :)

Hi @Maxee ,

thanks a lot for your great response.

So this automated tester us an optional feature and seems to be more like for a fallback mechanism like when the server is up again, the NAD will go to that server (network and service reachability), correct?

If I don't have the automated-tester feature, how long will it take for the NAD to determine that the RADIUS service is down?

In addition, this is also applicable if the RADIUS server is undergoing upgrade right (installing patches, updates, etc. phase)? The NAD can still determine that RADIUS service is dead in that scenario?

Thanks

So this automated tester us an optional feature and seems to be more like for a fallback mechanism like when the server is up again, the NAD will go to that server (network and service reachability), correct?

Yes correct. I recommend configuring the deadtime to prevent flapping and authentication disruption and to improve the authentication times in a fail state, otherwise the NAD will try to talk to the primary server first although it's dead.

 

If I don't have the automated-tester feature, how long will it take for the NAD to determine that the RADIUS service is down?

That's determined by the dead-criteria in this command which you can configure as needed.

radius-server dead-criteria time 10 tries 3

 

In addition, this is also applicable if the RADIUS server is undergoing upgrade right (installing patches, updates, etc. phase)? The NAD can still determine that RADIUS service is dead in that scenario?

Yes. If the RADIUS is being upgraded the service is unavailable and shouldn't respond to RADIUS requests thus triggering the dead-criteria timeout

Hi @Maxee ,

do you know the dead criteria configuration for WLC?

thanks

 

 

2019-02-27_21-23-08.pngYes. The default is pretty aggressive. If one endpoint has issues authenticating with a timeout of 2 seconds the WLC uses the next RADIUS.

 

I configured it to be 5 seconds and after about 3 or more endpoints having issues.

For this you type in the CLI "config radius aggressive-failover disable"

@Maxee , sorry i am not that in a wireless guy.

If you said aggressive failover, meaning timer is 2 seconds only? That is why you configure it 5 seconds to disable the aggressive failover?

In addition, it doesn't have like dead criteria tries like in the switch?

thanks

With aggressive failover I mean that it triggers the failover after only one endpoint has hit the authentication timeout of 2 seconds.
with the CLI command you can disable aggressive failover. Additionally I set the server timeout to 5 seconds.

 

In addition, it doesn't have like dead criteria tries like in the switch?

I think the aggressive-failover is the equivalent. I don't know if the WLC has this option

noted on that.

I believed the RADIUS behavior for IOS switches and WLC is different right? In IOS switches it will checks if the RADIUS is up in order but in WLC, it cannot fallback to its original RADIUS server unless you configure active fallback mechanism like the automated-tester?

thanks

It's a bit different yes.

WLC can fall back if you set it to active or passive just like the IOS switches but with a different name.
Getting Started

Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community: