cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
21068
Views
0
Helpful
31
Replies

Radius : fail / fallback - overview ?

I seem to have a little radius trouble.

I have two radius servers on a iPSK with radius SSID.

Everything has worked just fine.

But then things (hence this post) started , ehh , not working fine.

A bit of investigation (packet captures) shows that the AP sends Access-Request to the radius server, as expected, then nothing happens (aka no response) and the AP sends the request again (duplicate packet).

Apparently this continues, even though there are two radius servers configured on the SSID.

Why does it not switch to the secondary radius server ?

At the same time, I have another SSID running standard dot1x, nothing fancy, using the same radius servers in the same priority order. This seems to utilize radius 2 and , is my guess, switched over at some point.

Do anyone know where I can see that radius servers switched over in the eventlog ? (I cant seem to find such an option). - And is there a warning anywhere that tells me: "Oh look, your primary radius server has stopped responding " - Im guessing there is not 🙂

31 Replies 31

aleabrahao
Meraki Community All-Star
Meraki Community All-Star

You can also perform a packet capture.

I am not a Cisco employee. My suggestions are based on documentation of Meraki best practices and day-to-day experience.

Please, if this post was useful, leave your kudos and mark it as solved.

They have "immediately" closed my case. After replying that : "The AP will send Access-Request messages to configured RADIUS servers using identity 'meraki_8021x_test' to ensure that the RADIUS servers are reachable." - I didnt even have time to respond that i never see this IDs in messages from the AP, and I didnt get a response to why / how and where I can see Radius down in the UI or eventlog. - Perhaps I hit a nerve ?

(And as far as I know : meraki_8021x_test is only for the switches where there actually is a "Radius testing" option that seems well defined.)

And you don't even have the AP with the warning 'Recent 802.1X failures' ? That's very odd.

What MR version are you running ?

Have you tried enabling Radius testing on those AP ?

No fails in logs as far as I can tell. Not even Radius timeouts in the eventlog.

We were running 29.4.x and upgraded to 29.5.x just to see if anything changed, it did not.

I dont think there is an option to enable radius testing on wireless is there ?

There is on the switch, but Im unsure where to find the same feature on the AP side of the dashboard. For wireless it's just a single test right ? not continuous testing like on the switch.

If enabled Radius testing, Meraki devices will periodically send Access-Request messages to these RADIUS servers using identity 'meraki_8021x_test' to ensure that the RADIUS servers are reachable.

I am not a Cisco employee. My suggestions are based on documentation of Meraki best practices and day-to-day experience.

Please, if this post was useful, leave your kudos and mark it as solved.

But you cannot enable Radius testing for Wireless ? - please tell me where, because I can't find it for wireless.

I can find the "single" test, no problem, but the "Radius testing" like for switches, I can't find that.

Under Radius accounting servers:

image.png

I am not a Cisco employee. My suggestions are based on documentation of Meraki best practices and day-to-day experience.

Please, if this post was useful, leave your kudos and mark it as solved.

Ah yes there it is, I swear I have been staring at that page for so long trying to find i 🙂 I must be getting blind.

But the BIG question is, still.... Is this how the Meraki AP detects that the radius server is working or not ? (with testing ?)

- Im guessing no, because the normal behaviour, according to what I can read, is that it will try the primary server 3 times, then it will be "marked" as unreachable and secondary server will be used.

But what about fallback, you may ask. Well the documentation says :

"The fallback behavior depends on the order the servers are listed on the dashboard will dictate the priority of each one, For example:

  • Server 1 = priority 1
  • Server 2 = priority 2
  • Server 3 = priority 3

Where the available server with higher priority will be used (priority 1 is the highest). If Server 1 were to become unreachable, Server 2 would become active, and so on.

If the fallback option is enabled, once the server with higher priority recovers, the AP will switch back to using that preferred (higher priority) server.

-

Now the big question is, what does this mean : "once the server with higher priority recovers" how does the AP know ? ICMP ? The Radius testing (That is not enabled by default) or does it just, as I can see from captures, try the primary server with real auths, from clients once in a while ? (These clients will then have to wait until it times out, and tries the secondary server , Im guessing).

And the second question is, if the AP has detected that the primary server is unreachable, why is there not an alarm in the dashboard for this ? (Might not be a question for this forum, but rather Meraki support / development).

Meraki devices will periodically send Access-Request messages to these RADIUS servers using identity 'meraki_8021x_test' to ensure that the RADIUS servers are reachable.

Why don't you perform a simple packet capture?

I am not a Cisco employee. My suggestions are based on documentation of Meraki best practices and day-to-day experience.

Please, if this post was useful, leave your kudos and mark it as solved.

Raphael_L
Meraki Community All-Star
Meraki Community All-Star

That behavior is only true if Radius testing is enabled. Which is not by default.

aleabrahao
Meraki Community All-Star
Meraki Community All-Star

Yep, I mentioned It in a previous post. 😉

I am not a Cisco employee. My suggestions are based on documentation of Meraki best practices and day-to-day experience.

Please, if this post was useful, leave your kudos and mark it as solved.

But the question is now : Is radius testing required for the AP to do proper failover, and fail back ?

And there is still nothing on the dashboard that tells you that the radius server is down.

Support told me that there should be a "Radius server online/offline" type of event to filter in the event logs, i can't find it, or perhaps this is also only available when radius testing is turned on ? (I asked).

Regardless it just seems broken versus the documentation (in my opinion).

>Is radius testing required for the AP to do proper failover, and fail back

No, but it is much slower.

With RADIUS testing enabled, it regularly tests RADIUS servers, and if one is down and a real request comes in, it goes to the next working RADIUS server.

Without testing it tries the first RADIUS server and after it has timed out enough times, moves onto the next RADIUS server. Your client will need long enough time outs for this to work as well.

When RADIUS servers have failed they appear in the event log, and the device usually changes to an alerting status, and when you go into it the device status says their was a recent RADIUS failure.

This is my current response from support : "Radius testing is required to ensure that the RADIUS servers are reachable and if the primary server is not reachable then it will switch to the next available server in the list. Without the testing this will not be checked."

"When RADIUS servers have failed they appear in the event log, and the device usually changes to an alerting status, and when you go into it the device status says their was a recent RADIUS failure."

- And this was the problem here. No alerts, and nothing in the event log.

The dot1x SSID had changed to the secondary radius, but would "once in a while" try the primary one (with the ensuring timeout and everything as you mention), and I dont know why, it does this, to test if its back alive ? - and why does it try , also once in a while to ping the radius server, this seems "important", but no explanation.

The iPSK network would not change to the secondary radius, and just continued trying the primary one.

- But from all this it seems that there are really , ehhh , how do I put this : "inconsistencies" in how AP radius functions, what people think, and what the documentation and support says - and that is what annoys me the most.

Review Cisco Networking for a $25 gift card