08-26-2022 06:59 AM
our customer has many branches which are connected via MPLS lines.
The ISE PSNs and WLCs are located a cental datacenter.
We use MAB (+ACLs) for connecting the Cisco APs to the network. switchports have a reauth timer set to 12h. We use 3850s/9300s with IBNS Syntax.
it just happend that there was a MPLS outage for one of the branches. that usually isn't big of a deal, as ports which are already in a "authc succes" state will remain in that state as long as they don't portflap and since the DC resources are not reachable, no one cares anyway if their PC is connected or not. Also as soon as AAA is available again, the templates will reauth dot1x and mab.
We use the following template:
policy-map type control subscriber 1x_mab
event session-started match-first
10 class VOICE_VLAN do-until-failure
10 authenticate using mab
20 class always do-until-failure
10 authenticate using dot1x priority 10
20 authenticate using mab priority 20
event authentication-failure match-first
10 class AAA_SVR_DOWN_UNAUTHD_PHONE do-until-failure
10 activate service-template VOICE_SGT
20 authorize
30 pause reauthentication
30 class AAA_SVR_DOWN_AUTHD_HOST do-until-failure
10 pause reauthentication
20 authorize
40 class DOT1X_FAILED do-until-failure
10 terminate dot1x
50 class DOT1X_NO_RESP do-until-failure
10 terminate dot1x
20 authenticate using mab priority 20
60 class MAB_FAILED do-until-failure
10 terminate mab
20 authentication-restart 60
event agent-found match-all
10 class always do-until-failure
10 terminate mab
20 authenticate using dot1x retries 2 retry-time 2 priority 10
event violation match-all
10 class always do-until-failure
10 restrict
event authentication-success match-all
10 class MAB do-until-failure
10 terminate webauth
unfortunatelly there is a odd behaivior with Cisco APs when their reauth timer runs out, while the AAA Servers are not avaible.
instead of looping through:
60 class MAB_FAILED do-until-failure
10 terminate mab
20 authentication-restart 60
that usually works for unknown MAC addresses, because the MAB_FAILED class can't actually fail, due to authentication-restart 60 butit looks like this class gets skipped, since the MAB "Process" failed.
we see the following indication in the logs:
May 8 07:28:34.444: RADIUS: No response from (UNKNOWN:1645,1646) for id 1646/230
May 8 07:28:34.445: RADIUS/DECODE: No response from radius-server; parse response; FAIL
May 8 07:28:34.445: RADIUS/DECODE: Case error(no response/ bad packet/ op decode);parse response; FAIL
and the switchport goes into the following state:
xxxlabor-1#show access-session interface gigabitEthernet 1/0/39 details
Interface: GigabitEthernet1/0/39
IIF-ID: 0x1011880000001DA
MAC Address: 4001.7axxxx
IPv6 Address: Unknown
IPv4 Address: 10.50.xxxx
User-Name: 40017axxx
Status: Unauthorized
Domain: UNKNOWN
Oper host mode: multi-domain
Oper control dir: in
Session timeout: N/A
Restart timeout: N/A
session timeout is n/a, and it won't reauth, even is AAA is availble again.
In this state the APs is unable to reach the WLCs and will start a reboot loop, which unfortunatelly does not linkflapp the switchport. To recover we have to manually clear the session on the switch.
We could woraround by increasing the initial reauth persion to 48h, that would probaly limit the impact. also we thoght about checking for aaa-availble messages in the logs and restarting the template, like
event aaa-available match-all
10 class MAB do-until-failure
10 terminate mab
30 activate service-template xxxx
but i'm currently not sure if thats the best way to workaroud/solve this.
open for any inputs/feedback
Solved! Go to Solution.
08-28-2022 01:30 PM
Hello @samuel.heinrich
The class "MAB_FAILED" won't apply here in the case of a re-auth, because as you rightly said, MAB has not failed. The situation you are facing is that AAA servers are unavailable.
I am referencing the Wired Prescriptive Guide.
And that class looks like this:
class-map type control subscriber match-all AAA_SVR_DOWN_AUTHD_HOST
match result-type aaa-timeout
match authorization-status authorized
In the IBNS 2.0 Policy Map you would catch that class with a Policy such as:
event authentication-failure match-first
5 class DOT1X_FAILED do-until-failure
10 terminate dot1x
20 authenticate using mab priority 20
10 class AAA_SVR_DOWN_UNAUTHD_HOST do-until-failure
10 clear-authenticated-data-hosts-on-port
20 activate service-template CRITICAL_AUTH_ACCESS
30 activate service-template DEFAULT_CRITICAL_VOICE_TEMPLATE
40 authorize
50 pause reauthentication
20 class AAA_SVR_DOWN_AUTHD_HOST do-until-failure
10 pause reauthentication
20 authorize
...
....
The AAA server down detection only works if you have setup the dead detection mechanism in IOS-XE. These commands in the global config are typical examples:
radius-server dead-criteria time 10 tries 3
radius-server deadtime 15
You can check the aaa server status with the command
show aaa servers
In the lab, I would test this mechanism by blocking all IP traffic from switch towards all ise servers (deny ip any xxx where xxx is the IP of ISE PSN(s))
08-28-2022 01:30 PM
Hello @samuel.heinrich
The class "MAB_FAILED" won't apply here in the case of a re-auth, because as you rightly said, MAB has not failed. The situation you are facing is that AAA servers are unavailable.
I am referencing the Wired Prescriptive Guide.
And that class looks like this:
class-map type control subscriber match-all AAA_SVR_DOWN_AUTHD_HOST
match result-type aaa-timeout
match authorization-status authorized
In the IBNS 2.0 Policy Map you would catch that class with a Policy such as:
event authentication-failure match-first
5 class DOT1X_FAILED do-until-failure
10 terminate dot1x
20 authenticate using mab priority 20
10 class AAA_SVR_DOWN_UNAUTHD_HOST do-until-failure
10 clear-authenticated-data-hosts-on-port
20 activate service-template CRITICAL_AUTH_ACCESS
30 activate service-template DEFAULT_CRITICAL_VOICE_TEMPLATE
40 authorize
50 pause reauthentication
20 class AAA_SVR_DOWN_AUTHD_HOST do-until-failure
10 pause reauthentication
20 authorize
...
....
The AAA server down detection only works if you have setup the dead detection mechanism in IOS-XE. These commands in the global config are typical examples:
radius-server dead-criteria time 10 tries 3
radius-server deadtime 15
You can check the aaa server status with the command
show aaa servers
In the lab, I would test this mechanism by blocking all IP traffic from switch towards all ise servers (deny ip any xxx where xxx is the IP of ISE PSN(s))
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide