cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
1086
Views
5
Helpful
1
Replies

APs stuck in "MAB Authc failure" if AAA Servers are not available

samuel.heinrich
Level 1
Level 1

our customer has many branches which are connected via MPLS lines.

The ISE PSNs and WLCs are located a cental datacenter.

We use MAB (+ACLs) for connecting the Cisco APs to the network. switchports have a reauth timer set to 12h. We use 3850s/9300s with IBNS Syntax.

 

it just happend that there was a MPLS outage for one of the branches. that usually isn't big of a deal, as ports which are already in a "authc succes" state will remain in that state as long as they don't portflap and since the DC resources are not reachable, no one cares anyway if their PC is connected or not. Also as soon as AAA is available again, the templates will reauth dot1x and mab.

We use the following template:

 

 

 

policy-map type control subscriber 1x_mab
 event session-started match-first
  10 class VOICE_VLAN do-until-failure
   10 authenticate using mab
  20 class always do-until-failure
   10 authenticate using dot1x priority 10
   20 authenticate using mab priority 20
 event authentication-failure match-first
  10 class AAA_SVR_DOWN_UNAUTHD_PHONE do-until-failure
   10 activate service-template VOICE_SGT
   20 authorize
   30 pause reauthentication
  30 class AAA_SVR_DOWN_AUTHD_HOST do-until-failure
   10 pause reauthentication
   20 authorize
  40 class DOT1X_FAILED do-until-failure
   10 terminate dot1x
  50 class DOT1X_NO_RESP do-until-failure
   10 terminate dot1x
   20 authenticate using mab priority 20
  60 class MAB_FAILED do-until-failure
   10 terminate mab
   20 authentication-restart 60
 event agent-found match-all
  10 class always do-until-failure
   10 terminate mab
   20 authenticate using dot1x retries 2 retry-time 2 priority 10
 event violation match-all
  10 class always do-until-failure
   10 restrict
 event authentication-success match-all
  10 class MAB do-until-failure
   10 terminate webauth

 

 

 

unfortunatelly there is a odd behaivior with Cisco APs when their reauth timer runs out, while the AAA Servers are not avaible.

instead of looping through:

 

 

 

  60 class MAB_FAILED do-until-failure
   10 terminate mab
   20 authentication-restart 60

 

 

 

that usually works for unknown MAC addresses, because the MAB_FAILED class can't actually fail, due to authentication-restart 60 butit looks like this class gets skipped, since the MAB "Process" failed.

we see the following indication in the logs:

 

 

 

May  8 07:28:34.444: RADIUS: No response from (UNKNOWN:1645,1646) for id 1646/230
May  8 07:28:34.445: RADIUS/DECODE: No response from radius-server; parse response; FAIL
May  8 07:28:34.445: RADIUS/DECODE: Case error(no response/ bad packet/ op decode);parse response; FAIL

 

 

 

and the switchport goes into the following state:

 

 

 

xxxlabor-1#show access-session interface gigabitEthernet 1/0/39 details
            Interface:  GigabitEthernet1/0/39
               IIF-ID:  0x1011880000001DA
          MAC Address:  4001.7axxxx
         IPv6 Address:  Unknown
         IPv4 Address:  10.50.xxxx
            User-Name:  40017axxx
               Status:  Unauthorized
               Domain:  UNKNOWN
       Oper host mode:  multi-domain
     Oper control dir:  in
      Session timeout:  N/A
      Restart timeout:  N/A

 

 

 

session timeout is n/a, and it won't reauth, even is AAA is availble again.

 

In this state the APs is unable to reach the WLCs and will start a reboot loop, which unfortunatelly does not linkflapp the switchport. To recover we have to manually clear the session on the switch.

 

We could woraround by increasing the initial reauth persion to 48h, that would probaly limit the impact. also we thoght about checking for aaa-availble messages in the logs and restarting the template, like

 

 

 

event aaa-available match-all
  10 class MAB do-until-failure
   10 terminate mab
   30 activate service-template xxxx

 

 

 

 

but i'm currently not sure if thats the best way to workaroud/solve this.

 

open for any inputs/feedback

 

1 Accepted Solution

Accepted Solutions

Arne Bier
VIP
VIP

Hello @samuel.heinrich 

The class "MAB_FAILED" won't apply here in the case of a re-auth, because as you rightly said, MAB has not failed. The situation you are facing is that AAA servers are unavailable.

I am referencing the Wired Prescriptive Guide.

And that class looks like this:

class-map type control subscriber match-all AAA_SVR_DOWN_AUTHD_HOST
 match result-type aaa-timeout
 match authorization-status authorized

In the IBNS 2.0 Policy Map you would catch that class with a Policy such as:

event authentication-failure match-first
  5 class DOT1X_FAILED do-until-failure
   10 terminate dot1x
   20 authenticate using mab priority 20
  10 class AAA_SVR_DOWN_UNAUTHD_HOST do-until-failure
   10 clear-authenticated-data-hosts-on-port
   20 activate service-template CRITICAL_AUTH_ACCESS
   30 activate service-template DEFAULT_CRITICAL_VOICE_TEMPLATE
   40 authorize
   50 pause reauthentication
  20 class AAA_SVR_DOWN_AUTHD_HOST do-until-failure
   10 pause reauthentication
   20 authorize
  ...
  ....

The AAA server down detection only works if you have setup the dead detection mechanism in IOS-XE. These commands in the global config are typical examples:

radius-server dead-criteria time 10 tries 3
radius-server deadtime 15

You can check the aaa server status with the command

show aaa servers

In the lab, I would test this mechanism by blocking all IP traffic from switch towards all ise servers (deny ip any xxx   where xxx is the IP of ISE PSN(s)) 

 

 

 

View solution in original post

1 Reply 1

Arne Bier
VIP
VIP

Hello @samuel.heinrich 

The class "MAB_FAILED" won't apply here in the case of a re-auth, because as you rightly said, MAB has not failed. The situation you are facing is that AAA servers are unavailable.

I am referencing the Wired Prescriptive Guide.

And that class looks like this:

class-map type control subscriber match-all AAA_SVR_DOWN_AUTHD_HOST
 match result-type aaa-timeout
 match authorization-status authorized

In the IBNS 2.0 Policy Map you would catch that class with a Policy such as:

event authentication-failure match-first
  5 class DOT1X_FAILED do-until-failure
   10 terminate dot1x
   20 authenticate using mab priority 20
  10 class AAA_SVR_DOWN_UNAUTHD_HOST do-until-failure
   10 clear-authenticated-data-hosts-on-port
   20 activate service-template CRITICAL_AUTH_ACCESS
   30 activate service-template DEFAULT_CRITICAL_VOICE_TEMPLATE
   40 authorize
   50 pause reauthentication
  20 class AAA_SVR_DOWN_AUTHD_HOST do-until-failure
   10 pause reauthentication
   20 authorize
  ...
  ....

The AAA server down detection only works if you have setup the dead detection mechanism in IOS-XE. These commands in the global config are typical examples:

radius-server dead-criteria time 10 tries 3
radius-server deadtime 15

You can check the aaa server status with the command

show aaa servers

In the lab, I would test this mechanism by blocking all IP traffic from switch towards all ise servers (deny ip any xxx   where xxx is the IP of ISE PSN(s))