Solved: Re: Cisco ISE Dot1X / MAB - No Authentication Session after Reboot Issue

ucberry · ‎01-23-2020

We are doing a Cisco ISE wired 802.1X deployment with MAB + profiling and have found that one of our switch stacks (Catalyst 3850) running IOS XE 16.9.4 is behaving unexpectedly for MAB clients after a reboot.

Anytime we reboot the switch stack there are a handful of endpoints that are connected with an authenticated session + dACL, that after the reboot do NOT have an authentication session associated with it. The command “show authentication session interface GigabitEthernet x/y/z detail” yields a “No sessions match supplied criteria.” This means ISE is not pushing down a dACL and providing the correct level of access to the endpoint and is thus unreachable. The only solution so far has been to bounce the port whenever we see this occur. The devices that this issue is occurring on have all been either printers, IoT devices or UPSs. What is especially odd is that this issue is occurring on some identical models of printer and UPSs and not on the others (i.e. One APC Smart-UPS200 authenticates properly and the other one doesn’t)

We want a cleaner solution than bouncing ports and ideally there is a permanent fix so that we are not manually "kickstarting" the authentication process because of some other issue. Some things we have tried that have not worked:

Clear the mac address table
Clear device tracking database
disabling authentication on the port (no authentication open and re-enabling it)
Removing the endpoint from the ISE database and repeating the above tests

Some other important notes:

The switch stack is a stack of four WS-C3850-48Ps running IOS XE 16.9.4
We have already upgraded IOS and the problem persisted
The affected devices are spread across different switches in the stack.
We are able to replicate this issue every time we reboot with the same hosts
There are other printers (exact make /model) attached to this same switch stack that are not having the issue
We use macros to enable 802.1X and all access ports get the same macro to enable it
I’ve looked through the bug search tool and, I don’t see anything related, but there were certainly a lot to look through and I feel I could have missed something.

Has anyone experienced something similar? What are some things I should be looking at to troubleshoot?

Amen · ‎05-17-2023

I had the same issue after a power outage, the MAB only was working and the dot1x was all time failing so i didnt see the supplicant in the ise at all , the authentication was failing in the switch itself, ((( May 17 10:18:17.559 CEST: %DOT1X-5-FAIL: Authentication failed for client (043a.5b26.c951) on Interface Gi2/0/1 AuditSessionID 000000000000005F0013B79C
smdl7-4ar-cuac-1#
May 17 10:18:17.639 CEST: %MAB-5-SUCCESS: Authentication successful for client (043a.5b26.c951) on Interface Gi2/0/1 AuditSessionID 000000000000005F0013B79C )))

my switch was connected to another aggregation switch, and it was so difficult to reboot the aggregation switch due to the impact, so rebooted the access switch but it didn't help, we rebooted the aggregation switch it solved the issue.

View solution in original post

Damien Miller · ‎01-23-2020

My experience has been that this occurs with devices that go in to extended sleep modes. I would not expect steps 1 though 4 that you indicated would result in much, but I'm surprised that the port bounce a switch reload would cause wouldn't wake up these endpoints.

The switch should initiate an authentication flow as soon as it registers a new mac on the switch port. In the case of traditional flows, this would result in eapol start messages soliciting for a dot1x supplicant, and then eventually a MAB authentication if there is no response. Some policy sets now run these in parallel but the behavior isn't much different.

Now an alternative I have heard about but not directly encountered is with some IOT devices that have a poorly implemented network stack. In this case, the endpoint sees the PHY come up, tries to reach out and either can't because it is too early and the switch won't pass traffic, or there is a pre-auth acl/closed/low-impact mode in use. The endpoints network stack hangs up until the port is opened up.

So if you bounce the port or wake up the endpoints, and a new authentication session get's built, then it's very likely not a switch or ISE issue. I understand that it's still a frustrating issue, maybe some others have crafted an EEM script to bounce ports.

Arne Bier · ‎01-23-2020

I have encountered this in healthcare with devices that are running IP stacks that are not very chatty. They don't send out any keepalives, and they often give up very early during the DHCP DORA cycle ... by which time the 802.1X has timed out and then it's too late for MAB ... and then you have a stale mate. The port bounce might be a solution but this would have to be quite specific and only applied to certain "problem cases". At this point I would probably argue that these devices are not fit for a NAC solution (too much operational overhead and potential for creating more problems) and one might be better off doing traditional access VLAN configs (assuming that the port outlets are secured from potential tampering). If there is a risk of someone attaching a rogue device to a switch port then I would not advocate this workaround.

It might be worth investigating whether the end device can send out regular automated Ethernet traffic (keepalives, ping probes, etc) which would create some artificial activity on the port to allow MAB to work. At a minimum, any device that is configured to receive its IP address via DHCP has the obligation to perfrom a DHCP discovery as soon as it detects a link up. And furthermore - to CONTINUE doing this until it receives an offer. If it doesn't do this (or you miss the one and only DHCP Discovery packet) then you're entering a world of pain. If the IP address is statically configured, then it's even harder. The device has no reason to initiate any communication. Hence - no switch session is created.

It would be nice if the switch could send an automated icmp message to a device on a port that has link up, but where there is no MAB session. If the device responds to icmp echo then it would inform the switch and cause the MAB process. Perhaps this already exists - I have never thought about this.

ucberry · ‎01-26-2020

Arne,

I liked your note around looking at the endpoints to see if there are any sort of keepalives that might help here. I will investigate this.

Thanks!

ucberry · ‎01-26-2020

Hi Damien,

I appreciate the response!

I cannot say that I 100% know what extended sleep mode means from a networking stack perspective, but I will add some additional info of things we noticed and see if this lines up with what you were saying.

In our testing:

We are run a continuous ping to one of the devices
We confirm the device has an authenticated session on the switch with a dACL
We reload the switch (pings stop at this point obviously)
The switch finishes the boot cycle
The continuous ping never comes back and the authentication process for dot1X / MAB never starts
Devices of the same make and model (at least for the printers we've confirmed this) that we were not pinging throughout the process have finished the MAB process and received the correct dACL

I guess our original thought was that the continuous ping would negate the "extended sleep" scenario, but maybe it hasn't...?

It is also worth noting that we currently are in low impact mode and have a Pre-authentication ACL that allows:

DNS to anywhere
DHCP to anywhere
IP traffic to any of the ISE servers
IP traffic to vulnerability scanners
Blocks all other RFC1918 traffic
Ends with a permit ip any any

Appreciate your post!

andrewswanson · ‎01-24-2020

Hi

I assume you have dhcp snooping enabled but are you saving the DHCP snooping binding table so that it survives a switch reload?

I've seen similar issues with Xerox MFDs that wouldn't renew their DHCP lease after a switch reload (and so wouldn't appear in the switch's dynamically recreated DHCP snooping binding table).

To resolve this, the switch's binding table was saved to flash:

ip dhcp snooping database flash:<FILENAME>

hth
Andy

ucberry · ‎01-26-2020

Hi Andy,

I had not thought about DHCP snooping, so I liked where your head was at here.

It's been quite a while since I've done anything with DHCP so I may have missed something obvious, but it does not appear it is enabled on this switch stack:

OPSSW1#show ip dhcp snooping
Switch DHCP snooping is disabled
Switch DHCP gleaning is disabled
DHCP snooping is configured on following VLANs:
none
DHCP snooping is operational on following VLANs:
none
DHCP snooping is configured on the following L3 Interfaces:

Insertion of option 82 is enabled
   circuit-id default format: vlan-mod-port
   remote-id: 00e1.6d09.3d80 (MAC)
Option 82 on untrusted port is not allowed
Verification of hwaddr field is enabled
Verification of giaddr field is enabled
DHCP snooping trust/rate is configured on the following Interfaces:
Interface                  Trusted    Allow option    Rate limit (pps)
-----------------------    -------    ------------    ----------------

OPSSW1#show ip dhcp snooping binding
MacAddress          IpAddress        Lease(sec)  Type           VLAN  Interface
------------------  ---------------  ----------  -------------  ----  --------------------
Total number of bindings: 0

OPSSW1#show ip dhcp snooping statistics
 Packets Forwarded                                     = 0
 Packets Dropped                                       = 0
 Packets Dropped From untrusted ports                  = 0

OPSSW1#show run all | include dhcp snooping
ip dhcp snooping information option
ip dhcp snooping database write-delay 300
ip dhcp snooping database timeout 300
ip dhcp snooping verify mac-address
ip dhcp snooping verify no-relay-agent-address
no ip dhcp snooping wireless bootp-broadcast enable

OPSSW1#show ip dhcp snooping database detail
Agent URL :
Write delay Timer : 300 seconds
Abort Timer : 300 seconds

Agent Running : No
Delay Timer Expiry : Not Running
Abort Timer Expiry : Not Running

Last Succeded Time : None
Last Failed Time : None
Last Failed Reason : No failure recorded.

Total Attempts       :        0   Startup Failures :        0
Successful Transfers :        0   Failed Transfers :        0
Successful Reads     :        0   Failed Reads     :        0
Successful Writes    :        0   Failed Writes    :        0
Media Failures       :        0

First successful access: None

Last ignored bindings counters :
Binding Collisions    :        0   Expired leases    :        0
Invalid interfaces    :        0   Unsupported vlans :        0
Parse failures        :        0
Last Ignored Time : None

Total ignored bindings counters:
Binding Collisions    :        0   Expired leases    :        0
Invalid interfaces    :        0   Unsupported vlans :        0
Parse failures        :        0

Let me know your thoughts on if I missed something obvious and much appreciated for the initial reply!

Amen · ‎05-17-2023

I had the same issue after a power outage, the MAB only was working and the dot1x was all time failing so i didnt see the supplicant in the ise at all , the authentication was failing in the switch itself, ((( May 17 10:18:17.559 CEST: %DOT1X-5-FAIL: Authentication failed for client (043a.5b26.c951) on Interface Gi2/0/1 AuditSessionID 000000000000005F0013B79C
smdl7-4ar-cuac-1#
May 17 10:18:17.639 CEST: %MAB-5-SUCCESS: Authentication successful for client (043a.5b26.c951) on Interface Gi2/0/1 AuditSessionID 000000000000005F0013B79C )))

my switch was connected to another aggregation switch, and it was so difficult to reboot the aggregation switch due to the impact, so rebooted the access switch but it didn't help, we rebooted the aggregation switch it solved the issue.