cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
5971
Views
0
Helpful
15
Replies

CISCO ISE - Radius Failed Authentications

mattpant
Level 1
Level 1

Hi All,

 

Apologies if this is in the incorrect place - just wondering if somebody could help explain something. We have a wireless network which is managed and maintained by the local authority - the WLC is located remotely (Catalyst 9800-80) along with the ISE server.

 

I have limited access to viewing logs on ISE and I do have limited access to Cisco Prime to view things - we don't have any access to the config / setup.

 

Our clients connect to 9120AXI-E Access Points - Throughtout the day we have major issues with devices not being able to connect - this tends to happen more when they are roaming across the building - and can then take quite sometime to re-connect. When I look at the ISE Live log files I see the following... Could somebody tell me what the difference is between the one's listed as HOST\laptopname that keep showing up with a red cross and those that just have the laptop name that have green ticks - the MAC address against the two are the same.

 

Why are some listed as HOST\name and others not?? - How do we fix this if it is a problem.

 

Cisco ISE Radius.jpg


Thanks


Matt

 

2 Accepted Solutions

Accepted Solutions

Arne Bier
VIP
VIP

The "host/LAPTOP.." User-Name indicates that this is a machine authentication. I looks like the computer that is failing is perhaps not a domain joined computer, but the wireless 802.1X Supplicant has been configured for "User or computer authentication". The options that you select are crucial, depending on whether the laptop is domain joined or not. If it is not, then select User authentication. If it is domain joined, then at least choose Computer authentication. 

The "User or computer authentication" will cause a network authentication event to ISE for every

laptop bootup (Computer auth)

user login (User auth)

user logoff (Computer auth)

 

If the laptop is domain joined and you've configured the supplicant correctly and ISE is still unhappy then get the details of the failure by clicking on the details icon of the failed events. 

supplicant.png

View solution in original post

In your case, those authentications that are failed and showing "host/..." are not unique or strange authentications (due to, e.g. misconfigured supplicants). I have observed in the lab that a successful Computer authentication will not display the Identity as "host/workstation123" in Live Logs, but it automatically strips the "host/" - you will still see the "host/" in the details section though. But my point is, you're seeing the "host/" in the Live Logs because this is an incomplete/broken authentication and ISE hasn't done the "tidying up" job of stripping the prefix. Thus, I think it's a red herring.

 

Have you considered enabling 802.11r or some kind of key caching mechanism to limit the amount of authentications that are done every time a supplicant roams from AP to AP? It's worth considering.

In addition, Session Resume should be enabled in ISE to allow the TLS process to be greatly short circuited on subsequent connections. 

Administration > System > Settings >  Protocols > EAP-TLS / EAP-PEAP

 

Fats Reconnect is also an option but it has potential security side effects - when enabled, the authentication is bypassed on subsequent reconnects for a period of time (avoiding the authentication each time can save time) - if you were to disable an AD account (for example) then Fast Reconnect would still allow that user to connect for the period of time in which they are granted "fast reconnect".

View solution in original post

15 Replies 15

 I can not see any print, you may want to attach it instead.

This difference can be related to the windows version but surely is not the reason of your problems.

Dropping while roaming can be a coverage problem as well as any kind of disconnection.  Start by double cheking Access points distribuitions inside the building and better yet, perform a site survey.

 If everything is on place, then, the problem can be on the devices setup. 

 You said, you have limited access to the Prime but can you at least put a mac address on the Search windows and see how is it going?

mattpant
Level 1
Level 1

Hi,

 

Thanks for the reply - I’ve tried attaching the image - see if this works?

 

I don’t think coverage is an issue - another SSID on the same system that uses WPA2-PSK works fine - it’s only the devices that connect via RADIUS.

 

yes, I’m able to search prime for max adddress - and CISCO DNA Centre.

Click on the dot - that will give you detail on the errors.

H,

 

Sorry - I've only just got back in the office after the weekend - so didn't have access to the system to check...

 

When I click on the dot for more info, this is what it say's for the one's showing as red, auth failed (listed as HOST\hostname)

 

There's a number of them that show this:-

5440 Endpoint abandoned EAP session and started new

 

Then the odd one that shows as this:-

5411 Supplicant stopped responding to ISE
12934 Supplicant stopped responding to ISE during PEAP tunnel establishment

 

----

The machines are domain joined on our local network - the WLC sits on a different network at the councils HQ - our Domain has been added to the Cisco ISE.

Arne Bier
VIP
VIP

The "host/LAPTOP.." User-Name indicates that this is a machine authentication. I looks like the computer that is failing is perhaps not a domain joined computer, but the wireless 802.1X Supplicant has been configured for "User or computer authentication". The options that you select are crucial, depending on whether the laptop is domain joined or not. If it is not, then select User authentication. If it is domain joined, then at least choose Computer authentication. 

The "User or computer authentication" will cause a network authentication event to ISE for every

laptop bootup (Computer auth)

user login (User auth)

user logoff (Computer auth)

 

If the laptop is domain joined and you've configured the supplicant correctly and ISE is still unhappy then get the details of the failure by clicking on the details icon of the failed events. 

supplicant.png

Hi Arne,

 

Thanks for your reply and explaining the difference between HOST\HOSTNAME and HOSTNAME. I've just checked on our Group Policy which pushes out the WiFi settings and it's shown as this:-

 

Screenshot 2022-04-04 100029.jpg

The machines are Domain joined locally - the WLC and ISE sit remotely on a different network at the councils HQ - but our AD / Domain has been added into ISE.

 

Looking at the details on some of the failed authentications for HOST\HOSTNAME they show's as:-

 

5440 Endpoint abandoned EAP session and started new
12934 Supplicant stopped responding to ISE during PEAP tunnel establishment

 

Thanks


Matt

Hi Matt

 

oh dear. The dreaded “endpoint abandoned EAP”. That is the symptom of a problem (or problems) and the cause could be a few things. Sadly it could be an indication that coverage is poor and that clients are losing signal during an EAP tunnel establishment (which is a lengthy process where up to 20 packets can be exchanged). If a client is moving or the RSSI fluctuates to a level where the client has disassociated then the session with ISE will not work out. As the client comes back in range it will start EAP from scratch. Hence you see these messages. Something is causing this to happen. In a well designed wireless network your clients should always have enough coverage with an AP to avoid this from happening. 
I am guessing here but another cause might be that you have more than one controller and that a client might be roaming in and out of coverage onto another controller. ?? ISE can get confused and won’t like that. EAP has to come to a final conclusion on the same ISE node. 

Thanks for the update on this one...

 

What i don't understand though is why it only happens / recorded where the device is listed as HOST\HOSTNAME (HOST\Laptop) and it never happens on juts the HOSTNAME ??

 

Signal around the building isn't an issue - we have plenty of Access Points covering the site, we don't have the issue when using another SSID which uses WPA2 Enterprise PSK - only seems to be an issue when using Radius.

 

Cisco ISE Radius.jpg

 

One theory I have is that, from what I understand (I might be wrong) - but every time a user roams from 1 AP to another - the request has to authenticate to the WLC / ISE controller when using this Radius setup - with the WLC / ISE not located on the same network and hosted remotely - the connection to this server might be timing out / slowing down and it not passing the request on fast enough? - A limitation to the amount of devices on our network with the speed of the internet link maybe. - With PSK the authentication is done on the AP - so no requests going to the WLC / ISE server??

 

But I'm still myth'ed with why the Authentication Failures are only on the HOST\ entries - 

 

Thanks


Matt 

In your case, those authentications that are failed and showing "host/..." are not unique or strange authentications (due to, e.g. misconfigured supplicants). I have observed in the lab that a successful Computer authentication will not display the Identity as "host/workstation123" in Live Logs, but it automatically strips the "host/" - you will still see the "host/" in the details section though. But my point is, you're seeing the "host/" in the Live Logs because this is an incomplete/broken authentication and ISE hasn't done the "tidying up" job of stripping the prefix. Thus, I think it's a red herring.

 

Have you considered enabling 802.11r or some kind of key caching mechanism to limit the amount of authentications that are done every time a supplicant roams from AP to AP? It's worth considering.

In addition, Session Resume should be enabled in ISE to allow the TLS process to be greatly short circuited on subsequent connections. 

Administration > System > Settings >  Protocols > EAP-TLS / EAP-PEAP

 

Fats Reconnect is also an option but it has potential security side effects - when enabled, the authentication is bypassed on subsequent reconnects for a period of time (avoiding the authentication each time can save time) - if you were to disable an AD account (for example) then Fast Reconnect would still allow that user to connect for the period of time in which they are granted "fast reconnect".

We are dealing with the exact symptoms you have described.  We have an eap-tls wlan(computer auth) and it appears that devices will intermittently try to auth with "host/<computer name>".....which will fail every time.  Then the same computer will auth with "<computername>$@<domain name>" and it will succeed.  The problem is that it is intermittent and we suspect that it causes clients to get blacklisted by our "client exclusion timeout policy".  As you mentioned they are all 5411 and 5440 errors in ISE.  These locations have a dense AP deployment so I don't think it is signal related...and DNAC reflects this as well.

have you had any luck finding the answer?

We see almost the exact same issue, and are struggling to determine if the issue is caused by the client behaving badly or what could be the cause.

A small percentage (1-3 %) of the Windows (10) client expose this behaviour, even though all clients are domain-joined fine, and configured via GPO's.

Out issue is both on wired and wireless network.

The failing Windows client have 4 failed attempts (5440 Endpoint abandoned EAP session and started new),
before it finally succeeds on the 5 attempt, and the live log shows a success.

When it fails, the ISE logs something like this:

[SNIP]
12504 Extracted EAP-Response containing EAP-TLS challenge-response
12505 Prepared EAP-Request with another EAP-TLS challenge
11006 Returned RADIUS Access-Challenge
11001 Received RADIUS Access-Request
11018 RADIUS is re-using an existing session
12504 Extracted EAP-Response containing EAP-TLS challenge-response
12505 Prepared EAP-Request with another EAP-TLS challenge
11006 Returned RADIUS Access-Challenge
5440 Endpoint abandoned EAP session and started new (step latency=78124 ms Step latency=78124 ms)

 

and when it succeeds, it looks like this:

[SNIP]
12505 Prepared EAP-Request with another EAP-TLS challenge
11006 Returned RADIUS Access-Challenge
11001 Received RADIUS Access-Request
11018 RADIUS is re-using an existing session
12504 Extracted EAP-Response containing EAP-TLS challenge-response
12810 Prepared TLS ServerDone message
12571 ISE will continue to CRL verification if it is configured for specific CA - certificate for XXXXXX.XXXXX:XX
12571 ISE will continue to CRL verification if it is configured for specific CA - certificate for XXXXXXXXXXXX
12811 Extracted TLS Certificate message containing client certificate
[SNIP]

Any new ideas as to what happens on the client side ? no really visible errors are seen in the wired autoconfig logs in the windows client.

Arne Bier
VIP
VIP

Are your ISE PSN's behind a load balancer? If so, the persistence might not be setup/working correctly. But if the NAS is going directly to the PSNs, then check if the NAS is alternating the PSNs perhaps? If the PSN is always the same one, but client is having this issue, then check the PSN performance/latency KPI data - is the PSN overloaded? Is the link between PSN and NAS over a congested WAN?  So many variables to eliminate. 

Hi Arne, thanks for responding.

- We have no load balancer in this deployment.
- No WAN overload (lots of capacity)
- Only load issue we might have is related to profiling (Alarms: Profiler Queue Size Limit Reached)

I have tried doing a packet capture on the switch where a client was connected, but I only see the EAP/EAPOL traffic from the client towards the switch. I see lots of other traffic going TO the client, just not the EAP/EAPOL.

2023-11-17 09_38_38-oea-dot1xtest.pcap.png
the issue sounds a bit similar to this thread as well - both the client and the switch reports that the other party stopped responding

https://community.cisco.com/t5/network-access-control/nac-endpoint-abandoned-eap-session/td-p/4767631

I would just like to be able to show that with a capture as well.

It turns out to be a CRL checking issue.


Since we use OPEN mode with a PRE-AUTH accesslist, with openings for DNS and DHCP (for profiling to work properly),
the Windows client is able to resolve the name of the CRL server using DNS, but not able to reach the CRL server since there is no opening in the PRE-AUTH acl for TCP/80 (used for CRL checking in this case).
This somehow confuses the windows client, and will eventually have it end up in a wait-state waiting 1200 seconds before retrying to connect.

Three possible solutions are
1 - Remove the DNS opening in the PRE-AUTH acl. A test shows that this makes the client skip the CRL check and connect without delay.
2 - Add opening in PRE-AUTH acl for CRL servers (TCP/80). A test shows that this makes the client perform the CRL check and connect without delay.
3 - Disable CRL check in Windows client - possibly not recommended, since it is a global setting ?

The solution we will recommend the client is to remove the opening for DNS traffic, since it should not be needed for any clients, and it will fix the issue.
I'm still puzzled as to how this is supposed to work with CRL check and machine authentication.
Found this nice graphic below in https://community.cisco.com/t5/network-access-control/ise-deployment-eap-tls-machine-or-user-certificates-native/m-p/4095197/highlight/true#M560791
and it shows that clients should not expect DHCP/DNS to work before machine authentication (802.1x) has taken place.
So why is windows even trying to use DHCP/DNS before the 802.1x has succeeded ?

jyla_3-1704359440161.png