06-06-2023 02:50 AM
Hi Guys,
i have some reports from clients that they occasionally loose connectivity during authentication to EAP-TLS enabled SSID (that is using AAA override to steer them into correct vlan / flexconnect setup)
I've tried to check the live logs on the ISE and i can see that usually this issues can be traced to authentication failures which say
" 11514 Unexpectedly received empty TLS message; treating as a rejection by the client "
"Ensure that the client's supplicant does not have any known compatibility issues and that it is properly configured. Also ensure that the ISE server certificate is trusted by the client, by configuring the supplicant with the CA certificate that signed the ISE server certificate. It is strongly recommended to not disable the server certificate validation on the client!"
i can see that the username is just "USERNAME" and not the client certificate name as it should be.
does any of you have any experience what could be causing this ?
06-06-2023 03:48 PM
- What model of WLC?
- What version of software?
- What model of AP?
- What OS is client and what network driver and version?
Refer to TAC recommended code versions below.
Check WLC config with https://cway.cisco.com/wireless-config-analyzer/ - use output of "show tech wireless" if 9800
Make sure client OS and network driver are fully up to date
06-07-2023 12:12 AM - edited 06-07-2023 12:15 AM
Hi Rich
-5520 WLC version 8.10.183.0
- We have a mixed deployment at that particular location (large vehicle manufacturing hall) consisting mostly of 9115, 2700, 9130 and some 2800 APs (all together 190 in that physical location - production hall)
- OS and network drivers are sensitive topic, since the clients are mostly some proprietary manufacturing devices used to flash software to cars. End devices are based on some version of linux and using a linux supplicant software (wpa_supplicant), but since the end devices are not managed by us, it is hard to get any relevant info about drivers or basically anything.
i used the config analyzer couple of times, but nothing significant popped up, but i'll try to run it again and see.
Now i'm trying to run some client debugs on the WLC to see what is happening, cause form ISE's perspective it seems like it sends out the server cert chain in access-request radius messages but the last fragment of the cert is acked by the supplicant (or perhaps authenticator, hard to say without proper debugs) via radius access-challenge after 30sec and most likely it contains the empty TLS. So at least from this perspective it seems like it could be an issue with the supplicant on the client, but i want to see exactly what is happening at the authenticator (WLC) side.
But it is really hard to capture, since these problems are intermittent, and are happening only from time to time.
I was also thinking that this could be a problem of not using 802.11r and slow roaming (mostly because the one AAA override SSID we're using is being used by many proprietary / and some old / clients that do not support 802.11r, hence we have it disabled).
So when the cars are moving around the facility and they do not finish the EAP-TLS auth in time (cause they're roaming too fast) their supplicant might just gets stuck/confused.
So my other line of thought was to try to create a new 802.11r enabled SSID and test it out on that (but i need to run a wlan profiler to see if those clients even support 802.11r)
06-10-2023 10:21 AM
> since the end devices are not managed by us, it is hard to get any relevant info about drivers or basically anything
So maybe it would be better to have individual SSIDs with PSK and static VLAN assignment to keep it as simple as possible?
No harm experimenting with features like 11r but that can also cause problems for some clients.
06-10-2023 09:34 PM
Can you make sure certificate used for EAP on ISE is not expired and on client (supplicant) side CA that signed ISE Certificate is selected under "validate server certificate"
06-12-2023 12:48 AM - edited 06-12-2023 12:50 AM
I ran some debugs and i can see the WLC receiveing Radius messages containing the cert and then forwarding it to the client via EAPOL
All EAPOL messages are received by the client (as i can see on the debugs from the client) and it is able to verify the Serverside cert. But the problem occurs after the serverside cert verification when client should be sending its certificate
these are the logs from the wpa supplicant running on that device:
/var/opt/dsa/log/logs-today/wland.20.gz-r 230608 140934.499745 wland 1157 [wpasupp.cpp : 363] <3>CTRL-EVENT-EAP-STATUS status='remote certificate verification' parameter='success'
/var/opt/dsa/log/logs-today/wland.20.gz-I 230608 140934.499786 wland 1157 [wpasupp.cpp : 129] WPA_EV_MONITOR 0
/var/opt/dsa/log/logs-today/wland.20.gz-T 230608 140934.499822 wland 1157 [wland.cpp :5844] 7| <3>CTRL-EVENT-EAP-STATUS status='remote certificate verification' parameter='success'
A/var/opt/dsa/log/logs-today/wland.20.gz-I 230608 140934.500233 wland 1157 [wpasupp.cpp : 129] WPA_EV_MONITOR 0
/var/opt/dsa/log/logs-today/wland.20.gz:T 230608 140934.500270 wland 1157 [wland.cpp :5844] 7| <3>CTRL-EVENT-EAP-STATUS status='local TLS alert' parameter='decrypt error'
It says "decrypt error" and im thinking that this is most likely some issue with the client itself, because from infrastructure point of view, everything is okay. Client is just having some troubles sending its cert
and i could also see some strange behaviour in wlan driver of the client, where it constantly switches between regulatory domain 00 (generic world) and SK (our country code) which does not seem like a way wlan driver should behave..
But as always, when something doesnt work its network issue
11-17-2024 08:42 PM
Did anyone get a resolution to this.....
We have been having all sorts of random wifi drops since we upgraded ISE to version 3.3 all related to EAP-Key exchange timeout issues.
What we can establish so far is the following -
-Machine bypassing ISE and using WPA key on wifi 6 AX works fine.
-Machine pointing to ISE on Wifi 6 AX, failing authentication and or randomly dropping out.
-Machine disabling AX on WiFi NIC and moving to AC only, connected to ISE just fine with no key exchange timeout issues
We disabled AX on the WLC globally and still observed the issues until we disable AX on the NIC cards themselves. These are the Intel adaptors which others have reported as having major issues eg AX201, 211 etc. and yes we did put them on the latest 23.90 drivers at the time and they were still having issues until AX was disabled on the adaptors.
11-18-2024 03:39 PM
What model of controller and what version of software @omehmetoglu ?
11-18-2024 06:44 PM
We are running the 5520 on code 8.10.183.0, we did also trial it on 8.10.190.0 and still the same issue.
We never had this kind of instability when we were on ISE 3.2, we upgraded to 3.3 in July and about 1.5 weeks later we have seen a major increase in EAP-Key exchange timeout issues. We are seeing this across multiple devices, where previously it was working fine. We have had this wireless environment for about 5-6 years now and has been stable.
11-18-2024 11:28 PM
Well TAC will be reluctant to offer any support if you are not running 8.10.196.0 for a start ...
> we upgraded to 3.3 in July and about 1.5 weeks later we have seen a major increase in EAP-Key exchange timeout issues
Hmmm that doesn't stack up as a simple software issue unless it's a creeping death situation. If it is caused by ISE and time related you might need to be restarting ISE regularly. Either way you probably need to work it through with TAC but look for other possible causes that match that time line better. What else happened at that time - OS updates, certificate update etc?
11-18-2024 11:49 PM
We are actually upgrading the WLC to 8.10.196.0 tonight, but we have also been on 8.10.190.0 (since March) and have seen the problem on this code as well once we went to ISE 3.3 starting in July. I know for a fact that when we were on code 8.10.190.0 we never had this problem, so I know its not a WLC version issue.
We have not had any other changes other than that of ISE going to version 3.3 and more recently we are now on Patch 4 as we have seen performance issues with ISE since upgrading to 3.2 Patch 6 and have since moved to 3.3 Patch 4 as a bug was identified related to heap dump space issues.
This was only applied 1 weeks ago so we need to see if this has resolved the EAP timeout issues but at this stage we cannot confirm it.
We are at a point that we may even rebuild ISE in AWS back on 3.2 just to test the theory, we have also had an ongoing ticket with Cisco TAC for both items (ISE since March) and (WLC wifi drop outs since July), many TAC resources have investigated to this date and no one can figure it out just yet.
11-19-2024 12:03 AM
M.
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide