ā03-16-2023 04:41 AM
Hello Team,
we have Cisco 5520 WLC and we have upgraded WLC to 8.10.183.0 image version.
After this upgrade, few region started complaining issue in connecting and as per invistigation we found that user are stuck in 8021X_REQD state.
When we checked logs in ISE, we found error like 'suplicant stopped responding to ISE'
We already checked with CISCO TAC from wireless and ISE end but no any findings from them.
Anyone has similar issues at your end?
ā03-16-2023 05:46 AM
- Below you will find the output of your attached debug file when processed with https://cway.cisco.com/wireless-debug-analyzer/ ,
(I used the flag Show All ) I would look into things like disable fast roaming settings on the WLAN if applicable , update the client Wifi (NIC) drivers if not using the latest , :
Mar 16 19:35:51.343 | *Dot1x_NW_MsgTask_3 | WLC/AP is sending EAP-Identity-Request to the client | |
Mar 16 19:35:51.382 | *Dot1x_NW_MsgTask_3 | Client sent EAP-Identity-Response to WLC/AP | |
Mar 16 19:35:51.382 | *aaaQueueReader | Radius request with ID 150 sent to 172.28.139.138. | |
Mar 16 19:35:51.384 | *radiusTransportThread | Radius request with ID 150 sent to 172.28.139.138. | |
Mar 16 19:35:51.427 | *aaaQueueReader | Radius request with ID 151 sent to 172.28.139.138. | |
Mar 16 19:35:51.434 | *radiusTransportThread | Radius request with ID 151 sent to 172.28.139.138. | |
Mar 16 19:35:51.502 | *aaaQueueReader | Radius request with ID 152 sent to 172.28.139.138. | |
Mar 16 19:35:51.503 | *radiusTransportThread | Radius request with ID 152 sent to 172.28.139.138. | |
Mar 16 19:35:51.563 | *aaaQueueReader | Radius request with ID 153 sent to 172.28.139.138. | |
Mar 16 19:35:51.564 | *radiusTransportThread | Radius request with ID 153 sent to 172.28.139.138. | |
Mar 16 19:35:51.623 | *aaaQueueReader | Radius request with ID 154 sent to 172.28.139.138. | |
Mar 16 19:35:51.624 | *radiusTransportThread | Radius request with ID 154 sent to 172.28.139.138. | |
Mar 16 19:35:51.686 | *aaaQueueReader | Radius request with ID 155 sent to 172.28.139.138. | |
Mar 16 19:35:51.688 | *radiusTransportThread | Radius request with ID 155 sent to 172.28.139.138. | |
Mar 16 19:35:51.731 | *aaaQueueReader | Radius request with ID 156 sent to 172.28.139.138. | |
Mar 16 19:35:51.733 | *radiusTransportThread | Radius request with ID 156 sent to 172.28.139.138. | |
Mar 16 19:35:51.864 | *aaaQueueReader | Radius request with ID 157 sent to 172.28.139.138. | |
Mar 16 19:35:51.865 | *radiusTransportThread | Radius request with ID 157 sent to 172.28.139.138. | |
Mar 16 19:35:51.910 | *aaaQueueReader | Radius request with ID 158 sent to 172.28.139.138. | |
Mar 16 19:35:51.912 | *radiusTransportThread | Radius request with ID 158 sent to 172.28.139.138. | |
Mar 16 19:35:52.041 | *aaaQueueReader | Radius request with ID 159 sent to 172.28.139.138. | |
Mar 16 19:35:52.042 | *radiusTransportThread | Radius request with ID 159 sent to 172.28.139.138. | |
Mar 16 19:35:52.102 | *aaaQueueReader | Radius request with ID 160 sent to 172.28.139.138. | |
Mar 16 19:35:52.106 | *radiusTransportThread | Radius request with ID 160 sent to 172.28.139.138. | |
Mar 16 19:35:52.153 | *aaaQueueReader | Radius request with ID 161 sent to 172.28.139.138. | |
Mar 16 19:35:52.163 | *Dot1x_NW_MsgTask_3 | RADIUS Server permitted access | |
Mar 16 19:35:52.163 | *Dot1x_NW_MsgTask_3 | Client will be required to Reauthenticate in 43000 seconds |
|
Mar 16 19:35:52.163 | *Dot1x_NW_MsgTask_3 | 4-Way PTK Handshake, Sending M1 | |
Mar 16 19:35:52.217 | *Dot1x_NW_MsgTask_3 | 4-Way PTK Handshake, Received M2 | |
Mar 16 19:35:52.217 | *Dot1x_NW_MsgTask_3 | 4-Way PTK Handshake, Sending M3 | |
Mar 16 19:35:52.269 | *Dot1x_NW_MsgTask_3 | 4-Way PTK Handshake, Received M4 | |
Mar 16 19:35:52.269 | *Dot1x_NW_MsgTask_3 | Client has completed PSK Dot1x or WEP authentication phase | |
Mar 16 19:35:52.269 | *Dot1x_NW_MsgTask_3 | Client has entered DHCP Required state | |
Mar 16 19:35:54.552 | *emWeb | Client delete code: Multiple triggers That can be due to possible reasons: Received a CCX RM request from a client with CCX version lower than 2/ Radius server sent a disconnect request (RFC3576, etc)/ On some scenarios of client blacklist (administrator request)/ For HTTP profiling scenarios, after a vlan change, so policies can be reapplied, or when received policies have a different session timeout, from the client session timeout/ WLAN is deleted or disabledIn PMIPv6, MAG notified to delete the client/ Administrator request a client delete by CLI/GUI |
|
Mar 16 19:35:54.552 | *emWeb | Client expiration timer code set for 1 seconds. The reason: Dissasociation or deauthentication received from client, this is valid on 802.11w scenario. Also, generic termination clause, reason would be provided by pervious log message | |
Mar 16 19:35:55.398 | *apfReceiveTask | Client session has timed out | |
Mar 16 19:35:55.398 | *apfReceiveTask | Client disassociation event has occured. Possible reasons may be due to AP Radio Reset usually due to channel change or wlan was manually disabled or Client unable to get valid DHCP IP for WLAN using DHCP required | |
Mar 16 19:35:55.398 | *apfReceiveTask | Client has been deauthenticated | |
Mar 16 19:35:55.398 | *apfReceiveTask | Client session has timed out | |
Connection attempt #1 | |||
Mar 16 19:35:58.490 | *apfMsConnTask_0 | Client roamed to AP/BSSID BSSID 24:36:da:13:db:f6 AP CN-07928ap-04 | |
Mar 16 19:35:58.490 | *apfMsConnTask_0 | The WLC/AP has found from client association request Information Element that claims PMKID Caching support | |
Mar 16 19:35:58.490 | *apfMsConnTask_0 | The Reassociation Request from the client comes with 1 PMKID | |
Mar 16 19:35:58.490 | *apfMsConnTask_0 | WLC cannot find a valid PMKID to match the one provided by the client. However, if the client performs OKC and not SKC, the WLC computes a new PMKID based on the information gathered (the cached PMK, the client MAC address, and the new AP MAC address) | |
Mar 16 19:35:58.490 | *apfMsConnTask_0 | Client is entering the 802.1x or PSK Authentication state | |
Mar 16 19:35:58.490 | *apfMsConnTask_0 | Client has successfully cleared AP association phase | |
Mar 16 19:35:58.490 | *apfMsConnTask_0 | WLC/AP is sending an Association Response to the client with status code 0 = Successful association | |
Mar 16 19:35:58.526 | *Dot1x_NW_MsgTask_3 | Client will be required to Reauthenticate in 43000 seconds |
|
Mar 16 19:35:58.526 | *Dot1x_NW_MsgTask_3 | WLC/AP is sending EAP-Identity-Request to the client |
ā03-16-2023 07:41 AM
Just to add, you should always run a diff between the old configuration and the post upgrade configuration. This will show you what might have been added or something that might have been disabled or set back to default. Hopefully you have a backup config that you can run diff against.
ā03-16-2023 09:57 AM - edited ā03-16-2023 09:59 AM
And a bug which @Leo Laohoo pointed out to me https://bst.cloudapps.cisco.com/bugsearch/bug/CSCwe07802 which is fixed in the next maintenance release due out in the next week or two - ask TAC about that. (what AP model are you seeing this on?)
And of course as others said make sure your WiFi drivers are updated to the LATEST version. I say that because quite often people say "my drivers are up to date" (because Windows update hasn't offered anything new) but the driver they're using is 2 years older than the one on Intel web site. So if it's Intel then look at https://www.intel.com/content/www/us/en/download/19351/windows-10-and-windows-11-wi-fi-drivers-for-intel-wireless-adapters.html for example. (The earlier versions of those drivers are riddled with bugs)
ā03-16-2023 10:13 AM
i think this is the bug which is affecting. Issue is intermittent and randomly coming and pointing to EAP authentication.
let me work with TAC for next fix release
ā03-16-2023 10:18 AM
TAC should be able to give you a copy of the latest beta if you're willing to test it.
ā03-16-2023 11:32 AM
I have this bug too, seems to affect all devices, but is not consistent. Issue occurred over the weekend.
ā03-16-2023 11:59 AM
My question would be, did you run into this issue because you upgraded or did you finally notice that you were having user issues? There will always be bugs and the biggest take back is if you upgrade and users finally tell you that wireless sucks after a few weeks or months, then revert back. Users tend to find their fixes or let's say work arounds until it becomes a pain in their rear ends. I have done so many upgrades with testing and you will always run into one upgrade that bites you in to butt. The best way is not to wait for a fix and then upgrade to find out it's still broke or another issue happens, revert back and do further testing. At the end of the day, you can't blame the vendor for a bug, because management will always look at the person or team that made the change.
ā03-16-2023 12:03 PM
We were on 8.10.170 since it came out and ran into this issue on Monday (hmm DST happened Sunday). Upgraded to 183 based on TAC advice, no change. My debugs look exactly the same as the OP, the bug is logged this week. I have a 3504 controller with users hittings the same NPS server policy with no issues, but it runs 8.5. Why it would run fine for about a year, then screw up like this, I don't know, but at this point it must be a WLC bug.
ā03-16-2023 12:28 PM
Things just don't break. You need to look at patches on the Windows device that can also tend to break things. Upgrade of NIC firmware can also introduce issues. So you have to go back a month or so and see what was pushed and try to isolate the issue. New devices can also look like something just broke, but a bunch of users just got their laptop refreshed. Its best to gather data on the devices though some device management management system that can help with you correlating NIC model types and firmware along with patches to see what might of caused the issue. In all case, take time to reboot the controller or fail it to another controller to see if the issue goes away. Even though the controller seems okay, it just might not be. I have seen that too many times, just like folks whom never shut down their laptops and eventually its slow, has issues connecting ,etc.
ā03-28-2023 09:41 AM
The root cause was Azure fragmenting and delivering packets out of order from the NPS server. We needed to get Azure to enable UDP Fragment reordering as this behavior is by design.
ā03-28-2023 10:14 AM
I ran into this also a few months back and keep in mind that Azure engineer will enable this on an Azure virtual network for the subscription. If you have multiple rescue groups and need this feature, you will need to request them to enable this flag. If you create a new virtual network gateway, you will need to open a ticket to have them enable this flag.
I saw issue with ISE in Azure with only EAP-TLS and fragmentation when using an OTA capture.
ā03-17-2023 12:04 AM
Please look for clients where OS and/or drivers have been upgraded like @Scott Fella said, if something has been working consistently during the last months, and failures have appeared to all clients with a set of specifications (Intel on this case) look for the problem on that side.
I'd recommend you to subscribe to Intel communities where you can post the errors and work with Intel engineers into tracking down the issue and possibly fix it. In parallel, othe wNIC vendors do have known connectivity and performance issues under Windows such as Realtek and Mediatek so look always for the most up-to-date driver in MS Catalog Update, there are some scripts that do this for you only for drivers, search for them in Google.
ā03-17-2023 07:19 AM
I'd entertain assuming it was just Intel if it wasn't the same behavior on Apple and Android devices.
ā03-17-2023 05:57 PM
We are currently investigating an Intel-related wireless NIC driver issue where the NIC would drop association if the SSID is configured for WPA2 Enterprise. Dropouts with PSK will also occur but not as frequent with WPA2 Enterprise.
The matter first observed after a large fleet of ChromeBooks (CB) were having irregular dropouts. We brought this issue with Google and Google tapped Intel. Intel confirms issue with the NIC drivers.
According to Google, the issue is due to the GTK regeneration where the driver is unable to handle.
We suspect all drivers, up to 22.150.3 are affected.
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide