cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
653
Views
0
Helpful
6
Replies

802.1x EAP-TLS roaming issue 91xx ap with AirOS 8.10.183

mqontt
Level 1
Level 1

Hey Guys,

i have noticed a strange issue with some clients while roaming (slow roaming since the clients does not support FT)

They are connected to our AAA override SSID using dot1x (EAP-TLS).

Some times when they roam the client loses the connectivity for few minutes and stays offline.

I've done some troubleshooting and captured some pcaps (from DNAC intelligent capture) and i see a common denominator when this happens.

1. Client decides to move to another AP

2. open auth exchange is fine

3. Reassociation request / respone is okay

4. client moves to dot1x auth

5. EAP frames are exchanged (certificate exchange)

6. !! here comes the strange thing. AP starts EAPOL (4way handshake) before it sends out the last EAP success message. So AP sends out EAPOL (4way handshake m1) and after that it sends EAP success message

7. Client gets confused and from its debug i can see message "EAP: EAP-Success Id mismatch - reqId=12 lastId=-1" and then client's EAP is being discarded and client deautentificates itself from the BSS and marks the BSS as "blacklisted" because of failed EAP

 

I noticed that client then jumps to another BSS and only finishes the succesful roam in case the EAP success is sent from AP as it should, so before the 4WAY handshake over EAPOL

see screens from wireshark capture (as i said its from intelligent capture from DNAC, so im not sure how relevent it is compared to true OTA capture which i am not really able to do cause of the intermittency of this issue / and complexity of the environment)

 

thanks for any help, cause i've been scratching my head with this one for quite some time..

i wonder if perhaps i might be running into these?

https://bst.cisco.com/bugsearch/bug/CSCwe11747
https://bst.cisco.com/bugsearch/bug/CSCwe75100

6 Replies 6

Scott Fella
Hall of Fame
Hall of Fame

Well let's start at some basic items.  Are you able to replicate the issue and can identify if all devices or certain devices have this issue.  You need to try to rule out device types and then compare same type devices and what is working and what is not.  Does certain laptops work fine, or maybe mobile devices.  Keep in mind, when you have devices that only needs to really connect to one SSID, you should push a policy in GPO for example, to not show your other SSID's as visible.  Basically you are filtering out the 'other' SSID's.  I have seen issues when the device all of a suddens try's to connect to a different SSID and then no longer connects or takes a long time to connect to a dot1x SSID.  Also identify if the issue is related to a particular area in the building, that also can help with your investigation.

-Scott
*** Please rate helpful posts ***

Hey Scott,

thanks!

currently only complaints are from this specific type of STA. Its a little complicated since these are not standard laptops/mobile phones/tablets. Its a manufacturing facility and these are some special work tools that communicate over 802.11.

They already tried changing the wireless radio chip on that particular STA for a different one but the problem still occurs. Its usually happening at the same physical location, but its intermittent (few times a week). I've already tried to see if there are any similar STAs around, that i could possibly compare these roams to.

Anyway I just dont get why the EAP success frame is sent out by the AP after the M1 of 4way handshake (might be an issue with intelligent capture pcap, but i cant be sure).  I can see in packet capture from the AP and also in the logs from the client.

If it is sent in correct order (EAP success, then EAPOL frames for 4way handshake) then everything is fine and STA is not complaining about incorrect ID in the EAP frame and authentication is working fine.

If it arrives after the 4way handshake has already started, it has incorrect ID (at least thats what STA is saying in its logs) and whole EAP process is torn down by the STAtion.

 

Are these APs 911x or 912x?

Yep.

Well, i saw it happen only on 9115i and 9130i. We do not have 9120 in this particular location

Other half of the APs we have is 2702i that are to be replaced soon. (and nobody complained in the physical space where aironet APs are deployed)

 

Rich R
VIP
VIP

Both of those bugs are actually duplicates of others:
https://bst.cisco.com/bugsearch/bug/CSCwd59921
https://bst.cisco.com/bugsearch/bug/CSCwb82694
Neither of them sounds to me like the same problem you've observed but the bug descriptions sometimes don't list all the effects of the bug so it's possible you're encountering one of those.  But there are plenty of other bug fixes since 8.10.183.0.  
I'd recommend upgrading to 8.10.190.0 as a starting point to eliminate all those bugs which have already been fixed.  If you still see the problem after that then it might be an unresolved bug but be aware that you are very unlikely to get a fix for it in that case as AireOS 8.10 is now "end-of-life":
https://www.cisco.com/c/en/us/products/collateral/wireless/8500-series-wireless-controllers/wireless-software-8-10-pb.html
If that's the case then your only hope for getting it fixed will be to migrate to 9800 series WLC.

Yea im pushing to do the upgrade asap.

I have 9800 ready, but still waiting to migrate this site over to the iosxe controller. Hopefully it will fix some of the issues we are currently experiencing.

Will definitely come back to this topic after its done thanks !

Review Cisco Networking products for a $25 gift card