06-18-2024 04:25 AM
Hey Guys,
im troubleshooting a client roaming issue within our environment and i need some help.
Environment - 9800-40 WLC in HA version 17.9.3, 9115 + 9130i APs, FlexConnect, CentralAuth, AAA override SSID, EAP-TLS
The issue is that the roams of the client are often incomplete (as can be seen in the DNAC) and client bounces between APs for quite some time before even completing the roaming process and it can take up to few minutes to roam. Client usually roams between 2 APs that are in the same Site Tag. Site tag has PMK propagation enabled, and i can see on the AP and on the WLC that PMKID for the client's mac is cached.
I did OTA captures on 4 channels to try to catch in the air what is actually happening and i noticed some strange behavior that i'll try to explain in bullets 
1. STA decides to roam, because it (dont ask me why) starts periodical off-channel scans for new APs on specified amount of time, and if it finds AP with RSSI better than 5dBm, it just roams (even if it has perfectly great RSSI/SNR). But okay 
2. STA sends probes and receives probe resp for AP with better RSSI, and it starts reassociation process
3. STA completes openauth with the AP it wants to roam to
4. STA sends out Reassociation request with PMKID inside the RSN information field
5. AP replies with Reassociation response
6. STA and AP proceed with 4-Way handshake using EAPOL
7.PTK/GTK are exchanged and STA starts to send some encrypted data
8. Here comes the funny part. - AP does not send any data from distribution to the AP (it shouldve had some cached data from previous AP), but instead it sends out an EAP identity request frame
9. STA ignores the EAP identity request (cause its already after 4WAY handshake)
10. AP sends out two more EAP identity request and then the STA gets deauthed
11. STA leaves the BSS (sends deauth frame) and starts to probe for new APs again
I can see in the RA traces that during this roam AP complains about the PMKID
2024/06/12 10:50:57.855260666 {wncd_x_R0-0}{1}: [dot11-validate] [21176]: (ERR): MAC: STA-macadd Failed to Dot11 validate dot11i pmkids. No matching pmkid for the pmk available in cache
When i check on the AP (show flexconnect pmk) i can see that the STA's mac address is present.
If the AP cant validate the PMKID why does it even continue with 4Way handshake? And why does it send out EAP-identity request after the 4WAY handshake has completed?
06-18-2024 05:25 AM
                                - FYI : https://bst.cloudapps.cisco.com/bugsearch/bug/CSCvz97359
         Have a go with https://software.cisco.com/download/home/286316412/type/282046477/release/Cupertino-17.9.5
M.
06-18-2024 07:39 AM - edited 06-18-2024 07:47 AM
Thx Marce
Thing is that we're currently on "fixed" release - 17.9.3 , and i dont think i can ask operations team to just upgrade since this is a 24/7 environment and we need to wait for proper maintenance window.
Anyway i looked at other clients roaming on these 4 channels, and all of them do only 4-Way handhsake (reassociation with cached PMKID) fine, and AP is not sending EAP-identity request to them.
I believe it might be the client that is sending some wrong/bogus PMKID in reassociaton request. Hence maybe the reason that i can see "Failed to Dot11 validate dot11i pmkids" in the RA traces.
But shouldnt AP deauth client immediately after finding that it doesnt have PMKID in the cache and not proceed with 4-way handshake at all?
06-18-2024 08:03 AM
         >...Thing is that we're currently on "fixed" release - 17.9.3
    - Sometimes for these kind of issues that needs to be taken with a grain of salt and going a few steps higher in versioning can help , I understand that is difficult for you at this moment : In the past I have sometimes advised people do download the 9800-CL (for any version chosen, which is always free for download )  and deploy as a VM (lab) for free testing of issues like this, Of course that requires time (and perhaps spare AP-equipment)
 -    Have a checkup of the current 9800-40 WLC  configuration with the CLI command show tech wireless , and have the output analyzed with https://cway.cisco.com/wireless-config-analyzer/
    Review all tabs (and advisories) in the resulting excell and check if anything related comes up
  - Make sure the client's wifi driver(s) are up to date
                            >...My question is why would AP even do this? 
   No direct insights on that.
M.
06-20-2024 10:50 AM
Maybe https://bst.cloudapps.cisco.com/bugsearch/bug/CSCwf60519 which would require an update to 17.9.5
 
					
				
				
			
		
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide