Client Roaming issue - PMKID cache problem (IosXE 17.9.3
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-18-2024 04:25 AM
Hey Guys,
im troubleshooting a client roaming issue within our environment and i need some help.
Environment - 9800-40 WLC in HA version 17.9.3, 9115 + 9130i APs, FlexConnect, CentralAuth, AAA override SSID, EAP-TLS
The issue is that the roams of the client are often incomplete (as can be seen in the DNAC) and client bounces between APs for quite some time before even completing the roaming process and it can take up to few minutes to roam. Client usually roams between 2 APs that are in the same Site Tag. Site tag has PMK propagation enabled, and i can see on the AP and on the WLC that PMKID for the client's mac is cached.
I did OTA captures on 4 channels to try to catch in the air what is actually happening and i noticed some strange behavior that i'll try to explain in bullets
1. STA decides to roam, because it (dont ask me why) starts periodical off-channel scans for new APs on specified amount of time, and if it finds AP with RSSI better than 5dBm, it just roams (even if it has perfectly great RSSI/SNR). But okay
2. STA sends probes and receives probe resp for AP with better RSSI, and it starts reassociation process
3. STA completes openauth with the AP it wants to roam to
4. STA sends out Reassociation request with PMKID inside the RSN information field
5. AP replies with Reassociation response
6. STA and AP proceed with 4-Way handshake using EAPOL
7.PTK/GTK are exchanged and STA starts to send some encrypted data
8. Here comes the funny part. - AP does not send any data from distribution to the AP (it shouldve had some cached data from previous AP), but instead it sends out an EAP identity request frame
9. STA ignores the EAP identity request (cause its already after 4WAY handshake)
10. AP sends out two more EAP identity request and then the STA gets deauthed
11. STA leaves the BSS (sends deauth frame) and starts to probe for new APs again
I can see in the RA traces that during this roam AP complains about the PMKID
2024/06/12 10:50:57.855260666 {wncd_x_R0-0}{1}: [dot11-validate] [21176]: (ERR): MAC: STA-macadd Failed to Dot11 validate dot11i pmkids. No matching pmkid for the pmk available in cache
When i check on the AP (show flexconnect pmk) i can see that the STA's mac address is present.
If the AP cant validate the PMKID why does it even continue with 4Way handshake? And why does it send out EAP-identity request after the 4WAY handshake has completed?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-18-2024 05:25 AM
- FYI : https://bst.cloudapps.cisco.com/bugsearch/bug/CSCvz97359
Have a go with https://software.cisco.com/download/home/286316412/type/282046477/release/Cupertino-17.9.5
M.
-- Each morning when I wake up and look into the mirror I always say ' Why am I so brilliant ? '
When the mirror will then always repond to me with ' The only thing that exceeds your brilliance is your beauty! '
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-18-2024 07:39 AM - edited 06-18-2024 07:47 AM
Thx Marce
Thing is that we're currently on "fixed" release - 17.9.3 , and i dont think i can ask operations team to just upgrade since this is a 24/7 environment and we need to wait for proper maintenance window.
Anyway i looked at other clients roaming on these 4 channels, and all of them do only 4-Way handhsake (reassociation with cached PMKID) fine, and AP is not sending EAP-identity request to them.
I believe it might be the client that is sending some wrong/bogus PMKID in reassociaton request. Hence maybe the reason that i can see "Failed to Dot11 validate dot11i pmkids" in the RA traces.
But shouldnt AP deauth client immediately after finding that it doesnt have PMKID in the cache and not proceed with 4-way handshake at all?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-18-2024 08:03 AM
>...Thing is that we're currently on "fixed" release - 17.9.3
- Sometimes for these kind of issues that needs to be taken with a grain of salt and going a few steps higher in versioning can help , I understand that is difficult for you at this moment : In the past I have sometimes advised people do download the 9800-CL (for any version chosen, which is always free for download ) and deploy as a VM (lab) for free testing of issues like this, Of course that requires time (and perhaps spare AP-equipment)
- Have a checkup of the current 9800-40 WLC configuration with the CLI command show tech wireless , and have the output analyzed with https://cway.cisco.com/wireless-config-analyzer/
Review all tabs (and advisories) in the resulting excell and check if anything related comes up
- Make sure the client's wifi driver(s) are up to date
>...My question is why would AP even do this?
No direct insights on that.
M.
-- Each morning when I wake up and look into the mirror I always say ' Why am I so brilliant ? '
When the mirror will then always repond to me with ' The only thing that exceeds your brilliance is your beauty! '
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-20-2024 10:50 AM
Maybe https://bst.cloudapps.cisco.com/bugsearch/bug/CSCwf60519 which would require an update to 17.9.5
Please click Helpful if this post helped you and Select as Solution (drop down menu at top right of this reply) if this answered your query.
------------------------------
TAC recommended codes for AireOS WLC's and TAC recommended codes for 9800 WLC's
Best Practices for AireOS WLC's, Best Practices for 9800 WLC's and Cisco Wireless compatibility matrix
Check your 9800 WLC config with Wireless Config Analyzer using "show tech wireless" output or "config paging disable" then "show run-config" output on AireOS and use Wireless Debug Analyzer to analyze your WLC client debugs
Field Notice: FN63942 APs and WLCs Fail to Create CAPWAP Connections Due to Certificate Expiration
Field Notice: FN72424 Later Versions of WiFi 6 APs Fail to Join WLC - Software Upgrade Required
Field Notice: FN72524 IOS APs stuck in downloading state after 4 Dec 2022 due to Certificate Expired
- Fixed in 8.10.196.0, latest 9800 releases, 8.5.182.12 (8.5.182.13 for 3504) and 8.5.182.109 (IRCM, 8.5.182.111 for 3504)
Field Notice: FN70479 AP Fails to Join or Joins with 1 Radio due to Country Mismatch, RMA needed
How to avoid boot loop due to corrupted image on Wave 2 and Catalyst 11ax Access Points (CSCvx32806)
Field Notice: FN74035 - Wave2 APs DFS May Not Detect Radar After Channel Availability Check Time
Leo's list of bugs affecting 2800/3800/4800/1560 APs
Default AP console baud rate from 17.12.x is 115200 - introduced by CSCwe88390
