cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
1644
Views
1
Helpful
4
Replies

Client Roaming issue - PMKID cache problem (IosXE 17.9.3

mqontt
Level 1
Level 1

Hey Guys,

im troubleshooting a client roaming issue within our environment and i need some help.

Environment - 9800-40 WLC in HA version 17.9.3, 9115 + 9130i APs, FlexConnect, CentralAuth, AAA override SSID, EAP-TLS

The issue is that the roams of the client are often incomplete (as can be seen in the DNAC) and client bounces between APs for quite some time before even completing the roaming process and it can take up to few minutes to roam. Client usually roams between 2 APs that are in the same Site Tag. Site tag has PMK propagation enabled, and i can see on the AP and on the WLC that PMKID for the client's mac is cached.

I did OTA captures on 4 channels to try to catch in the air what is actually happening and i noticed some strange behavior that i'll try to explain in bullets i'll also include a screens from OmniPeek (with notes)

1. STA decides to roam, because it (dont ask me why) starts periodical off-channel scans for new APs on specified amount of time, and if it finds AP with RSSI better than 5dBm, it just roams (even if it has perfectly great RSSI/SNR). But okay

2. STA sends probes and receives probe resp for AP with better RSSI, and it starts reassociation process

3. STA completes openauth with the AP it wants to roam to

4. STA sends out Reassociation request with PMKID inside the RSN information field

5. AP replies with Reassociation response

6. STA and AP proceed with 4-Way handshake using EAPOL

7.PTK/GTK are exchanged and STA starts to send some encrypted data

8. Here comes the funny part. - AP does not send any data from distribution to the AP (it shouldve had some cached data from previous AP), but instead it sends out an EAP identity request frame

9. STA ignores the EAP identity request (cause its already after 4WAY handshake)

10. AP sends out two more EAP identity request and then the STA gets deauthed

11. STA leaves the BSS (sends deauth frame) and starts to probe for new APs again

 

I can see in the RA traces that during this roam AP complains about the PMKID

2024/06/12 10:50:57.855260666 {wncd_x_R0-0}{1}: [dot11-validate] [21176]: (ERR): MAC: STA-macadd Failed to Dot11 validate dot11i pmkids. No matching pmkid for the pmk available in cache

When i check on the AP (show flexconnect pmk) i can see that the STA's mac address is present.

If the AP cant validate the PMKID why does it even continue with 4Way handshake? And why does it send out EAP-identity request after the 4WAY handshake has completed?

 

 

4 Replies 4

marce1000
VIP
VIP

 

                                - FYI : https://bst.cloudapps.cisco.com/bugsearch/bug/CSCvz97359
         Have a go with https://software.cisco.com/download/home/286316412/type/282046477/release/Cupertino-17.9.5

  M.



-- Each morning when I wake up and look into the mirror I always say ' Why am I so brilliant ? '
    When the mirror will then always repond to me with ' The only thing that exceeds your brilliance is your beauty! '

mqontt
Level 1
Level 1

Thx Marce

Thing is that we're currently on "fixed" release - 17.9.3 , and i dont think i can ask operations team to just upgrade since this is a 24/7 environment and we need to wait for proper maintenance window.

Anyway i looked at other clients roaming on these 4 channels, and all of them do only 4-Way handhsake (reassociation with cached PMKID) fine, and AP is not sending EAP-identity request to them.

I believe it might be the client that is sending some wrong/bogus PMKID in reassociaton request. Hence maybe the reason that i can see "Failed to Dot11 validate dot11i pmkids" in the RA traces.

But shouldnt AP deauth client immediately after finding that it doesnt have PMKID in the cache and not proceed with 4-way handshake at all?

 

         >...Thing is that we're currently on "fixed" release - 17.9.3
    - Sometimes for these kind of issues that needs to be taken with a grain of salt and going a few steps higher in versioning can help , I understand that is difficult for you at this moment : In the past I have sometimes advised people do download the 9800-CL (for any version chosen, which is always free for download )  and deploy as a VM (lab) for free testing of issues like this, Of course that requires time (and perhaps spare AP-equipment)

 -    Have a checkup of the current 9800-40 WLC  configuration with the CLI command show tech wireless , and have the output analyzed with https://cway.cisco.com/wireless-config-analyzer/
    Review all tabs (and advisories) in the resulting excell and check if anything related comes up

  - Make sure the client's wifi driver(s) are up to date

                            >...My question is why would AP even do this? 
   No direct insights on that.

  M.



-- Each morning when I wake up and look into the mirror I always say ' Why am I so brilliant ? '
    When the mirror will then always repond to me with ' The only thing that exceeds your brilliance is your beauty! '

Maybe https://bst.cloudapps.cisco.com/bugsearch/bug/CSCwf60519 which would require an update to 17.9.5

Review Cisco Networking for a $25 gift card