cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
2796
Views
0
Helpful
21
Replies

Outage on 802.1X Wifi

OrkhanRustamli
Level 1
Level 1

Hello all,

I have had WiFi network with 8 APs with one SSID which was PSK secured. One month ago, we have added I have created new SSID which is 802.1X secured with ISE but it turned to be nightmare for me. Some part of people are complaining that they face a lot of outages in using new wifi and it is so bad. I have checked and they were right, some people have weird problems. I updated all their drivers and OS, moreover I disabled DFS channels from DCA(My APs was seeing a lot of radar signals on DFS channels) but nothing helped. Interesting part is that people sitting in same room next to each other and one has problem while for another everything is smooth. I wonder whether that is connected with any timeout timers or nor. I have disabled Re-Authentication timeout and Idle Session Timeout which was not helpful. I am using Cisco recommended software.

 

Any help is appreciated.

21 Replies 21

pieterh
VIP
VIP

You need to supply some logging to see what's happening.

My guess is the client switches back and forth to the old /new SSID

1) ensure the old WLAN profile is removed or at least set to NOT connect automatically.

2) ensure dot1x profile is tested first in your ISE Configuration

3) do you use computer or user authentication for dot1x ->  set the WLAN profile accordingly.

 

May you tell me what command outputs are needed?

start with describing the environment 

- WLC type/model

- (Aire)OS  version 

- AP type/model

- ISE version

 

if windows clients look at the windows-event log

especially the "application and services log -> Microsoft -> windows -> WLAN autoconfig" logging

if possible from the wireless controller capture output from

- config paging disabled

- show logging

you may even debug a specific client

- debug client <MAC address>

and let this run for some time untill you've captured some examples of these "outage"

 

Hi,

WLC Model - AIR-CT3504-K9

Image - 8.5.151.0

AP Type - AIR-AP1852I-E-K9

ISE - 2.4.0.357 patch 9

 

Let me describe the problem again. We have fabric wifi network with 802.1x authentication. User devices are mostly MAC but also have important amount of Windows devices as well. I have started moving people from old legacy network to this Wifi where some part of people started complaining that network dropping constantly (WiFi is connected but no connection even to gateway). They can successfully authenticate to network and for starting everything is okay. However after half an hour or some time they start to lose internet and even ping to gateway is timed out. I did debug client in AP and WLC when they have problem and nothing. DNAC show smooth 10 health score connection for affected users at the problem time. I disabled Session timeout and make EAP_Broadcast key timeout to 86400 which not helped.

From WLC I see a lot of 

*Dot1x_NW_MsgTask_2: Jan 07 11:48:01.956: %DOT1X-4-MAX_EAPOL_KEY_RETRANS: [PA]1x_ptsm.c:550 Max EAPOL-key M1 retransmissions exceeded for client xxxxxxxxx
*Dot1x_NW_MsgTask_6: Jan 07 11:47:56.203: %DOT1X-4-MAX_EAPOL_KEY_RETRANS: [PA]1x_ptsm.c:550 Max EAPOL-key M1 retransmissions exceeded for client yyyyyyyyy

 logs but when I check the mac addresses I see that they are connected to old legacy WiFi which is PSK network but not 802.1X

 

Moreover I created new WiFi SSID with same network as 802.1X one and put affected users there and they said no problem at all which made me think that it is not about Fabric network.

Do you maybe have 802.11r enabled? If so, try it disabled.
Also you might want to try the latest release 8.5.160.0
https://www.cisco.com/c/en/us/td/docs/wireless/controller/release/notes/crn85mr6.html

Hi,

I have Fast Transition enabled both on both PSK and 802.1X network. I disabled it at 802.1X network and will try to monitor and see what happens but before that may you explain how it can have affects on 802.1X network.

See here what it does:

https://www.cisco.com/c/dam/en/us/td/docs/wireless/controller/technotes/80211r-ft/b-80211r-dg.html



In short, the wireless device drivers need to correctly handle those altered frames. If they don't (driver bugs, wlc bugs, OS bugs, ...), then they show very weird behavior.


Thank you for provided information but I still do not understand how it is not affecting PSK network but only 802.1X

Drivers react differently if it's 802.1x and PSK, so you never know.


They are different, just like open ssid. Did you open a TACc case so they can review how your ISE policies are defined? What is the common message from users about the new SSID? What is your experience when using the new SSID? Have you isolated what devices work well and what devices have issues? Have you isolated if the issue is on a specific area or floor?
-Scott
*** Please rate helpful posts ***

Hello, Sorry for late reply because I was on vacation.

*I opened TAC case but they not paid attention to ISE but mostly wanted me to get almost 20 commands debugs when problem happen which is pretty hard to get because I do not know when and which users face the issue and cannot get all of them immediately which resulted in closing the case. 

*I tried several tests for trying to figure out what is going on. 

*I created same network with only PSK where users tell that everything is smooth now.

The problem is not isolated to specific room or place because I hear users complaining from different places of office.

*The symptom of the problem is that users are still connected and WiFi icons don`t show any error mark but they basically does not have connection at all. I could catch some problematic laptops when problem appeared and in issue time they cannot even ping their anycast gateway which sits in closest edge switch.

*For another part of users the wifi is completely perfect without any error.

*After disabling FT, some people claimed that now it seems to be okay but almost after a week they started telling that again same problem.

Ok, then it could be an ACL or VLAN issue in regards to a policy that might get pushed from the ISE.


How can it be related with Policy where during issue laptops cannot even ping their Anycast Gateway in Fabric network which is the first switch after an AP. Moreover I am not pushing any VLAN in CoA. The only thing that can be related to VLAN that I am using Fabric Network managed by DNA Center. Unfortunately, it is not possible to create SSID from DNAC and provision to WLC because DNAC seems to have some BUG. For solving that I created fabric ssid from WLC manually where I also had to show VLAN INTERFACE (because it is mandatory drop down in WLC config). VLAN Interface and Fabric network is kinda strange to each other because in Fabric network APs play standalone role and don`t send any traffic to WLC except management.

Why I do not pay attention to this point because not everyone has problem with wifi.

Sorry I never worked with DNAC or Fabric, can't help you further.

In the old setups, what sometimes happened, was that a policy pushed via the ISE overwrote the VLAN and it the AP was in Flexconnect mode, but the switchport didn't carry the VLAN, the client attached to that AP was stranded.
Review Cisco Networking for a $25 gift card