I have had WiFi network with 8 APs with one SSID which was PSK secured. One month ago, we have added I have created new SSID which is 802.1X secured with ISE but it turned to be nightmare for me. Some part of people are complaining that they face a lot of outages in using new wifi and it is so bad. I have checked and they were right, some people have weird problems. I updated all their drivers and OS, moreover I disabled DFS channels from DCA(My APs was seeing a lot of radar signals on DFS channels) but nothing helped. Interesting part is that people sitting in same room next to each other and one has problem while for another everything is smooth. I wonder whether that is connected with any timeout timers or nor. I have disabled Re-Authentication timeout and Idle Session Timeout which was not helpful. I am using Cisco recommended software.
Any help is appreciated.
You need to supply some logging to see what's happening.
My guess is the client switches back and forth to the old /new SSID
1) ensure the old WLAN profile is removed or at least set to NOT connect automatically.
2) ensure dot1x profile is tested first in your ISE Configuration
3) do you use computer or user authentication for dot1x -> set the WLAN profile accordingly.
start with describing the environment
- WLC type/model
- (Aire)OS version
- AP type/model
- ISE version
if windows clients look at the windows-event log
especially the "application and services log -> Microsoft -> windows -> WLAN autoconfig" logging
if possible from the wireless controller capture output from
- config paging disabled
- show logging
you may even debug a specific client
- debug client <MAC address>
and let this run for some time untill you've captured some examples of these "outage"
WLC Model - AIR-CT3504-K9
Image - 184.108.40.206
AP Type - AIR-AP1852I-E-K9
ISE - 220.127.116.117 patch 9
Let me describe the problem again. We have fabric wifi network with 802.1x authentication. User devices are mostly MAC but also have important amount of Windows devices as well. I have started moving people from old legacy network to this Wifi where some part of people started complaining that network dropping constantly (WiFi is connected but no connection even to gateway). They can successfully authenticate to network and for starting everything is okay. However after half an hour or some time they start to lose internet and even ping to gateway is timed out. I did debug client in AP and WLC when they have problem and nothing. DNAC show smooth 10 health score connection for affected users at the problem time. I disabled Session timeout and make EAP_Broadcast key timeout to 86400 which not helped.
From WLC I see a lot of
*Dot1x_NW_MsgTask_2: Jan 07 11:48:01.956: %DOT1X-4-MAX_EAPOL_KEY_RETRANS: [PA]1x_ptsm.c:550 Max EAPOL-key M1 retransmissions exceeded for client xxxxxxxxx
*Dot1x_NW_MsgTask_6: Jan 07 11:47:56.203: %DOT1X-4-MAX_EAPOL_KEY_RETRANS: [PA]1x_ptsm.c:550 Max EAPOL-key M1 retransmissions exceeded for client yyyyyyyyy
logs but when I check the mac addresses I see that they are connected to old legacy WiFi which is PSK network but not 802.1X
Moreover I created new WiFi SSID with same network as 802.1X one and put affected users there and they said no problem at all which made me think that it is not about Fabric network.
I have Fast Transition enabled both on both PSK and 802.1X network. I disabled it at 802.1X network and will try to monitor and see what happens but before that may you explain how it can have affects on 802.1X network.
Hello, Sorry for late reply because I was on vacation.
*I opened TAC case but they not paid attention to ISE but mostly wanted me to get almost 20 commands debugs when problem happen which is pretty hard to get because I do not know when and which users face the issue and cannot get all of them immediately which resulted in closing the case.
*I tried several tests for trying to figure out what is going on.
*I created same network with only PSK where users tell that everything is smooth now.
The problem is not isolated to specific room or place because I hear users complaining from different places of office.
*The symptom of the problem is that users are still connected and WiFi icons don`t show any error mark but they basically does not have connection at all. I could catch some problematic laptops when problem appeared and in issue time they cannot even ping their anycast gateway which sits in closest edge switch.
*For another part of users the wifi is completely perfect without any error.
*After disabling FT, some people claimed that now it seems to be okay but almost after a week they started telling that again same problem.
How can it be related with Policy where during issue laptops cannot even ping their Anycast Gateway in Fabric network which is the first switch after an AP. Moreover I am not pushing any VLAN in CoA. The only thing that can be related to VLAN that I am using Fabric Network managed by DNA Center. Unfortunately, it is not possible to create SSID from DNAC and provision to WLC because DNAC seems to have some BUG. For solving that I created fabric ssid from WLC manually where I also had to show VLAN INTERFACE (because it is mandatory drop down in WLC config). VLAN Interface and Fabric network is kinda strange to each other because in Fabric network APs play standalone role and don`t send any traffic to WLC except management.
Why I do not pay attention to this point because not everyone has problem with wifi.