cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
1316
Views
0
Helpful
6
Replies

Bizarre client behavior of Autonomous Access Points (1130AG and 1242)

gruntsbp40
Level 1
Level 1

Before considering solutions, I must explain that there are two things I can NOT consider - First, migrating to a controller-based environment.  My preference, but not an option. Second, upgrading the drivers on the client machines, again my preference, but is not a permanent solution I can consider (I'll explain why later - I want this to be about how to troubleshoot the problem, not avoid the problem).

For the background information: We are a contractor that works primarily for a large enterprise and as such, most of us are issued two laptops, one from our company, one from the enterprise.  The laptops are Dell Latitudes and 99% run the Intel Centrino 6205-N wifi chipset. This installation has been working for 3+ years, and manifested for the first time in summer 2013. We have a full wired infrastructure, so the easy solution has been CAT-5.

The basic problem: Completely random spontaneous loss of connectivity via wireless.

Detailed symptoms and temporary resolution: The client will maintain association and ip address, however, the yellow splat will appear over the signal bars and no traffic will leave or return to the laptop until either the manual switch is toggled or the adapter is disabled and re-enabled.  The time interval that connectivity can be maintained ranges from 30 seconds to more than an hour, but most of the time is between 5 and 10 minutes. Resetting the adapter always restores the connectivity for a brief period.

Affected users: TOTALLY AND COMPLETELY RANDOM. Here's what I know - None of the affected laptops has this problem anywhere else (home, enterprise campus, remote sites, starbucks, airports). Most devices running driver rev 14.0 work fine - we have 2 users out of 5 on that driver revision that has the problem. Almost all users on driver revision 15.3 have the problem. NO users on revision 15.9 ever have the problem. Because none of the laptops are affected anywhere but this office, and we can't tell our visiting customer to "upgrade drivers" when they show up for meetings, I CAN NOT simply tell people to "update drivers to use our wireless, even though your wireless works everywhere else".  There is also a single Broadcom laptop that just this week decided 9am is the cut-off time for its operation.  Works fine, then at 9am, gets the yellow splat and does not work for the rest of the day. Bizarre in the extreme.

TROUBLESHOOTING I HAVE DONE:

  1. Upgraded the software of my access points
  2. Pulled a heat map of our floor looking for sources of RFI, or neighboring access points and associated channels
  3. Manually set my channels to 1,6,11 and manipulated the transmit power
  4. Disabed Aironet Extensions
  5. Manually manipulated user density on the APs by forcing some people on the wired and making people shut off wifi on cell phones
  6. Removed our non-broadcast ssid and run the AP's in guest mode only
  7. Reviewed logs for sign of problems and these messages seem to appear each time the client loses connectivity:

*Mar  1 20:26:49.069: %DOT11-6-DISASSOC: Interface Dot11Radio0, Deauthenticating Station 64a3.cb4f.1023 Reason: Previous authentication no longer valid 
*Mar  1 20:26:49.076: %DOT11-4-MAXRETRIES: Packet to client 64a3.cb4f.1023 reached max retries, removing the client

The "recommended action" for such messages appears to be "NONE": http://www.cisco.com/c/en/us/td/docs/wireless/access_point/12-3_2_JA/configuration/guide/i1232sc/s32err.html

I also have been surfing the web, trying to find similar issues, but mine are so specific that I can't seem to pin down anything that can help. I'm very confident the problem is an 802.11 problem, but since I don't have an 802.11 sniffer (and wouldn't know what to do with it if I did have it probably), I believe I need to debug the radio on the AP to at least see what is triggering the disconnects.  My two major obstacles to determining this is that I don't know what debugs to run, nor do I have any idea how to parse through the output.  I have also consulted the "troubleshooting autonomus AP's" document, but that is so basic, it doesn't really help: http://www.cisco.com/c/en/us/td/docs/wireless/access_point/1130/installation/guide/1130-TD-Book-Wrapper/113h_c3.html#wp1057645

So I need some suggestions for how to either alter my assumption of what the problem is, or get some direction for debugging the 802.11 operation on the WAP and interpreting it to at least find out what the *actual* problem might be.  Thanks for any assistance.

6 Replies 6

Leo Laohoo
Hall of Fame
Hall of Fame

Because none of the laptops are affected anywhere but this office, and we can't tell our visiting customer to "upgrade drivers" when they show up for meetings, I CAN NOT simply tell people to "update drivers to use our wireless, even though your wireless works everywhere else".

No you can't but you can send a nightly pre-package via MS SMS to distribute the firmware.  We did this to 25K wireless netbooks because they were just dropping packets.  We upgraded our netbooks to firmware versions 15.6 and all our problems go away.  

http://supportforums.cisco.com/discussion/11388801/intermittent-dropouts-intel-centrino-advanced-n-6205

This is a known issue with the Intel Centrino, as discussed in the link above.  

Leo - 

You misunderstand - we don't have administrative control over the laptops in question. They belong to A COMPLETELY SEPARATE ADMINISTRATIVE AUTHORITY/CORPORATION/DOMAIN/ETC and we can NOT tell them - "We know your wireless works fine everywhere else but our office, can you update your drivers so you can connect to *our* wireless?". This is unreasonable.

As for the "solution", here are a few things I've been up to - 

For our autonomous AP's, they appear to only run "Client" MFP, and the problem says it's with "infrastructure" MFP. A look at the problem AP's in question shows NO CHECKBOXES in any of the "MFP" options.

I decided based on the density of neighboring 802.11 sources (there are over 20), to investigate the possibility that when it wasn't the centrino issue that it was a resource contention/RFI issue and that upgrading to 3602's would solve the problem.

Amazingly, when I deployed a 3602 to replace an 1130 that I was working fine with, my disconnect problem has re-occurred.

TO BE CLEAR - I experienced this issue running driver revision 15.1 and AP model 1130. Now I have deployed a 3602 to improve performance and even though I'm on the latest driver now (15.9), the symptoms are exactly the same - I connect for 2 - 20 minutes, then I get the yellow splat and I have to reset my wireless adapter to reconnect.  WTF OVER?

Using the linked articles and still hoping that it is related to MFP, and the following config guide for MFP on autonomous AP's, I attempted to disable it:

 http://www.cisco.com/c/en/us/td/docs/wireless/access_point/12-4_3g_JA/configuration/guide/ios1243gjaconfigguide/s43roamg.html#wp1113447

I issued the following at the CLI:

no dot11 ids mfp det/dis/gen

which I assume is the same as NOT CHECKING THE BOXES in the GUI which I already knew were unchecked.

These are the log messages that occurred during the time the AP repeatedly disconnects my workstation. How can I determine the root cause?  I don't believe MFP has anything to do with our probem!

ar 28 18:20:05 UTC: %DOT11-4-MAXRETRIES: Packet to client 0811.960c.b214 reached max retries, removing the client
Mar 28 18:20:05 UTC: %DOT11-6-DISASSOC: Interface Dot11Radio0, Deauthenticating Station 0811.960c.b214 Reason: Previous authentication no longer valid 
Mar 28 18:20:05 UTC: %DOT11-4-MAXRETRIES: Packet to client 0811.960c.b214 reached max retries, removing the client
Mar 28 18:20:12 UTC: %DOT11-6-ASSOC: Interface Dot11Radio0, Station   0811.960c.b214 Associated KEY_MGMT[WPAv2 PSK]
Mar 28 18:21:00 UTC: %DOT11-4-MAXRETRIES: Packet to client 0811.960c.b214 reached max retries, removing the client
Mar 28 18:21:00 UTC: %DOT11-6-DISASSOC: Interface Dot11Radio0, Deauthenticating Station 0811.960c.b214 Reason: Previous authentication no longer valid 
Mar 28 18:21:00 UTC: %DOT11-4-MAXRETRIES: Packet to client 0811.960c.b214 reached max retries, removing the client
Mar 28 18:21:08 UTC: %DOT11-6-ASSOC: Interface Dot11Radio0, Station   0811.960c.b214 Associated KEY_MGMT[WPAv2 PSK]
Mar 28 18:22:34 UTC: %DOT11-4-MAXRETRIES: Packet to client 0811.960c.b214 reached max retries, removing the client
Mar 28 18:22:34 UTC: %DOT11-6-DISASSOC: Interface Dot11Radio0, Deauthenticating Station 0811.960c.b214 Reason: Previous authentication no longer valid 
Mar 28 18:22:34 UTC: %DOT11-4-MAXRETRIES: Packet to client 0811.960c.b214 reached max retries, removing the client
Mar 28 18:22:42 UTC: %DOT11-6-ASSOC: Interface Dot11Radio0, Station   0811.960c.b214 Associated KEY_MGMT[WPAv2 PSK]
Mar 28 18:25:09 UTC: %DOT11-4-MAXRETRIES: Packet to client 0811.960c.b214 reached max retries, removing the client
Mar 28 18:25:09 UTC: %DOT11-6-DISASSOC: Interface Dot11Radio0, Deauthenticating Station 0811.960c.b214 Reason: Previous authentication no longer valid 
Mar 28 18:25:09 UTC: %DOT11-4-MAXRETRIES: Packet to client 0811.960c.b214 reached max retries, removing the client
Mar 28 18:25:17 UTC: %DOT11-6-ASSOC: Interface Dot11Radio0, Station   0811.960c.b214 Associated KEY_MGMT[WPAv2 PSK]
Mar 28 18:28:48 UTC: %DOT11-4-MAXRETRIES: Packet to client 0811.960c.b214 reached max retries, removing the client
Mar 28 18:28:48 UTC: %DOT11-6-DISASSOC: Interface Dot11Radio0, Deauthenticating Station 0811.960c.b214 Reason: Previous authentication no longer valid 
Mar 28 18:28:48 UTC: %DOT11-4-MAXRETRIES: Packet to client 0811.960c.b214 reached max retries, removing the client
Mar 28 18:28:56 UTC: %DOT11-6-ASSOC: Interface Dot11Radio0, Station   0811.960c.b214 Associated KEY_MGMT[WPAv2 PSK]

So I'm in the especially absurd position of telling people - the newest drivers will work with old APs.  The new drivers will NOT work with new APs. With only these log messages to go on - what and how can I debug this problem so I can determine root cause? Could it be mbssid?  Stripping an old AP down to a single SSID is also a proven way to resolve the problem. 

Because we are using WPA-PSK I attempted to run debugs IAW this article:
http://www.cisco.com/c/en/us/support/docs/wireless/aironet-1200-series/50843-debug-authen.html#wpa

I ran 'debug dot11 aaa authenticator process' and 'debug dot11 aaa authenticator state-machine' and those debugs do not generate any output - even when I manually forced associations and disassociations. I was disconnected multiple times while posting this - the only log from the AP was "DOT11-6-DISASSOC: Interface Dot11Radio0, Deauthenticating Station 0811.960c.b214 Reason: Sending station has left the BSS".   The attachments are to give you a look at what I see from my desktop.

So in dealing with this for the last 3 days, I'm more convinced than EVER that the problem is related to mbssid, and NOT mfp.  I believe this because we removed our non-guest mode SSID from our 1242 and have not had problem recurrence since.  Further, on our 3602's, I assigned one SSID to the 2.4GHz radio and one SSID to the 5GHz radio, disabled mbssid on both and have been stable since.

I could not find any similar mbssid related results in the support forums, and do not know how to debug mbssid. Anyone else know what the issue could be.

Relevant info:

The SSID assigned to the 2.4GHz radio is Open/WEP w/ 108-bit key (Let's not start a "WEP is obselete discussion please, I'm aware of this, this is again not my decision to run this this way)

The SSID assigned to the 5GHz radio is Open/WPA-PSK

Again - it appears that the connectivity issue is wildly random, but occurs when "mbssid" is on the radio interface configuration.

When you say you disabled MBSSID, what did you do?

 

Right now we have 3 SSIDs running in the default AP group that are broadcasted out all of our APs.

 

How does one disabled MBSSID?

 

Thank you,

How does one disabled MBSSID?

Not all wireless card "play nice" when SSID is not broadcasted.    For security-reason, it's also not recommended.  Hackers find it as a "challenge" to find out what is the hidden SSID and there are a lot of free softwares available.  

 

To disable/enable broadcasting SSID, go to your WLAN and on the main tab you should see a tick box that says "Broadcast SSID".

gruntsbp40
Level 1
Level 1

Since removing mbssid from the radio interface, it appears that the stability problem has gone away.  Is there an explanation for this? I believe mbssid was enabled at some point b/c someone thought we needed it to run "Multiple" Basic SSID's.  This is clearly not the case, but I don't understand why 'mbssid' on the interface was so disruptive, why these Centrino's freaked out about it.  Is there a way to debug what was happening?  I'm going to try and recreate the issue looking for mbssid debugs, but any ideas would be helpful.

Review Cisco Networking for a $25 gift card