I work for a medium size muncipality and I provide wireless network for many different types of clients, on schools (high density), offices (medium to high density) and Day Care (very low density)
2x 8540 WLC's in HA-setup - 22.214.171.124
1x 3504 WLC for testing - 8.5.140
About 2.200 AP’s
Yes, I know many of above models should have been replaced - and I want to - but ...$$
For some time I have experienced that a specific type of IoT-device, has had a hard time, staying connected to our WiFi. The device is “only” 2,4GHz-capable. Since I had not received any other complaints or error-messages from other and very different users, I concluded that it must have been a client-side-issue. However, the last few month I’ve been receiving several complaints from users on different locations, telling that their clients randomly disconnects or are unable to connect. This happens on both schools with many clients and on locations with ten or less clients. Primarily I get complaints from users of iPads, but also others, like PCs, MacBooks, Surface – primarily clients being connected to 5GHz. It seems that’s it’s always the same AP’s that the users complain about. Replacing the AP’s helps and the users seems not tho have any problems afterwords. In several cases I’ve tried to bring the failing AP’s to my office, but I can’t make the fail again. The just work perfect. I can't tell if it's the reboot or other location/evironment that makes the difference.
Well. sometimes a reboot seems to solve the problem – for a while, which could be related to DCA, but I’m not sure.
I have been advised to make a proper site survey, and that has been done – to some extend. We have surveyed some locations for interference with a NETSCOUT AirCheck. Model, G2 Wireless Tester. On some locations we found some interference from PIR-sensors – but only on channel 1. We didn’t find anything on 5GHz.
On the small locations that we surveyed, we have about 9 AP’s, including outdoor AP’s and only about 10-15 clients - total - so I don't think it's a load issue. We even tried to disable either band, one at the time, to force clients to a specific band. Even though the test wasn’t perfect, on one location it seemed that the problem only was present on 5GHz. I can’t conclude much on that, since we also have single-band clients (2,4GHz) on other locations that also has problems – even where we can’t find any significant interference.
We also lowered the TCP MSS from default to 1250. We also tested at 1300 and 1200, but with no difference.
Our tests are done with a setup with a PC, Macbook, 2 x iPad Air 2 with latest iOS and an iPhone, within 2-5m from the AP where the problem has been reported.
I’ve spend quite some time analyzing this problem, but haven’t found any reason for this problem. I even had a consultant and TAC to evaluate this. I was advised to follow Cisco’s Best Practices and to upgrade software, but that still doesn’t make sense. Here are reasons why:
Various different locations with very different types of clients and density
I have tried to evaluate WLC-config with WLCCA, but a great part of the messages doesn’t make sense – either because I don’t understand them or because it’s already configured.
Ex. 30057,General: Disabling low data rates/11b can help to optimize the channel utilization on the 2.4 band. Depending on RF coverage, or if using legacy clients, this may cause problems. Please validate before enforcing the changes, as this may have important RF dependencies. Global Configuration. Well, I have globally disabled data rates below 18 Mbps in both bands, so why do I get that message?
I also get: 20024,AP: Missing configuration, information not present in file. Possible corrupted file and 120008,Security: AP Local credentials to access point CLI are not configured. For best security practices, it is desirable to configure to Username/passwords to all Aps and 20030,AP: It is recommended to set the MSS size at 1300. Well, only the WLCCA recommends using MSS size at 1300. The Cisco Best Practices, only says that 1300 is a good avarage, but can be optimized.
I also get confused about the Cisco AP performance view on my controllers. It does seem that there are interference problems, but I'm not exactly sure what's causing this. Mayby rouge AP's - like WiFi-sharing on cell phones!?! That makes sense on some locations - but certainly not all.
I've googled a lot and found a few others with similar problem. Unfortunately I can't use the same solution - downgrading the software due to the precense of AP's that doesn't support lower versions. I also doubt that, that will work, since we are experiencing the same problem on AP's connected to our 3504-WLC for testing.
I've attached the WLCCA HTML-report. Maybe you note something that I don't.
I really hope you can help.
@Leo Laohoo wrote:
What happens if the AP gets rebooted? Does the issue go away?
Well, in some case it does seem to disappear - at least for a while. In other cases it doesn't. In two cases, I replaced "failing" AP's with new ones, and placed the old on my office. No issues so far.
I must add that I started troubleshooting this issue a few month ago and escalated to Cisco TAC. Due to the Covid-19 situation, we had to close the case with TAC, because we weren't allowed to go onsite to do tests.
@Leo Laohoo wrote:
And which of the two firmware are the issues more evident?
Hi again Leo
Thanks for taking your time to reply.
Well, I have a hard time to tell. Since the majority of our AP's are connected to our 8540-WLC, it should also be the AP's from that controller that we hear most about. As for the moment, we only have about 12 AP's connected to our 3504-WLC, covering two locations. These AP's has been moved to this controller, to test on different HW-platform and software. We where told several times from the people working there, that their clients still disconnected from WiFi. When we went to test, everything worked fine for about two hours, which made us decide to end the test. So for now, we haven't been able to confirm that the problem also occurs on AP's on the 3504-WLC, but it does seem to.
In Prime Infrastructur, it's possible to see a history of which AP's that the clients roam to and from. In some cases it seems weird that clients roam to AP's with very low RSSI. With the help from TAC, we conducted a test, where a collegue of mine, went to a location, placing himself and his clients very close (3-5m) to an AP. We all saw that for some reason the client disconnected and then connected to another AP ...and after that returned on the first AP. Other reports claim that many clients from a single class in school couldn't connect to an AP in a classroom. Rebooting helped, but the problem returned later.
As you're still running 8.3.x for legacy reasons, contact TAC to get the latest escalation build, there were some bugs fixed, which could show the issue you have:
I owe you a reply.
Prior to this thread, I've been in contact with TAC, and my next step is to upgrade to latest compatible version - 126.96.36.199. I am however, a little concerned, due to the fact that this problem seems to be present on different controllers with different hardware platforms.
Well. Anyway, thanks for your reply.