08-04-2023 01:32 PM - edited 08-04-2023 01:33 PM
Cisco 5520 in HA running 8.10.151 with (150) 2082i access points. Local mode, local controller, central switching. WPA2 AES PSK. Fast Roaming OTA. 802.11v/k enabled. Session timeout 28800. 5Ghz only, 20Mhz channels all UNII1/2/2e/3 enabled. No DFS issues. No auth/roaming problems. SNR is above voice grade.
We have over 10,000 of these Zebra TC51 Android 6.0 devices in our warehouse deployment. I have other branches runnning the same specs but they are not reporting any issue.
Issue: Zebra TC51 Android 6.0 devices in our New York branch is complaining about losing connection to their picking app. The RF shows they are connected to WiFi but they have to toggle it to re-connect. I currently have an RF handheld in this "broken" state in a RUN state with an IP address but it is not pingable. Everything at the core level checks out fine. It happens to devices with Android 6.0 randomly. Android 8 devices it does not.
My obvious answer to the problem I just provided. This won't be appease management however because they will ask why is it working at other branches. Cisco TAC is not providing many answers but still waiting for more information.
What could be causing this? Has anyone seen this before? Even the debugs show the device roaming from AP to AP but you cannot use it to send data and you cannot ping it. DNAC intelligent capture shows zero issues.
08-04-2023 02:18 PM - edited 08-04-2023 02:18 PM
What’s the difference between this branch and the other branches where they are working? Are they on different controllers with different software/APs/settings?
You should upgrade WLC software since that’s several versions and many bug fixes behind. 8.10.185.0 is recommended now.
*Cue Leo to discuss the many issues with 2800s on AireOS*
08-04-2023 02:58 PM
By looking at your issue, it is clear some compatibility issue or a bug (client-side or AP-side). I would always look at client side first in this scenario.
1. Is it possible to upgrade OS (Android 6 to
2. If you have the same Android 6 client devices on other sites and the same WLC/AP (hardware & firmware), still not a problem experience, this is a little possibility of an infrastructure-side issue. Still from the best practice point of view, as others suggested I would go with 8.10.185.0 code upgrade to give it a chance to see if that helps.
HTH
Rasika
*** Pls rate all useful responses ***
08-04-2023 07:24 PM
This is exactly what I am thinking. Where I am currently at, management wants time wasted on answers versus moving on with solutions. My next step is a code upgrade otherwise no option but to expedite Android 8.0 (which they are due for in a few months).
08-04-2023 06:41 PM - edited 08-04-2023 06:44 PM
@thewifidude wrote:
The RF shows they are connected to WiFi but they have to toggle it to re-connect.
Let me guess:
2800/3800/4800/1560 belong to the same "family" and they share one common component: The MARVELL WiFi chipset
Over the years, people have reported bugs about 2800/3800/4800/1560 randomly dropping packets, such as DHCP, authentication, voice traffic, etc. Because most people who reported the issue were on AireOS (plus 2800/3800/4800/1560 is approaching end-of-support date), the easiest way to "fix" this problem was to encourage people to migrate to IOS-XE (and translate to sales).
As people slowly transition from AireOS to IOS-XE, I am seeing Bug IDs reporting very similar issues appearing in IOS-XE in the form of CSCwh03842.
The list of 2800/3800/4800/1560 Bug IDs can be found HERE.
The "mega BUG" CSCwa73245 talks about turning off MU-MIMO and some bugs which recommends turning off WMM as a workaround.
08-04-2023 07:21 PM
Actually they all get IP addresses and they roam just fine. Rebooting the access points did not resolve the issue as they reported the problem the very next day. The problem is strictly layer 3. I have debugs from the device while it is non-reachable, requesting DHCP, roaming, 4 way handshake etc. It's just layer 3 stops.
This is debug data of a roam while the device wasn't reachable. The odd part is I was told they took an RF device to another branch and they didn't have the issue. I am leaning towards there must be something with the Catalyst 9500 core. But again, I'm torn because Android 8 on the same device doesn't experience the issue. The bugs you listed however are a good find as I also have IOS-XE deployments.
08-04-2023 07:32 PM - edited 08-04-2023 07:36 PM
@thewifidude wrote:
I have debugs from the device while it is non-reachable, requesting DHCP, roaming, 4 way handshake etc. It's just layer 3 stops.
Go through the list of Bug IDs that I have compiled. This behaviour described really sound like a the MARVELL chipset hardware defect coming to make it's presence known.
There is really no (permanent) fix. No amount of upgrading/downgrading of the WLC firmware will fix it. This is a hardware design fault and nothing can be done (unless you are "whale"). Upgrading to the latest Catalyst 9k is not a fix either because we are not sure what other people will be reporting in the coming months or years. 2800/3800/4800/1560 chipset is made by MARVELL. Catalyst 9120/9124 and below are made by Broadcom and Catalyst 9130, 916X are made by Qualcomm. And we were told that programming of the Broadcom chips are "challenging". For example, have a look at CSCwh12413. The AP in question is a 9120 (Broadcom) and this behaviour is very much like one of the bugs affecting the 2800/3800/4800/1560.
08-04-2023 07:48 PM
I'll be honest, I went down a rabbit hole with all those bugs and it gave me anxiety; knowing this would likely impact my whole environment. @Scott Fella is right, we know the fusion drivers on Android 6 is extremely out of date. It works at another branch but I was not included in validating that. At this point I have to quantify my time spent from an issue that has been going on since the branch has been open and suggest they expedite the upgrade.
I will reach out to Cisco to get comment on this because it's deeply concerning. The 9120 console bug (that never got filed) is now there is a reason (I suspect) that you can now disable the serial port in the AP Join Profile in the new 9800s. This issue which cannot be found anywhere prevented us from executing any commands onto a 9120 access point that had an ethernet cable connected to the serial port, longer then 7ft.
I'm cracking the beer for now. Thanks to you all for the insights.
08-04-2023 07:37 PM
From reading this thread, my experience has always been an issue with the device and almost the issue was with the firmware or custom firmware on the device. Now testing a few at a different location is a great idea, especially if they test for a few weeks with devices that have been reported as not working. You do have to be part of that testing so you can validate it. Even if they get a few working devices from another location, if users report issues or maybe have no issues, that can help you narrow things down even more.
08-05-2023 07:47 AM
To eliminate those known bugs (as much as possible) upgrade to 8.10.185.3 (link below) which replaces 8.10.185.0 (mentioned above by @eglinsky2012) because DFS is broken on those APs in 8.10.185.0 (and 8.10.151.0 for that matter).
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide