05-21-2025 01:09 AM
Since we upgraded to 17.12.5 on our 9800 I see these logs from 9120 APs:
...Rx stuck detected,doing phy forcecal for radio 1
Has anyone else noticed this or know what they mean? It doesn't sound good.
05-21-2025 01:53 AM
- @talasgair - Looks like a bug , report to TAC.
- You could try rebooting the (an) AP and check if it is persistent or not
- Check if clients are affected using :
https://www.cisco.com/c/en/us/support/docs/wireless/catalyst-9800-series-wireless-controllers/217738-monitor-catalyst-9800-kpis-key-performa.html#toc-hId-866973845
M.
05-21-2025 06:44 PM
Could you please share the output of -
show controllers dot11Radio 1 reset
show flash crash
show flash cores
From the problem AP.
05-21-2025 11:44 PM
Here is the output from one of the problem APs.
05-21-2025 11:53 PM
@talasgair
There are problems with the Broadcom drivers https://bst.cloudapps.cisco.com/bugsearch/bug/CSCwn27877
I believe the log comes from https://bst.cloudapps.cisco.com/bugsearch/bug/CSCwk12169 which did not resolve the problem.
In the absence of an actual fix I think they've put something in the code to try to detect the Rx stuck (radio Rx queue is full and stops receiving frames) and restart the radio. Clients which are associated will have a dead service while the queue is stuck (no response from AP and will eventually timeout) and until after the radio is restarted. New clients cannot associate while the queue is stuck.
Do you see any clients connected to the 5GHz radio when you see these logs?
If no clients, then do a shut/no shut on the radio (or reload the AP) and then see whether it's working again? (but can fail again with hours or days)
05-22-2025 12:46 PM
@Rich R wrote:@talasgair
There are problems with the Broadcom drivers https://bst.cloudapps.cisco.com/bugsearch/bug/CSCwn27877
I believe the log comes from https://bst.cloudapps.cisco.com/bugsearch/bug/CSCwk12169 which did not resolve the
Sounds like an issue I had last year affecting 9166, 9136, and 9130 when we were on 17.9.4a/APSP8. At the time, the issue was attributed to CSCwj45141 or CSCwk48338. Ultimately, upgrading to 17.12.4/APSP2 solved it, and we're still fine as of 17.12.4/APSP6.
Anyway, I hope that it's not the same issue returning to 17.12.5.
05-27-2025 10:32 AM
Sadly, the bug CSCwk48338 you noted was updated yesterday and now includes 17.12.5 as affected, so unfortunately it appears there's a regression on that one.
05-22-2025 01:40 AM
@Rich R There are no clients on these APs when I see the log. I just assumed that there were no clients on these APs as they are mostly in quiet areas, e.g. basements or areas usually unoccupied but maybe it is because the radio is stuck. Also see the frequency of these logs increase overnight and from more APs when there are less people on site.
05-22-2025 07:17 AM
This is something that you need to validate. Don't assume that it's because of the log or its in a quiet area, put a device there so that it can connect to that ap and see what happens over time. Windows machines, you can use the netsh wlan show wlanreports to get history of the device wireless connection over time. Logs from the controller and correlating the netsh, can help you determine if that log is indeed dropping client connections or not.
My opinion is, if you upgrade (anything) and then you start having issue or seeing log's that was not there, you need to open a TAC case and rollback. Keep your users happy and never drag things out unless there were already issues prior to the upgrade.
05-23-2025 03:15 AM
We have a script running every 10 minutes doing "sh ap summ load-info" so if we see >1 clients on 2.4 GHz radio (slot 0) but zero on 5GHz radio (slot 1) then it's probably stuck and we do a shut/no shut on the 5GHz radio:
ap name <ap-name> dot11 5ghz shut | ap name <ap-name> no dot11 5ghz shut |
That is the quickest way to get it working again.
TAC and BU were not able to suggest any better method of detecting the problem which scales well. The most reliable way is by logging in to each AP (or running remote AP commands from WLC with the results going into WLC logs which is messy) and checking the radio stats directly: If you check show interfaces dot11Radio 1 a few times and you see that FCS errors are incrementing but none of the other Rx counters are incrementing that means the Rx Queue is stuck and the AP is not receiving any frames from the radio. If you look at Over The Air (OTA) capture you see clients trying to talk to the AP and zero response from the AP because the AP never receives the client frames. You still see the AP beaconing as normal (which is why clients try to join) because the Tx is still working fine.
05-22-2025 04:00 PM
Reboot the APs daily -- This is going to be the new "fix" going forward.
05-23-2025 03:42 AM
- @Leo Laohoo >....Reboot the APs daily This is going to be the new "fix" going forward.
I can't agree , at all actually ; there are many places which need 24/24 wireless service.
For instance we have a chip factory with FABs on 24 production , hospitals , warehouses
airports and numerous others. Perhaps if Cisco would also provide an approach such
as in flex upgrades where APs can rebooted in a manner (pattern) where some coverage is kept always
and clients can hop to an available AP it would be feasible.
Better is for them to fix the bugs,
M.
05-23-2025 03:56 AM - edited 05-23-2025 03:58 AM
I agree with the sentiments and "reboot the APs daily" is not a sustainable solution, however, it is faster (and easier) to reboot the APs daily than wait for Cisco to come up with a solution or fix.
And when I say "wait for Cisco to come up with a solution or fix", I am talking about 5 to 10 years away, if they promised to fix it.
My latest "product enhancement request" (without an "executive support") took, at the very least, 4 years. And it would have taken a lot more had it not for a "whale" to lean on Cisco -- And the solution is not even an APSP nor an APDP!
05-23-2025 06:51 AM
I hate to agree, but I have also had to write automation do find these and reboot the ap or radio. Back in the day's I had Prime run reports on client count do find issues like this, but seems like you might need to pull this info in a DB so you can filter by client count and determine what you do next.
05-27-2025 11:01 AM
Anyone have luck with these radio monitoring settings found in AP join profile? It seems to be a feature that should reset radios if there are no increment in the Tx and Rx statistics. I haven't seen much improvement so far using it.
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide