Re: C9120 Rx Stuck using 17.12.5

talasgair · ‎05-21-2025

Since we upgraded to 17.12.5 on our 9800 I see these logs from 9120 APs:

...Rx stuck detected,doing phy forcecal for radio 1

Has anyone else noticed this or know what they mean? It doesn't sound good.

marce1000 · ‎05-21-2025

- @talasgair - Looks like a bug , report to TAC.
- You could try rebooting the (an) AP and check if it is persistent or not
- Check if clients are affected using :
https://www.cisco.com/c/en/us/support/docs/wireless/catalyst-9800-series-wireless-controllers/217738-monitor-catalyst-9800-kpis-key-performa.html#toc-hId-866973845

M.

-- Each morning when I wake up and look into the mirror I always say ' Why am I so brilliant ? '
When the mirror will then always repond to me with ' The only thing that exceeds your brilliance is your beauty! '

Saikat Nandy · ‎05-21-2025

Could you please share the output of -

show controllers dot11Radio 1 reset
show flash crash
show flash cores

From the problem AP.

talasgair · ‎05-21-2025

@Saikat Nandy

Here is the output from one of the problem APs.

Rich R · ‎05-21-2025

@talasgair
There are problems with the Broadcom drivers https://bst.cloudapps.cisco.com/bugsearch/bug/CSCwn27877
I believe the log comes from https://bst.cloudapps.cisco.com/bugsearch/bug/CSCwk12169 which did not resolve the problem.
In the absence of an actual fix I think they've put something in the code to try to detect the Rx stuck (radio Rx queue is full and stops receiving frames) and restart the radio. Clients which are associated will have a dead service while the queue is stuck (no response from AP and will eventually timeout) and until after the radio is restarted. New clients cannot associate while the queue is stuck.

Do you see any clients connected to the 5GHz radio when you see these logs?
If no clients, then do a shut/no shut on the radio (or reload the AP) and then see whether it's working again? (but can fail again with hours or days)

------------------------------
Please click Helpful if this post helped you and Select as Solution (drop down menu at top right of this reply) if this answered your query.
------------------------------
TAC recommended codes for AireOS WLC's and TAC recommended codes for 9800 WLC's
Best Practices for AireOS WLC's, Best Practices for 9800 WLC's and Cisco Wireless compatibility matrix
Check your 9800 WLC config with Wireless Config Analyzer using "show tech wireless" output or "config paging disable" then "show run-config" output on AireOS and use Wireless Debug Analyzer to analyze your WLC client debugs
Field Notice: FN63942 APs and WLCs Fail to Create CAPWAP Connections Due to Certificate Expiration
Field Notice: FN72424 Later Versions of WiFi 6 APs Fail to Join WLC - Software Upgrade Required
Field Notice: FN72524 IOS APs stuck in downloading state after 4 Dec 2022 due to Certificate Expired
- Fixed in 8.10.196.0, latest 9800 releases, 8.5.182.12 (8.5.182.13 for 3504) and 8.5.182.109 (IRCM, 8.5.182.111 for 3504)
Field Notice: FN70479 AP Fails to Join or Joins with 1 Radio due to Country Mismatch, RMA needed
How to avoid boot loop due to corrupted image on Wave 2 and Catalyst 11ax Access Points (CSCvx32806)
Field Notice: FN74035 - Wave2 APs DFS May Not Detect Radar After Channel Availability Check Time
Leo's list of bugs affecting 2800/3800/4800/1560 APs
Default AP console baud rate from 17.12.x is 115200 - introduced by CSCwe88390

eglinsky2012 · ‎05-22-2025

@Rich R wrote:
@talasgair
There are problems with the Broadcom drivers https://bst.cloudapps.cisco.com/bugsearch/bug/CSCwn27877
I believe the log comes from https://bst.cloudapps.cisco.com/bugsearch/bug/CSCwk12169 which did not resolve the

Sounds like an issue I had last year affecting 9166, 9136, and 9130 when we were on 17.9.4a/APSP8. At the time, the issue was attributed to CSCwj45141 or CSCwk48338. Ultimately, upgrading to 17.12.4/APSP2 solved it, and we're still fine as of 17.12.4/APSP6.

Anyway, I hope that it's not the same issue returning to 17.12.5.

jasondodge · ‎05-27-2025

Sadly, the bug CSCwk48338 you noted was updated yesterday and now includes 17.12.5 as affected, so unfortunately it appears there's a regression on that one.

talasgair · ‎05-22-2025

@Rich R There are no clients on these APs when I see the log. I just assumed that there were no clients on these APs as they are mostly in quiet areas, e.g. basements or areas usually unoccupied but maybe it is because the radio is stuck. Also see the frequency of these logs increase overnight and from more APs when there are less people on site.

Scott Fella · ‎05-22-2025

This is something that you need to validate. Don't assume that it's because of the log or its in a quiet area, put a device there so that it can connect to that ap and see what happens over time. Windows machines, you can use the netsh wlan show wlanreports to get history of the device wireless connection over time. Logs from the controller and correlating the netsh, can help you determine if that log is indeed dropping client connections or not.

My opinion is, if you upgrade (anything) and then you start having issue or seeing log's that was not there, you need to open a TAC case and rollback. Keep your users happy and never drag things out unless there were already issues prior to the upgrade.

-Scott
*** Please rate helpful posts ***

Rich R · ‎05-23-2025

We have a script running every 10 minutes doing "sh ap summ load-info" so if we see >1 clients on 2.4 GHz radio (slot 0) but zero on 5GHz radio (slot 1) then it's probably stuck and we do a shut/no shut on the 5GHz radio:

ap name <ap-name> dot11 5ghz shut

ap name <ap-name> no dot11 5ghz shut

That is the quickest way to get it working again.
TAC and BU were not able to suggest any better method of detecting the problem which scales well. The most reliable way is by logging in to each AP (or running remote AP commands from WLC with the results going into WLC logs which is messy) and checking the radio stats directly: If you check show interfaces dot11Radio 1 a few times and you see that FCS errors are incrementing but none of the other Rx counters are incrementing that means the Rx Queue is stuck and the AP is not receiving any frames from the radio. If you look at Over The Air (OTA) capture you see clients trying to talk to the AP and zero response from the AP because the AP never receives the client frames. You still see the AP beaconing as normal (which is why clients try to join) because the Tx is still working fine.

------------------------------
Please click Helpful if this post helped you and Select as Solution (drop down menu at top right of this reply) if this answered your query.
------------------------------
TAC recommended codes for AireOS WLC's and TAC recommended codes for 9800 WLC's
Best Practices for AireOS WLC's, Best Practices for 9800 WLC's and Cisco Wireless compatibility matrix
Check your 9800 WLC config with Wireless Config Analyzer using "show tech wireless" output or "config paging disable" then "show run-config" output on AireOS and use Wireless Debug Analyzer to analyze your WLC client debugs
Field Notice: FN63942 APs and WLCs Fail to Create CAPWAP Connections Due to Certificate Expiration
Field Notice: FN72424 Later Versions of WiFi 6 APs Fail to Join WLC - Software Upgrade Required
Field Notice: FN72524 IOS APs stuck in downloading state after 4 Dec 2022 due to Certificate Expired
- Fixed in 8.10.196.0, latest 9800 releases, 8.5.182.12 (8.5.182.13 for 3504) and 8.5.182.109 (IRCM, 8.5.182.111 for 3504)
Field Notice: FN70479 AP Fails to Join or Joins with 1 Radio due to Country Mismatch, RMA needed
How to avoid boot loop due to corrupted image on Wave 2 and Catalyst 11ax Access Points (CSCvx32806)
Field Notice: FN74035 - Wave2 APs DFS May Not Detect Radar After Channel Availability Check Time
Leo's list of bugs affecting 2800/3800/4800/1560 APs
Default AP console baud rate from 17.12.x is 115200 - introduced by CSCwe88390

Leo Laohoo · ‎05-22-2025

Reboot the APs daily -- This is going to be the new "fix" going forward.

marce1000 · ‎05-23-2025

- @Leo Laohoo >....Reboot the APs daily This is going to be the new "fix" going forward.
I can't agree , at all actually ; there are many places which need 24/24 wireless service.
For instance we have a chip factory with FABs on 24 production , hospitals , warehouses
airports and numerous others. Perhaps if Cisco would also provide an approach such
as in flex upgrades where APs can rebooted in a manner (pattern) where some coverage is kept always
and clients can hop to an available AP it would be feasible.
Better is for them to fix the bugs,

M.

-- Each morning when I wake up and look into the mirror I always say ' Why am I so brilliant ? '
When the mirror will then always repond to me with ' The only thing that exceeds your brilliance is your beauty! '

Leo Laohoo · ‎05-23-2025

I agree with the sentiments and "reboot the APs daily" is not a sustainable solution, however, it is faster (and easier) to reboot the APs daily than wait for Cisco to come up with a solution or fix.

And when I say "wait for Cisco to come up with a solution or fix", I am talking about 5 to 10 years away, if they promised to fix it.

My latest "product enhancement request" (without an "executive support") took, at the very least, 4 years. And it would have taken a lot more had it not for a "whale" to lean on Cisco -- And the solution is not even an APSP nor an APDP!

Scott Fella · ‎05-23-2025

I hate to agree, but I have also had to write automation do find these and reboot the ap or radio. Back in the day's I had Prime run reports on client count do find issues like this, but seems like you might need to pull this info in a DB so you can filter by client count and determine what you do next.

-Scott
*** Please rate helpful posts ***

jasondodge · ‎05-27-2025

Anyone have luck with these radio monitoring settings found in AP join profile? It seems to be a feature that should reset radios if there are no increment in the Tx and Rx statistics. I haven't seen much improvement so far using it.

Source: https://www.cisco.com/c/en/us/td/docs/wireless/controller/9800/17-12/config-guide/b_wl_17_12_cg/m_ap_crash_file_upload_ewlc.html#info-ap-real-time-statistics