off-channel TX or RX timeout and hostapd failed logs

Ajit Pai · ‎11-22-2023

Hi community,

I'm looking for people willing to share their experience regarding these type of logs on ap model 9115 with 17.9.4a code:

Nov 16 11:01:20 kernel: [*11/16/2023 11:01:20.6240] DOT11_DRV[0]: off-channel TX timeout

Nov 21 09:46:58 kernel: [*11/21/2023 09:46:58.7638] Sending Msg:2 in mode:4 to hostapd failed

- Systematically the access points with this repetition of logs

Nov 16 11:01:20 kernel: [*11/16/2023 11:01:20.6240] DOT11_DRV[0]: off-channel TX timeout

do not accept clients, generating strong retransmissions both during association and authentication, preventing any client from connecting.

- Clients (iPhone, Samsung, Windows 10) attempting to connect to the access point go into a "stall" causing significant issues, disassociations and suboptimal roaming. The access points are unusable in that area. The same problem is encountered with Zebra Android handhelds at another location. Therefore, the problem is no longer isolated to individual clients, as it is reproducible across all types.

The circumstances of the problem are:

- Multiple L2 authentication - SSIDs tested: open-authentication, WPA2 or WPA2/WPA3 + FT and WPA2 only.

- Access point C9115AXI-E in "stall" showing old clients (-90 dB and 0Mbps) that have not been connected to the network for several days. On the 9800 controller, we do not see them, but only through SSH on the access point.

mon-2b-ap-177#sh dot11 clients
Total dot11 clients: 2
Client MAC Slot ID WLAN ID AID WLAN Name RSSI Maxrate is_wgb_wired
84:E3:42:B5:A2:36 0 5 0 HOSPITALITY -90 0Mbps No
D8:1F:12:CF:DB:37 0 5 0 HOSPITALITY -90 0Mbps No

- Simultaneously, on the same access point, we observe many logs of the following type (mon-2b-ap-177 - 172.30.190.177 (172.30.190.177) -- 11-57.txt)

   Nov 16 11:01:20 kernel: [*11/16/2023 11:01:20.6240] DOT11_DRV[0]: off-channel TX timeout, ch = 11
   Nov 16 11:01:20 kernel: [*11/16/2023 11:01:20.6240] DOT11_DRV[1]: off-channel RX timeout, ch = 116

- From the rrm/off-channel scanning debug, we notice a repetition of these logs, which I am attaching for completeness (mon-2b-ap-177 - 172.30.190.177 (172.30.190.177) -- 11-57.txt)

   Nov 16 11:02:33 kernel: [*11/16/2023 11:02:33.0650] wcp/radio0_off_channel :: OffChannel req_service_notify: report ID 33555758 status 3 FAILED - Error
   Nov 16 11:02:37 kernel: [*11/16/2023 11:02:37.9830] wcp/radio1_off_channel :: OffChannel req_service_notify: report ID 33557439 status 4 FAILED - Tuned

- The same "FAILED" type debug logs and "off-channel timeout" logs are present on all problematic access points.

- We have compared the logs with other functioning access points, and the same debug as before returns "SUCCESS."

What we know for sure is:

- We have identified the access points 9115 that freeze, and the problem is only resolved by rebooting or, as we have seen today, by hard resetting.

- The frequent logs during this AP freeze phase are consistently:

- Hostapd logs (those we investigated this morning)

- Off-channel TX and RX timeout logs (those we investigated last week) are also present after the update to release 17.9.4a.

- Every time the AP freezes, two random clients with a signal of -90 appear in the client list.

Thank you. Regards.

marce1000 · ‎11-22-2023

- Could this be related to specific client branch(model) (WiFi driver bugs) using the mentioned APs (e.g.) ?

M.

-- Each morning when I wake up and look into the mirror I always say ' Why am I so brilliant ? '
When the mirror will then always repond to me with ' The only thing that exceeds your brilliance is your beauty! '

Rich R · ‎11-23-2023

We have seen a variety of similar problems. Best to open a TAC case so the TAC engineer can collect all the relevant debugs, packet captures and radioactive trace logs etc for dev team while the AP is in the failed state with clients attempting to connect.
It will also be useful to know if the problem is cleared by "ap name <apname> reset capwap" (much quicker than a reboot and resolves a number of the current open bugs for this type of problem).

> Off-channel TX and RX timeout logs (those we investigated last week) are also present after the update to release 17.9.4a.
I don't think the AP code changed between 17.9.4 and 17.9.4a so what were you updating from?

------------------------------
Please click Helpful if this post helped you and Select as Solution (drop down menu at top right of this reply) if this answered your query.
------------------------------
TAC recommended codes for AireOS WLC's and TAC recommended codes for 9800 WLC's
Best Practices for AireOS WLC's, Best Practices for 9800 WLC's and Cisco Wireless compatibility matrix
Check your 9800 WLC config with Wireless Config Analyzer using "show tech wireless" output or "config paging disable" then "show run-config" output on AireOS and use Wireless Debug Analyzer to analyze your WLC client debugs
Field Notice: FN63942 APs and WLCs Fail to Create CAPWAP Connections Due to Certificate Expiration
Field Notice: FN72424 Later Versions of WiFi 6 APs Fail to Join WLC - Software Upgrade Required
Field Notice: FN72524 IOS APs stuck in downloading state after 4 Dec 2022 due to Certificate Expired
- Fixed in 8.10.196.0, latest 9800 releases, 8.5.182.12 (8.5.182.13 for 3504) and 8.5.182.109 (IRCM, 8.5.182.111 for 3504)
Field Notice: FN70479 AP Fails to Join or Joins with 1 Radio due to Country Mismatch, RMA needed
How to avoid boot loop due to corrupted image on Wave 2 and Catalyst 11ax Access Points (CSCvx32806)
Field Notice: FN74035 - Wave2 APs DFS May Not Detect Radar After Channel Availability Check Time
Leo's list of bugs affecting 2800/3800/4800/1560 APs
Default AP console baud rate from 17.12.x is 115200 - introduced by CSCwe88390

Ajit Pai · ‎11-23-2023

Hi,

thank you, we are following the problem with the TAC we are waiting they to replicate bug in the lab. We will try resetting the capwap. We upgraded from 17.6.4 to 17.9.3 to patch a bug and then from 17.9.3 to 17.9.4a.
Regards.

Rich R · ‎11-23-2023

It will be extremely difficult for TAC to reproduce it so better to get all the relevant data from the live fault so they can pass that to dev team.

------------------------------
Please click Helpful if this post helped you and Select as Solution (drop down menu at top right of this reply) if this answered your query.
------------------------------
TAC recommended codes for AireOS WLC's and TAC recommended codes for 9800 WLC's
Best Practices for AireOS WLC's, Best Practices for 9800 WLC's and Cisco Wireless compatibility matrix
Check your 9800 WLC config with Wireless Config Analyzer using "show tech wireless" output or "config paging disable" then "show run-config" output on AireOS and use Wireless Debug Analyzer to analyze your WLC client debugs
Field Notice: FN63942 APs and WLCs Fail to Create CAPWAP Connections Due to Certificate Expiration
Field Notice: FN72424 Later Versions of WiFi 6 APs Fail to Join WLC - Software Upgrade Required
Field Notice: FN72524 IOS APs stuck in downloading state after 4 Dec 2022 due to Certificate Expired
- Fixed in 8.10.196.0, latest 9800 releases, 8.5.182.12 (8.5.182.13 for 3504) and 8.5.182.109 (IRCM, 8.5.182.111 for 3504)
Field Notice: FN70479 AP Fails to Join or Joins with 1 Radio due to Country Mismatch, RMA needed
How to avoid boot loop due to corrupted image on Wave 2 and Catalyst 11ax Access Points (CSCvx32806)
Field Notice: FN74035 - Wave2 APs DFS May Not Detect Radar After Channel Availability Check Time
Leo's list of bugs affecting 2800/3800/4800/1560 APs
Default AP console baud rate from 17.12.x is 115200 - introduced by CSCwe88390

eglinsky2012 · ‎11-30-2023

Is APSP6 for 17.9.4a installed? I'm not sure it's relevant to this issue, but it is good practice and likely something TAC will tell you to do, if you haven't already, before in-depth troubleshooting.

Rich R · ‎11-30-2023

I second that - we installed APSP6 last night.

------------------------------
Please click Helpful if this post helped you and Select as Solution (drop down menu at top right of this reply) if this answered your query.
------------------------------
TAC recommended codes for AireOS WLC's and TAC recommended codes for 9800 WLC's
Best Practices for AireOS WLC's, Best Practices for 9800 WLC's and Cisco Wireless compatibility matrix
Check your 9800 WLC config with Wireless Config Analyzer using "show tech wireless" output or "config paging disable" then "show run-config" output on AireOS and use Wireless Debug Analyzer to analyze your WLC client debugs
Field Notice: FN63942 APs and WLCs Fail to Create CAPWAP Connections Due to Certificate Expiration
Field Notice: FN72424 Later Versions of WiFi 6 APs Fail to Join WLC - Software Upgrade Required
Field Notice: FN72524 IOS APs stuck in downloading state after 4 Dec 2022 due to Certificate Expired
- Fixed in 8.10.196.0, latest 9800 releases, 8.5.182.12 (8.5.182.13 for 3504) and 8.5.182.109 (IRCM, 8.5.182.111 for 3504)
Field Notice: FN70479 AP Fails to Join or Joins with 1 Radio due to Country Mismatch, RMA needed
How to avoid boot loop due to corrupted image on Wave 2 and Catalyst 11ax Access Points (CSCvx32806)
Field Notice: FN74035 - Wave2 APs DFS May Not Detect Radar After Channel Availability Check Time
Leo's list of bugs affecting 2800/3800/4800/1560 APs
Default AP console baud rate from 17.12.x is 115200 - introduced by CSCwe88390