cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
1837
Views
2
Helpful
6
Replies

off-channel TX or RX timeout and hostapd failed logs

Ajit Pai
Level 1
Level 1

Hi community,

I'm looking for people willing to share their experience regarding these type of logs on ap model 9115 with 17.9.4a code:

Nov 16 11:01:20 kernel: [*11/16/2023 11:01:20.6240] DOT11_DRV[0]: off-channel TX timeout
Nov 21 09:46:58 kernel: [*11/21/2023 09:46:58.7638] Sending Msg:2 in mode:4 to hostapd failed 

- Systematically the access points with this repetition of logs

Nov 16 11:01:20 kernel: [*11/16/2023 11:01:20.6240] DOT11_DRV[0]: off-channel TX timeout     

do not accept clients, generating strong retransmissions both during association and authentication, preventing any client from connecting.

- Clients (iPhone, Samsung, Windows 10) attempting to connect to the access point go into a "stall" causing significant issues, disassociations and suboptimal roaming. The access points are unusable in that area. The same problem is encountered with Zebra Android handhelds at another location. Therefore, the problem is no longer isolated to individual clients, as it is reproducible across all types.

The circumstances of the problem are:

- Multiple L2 authentication - SSIDs tested: open-authentication, WPA2 or WPA2/WPA3 + FT and WPA2 only.

- Access point C9115AXI-E in "stall" showing old clients (-90 dB and 0Mbps) that have not been connected to the network for several days. On the 9800 controller, we do not see them, but only through SSH on the access point.

mon-2b-ap-177#sh dot11 clients
Total dot11 clients: 2
Client MAC Slot ID WLAN ID AID WLAN Name RSSI Maxrate is_wgb_wired
84:E3:42:B5:A2:36 0 5 0 HOSPITALITY -90 0Mbps No
D8:1F:12:CF:DB:37 0 5 0 HOSPITALITY -90 0Mbps No

- Simultaneously, on the same access point, we observe many logs of the following type (mon-2b-ap-177 - 172.30.190.177 (172.30.190.177) -- 11-57.txt)

   Nov 16 11:01:20 kernel: [*11/16/2023 11:01:20.6240] DOT11_DRV[0]: off-channel TX timeout, ch = 11
   Nov 16 11:01:20 kernel: [*11/16/2023 11:01:20.6240] DOT11_DRV[1]: off-channel RX timeout, ch = 116

- From the rrm/off-channel scanning debug, we notice a repetition of these logs, which I am attaching for completeness (mon-2b-ap-177 - 172.30.190.177 (172.30.190.177) -- 11-57.txt)

   Nov 16 11:02:33 kernel: [*11/16/2023 11:02:33.0650] wcp/radio0_off_channel :: OffChannel req_service_notify: report ID 33555758 status 3 FAILED - Error
   Nov 16 11:02:37 kernel: [*11/16/2023 11:02:37.9830] wcp/radio1_off_channel :: OffChannel req_service_notify: report ID 33557439 status 4 FAILED - Tuned

- The same "FAILED" type debug logs and "off-channel timeout" logs are present on all problematic access points.

- We have compared the logs with other functioning access points, and the same debug as before returns "SUCCESS."

What we know for sure is:

- We have identified the access points 9115 that freeze, and the problem is only resolved by rebooting or, as we have seen today, by hard resetting.

- The frequent logs during this AP freeze phase are consistently:

  - Hostapd logs (those we investigated this morning)

  - Off-channel TX and RX timeout logs (those we investigated last week) are also present after the update to release 17.9.4a.

- Every time the AP freezes, two random clients with a signal of -90 appear in the client list.

Thank you. Regards.

6 Replies 6

marce1000
Hall of Fame
Hall of Fame

 

 - Could this be related to specific client branch(model)  (WiFi driver bugs) using the mentioned APs (e.g.) ?

 M.



-- Each morning when I wake up and look into the mirror I always say ' Why am I so brilliant ? '
    When the mirror will then always repond to me with ' The only thing that exceeds your brilliance is your beauty! '

Rich R
VIP
VIP

We have seen a variety of similar problems.  Best to open a TAC case so the TAC engineer can collect all the relevant debugs, packet captures and radioactive trace logs etc for dev team while the AP is in the failed state with clients attempting to connect.
It will also be useful to know if the problem is cleared by "ap name <apname> reset capwap" (much quicker than a reboot and resolves a number of the current open bugs for this type of problem).

> Off-channel TX and RX timeout logs (those we investigated last week) are also present after the update to release 17.9.4a.
I don't think the AP code changed between 17.9.4 and 17.9.4a so what were you updating from?

Hi,

thank you, we are following the problem with the TAC we are waiting they to replicate bug in the lab. We will try resetting the capwap. We upgraded from 17.6.4 to 17.9.3 to patch a bug and then from 17.9.3 to 17.9.4a.
Regards.

It will be extremely difficult for TAC to reproduce it so better to get all the relevant data from the live fault so they can pass that to dev team.

eglinsky2012
Spotlight
Spotlight

Is APSP6 for 17.9.4a installed? I'm not sure it's relevant to this issue, but it is good practice and likely something TAC will tell you to do, if you haven't already, before in-depth troubleshooting.

I second that - we installed APSP6 last night.

Review Cisco Networking for a $25 gift card