11-22-2023 01:57 AM - edited 11-30-2023 01:25 AM
Hi community,
I'm looking for people willing to share their experience regarding these type of logs on ap model 9115 with 17.9.4a code:
Nov 16 11:01:20 kernel: [*11/16/2023 11:01:20.6240] DOT11_DRV[0]: off-channel TX timeout
Nov 21 09:46:58 kernel: [*11/21/2023 09:46:58.7638] Sending Msg:2 in mode:4 to hostapd failed
- Systematically the access points with this repetition of logs
Nov 16 11:01:20 kernel: [*11/16/2023 11:01:20.6240] DOT11_DRV[0]: off-channel TX timeout
do not accept clients, generating strong retransmissions both during association and authentication, preventing any client from connecting.
- Clients (iPhone, Samsung, Windows 10) attempting to connect to the access point go into a "stall" causing significant issues, disassociations and suboptimal roaming. The access points are unusable in that area. The same problem is encountered with Zebra Android handhelds at another location. Therefore, the problem is no longer isolated to individual clients, as it is reproducible across all types.
The circumstances of the problem are:
- Multiple L2 authentication - SSIDs tested: open-authentication, WPA2 or WPA2/WPA3 + FT and WPA2 only.
- Access point C9115AXI-E in "stall" showing old clients (-90 dB and 0Mbps) that have not been connected to the network for several days. On the 9800 controller, we do not see them, but only through SSH on the access point.
mon-2b-ap-177#sh dot11 clients
Total dot11 clients: 2
Client MAC Slot ID WLAN ID AID WLAN Name RSSI Maxrate is_wgb_wired
84:E3:42:B5:A2:36 0 5 0 HOSPITALITY -90 0Mbps No
D8:1F:12:CF:DB:37 0 5 0 HOSPITALITY -90 0Mbps No
- Simultaneously, on the same access point, we observe many logs of the following type (mon-2b-ap-177 - 172.30.190.177 (172.30.190.177) -- 11-57.txt)
Nov 16 11:01:20 kernel: [*11/16/2023 11:01:20.6240] DOT11_DRV[0]: off-channel TX timeout, ch = 11
Nov 16 11:01:20 kernel: [*11/16/2023 11:01:20.6240] DOT11_DRV[1]: off-channel RX timeout, ch = 116
- From the rrm/off-channel scanning debug, we notice a repetition of these logs, which I am attaching for completeness (mon-2b-ap-177 - 172.30.190.177 (172.30.190.177) -- 11-57.txt)
Nov 16 11:02:33 kernel: [*11/16/2023 11:02:33.0650] wcp/radio0_off_channel :: OffChannel req_service_notify: report ID 33555758 status 3 FAILED - Error
Nov 16 11:02:37 kernel: [*11/16/2023 11:02:37.9830] wcp/radio1_off_channel :: OffChannel req_service_notify: report ID 33557439 status 4 FAILED - Tuned
- The same "FAILED" type debug logs and "off-channel timeout" logs are present on all problematic access points.
- We have compared the logs with other functioning access points, and the same debug as before returns "SUCCESS."
What we know for sure is:
- We have identified the access points 9115 that freeze, and the problem is only resolved by rebooting or, as we have seen today, by hard resetting.
- The frequent logs during this AP freeze phase are consistently:
- Hostapd logs (those we investigated this morning)
- Off-channel TX and RX timeout logs (those we investigated last week) are also present after the update to release 17.9.4a.
- Every time the AP freezes, two random clients with a signal of -90 appear in the client list.
Thank you. Regards.
11-22-2023 04:00 AM
- Could this be related to specific client branch(model) (WiFi driver bugs) using the mentioned APs (e.g.) ?
M.
11-23-2023 08:30 AM - edited 11-23-2023 08:33 AM
We have seen a variety of similar problems. Best to open a TAC case so the TAC engineer can collect all the relevant debugs, packet captures and radioactive trace logs etc for dev team while the AP is in the failed state with clients attempting to connect.
It will also be useful to know if the problem is cleared by "ap name <apname> reset capwap" (much quicker than a reboot and resolves a number of the current open bugs for this type of problem).
> Off-channel TX and RX timeout logs (those we investigated last week) are also present after the update to release 17.9.4a.
I don't think the AP code changed between 17.9.4 and 17.9.4a so what were you updating from?
11-23-2023 08:52 AM
Hi,
thank you, we are following the problem with the TAC we are waiting they to replicate bug in the lab. We will try resetting the capwap. We upgraded from 17.6.4 to 17.9.3 to patch a bug and then from 17.9.3 to 17.9.4a.
Regards.
11-23-2023 11:25 AM
It will be extremely difficult for TAC to reproduce it so better to get all the relevant data from the live fault so they can pass that to dev team.
11-30-2023 06:10 AM
Is APSP6 for 17.9.4a installed? I'm not sure it's relevant to this issue, but it is good practice and likely something TAC will tell you to do, if you haven't already, before in-depth troubleshooting.
11-30-2023 06:25 AM
I second that - we installed APSP6 last night.
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide