02-07-2023 04:47 AM
Hello Folks,
We are observing a strange issue where cisco C9120AXI-E disconnects from wlc 9800L which is in GCP intermittently through out the day.
We have a cisco sdwan overlay which connect GCP vedge to remote site vedge via ipsec tunnels
C9800-CL-K9_IOSXE Version 17.3.5a
Please help
02-07-2023 04:59 AM - edited 02-07-2023 05:01 AM
- If the AP's are behind the tunnel then capwap latency may get exceeded (max 300ms (corrected)) , below some general troubleshooting info;s added :
- For starters have a checkup of the current controller configuration with the CLI command : show tech wireless , have the output analyzed by https://cway.cisco.com/
Checkout : https://logadvisor.cisco.com/logadvisor/wireless/9800/9800APJoin
The important ones are summarized here
show wireless stats ap join summary
show wireless dtls connections
show platform hardware chassis active qfp feature wireless capwap datapath statistics drop all
show platform hardware chassis active qfp feature wireless dtls datapath statistics all
show platform hardware chassis active qfp feature wireless capwap datapath mac-address <APradio-mac> details
show platform hardware chassis active qfp feature wireless capwap datapath mac-address <APradio-mac>
statistics
show platform hardware chassis active qfp feature wireless wlclient datapath statistics drop all
And last but not least check the controller logs too
02-07-2023 06:11 AM
Also take a packet capture on the tunnel to see if it is dropping any packets to or from the ap. Check your mtu and tcp mss to also validate each end it configured properly.
02-12-2023 11:56 PM
We are facing this issue from only one site out of the 33
I have enabled the link latency
Sometimes the AP join time take close to 15-30 mins
02-13-2023 02:08 AM
sounds like you might have packet loss from that site then.
02-13-2023 03:32 AM
Hello All
I ran some capture on both WLC and cisco switch to which the aps are connected
unable to attach pcaps
AP ip: 172.30.54.20/21/22/23
WLc1 ip: 10.250.8.4 Wlc 2 ip: 10.250.8.5 in n+1 setup
Local Dhcp server for aps: 10.54.1.80 with opt 43
Time stamp when ap disconnects from wlc
Logs from AP
Feb 13 10:06:56 kernel: [*02/13/2023 10:06:56.5170] Re-Tx Count=1, Max Re-Tx Value=5, SendSeqNum=20, NumofPendingMsgs=3
Feb 13 10:06:56 kernel: [*02/13/2023 10:06:56.5170]
Feb 13 10:06:59 kernel: [*02/13/2023 10:06:59.3680] Re-Tx Count=2, Max Re-Tx Value=5, SendSeqNum=21, NumofPendingMsgs=4
Feb 13 10:06:59 kernel: [*02/13/2023 10:06:59.3680]
Feb 13 10:07:02 kernel: [*02/13/2023 10:07:02.2190] Re-Tx Count=3, Max Re-Tx Value=5, SendSeqNum=21, NumofPendingMsgs=4
Feb 13 10:07:02 kernel: [*02/13/2023 10:07:02.2190]
Feb 13 10:07:05 kernel: [*02/13/2023 10:07:05.0700] Re-Tx Count=4, Max Re-Tx Value=5, SendSeqNum=21, NumofPendingMsgs=4
Feb 13 10:07:05 kernel: [*02/13/2023 10:07:05.0700]
Feb 13 10:07:07 kernel: [*02/13/2023 10:07:07.9210] Re-Tx Count=5, Max Re-Tx Value=5, SendSeqNum=21, NumofPendingMsgs=4
Feb 13 10:07:07 kernel: [*02/13/2023 10:07:07.9210]
Feb 13 10:07:10 kernel: [*02/13/2023 10:07:10.7710] Max retransmission count exceeded, going back to DISCOVER mode.
Feb 13 10:07:10 kernel: [*02/13/2023 10:07:10.7710] Dropping msg CAPWAP_WTP_EVENT_REQUEST, type = 34, len = 208, eleLen = 216, sendSeqNum = 22
Feb 13 10:07:10 kernel: [*02/13/2023 10:07:10.7710] ....TLV: TLV_FLEX_MCUC_PAYLOAD(281), level: 0, seq: 0, nested: true
Feb 13 10:07:10 kernel: [*02/13/2023 10:07:10.7710] Dropping msg CAPWAP_WTP_EVENT_REQUEST, type = 34, len = 388, eleLen = 396, sendSeqNum = 22
Feb 13 10:07:10 kernel: [*02/13/2023 10:07:10.7720] ....TLV: TLV_FLEX_IPV6_MCUC_PAYLOAD(2460), level: 0, seq: 0, nested: true
Feb 13 10:07:10 kernel: [*02/13/2023 10:07:10.7720] Dropping msg CAPWAP_WTP_EVENT_REQUEST, type = 34, len = 14, eleLen = 22, sendSeqNum = 22
Feb 13 10:07:10 kernel: [*02/13/2023 10:07:10.7720] Dropping msg CAPWAP_WTP_EVENT_REQUEST, type = 34, len = 14, eleLen = 22, sendSeqNum = 22
Feb 13 10:07:10 kernel: [*02/13/2023 10:07:10.7720] Dropping msg CAPWAP_WTP_EVENT_REQUEST, type = 6, len = 1368, eleLen = 1376, sendSeqNum = 22
Feb 13 10:07:10 kernel: [*02/13/2023 10:07:10.7720] ...Vendor SubType: AP_PATH_MTU_PAYLOAD(73) len: 1364 vendId 409600
Feb 13 10:07:10 kernel: [*02/13/2023 10:07:10.7720] Flexconnect Switching to Standalone Mode!
Feb 13 10:07:10 kernel: [*02/13/2023 10:07:10.9150]
Feb 13 10:07:10 kernel: [*02/13/2023 10:07:10.9150] CAPWAP State: DTLS Teardown
Feb 13 10:07:10 NCI: CLEANAIR: Slot 0 CAPWAP down
Feb 13 10:07:10 NCI: I0: shutdownNci
Feb 13 10:07:10 upgrade: Script called with args:[ABORT]
Feb 13 10:07:10 kernel: [*02/13/2023 10:07:10.9880] upgrade.sh: Script called with args:[ABORT]
Feb 13 10:07:11 kernel: [*02/13/2023 10:07:11.0290] do ABORT, part1 is active part
Feb 13 10:07:11 upgrade: Cleanup tmp files ...
Feb 13 10:07:11 kernel: [*02/13/2023 10:07:11.0590] upgrade.sh: Cleanup tmp files ...
Feb 13 10:07:25 kernel: [*02/13/2023 10:07:25.5430] systemd[1]: Starting dhcpv6 client watcher...
Feb 13 10:07:25 kernel: [*02/13/2023 10:07:25.5560] systemd[1]: Stopping DHCPv6 client...
Feb 13 10:07:25 kernel: [*02/13/2023 10:07:25.5600] systemd[1]: Starting DHCPv6 client...
Feb 13 10:07:25 kernel: [*02/13/2023 10:07:25.5950] systemd[1]: Started DHCPv6 client.
Feb 13 10:07:25 kernel: [*02/13/2023 10:07:25.6170] systemd[1]: Started dhcpv6 client watcher.
Feb 13 10:08:05 kernel: [*02/13/2023 10:08:05.0620] systemd[1]: Starting dhcpv6 client watcher...
Feb 13 10:08:05 kernel: [*02/13/2023 10:08:05.0730] systemd[1]: Stopping DHCPv6 client...
Feb 13 10:08:05 kernel: [*02/13/2023 10:08:05.0770] systemd[1]: Starting DHCPv6 client...
Feb 13 10:08:05 kernel: [*02/13/2023 10:08:05.1100] systemd[1]: Started DHCPv6 client.
Feb 13 10:08:05 kernel: [*02/13/2023 10:08:05.1290] systemd[1]: Started dhcpv6 client watcher.
Feb 13 10:08:15 kernel: [*02/13/2023 10:08:15.0630] systemd[1]: Starting dhcpv6 client watcher...
Feb 13 10:08:15 kernel: [*02/13/2023 10:08:15.0740] systemd[1]: Stopping DHCPv6 client...
Feb 13 10:08:15 kernel: [*02/13/2023 10:08:15.0780] systemd[1]: Starting DHCPv6 client...
Feb 13 10:08:15 kernel: [*02/13/2023 10:08:15.1130] systemd[1]: Started DHCPv6 client.
Feb 13 10:08:15 kernel: [*02/13/2023 10:08:15.1320] systemd[1]: Started dhcpv6 client watcher.
Feb 13 10:08:54 kernel: [*02/13/2023 10:08:54.4780] systemd[1]: Starting dhcpv6 client watcher...
Feb 13 10:08:54 kernel: [*02/13/2023 10:08:54.4900] systemd[1]: Stopping DHCPv6 client...
Feb 13 10:08:54 kernel: [*02/13/2023 10:08:54.4950] systemd[1]: Starting DHCPv6 client...
Feb 13 10:08:54 kernel: [*02/13/2023 10:08:54.5280] systemd[1]: Started DHCPv6 client.
Feb 13 10:08:54 kernel: [*02/13/2023 10:08:54.5480] systemd[1]: Started dhcpv6 client watcher.
Feb 13 10:09:04 kernel: [*02/13/2023 10:09:04.4800] systemd[1]: Starting dhcpv6 client watcher...
Feb 13 10:09:04 kernel: [*02/13/2023 10:09:04.4930] systemd[1]: Stopping DHCPv6 client...
Feb 13 10:09:04 kernel: [*02/13/2023 10:09:04.4960] systemd[1]: Starting DHCPv6 client...
Feb 13 10:09:04 kernel: [*02/13/2023 10:09:04.5300] systemd[1]: Started DHCPv6 client.
Feb 13 10:09:04 kernel: [*02/13/2023 10:09:04.5500] systemd[1]: Started dhcpv6 client watcher.
Feb 13 10:09:43 kernel: [*02/13/2023 10:09:43.8940] systemd[1]: Starting dhcpv6 client watcher...
Feb 13 10:09:43 kernel: [*02/13/2023 10:09:43.9050] systemd[1]: Stopping DHCPv6 client...
Feb 13 10:09:43 kernel: [*02/13/2023 10:09:43.9090] systemd[1]: Starting DHCPv6 client...
Feb 13 10:09:43 kernel: [*02/13/2023 10:09:43.9430] systemd[1]: Started DHCPv6 client.
Feb 13 10:09:43 kernel: [*02/13/2023 10:09:43.9630] systemd[1]: Started dhcpv6 client watcher.
Feb 13 10:09:53 kernel: [*02/13/2023 10:09:53.8960] systemd[1]: Starting dhcpv6 client watcher...
Feb 13 10:09:53 kernel: [*02/13/2023 10:09:53.9080] systemd[1]: Stopping DHCPv6 client...
Feb 13 10:09:53 kernel: [*02/13/2023 10:09:53.9120] systemd[1]: Starting DHCPv6 client...
Feb 13 10:09:53 kernel: [*02/13/2023 10:09:53.9450] systemd[1]: Started DHCPv6 client.
Feb 13 10:09:53 kernel: [*02/13/2023 10:09:53.9640] systemd[1]: Started dhcpv6 client watcher.
Feb 13 10:10:33 kernel: [*02/13/2023 10:10:33.3110] systemd[1]: Starting dhcpv6 client watcher...
Feb 13 10:10:33 kernel: [*02/13/2023 10:10:33.3230] systemd[1]: Stopping DHCPv6 client...
Feb 13 10:10:33 kernel: [*02/13/2023 10:10:33.3270] systemd[1]: Starting DHCPv6 client...
Feb 13 10:10:33 kernel: [*02/13/2023 10:10:33.3600] systemd[1]: Started DHCPv6 client.
Feb 13 10:10:33 kernel: [*02/13/2023 10:10:33.3800] systemd[1]: Started dhcpv6 client watcher.
Feb 13 10:10:43 kernel: [*02/13/2023 10:10:43.3130] systemd[1]: Starting dhcpv6 client watcher...
Feb 13 10:10:43 kernel: [*02/13/2023 10:10:43.3250] systemd[1]: Stopping DHCPv6 client...
Feb 13 10:10:43 kernel: [*02/13/2023 10:10:43.3290] systemd[1]: Starting DHCPv6 client...
Feb 13 10:10:43 kernel: [*02/13/2023 10:10:43.3630] systemd[1]: Started DHCPv6 client.
Feb 13 10:10:43 kernel: [*02/13/2023 10:10:43.3820] systemd[1]: Started dhcpv6 client watcher.
Feb 13 10:11:22 kernel: [*02/13/2023 10:11:22.7280] systemd[1]: Starting dhcpv6 client watcher...
Feb 13 10:11:22 kernel: [*02/13/2023 10:11:22.7400] systemd[1]: Stopping DHCPv6 client...
Feb 13 10:11:22 kernel: [*02/13/2023 10:11:22.7440] systemd[1]: Starting DHCPv6 client...
Feb 13 10:11:22 kernel: [*02/13/2023 10:11:22.7780] systemd[1]: Started DHCPv6 client.
Feb 13 10:11:22 kernel: [*02/13/2023 10:11:22.7970] systemd[1]: Started dhcpv6 client watcher.
Feb 13 10:11:32 kernel: [*02/13/2023 10:11:32.7290] systemd[1]: Starting dhcpv6 client watcher...
Feb 13 10:11:32 kernel: [*02/13/2023 10:11:32.7410] systemd[1]: Stopping DHCPv6 client...
Feb 13 10:11:32 kernel: [*02/13/2023 10:11:32.7450] systemd[1]: Starting DHCPv6 client...
Feb 13 10:11:32 kernel: [*02/13/2023 10:11:32.7790] systemd[1]: Started DHCPv6 client.
Feb 13 10:11:32 kernel: [*02/13/2023 10:11:32.8030] systemd[1]: Started dhcpv6 client watcher.
Feb 13 10:11:47 kernel: [*02/13/2023 10:11:47.8550] Discovery Response from 10.250.8.4
Feb 13 10:11:47 kernel: [*02/13/2023 10:11:47.8560] Discovery Response from 10.250.8.5
Feb 13 10:11:58 kernel: [*02/13/2023 10:11:58.0000]
Feb 13 10:11:58 kernel: [*02/13/2023 10:11:58.0000] CAPWAP State: DTLS Setup
Feb 13 10:11:58 kernel: [*02/13/2023 10:11:58.4970]
Feb 13 10:11:58 kernel: [*02/13/2023 10:11:58.4970] CAPWAP State: Join
Feb 13 10:11:58 kernel: [*02/13/2023 10:11:58.5000] Sending Join request to 10.250.8.4 through port 5265
Feb 13 10:12:03 kernel: [*02/13/2023 10:12:03.1140] Sending Join request to 10.250.8.4 through port 5265
Feb 13 10:12:07 kernel: [*02/13/2023 10:12:07.8660] Sending Join request to 10.250.8.4 through port 5265
Feb 13 10:12:07 kernel: [*02/13/2023 10:12:07.8990] Join Response from 10.250.8.4
Feb 13 10:12:07 kernel: [*02/13/2023 10:12:07.8990] AC accepted join request with result code: 0
Feb 13 10:12:07 kernel: [*02/13/2023 10:12:07.9010] Received wlcType 0, timer 30
Feb 13 10:12:07 kernel: [*02/13/2023 10:12:07.9760]
Feb 13 10:12:07 kernel: [*02/13/2023 10:12:07.9760] CAPWAP State: Image Data
Feb 13 10:12:07 kernel: [*02/13/2023 10:12:07.9760] AP image version 17.3.5.42 backup 17.3.2.32, Controller 17.3.5.42
Feb 13 10:12:07 kernel: [*02/13/2023 10:12:07.9760] Version is the same, do not need update.
Feb 13 10:12:07 upgrade: Script called with args:[NO_UPGRADE]
Feb 13 10:12:08 kernel: [*02/13/2023 10:12:08.0140] upgrade.sh: Script called with args:[NO_UPGRADE]
Feb 13 10:12:08 kernel: [*02/13/2023 10:12:08.0550] do NO_UPGRADE, part1 is active part
Feb 13 10:12:08 kernel: [*02/13/2023 10:12:08.0600]
Feb 13 10:12:08 kernel: [*02/13/2023 10:12:08.0600] CAPWAP State: Configure
Feb 13 10:12:08 capwapd[5288]: Check lagloadbalance setting flex mode 1 cfg 0 linkstate 1 ap_type 86
Feb 13 10:12:08 kernel: [*02/13/2023 10:12:08.0790] capwapd: reading file /click/nss_lag_control/lacp_state: No such file or directory
Feb 13 10:12:08 kernel: [*02/13/2023 10:12:08.0790] capwapd: reading file /click/nss_lag_control/capwap_state: No such file or directory
Feb 13 10:12:08 kernel: [*02/13/2023 10:12:08.6290]
Feb 13 10:12:08 kernel: [*02/13/2023 10:12:08.6290] CAPWAP State: Run
Feb 13 10:12:08 kernel: [*02/13/2023 10:12:08.7340] AP has joined controller nfe-gcp01-wc01
Feb 13 10:12:08 kernel: [*02/13/2023 10:12:08.8350] Flexconnect Switching to Connected Mode!
Feb 13 10:12:09 kernel: [*02/13/2023 10:12:09.6180] Previous AP mode is 2, change to 2
Feb 13 10:12:09 capwapd[5288]: Check lagloadbalance setting flex mode 1 cfg 0 linkstate 1 ap_type 86
Feb 13 10:12:09 kernel: [*02/13/2023 10:12:09.6370] capwapd: reading file /click/nss_lag_control/lacp_state: No such file or directory
Feb 13 10:12:09 kernel: [*02/13/2023 10:12:09.6370] capwapd: reading file /click/nss_lag_control/capwap_state: No such file or directory
Feb 13 10:12:09 kernel: [*02/13/2023 10:12:09.8420] systemd[1]: Starting Cisco syslogd watcher...
12-26-2023 11:45 PM
experiencing same issue here
02-13-2023 04:54 AM - edited 02-13-2023 04:55 AM
And that confirms you have packet loss which is why you're getting retransmissions.
So you need to work out why that site is suffering from packet loss - line quality, not enough bandwidth etc etc.
The APs dropping off is a symptom - not the root cause of the problem.
If you have QOS in place then CAPWAP control (5246) should be prioritised and not dropped with other user traffic (5247)
02-13-2023 06:49 AM
Hello
I have enabled link latency on all aps but i dont find any latency or packet loss issue on site
02-13-2023 07:12 AM
AP link latency doesn't show packet loss.
You need to troubleshoot that the old fashioned way.
02-13-2023 08:58 AM
If you are load-balancing the traffic across multiple WAN links, I would advise to disable and check. Make sure that on the transport tunnel, BFD path MTU discovery is used.
If not open a TAC case with SD-WAN team and ask them to collaborate with a wireless TAC engineer.
02-14-2023 11:30 PM
No we are not load balancing its and active-standby setup
Its already enabled BFD path MTU discovery
02-21-2023 06:19 AM
Do i need to worry about these malformed packets
These are seen during disconnections
02-21-2023 07:04 AM - edited 02-21-2023 07:05 AM
Hard to say without seeing the detail of the packets - need to see why Wireshark thinks they're malformed. Sometimes Wireshark can think a packet is malformed simply because the protocol dissector it's using doesn't recognise something in the protocol which is why there are protocol updates in every release of Wireshark.
But it is a striking coincidence if they occur at the same time as the drops so it probably is related. Packet captures at each end might reveal the problem - if the packet is good when leaving the AP but malformed when arriving at the WLC then you know something in the path is corrupting the packet. This can happen whenever a device is rewriting a packet - for example NAT. If it's already corrupted when leaving the AP then it's a bug in the AP software generating the malformed packet and only Cisco can help you with that. FYI: Note that some of the 18xx AP hardware failures (usually dead radio) result in malformed CAPWAP packets (missing whole required parts) but that would usually be on the CAPWAP control (5246) not CAPWAP data (5247) and seen at join time.
06-08-2023 08:05 AM
Hello All
The issue is resolved now or may i say we found a workaround.
The Aps are connected to a stack of 2x9300 switches.
Earlier the aps where connected to master switch and often get disconnected
I moved them to standby now and are working perfectly fine
I know its strange but it works for me now
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide