cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
3813
Views
15
Helpful
18
Replies

Cisco C9120AXI-E get disconnected from wlc c9800L intermittently

gaurav mahajan1
Level 1
Level 1

Hello Folks,

We are observing a strange issue where cisco C9120AXI-E disconnects from wlc 9800L which is in GCP intermittently through out the day.
We have a cisco sdwan overlay which connect GCP vedge to remote site vedge via ipsec tunnels

gauravmahajan1_0-1675773807991.pnggauravmahajan1_1-1675773853798.png

 



C9800-CL-K9_IOSXE Version 17.3.5a

Please help 


18 Replies 18

marce1000
Hall of Fame
Hall of Fame

 

 - If  the AP's are behind the tunnel  then capwap latency may get exceeded (max 300ms (corrected))  , below some general troubleshooting info;s added :

  
    - For starters have a checkup of the current controller configuration  with the CLI command : show  tech   wireless , have the output analyzed by  https://cway.cisco.com/tools/WirelessAnalyzer/  , please note do not use classical show tech-support (short version) , use the command denoted in green for Wireless Analyzer.               Checkout all advisories!

                  Checkout : https://logadvisor.cisco.com/logadvisor/wireless/9800/9800APJoin

                      The important ones are summarized here 
                            show wireless stats ap join summary
                            show wireless dtls connections
                            show platform hardware chassis active qfp feature wireless capwap datapath statistics drop all
                            show platform hardware chassis active qfp feature wireless dtls datapath statistics all 
                            show platform hardware chassis active qfp feature wireless capwap datapath mac-address <APradio-mac> details
                           show platform hardware chassis active qfp feature wireless capwap datapath mac-address <APradio-mac>
statistics
                            show platform hardware chassis active qfp feature wireless wlclient datapath statistics drop all 

   And last but not least check the controller logs too



-- Each morning when I wake up and look into the mirror I always say ' Why am I so brilliant ? '
    When the mirror will then always repond to me with ' The only thing that exceeds your brilliance is your beauty! '

Scott Fella
Hall of Fame
Hall of Fame

Also take a packet capture on the tunnel to see if it is dropping any packets to or from the ap.  Check your mtu and tcp mss to also validate each end it configured properly.

-Scott
*** Please rate helpful posts ***

gaurav mahajan1
Level 1
Level 1

We are facing this issue from only one site out of the 33 
I have enabled the link latency 
Sometimes the AP join time take close to 15-30 mins

gauravmahajan1_0-1676274960317.png

 

Rich R
VIP
VIP

sounds like you might have packet loss from that site then.

gaurav mahajan1
Level 1
Level 1

Hello All

I ran some capture on both WLC and cisco switch to which the aps are connected
unable to attach pcaps  

AP ip: 172.30.54.20/21/22/23
WLc1 ip: 10.250.8.4 Wlc 2 ip: 10.250.8.5 in n+1 setup
Local Dhcp server for aps: 10.54.1.80 with opt 43

Time stamp when ap disconnects from wlc

gauravmahajan1_0-1676286691675.png

gauravmahajan1_1-1676286749972.png

Logs from AP
Feb 13 10:06:56 kernel: [*02/13/2023 10:06:56.5170] Re-Tx Count=1, Max Re-Tx Value=5, SendSeqNum=20, NumofPendingMsgs=3
Feb 13 10:06:56 kernel: [*02/13/2023 10:06:56.5170]
Feb 13 10:06:59 kernel: [*02/13/2023 10:06:59.3680] Re-Tx Count=2, Max Re-Tx Value=5, SendSeqNum=21, NumofPendingMsgs=4
Feb 13 10:06:59 kernel: [*02/13/2023 10:06:59.3680]
Feb 13 10:07:02 kernel: [*02/13/2023 10:07:02.2190] Re-Tx Count=3, Max Re-Tx Value=5, SendSeqNum=21, NumofPendingMsgs=4
Feb 13 10:07:02 kernel: [*02/13/2023 10:07:02.2190]
Feb 13 10:07:05 kernel: [*02/13/2023 10:07:05.0700] Re-Tx Count=4, Max Re-Tx Value=5, SendSeqNum=21, NumofPendingMsgs=4
Feb 13 10:07:05 kernel: [*02/13/2023 10:07:05.0700]
Feb 13 10:07:07 kernel: [*02/13/2023 10:07:07.9210] Re-Tx Count=5, Max Re-Tx Value=5, SendSeqNum=21, NumofPendingMsgs=4
Feb 13 10:07:07 kernel: [*02/13/2023 10:07:07.9210]
Feb 13 10:07:10 kernel: [*02/13/2023 10:07:10.7710] Max retransmission count exceeded, going back to DISCOVER mode.
Feb 13 10:07:10 kernel: [*02/13/2023 10:07:10.7710] Dropping msg CAPWAP_WTP_EVENT_REQUEST, type = 34, len = 208, eleLen = 216, sendSeqNum = 22
Feb 13 10:07:10 kernel: [*02/13/2023 10:07:10.7710] ....TLV: TLV_FLEX_MCUC_PAYLOAD(281), level: 0, seq: 0, nested: true
Feb 13 10:07:10 kernel: [*02/13/2023 10:07:10.7710] Dropping msg CAPWAP_WTP_EVENT_REQUEST, type = 34, len = 388, eleLen = 396, sendSeqNum = 22
Feb 13 10:07:10 kernel: [*02/13/2023 10:07:10.7720] ....TLV: TLV_FLEX_IPV6_MCUC_PAYLOAD(2460), level: 0, seq: 0, nested: true
Feb 13 10:07:10 kernel: [*02/13/2023 10:07:10.7720] Dropping msg CAPWAP_WTP_EVENT_REQUEST, type = 34, len = 14, eleLen = 22, sendSeqNum = 22
Feb 13 10:07:10 kernel: [*02/13/2023 10:07:10.7720] Dropping msg CAPWAP_WTP_EVENT_REQUEST, type = 34, len = 14, eleLen = 22, sendSeqNum = 22
Feb 13 10:07:10 kernel: [*02/13/2023 10:07:10.7720] Dropping msg CAPWAP_WTP_EVENT_REQUEST, type = 6, len = 1368, eleLen = 1376, sendSeqNum = 22
Feb 13 10:07:10 kernel: [*02/13/2023 10:07:10.7720] ...Vendor SubType: AP_PATH_MTU_PAYLOAD(73) len: 1364 vendId 409600
Feb 13 10:07:10 kernel: [*02/13/2023 10:07:10.7720] Flexconnect Switching to Standalone Mode!
Feb 13 10:07:10 kernel: [*02/13/2023 10:07:10.9150]
Feb 13 10:07:10 kernel: [*02/13/2023 10:07:10.9150] CAPWAP State: DTLS Teardown
Feb 13 10:07:10 NCI: CLEANAIR: Slot 0 CAPWAP down
Feb 13 10:07:10 NCI: I0: shutdownNci
Feb 13 10:07:10 upgrade: Script called with args:[ABORT]
Feb 13 10:07:10 kernel: [*02/13/2023 10:07:10.9880] upgrade.sh: Script called with args:[ABORT]
Feb 13 10:07:11 kernel: [*02/13/2023 10:07:11.0290] do ABORT, part1 is active part
Feb 13 10:07:11 upgrade: Cleanup tmp files ...
Feb 13 10:07:11 kernel: [*02/13/2023 10:07:11.0590] upgrade.sh: Cleanup tmp files ...
Feb 13 10:07:25 kernel: [*02/13/2023 10:07:25.5430] systemd[1]: Starting dhcpv6 client watcher...
Feb 13 10:07:25 kernel: [*02/13/2023 10:07:25.5560] systemd[1]: Stopping DHCPv6 client...
Feb 13 10:07:25 kernel: [*02/13/2023 10:07:25.5600] systemd[1]: Starting DHCPv6 client...
Feb 13 10:07:25 kernel: [*02/13/2023 10:07:25.5950] systemd[1]: Started DHCPv6 client.
Feb 13 10:07:25 kernel: [*02/13/2023 10:07:25.6170] systemd[1]: Started dhcpv6 client watcher.
Feb 13 10:08:05 kernel: [*02/13/2023 10:08:05.0620] systemd[1]: Starting dhcpv6 client watcher...
Feb 13 10:08:05 kernel: [*02/13/2023 10:08:05.0730] systemd[1]: Stopping DHCPv6 client...
Feb 13 10:08:05 kernel: [*02/13/2023 10:08:05.0770] systemd[1]: Starting DHCPv6 client...
Feb 13 10:08:05 kernel: [*02/13/2023 10:08:05.1100] systemd[1]: Started DHCPv6 client.
Feb 13 10:08:05 kernel: [*02/13/2023 10:08:05.1290] systemd[1]: Started dhcpv6 client watcher.
Feb 13 10:08:15 kernel: [*02/13/2023 10:08:15.0630] systemd[1]: Starting dhcpv6 client watcher...
Feb 13 10:08:15 kernel: [*02/13/2023 10:08:15.0740] systemd[1]: Stopping DHCPv6 client...
Feb 13 10:08:15 kernel: [*02/13/2023 10:08:15.0780] systemd[1]: Starting DHCPv6 client...
Feb 13 10:08:15 kernel: [*02/13/2023 10:08:15.1130] systemd[1]: Started DHCPv6 client.
Feb 13 10:08:15 kernel: [*02/13/2023 10:08:15.1320] systemd[1]: Started dhcpv6 client watcher.
Feb 13 10:08:54 kernel: [*02/13/2023 10:08:54.4780] systemd[1]: Starting dhcpv6 client watcher...
Feb 13 10:08:54 kernel: [*02/13/2023 10:08:54.4900] systemd[1]: Stopping DHCPv6 client...
Feb 13 10:08:54 kernel: [*02/13/2023 10:08:54.4950] systemd[1]: Starting DHCPv6 client...
Feb 13 10:08:54 kernel: [*02/13/2023 10:08:54.5280] systemd[1]: Started DHCPv6 client.
Feb 13 10:08:54 kernel: [*02/13/2023 10:08:54.5480] systemd[1]: Started dhcpv6 client watcher.
Feb 13 10:09:04 kernel: [*02/13/2023 10:09:04.4800] systemd[1]: Starting dhcpv6 client watcher...
Feb 13 10:09:04 kernel: [*02/13/2023 10:09:04.4930] systemd[1]: Stopping DHCPv6 client...
Feb 13 10:09:04 kernel: [*02/13/2023 10:09:04.4960] systemd[1]: Starting DHCPv6 client...
Feb 13 10:09:04 kernel: [*02/13/2023 10:09:04.5300] systemd[1]: Started DHCPv6 client.
Feb 13 10:09:04 kernel: [*02/13/2023 10:09:04.5500] systemd[1]: Started dhcpv6 client watcher.
Feb 13 10:09:43 kernel: [*02/13/2023 10:09:43.8940] systemd[1]: Starting dhcpv6 client watcher...
Feb 13 10:09:43 kernel: [*02/13/2023 10:09:43.9050] systemd[1]: Stopping DHCPv6 client...
Feb 13 10:09:43 kernel: [*02/13/2023 10:09:43.9090] systemd[1]: Starting DHCPv6 client...
Feb 13 10:09:43 kernel: [*02/13/2023 10:09:43.9430] systemd[1]: Started DHCPv6 client.
Feb 13 10:09:43 kernel: [*02/13/2023 10:09:43.9630] systemd[1]: Started dhcpv6 client watcher.
Feb 13 10:09:53 kernel: [*02/13/2023 10:09:53.8960] systemd[1]: Starting dhcpv6 client watcher...
Feb 13 10:09:53 kernel: [*02/13/2023 10:09:53.9080] systemd[1]: Stopping DHCPv6 client...
Feb 13 10:09:53 kernel: [*02/13/2023 10:09:53.9120] systemd[1]: Starting DHCPv6 client...
Feb 13 10:09:53 kernel: [*02/13/2023 10:09:53.9450] systemd[1]: Started DHCPv6 client.
Feb 13 10:09:53 kernel: [*02/13/2023 10:09:53.9640] systemd[1]: Started dhcpv6 client watcher.
Feb 13 10:10:33 kernel: [*02/13/2023 10:10:33.3110] systemd[1]: Starting dhcpv6 client watcher...
Feb 13 10:10:33 kernel: [*02/13/2023 10:10:33.3230] systemd[1]: Stopping DHCPv6 client...
Feb 13 10:10:33 kernel: [*02/13/2023 10:10:33.3270] systemd[1]: Starting DHCPv6 client...
Feb 13 10:10:33 kernel: [*02/13/2023 10:10:33.3600] systemd[1]: Started DHCPv6 client.
Feb 13 10:10:33 kernel: [*02/13/2023 10:10:33.3800] systemd[1]: Started dhcpv6 client watcher.
Feb 13 10:10:43 kernel: [*02/13/2023 10:10:43.3130] systemd[1]: Starting dhcpv6 client watcher...
Feb 13 10:10:43 kernel: [*02/13/2023 10:10:43.3250] systemd[1]: Stopping DHCPv6 client...
Feb 13 10:10:43 kernel: [*02/13/2023 10:10:43.3290] systemd[1]: Starting DHCPv6 client...
Feb 13 10:10:43 kernel: [*02/13/2023 10:10:43.3630] systemd[1]: Started DHCPv6 client.
Feb 13 10:10:43 kernel: [*02/13/2023 10:10:43.3820] systemd[1]: Started dhcpv6 client watcher.
Feb 13 10:11:22 kernel: [*02/13/2023 10:11:22.7280] systemd[1]: Starting dhcpv6 client watcher...
Feb 13 10:11:22 kernel: [*02/13/2023 10:11:22.7400] systemd[1]: Stopping DHCPv6 client...
Feb 13 10:11:22 kernel: [*02/13/2023 10:11:22.7440] systemd[1]: Starting DHCPv6 client...
Feb 13 10:11:22 kernel: [*02/13/2023 10:11:22.7780] systemd[1]: Started DHCPv6 client.
Feb 13 10:11:22 kernel: [*02/13/2023 10:11:22.7970] systemd[1]: Started dhcpv6 client watcher.
Feb 13 10:11:32 kernel: [*02/13/2023 10:11:32.7290] systemd[1]: Starting dhcpv6 client watcher...
Feb 13 10:11:32 kernel: [*02/13/2023 10:11:32.7410] systemd[1]: Stopping DHCPv6 client...
Feb 13 10:11:32 kernel: [*02/13/2023 10:11:32.7450] systemd[1]: Starting DHCPv6 client...
Feb 13 10:11:32 kernel: [*02/13/2023 10:11:32.7790] systemd[1]: Started DHCPv6 client.
Feb 13 10:11:32 kernel: [*02/13/2023 10:11:32.8030] systemd[1]: Started dhcpv6 client watcher.
Feb 13 10:11:47 kernel: [*02/13/2023 10:11:47.8550] Discovery Response from 10.250.8.4
Feb 13 10:11:47 kernel: [*02/13/2023 10:11:47.8560] Discovery Response from 10.250.8.5
Feb 13 10:11:58 kernel: [*02/13/2023 10:11:58.0000]
Feb 13 10:11:58 kernel: [*02/13/2023 10:11:58.0000] CAPWAP State: DTLS Setup
Feb 13 10:11:58 kernel: [*02/13/2023 10:11:58.4970]
Feb 13 10:11:58 kernel: [*02/13/2023 10:11:58.4970] CAPWAP State: Join
Feb 13 10:11:58 kernel: [*02/13/2023 10:11:58.5000] Sending Join request to 10.250.8.4 through port 5265
Feb 13 10:12:03 kernel: [*02/13/2023 10:12:03.1140] Sending Join request to 10.250.8.4 through port 5265
Feb 13 10:12:07 kernel: [*02/13/2023 10:12:07.8660] Sending Join request to 10.250.8.4 through port 5265
Feb 13 10:12:07 kernel: [*02/13/2023 10:12:07.8990] Join Response from 10.250.8.4
Feb 13 10:12:07 kernel: [*02/13/2023 10:12:07.8990] AC accepted join request with result code: 0
Feb 13 10:12:07 kernel: [*02/13/2023 10:12:07.9010] Received wlcType 0, timer 30
Feb 13 10:12:07 kernel: [*02/13/2023 10:12:07.9760]
Feb 13 10:12:07 kernel: [*02/13/2023 10:12:07.9760] CAPWAP State: Image Data
Feb 13 10:12:07 kernel: [*02/13/2023 10:12:07.9760] AP image version 17.3.5.42 backup 17.3.2.32, Controller 17.3.5.42
Feb 13 10:12:07 kernel: [*02/13/2023 10:12:07.9760] Version is the same, do not need update.
Feb 13 10:12:07 upgrade: Script called with args:[NO_UPGRADE]
Feb 13 10:12:08 kernel: [*02/13/2023 10:12:08.0140] upgrade.sh: Script called with args:[NO_UPGRADE]
Feb 13 10:12:08 kernel: [*02/13/2023 10:12:08.0550] do NO_UPGRADE, part1 is active part
Feb 13 10:12:08 kernel: [*02/13/2023 10:12:08.0600]
Feb 13 10:12:08 kernel: [*02/13/2023 10:12:08.0600] CAPWAP State: Configure
Feb 13 10:12:08 capwapd[5288]: Check lagloadbalance setting flex mode 1 cfg 0 linkstate 1 ap_type 86
Feb 13 10:12:08 kernel: [*02/13/2023 10:12:08.0790] capwapd: reading file /click/nss_lag_control/lacp_state: No such file or directory
Feb 13 10:12:08 kernel: [*02/13/2023 10:12:08.0790] capwapd: reading file /click/nss_lag_control/capwap_state: No such file or directory
Feb 13 10:12:08 kernel: [*02/13/2023 10:12:08.6290]
Feb 13 10:12:08 kernel: [*02/13/2023 10:12:08.6290] CAPWAP State: Run
Feb 13 10:12:08 kernel: [*02/13/2023 10:12:08.7340] AP has joined controller nfe-gcp01-wc01
Feb 13 10:12:08 kernel: [*02/13/2023 10:12:08.8350] Flexconnect Switching to Connected Mode!
Feb 13 10:12:09 kernel: [*02/13/2023 10:12:09.6180] Previous AP mode is 2, change to 2
Feb 13 10:12:09 capwapd[5288]: Check lagloadbalance setting flex mode 1 cfg 0 linkstate 1 ap_type 86
Feb 13 10:12:09 kernel: [*02/13/2023 10:12:09.6370] capwapd: reading file /click/nss_lag_control/lacp_state: No such file or directory
Feb 13 10:12:09 kernel: [*02/13/2023 10:12:09.6370] capwapd: reading file /click/nss_lag_control/capwap_state: No such file or directory
Feb 13 10:12:09 kernel: [*02/13/2023 10:12:09.8420] systemd[1]: Starting Cisco syslogd watcher...

experiencing same issue here

Chrioni Siakla
Network Infrastructure– Senior Network specialist

Lihir Island New Ireland Province
PO Box 789 Port Moresby 121
Papua New Guinea

NEWMONT.COM

Rich R
VIP
VIP

And that confirms you have packet loss which is why you're getting retransmissions.

So you need to work out why that site is suffering from packet loss - line quality, not enough bandwidth etc etc.

The APs dropping off is a symptom - not the root cause of the problem.

If you have QOS in place then CAPWAP control (5246) should be prioritised and not dropped with other user traffic (5247)

gaurav mahajan1
Level 1
Level 1

Hello

I have enabled link latency on all aps but i dont find any latency or packet loss issue on site

gauravmahajan1_0-1676299718627.png

 

Rich R
VIP
VIP

AP link latency doesn't show packet loss.
You need to troubleshoot that the old fashioned way.

Arshad Safrulla
VIP Alumni
VIP Alumni

If you are load-balancing the traffic across multiple WAN links, I would advise to disable and check. Make sure that on the transport tunnel, BFD path MTU discovery is used.

If not open a TAC case with SD-WAN team and ask them to collaborate with a wireless TAC engineer.

gaurav mahajan1
Level 1
Level 1

No we are not load balancing its and active-standby setup
Its already enabled BFD path MTU discovery


gaurav mahajan1
Level 1
Level 1

gauravmahajan1_0-1676989106726.png



Do i need to worry about these malformed packets

These are seen during disconnections

Rich R
VIP
VIP

Hard to say without seeing the detail of the packets - need to see why Wireshark thinks they're malformed. Sometimes Wireshark can think a packet is malformed simply because the protocol dissector it's using doesn't recognise something in the protocol which is why there are protocol updates in every release of Wireshark.

But it is a striking coincidence if they occur at the same time as the drops so it probably is related.  Packet captures at each end might reveal the problem - if the packet is good when leaving the AP but malformed when arriving at the WLC then you know something in the path is corrupting the packet.  This can happen whenever a device is rewriting a packet - for example NAT.  If it's already corrupted when leaving the AP then it's a bug in the AP software generating the malformed packet and only Cisco can help you with that.  FYI: Note that some of the 18xx AP hardware failures (usually dead radio) result in malformed CAPWAP packets (missing whole required parts) but that would usually be on the CAPWAP control (5246) not CAPWAP data (5247) and seen at join time.

gaurav mahajan1
Level 1
Level 1

Hello All

The issue is resolved now or may i say we found a workaround.

The Aps are connected to a stack of 2x9300 switches.

Earlier the aps where connected to master switch and often get disconnected 
I moved them to standby now and are working perfectly fine
I know its strange but it works for me now




Review Cisco Networking for a $25 gift card