EWC AP C9120AXI 17.7.1 crash due to uplink instability report

joey.debra · ‎02-21-2022

Hi guys,

I was prepping a bunch of EWC AP's who don't have a license because it is not needed and no support contract.

However while prepping them I ran into crashes on the master AP and wish to report is somewhere:

Basically the EWC setup is working, however the AP that is acting as preferred master and actual master was having issues with it's uplink cable. The solution was rebooting several times so I inspected the logging on the AP itself.

Apparently there were alot lost keepalives and link down and up events like this:

[*02/21/2022 08:00:18.9375] wired0 (Ext switch port: 7) (Logical Port: 15) (phyId: 1f) Link DOWN.
[*02/21/2022 08:00:18.9375] ===> Activate Deep Green Mode
[*02/21/2022 08:00:18.9375] bcmswlpbk0 (Ext switch port: (Logical Port: Virtual link DOWN
[*02/21/2022 08:00:22.0456] <=== Deactivate Deep Green Mode
[*02/21/2022 08:00:22.0456] bcmswlpbk0 (Ext switch port: (Logical Port: Virtual link UP
[*02/21/2022 08:00:22.0456] wired0 (Ext switch port: 7) (Logical Port: 15) (phyId: 1f) Link Up at 1000 mbps full duplex
[*02/21/2022 08:00:22.6776] Re-Tx Count=1, Max Re-Tx Value=5, SendSeqNum=159, NumofPendingMsgs=2
[*02/21/2022 08:00:22.6776]
[*02/21/2022 08:00:24.2656] ethernet_port wired0, ip 10.2.11.101, netmask 255.255.255.0, gw 10.2.11.1, mtu 1500, bcast 10.2.11.255, dns1 0.0.0.0, is_static true, vid 0, static_ip_failover false, dhcp_vlan_failover false
[*02/21/2022 08:00:24.2676] Controller ip address changed to [10.2.11.100].
[*02/21/2022 08:00:24.2906] !!!!! {/opt/cisco/bin/capwap_brain} forkexec failed with status 256 cmd -ip netns exec ewlcme ifconfig mgmt 10.2.11.100 netmask 255.255.255.255
[*02/21/2022 08:00:24.5106] ethernet_port wired0, ip 10.2.11.101, netmask 255.255.255.0, gw 10.2.11.1, mtu 1500, bcast 10.2.11.255, dns1 0.0.0.0, is_static true, vid 0, static_ip_failover false, dhcp_vlan_failover false
[*02/21/2022 08:00:24.5126] Controller ip address changed to [10.2.11.100].
[*02/21/2022 08:00:24.5396] !!!!! {/opt/cisco/bin/capwap_brain} forkexec failed with status 256 cmd -ip netns exec ewlcme ifconfig mgmt 10.2.11.100 netmask 255.255.255.255
[*02/21/2022 08:00:25.5286] Re-Tx Count=2, Max Re-Tx Value=5, SendSeqNum=163, NumofPendingMsgs=6

It seems to come back mostly but after a while it came back negotiated at 100Mbps instead of 1000Mbps and then it just crashes.

Feb 21 08:23:36.923: %STACKMGR-6-KA_MISSED: Chassis 1 R0/0: stack_mgr: Keepalive missed for 2 times for Chassis 2
Feb 21 08:23:37.925: %STACKMGR-6-KA_MISSED: Chassis 1 R0/0: stack_mgr: Keepalive missed for 7 times for Chassis 2
Feb 21 08:23:38.535: %IOSXE_REDUNDANCY-6-PEER_LOST: Active detected chassis 2 is no longer standby
Feb 21 08:23:38.613: %REDUNDANCY-3-STANDBY_LOST: Standby processor fault (PEER_NOT_PRESENT)
Feb 21 08:23:38.613: %REDUNDANCY-3-REDUNDANCY_ALARMS: Unable to assert REDUNDANCY alarm

Feb 21 08:23:38.613: %REDUNDANCY-3-STANDBY_LOST: Standby processor fault (PEER_DOWN)
Feb 21 08:23:38.613: %REDUNDANCY-3-STANDBY_LOST: Standby processor fault (PEER_REDUNDANCY_STATE_CHANGE)
Feb 21 08:23:38.741: %RF-5-RF_RELOAD: Peer reload. Reason: EHSA standby down
Feb 21 08:23:38.528: %STACKMGR-6-CHASSIS_REMOVED: Chassis 1 R0/0: stack_mgr: Chassis 2 has been removed from the stack.
Feb 21 08:23:38.535: %STACKMGR-6-CHASSIS_REMOVED_KA: Chassis 1 R0/0: stack_mgr: Chassis 2 has been removed from the stack due to keepalive failure.[*02/21/2022 08:23:45.6938] ethernet_port wired0, ip 10.2.11.101, netmask 255.255.255.0, gw 10.2.11.1, mtu 1500, bcast 10.2.11.255, dns1 0.0.0.0, is_static true, vid 0, static_ip_failover false, dhcp_vlan_failover false
[*02/21/2022 08:23:45.6958] Controller ip address changed to [10.2.11.100].
[*02/21/2022 08:23:45.7218] !!!!! {/opt/cisco/bin/capwap_brain} forkexec failed with status 256 cmd -ip netns exec ewlcme ifconfig mgmt 10.2.11.100 netmask 255.255.255.255
[*02/21/2022 08:23:48.2478] Re-Tx Count=1, Max Re-Tx Value=5, SendSeqNum=247, NumofPendingMsgs=2
[*02/21/2022 08:23:48.2478]

Feb 21 08:23:48.540: %PEER_SELECTION-5-EWC_PEER_SELECTION_REMOVE_EV: Chassis 1 R0/0: wncd: REMOVE event: Standby no longer available. Starting over standby selection (internal AP, selection ENABLED)
Feb 21 08:24:03.012: %CAPWAPAC_SMGR_TRACE_MESSAGE-5-AP_JOIN_DISJOIN: Chassis 1 R0/0: wncd: AP Event: AP Name: AP-BRED21-02 Mac: 2c1a.05b1.9280 Session-IP: 10.2.11.102[5250] 10.2.11.100[5246] Disjoined Max Retransmission to AP
Feb 21 08:24:03.207: %PEER_SELECTION-5-EWC_PEER_SELECTION_REMOVE_EV: Chassis 1 R0/0: wncd: REMOVE event: AP 'AP-BRED21-02' is no longer a peer (internal AP, selection ENABLED)
Feb 21 08:24:18.541: %PEER_SELECTION-5-EWC_PEER_SELECTION_SELECT_EV: Chassis 1 R0/0: wncd: SELECT event: Candidate AP 'AP-BRED21-03' has been SELECTED as stand-by
Feb 21 08:24:28.543: %PEER_SELECTION-5-EWC_PEER_SELECTION_REMOVE_EV: Chassis 1 R0/0: wncd: REMOVE event: Standby no longer available. Starting over standby selection (internal AP, selection ENABLED)
Feb 21 08:24:29.327: %APMGR_TRACE_MESSAGE-3-AP_NTP_SYNC: Chassis 1 R0/0: wncd: AP AP-BRED21-01 MAC 2c1a.055d.1f00, NTP sync has failed. Reason: The NTP server is unreachable
Feb 21 08:24:29.856: %CAPWAPAC_SMGR_TRACE_MESSAGE-5-AP_JOIN_DISJOIN: Chassis 1 R0/0: wncd: AP Event: AP Name: AP-BRED21-03 Mac: 2c1a.055f.39c0 Session-IP: 10.2.11.103[5268] 10.2.11.100[5246] Disjoined Max Retransmission to AP
Feb 21 08:24:29.862: %PEER_SELECTION-5-EWC_PEER_SELECTION_REMOVE_EV: Chassis 1 R0/0: wncd: REMOVE event: AP 'AP-BRED21-03' is no longer a peer (internal AP, selection ENABLED)[*02/21/2022 08:24:52.2942] <=== Deactivate Deep Green Mode
[*02/21/2022 08:24:52.2942] bcmswlpbk0 (Ext switch port: (Logical Port: Virtual link UP
[*02/21/2022 08:24:52.2942] wired0 (Ext switch port: 7) (Logical Port: 15) (phyId: 1f) Link Up at 100 mbps full duplex
[*02/21/2022 08:[02/21/2022 08:24:53.2342] Unable to handle kernel paging request at virtual address 0006e8cc

That final line there is the crash immediately followed by a reboot.

marce1000 · ‎02-21-2022

- FYI : https://bst.cloudapps.cisco.com/bugsearch/bug/CSCvq82908 , go for 17.7.1.11

M.

-- Each morning when I wake up and look into the mirror I always say ' Why am I so brilliant ? '
When the mirror will then always repond to me with ' The only thing that exceeds your brilliance is your beauty! '

Rich R · ‎02-24-2022

I don't think it's that bug @marce1000 but it's certainly a software bug.

I'd rather focus on fixing the trigger which is going to be faulty cable or hardware.

I'm not aware of any route for opening a TAC case or bug if you don't have a contract.

You could try emailing the info to tac@cisco.com

For it to be any use to them you'd need to include a "show tech", "show tech wireless" and any crash files for them to decode.

------------------------------
Please click Helpful if this post helped you and Select as Solution (drop down menu at top right of this reply) if this answered your query.
------------------------------
TAC recommended codes for AireOS WLC's and TAC recommended codes for 9800 WLC's
Best Practices for AireOS WLC's, Best Practices for 9800 WLC's and Cisco Wireless compatibility matrix
Check your 9800 WLC config with Wireless Config Analyzer using "show tech wireless" output or "config paging disable" then "show run-config" output on AireOS and use Wireless Debug Analyzer to analyze your WLC client debugs
Field Notice: FN63942 APs and WLCs Fail to Create CAPWAP Connections Due to Certificate Expiration
Field Notice: FN72424 Later Versions of WiFi 6 APs Fail to Join WLC - Software Upgrade Required
Field Notice: FN72524 IOS APs stuck in downloading state after 4 Dec 2022 due to Certificate Expired
- Fixed in 8.10.196.0, latest 9800 releases, 8.5.182.12 (8.5.182.13 for 3504) and 8.5.182.109 (IRCM, 8.5.182.111 for 3504)
Field Notice: FN70479 AP Fails to Join or Joins with 1 Radio due to Country Mismatch, RMA needed
How to avoid boot loop due to corrupted image on Wave 2 and Catalyst 11ax Access Points (CSCvx32806)
Field Notice: FN74035 - Wave2 APs DFS May Not Detect Radar After Channel Availability Check Time
Leo's list of bugs affecting 2800/3800/4800/1560 APs
Default AP console baud rate from 17.12.x is 115200 - introduced by CSCwe88390