ā04-10-2025 05:35 AM
Hi team,
we have installed c9800 HA pair 2 months ago, running 17.12.4 version.
We are facing now random reloads of standby WLC every few hours with this info:
Reload History:
Reload Code: Reload
Reload Description: Reload Command - RIF: Bulk Sync, system report at bootflash:core/XXXXX
Reload Severity: Normal Reboot
Reload Time: 05:21:52 CEST Wed Apr 9 2025
As active WLC could see keep alive missing packets in logs, we got recommendation to increase keep alive timer in GUI (Administration -> Device -> Redundancy -> Keep alive timer) to 10 (currently set to 1).
My question: would this update of running HA pair cause some downtime for active WLC / HA pair / associated clients?
Also, is there a way how to test RP cable (direct connection between A/S WLCs) to prove cable issues or we should just exchange cable?
Thank you
ā04-10-2025 06:05 AM
Hi,
The keepalive timer is only used for communication between the active and standby WLCs, and updating this timer just adjusts how long the active waits before declaring the standby "down".
It should not affect the active WLC or connected clients, and for best practice i think you should do it during low usage and montor closely.
Regarding cable test, are you thinking about something like "test cable diagnostics tdr" on a switch? Dont think there is any support on WLC regarding that.
Have you seen any unnusual outputs from:
show redundancy history
show redundancy summary
Any interface errors on show logging?
ā04-10-2025 06:55 AM - edited ā04-10-2025 07:57 AM
- I can't see a correlation with the upgrade ; you can run CLI :
test wireless redundancy rping
And keep it running for a while. Also check counters on both RP ports on the switches.
- Always validate the configuration of the primary controller with the CLI command :
show tech wireless
Feed the output from that into https://cway.cisco.com/wireless-config-analyzer/
M.
ā04-10-2025 07:07 AM
If your environment is experiencing no issues, then you are in a good spot. You should probably open a TAC case also just in case. If you know that your secondary is an issue, or at least you see if restarting. Have you tried to failover to see if the issue persists with the new primary or does it happen with the same unit which would be the primary after you did a failover. This would be a good check to validate that your wireless is either still working well after a failover or not. If you happen to still see issues with that one unit that is your current secondary, I personally would remove it from the cluster, factory reset it and put on the basic configs so that you can enable SSO again. Like what the others also mentioned, you should check all your cabling and see if you see any interface errors, if not, then a factory reset of that unit would probably be the way to help determine if maybe your secondary needs to be RMA's, but that would be where TAC can help. TAC can look at the crash files.
ā04-10-2025 08:01 AM
- Adding a number of useful commands when troubleshooting HA SSO
show redundancy | i ptime|Location|Current Software state|Switchovers
show chassis
show chassis detail
show chassis ha-status local
show chassis ha-status active
show chassis ha-status standby
show chassis rmi
show redundancy
show redundancy history
show redundancy switchover history
show tech wireless redundancy
show redundancy states
show platform hardware slot R0 ha_port interface stats
show platform hardware slot R0 ha_port sfp idprom (when using fiber)
M.
ā04-10-2025 10:27 AM
Changing KA timers does not need a downtime. It can be done anytime. Based on the logs it looks like during every reload a system report is getting generated.
ā04-10-2025 10:53 PM
Thanks for all inputs,
I edited timer and retries to maximum - 10. Im gonna monitor the HA pair now, but I noticed yesterday some drops for keepalive counter:
WC-01#show platform software stack-mgr chassis active R0 sdp-counters
Stack Discovery Protocol (SDP) Counters
---------------------------------------
Message Tx Success Tx Fail Rx Success Rx Fail
------------------------------------------------------------------------------
Discovery 6374 2 177 0
Neighbor 17951390 2 8975605 0
Keepalive 2830439 17169 2820659 0
SEPPUKU 0 0 0 0
Standby Elect Req 11 0 0 0
Standby Elect Ack 0 0 11 0
Standby IOS State 0 0 22 0
Reload Req 0 0 0 0
Reload Ack 0 0 0 0
SESA Mesg 0 0 0 0
RTU Msg 0 0 0 0
Disc Timer Stop 1 0 11 0
Is it showing faulty RP cable?
If so, what are steps for replacement on running and operational HA pair? Just exchange cable and repatch it to WLCs? I assume, that some downtime is needed.
ā04-10-2025 11:26 PM
- @Scorpioo >...If so, what are steps for replacement on running and operational HA pair?
Simply power down the standby only and replace the cable , then power it on again
This will have no impact, use my bunch of diagnostic commands send earlier
to verify statuses,
M.
ā05-08-2025 10:38 PM
So, we exchange the RP cable for new one, issue still persists.
WLC pairs are in different floors, we are using Cat6 cable with length more than 60m to connect them and not even directly but via patch panels: WLC1 -> patch cable -> Cat6 RP cable between floors -> patch cable -> WLC2.
Could be issue with losing RP communication related to this physical connection?
ā05-08-2025 11:48 PM
- @Scorpioo Check the output of the command show redundancy config-sync failures prc
M.
ā05-08-2025 11:54 PM
WC-01#show redundancy config-sync failures prc
PRC Failed Command List
-----------------------
The list is Empty
ā05-09-2025 12:06 AM
- @Scorpioo That's OK, if you suspect problems with the physical connection then look at the port counters
of the RP port(s) on the switches.
And look at the output of the command : show platform hardware slot R0 ha_port interface stats
M.
ā05-09-2025 12:22 AM - edited ā05-09-2025 12:22 AM
There are no indications or error counters, except I see multiple Tx fails on active WLC:
WC-01#show platform software stack-mgr chassis active R0 sdp-counters
Stack Discovery Protocol (SDP) Counters
---------------------------------------
Message Tx Success Tx Fail Rx Success Rx Fail
------------------------------------------------------------------------------
Discovery 1883 2 238 0
Neighbor 24047679 55 12023734 0
Keepalive 396228 2416 394779 0
ā05-09-2025 12:25 AM
- @Scorpioo That's not the output from show platform hardware slot R0 ha_port interface stats
M.
ā05-09-2025 12:28 AM
HA Port
ha_port: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 169.254.141.20 netmask 255.255.255.0 broadcast 169.254.141.255
inet6 fe80::fac6:50ff:fe22:2500 prefixlen 64 scopeid 0x20<link>
ether f8:c6:50:22:25:00 txqueuelen 1000 (Ethernet)
RX packets 50849464 bytes 6738675926 (6.2 GiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 53788973 bytes 16471078111 (15.3 GiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
device memory 0xe1800000-e2000000
Settings for ha_port:
Supported ports: [ TP ]
Supported link modes: 10baseT/Half 10baseT/Full
100baseT/Half 100baseT/Full
1000baseT/Full
Supported pause frame use: Symmetric
Supports auto-negotiation: Yes
Supported FEC modes: Not reported
Advertised link modes: 10baseT/Half 10baseT/Full
100baseT/Half 100baseT/Full
1000baseT/Full
Advertised pause frame use: Symmetric
Advertised auto-negotiation: Yes
Advertised FEC modes: Not reported
Speed: 1000Mb/s
Duplex: Full
Auto-negotiation: on
Port: Twisted Pair
PHYAD: 1
Transceiver: internal
MDI-X: on (auto)
Supports Wake-on: pumbg
Wake-on: g
Current message level: 0x00000007 (7)
drv probe link
Link detected: yes
NIC statistics:
rx_packets: 50849465
tx_packets: 53788975
rx_bytes: 6942073876
tx_bytes: 16686251472
rx_broadcast: 14
tx_broadcast: 814
rx_multicast: 22
tx_multicast: 260
multicast: 22
collisions: 0
rx_crc_errors: 0
rx_no_buffer_count: 0
rx_missed_errors: 0
tx_aborted_errors: 0
tx_carrier_errors: 0
tx_window_errors: 0
tx_abort_late_coll: 0
tx_deferred_ok: 0
tx_single_coll_ok: 0
tx_multi_coll_ok: 0
tx_timeout_count: 0
rx_long_length_errors: 0
rx_short_length_errors: 0
rx_align_errors: 0
tx_tcp_seg_good: 0
tx_tcp_seg_failed: 0
rx_flow_control_xon: 0
rx_flow_control_xoff: 0
tx_flow_control_xon: 0
tx_flow_control_xoff: 0
rx_long_byte_count: 6942073876
tx_dma_out_of_sync: 0
tx_smbus: 0
rx_smbus: 0
dropped_smbus: 0
os2bmc_rx_by_bmc: 0
os2bmc_tx_by_bmc: 0
os2bmc_tx_by_host: 0
os2bmc_rx_by_host: 0
tx_hwtstamp_timeouts: 0
rx_hwtstamp_cleared: 0
rx_errors: 0
tx_errors: 0
tx_dropped: 0
rx_length_errors: 0
rx_over_errors: 0
rx_frame_errors: 0
rx_fifo_errors: 0
tx_fifo_errors: 0
tx_heartbeat_errors: 0
tx_queue_0_packets: 41049411
tx_queue_0_bytes: 10760587922
tx_queue_0_restart: 0
tx_queue_1_packets: 12739564
tx_queue_1_bytes: 5710492364
tx_queue_1_restart: 0
rx_queue_0_packets: 50849465
rx_queue_0_bytes: 6738676016
rx_queue_0_drops: 0
rx_queue_0_csum_err: 0
rx_queue_0_alloc_failed: 0
rx_queue_1_packets: 0
rx_queue_1_bytes: 0
rx_queue_1_drops: 0
rx_queue_1_csum_err: 0
rx_queue_1_alloc_failed: 0
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide