11-23-2024 04:25 AM
Hi I have a HA part of 9800 WLC's on vmware and for some reason, they keep rebooting and failing over, from what I can see through our monitoring there are no network events to cause the failover, any ideas? I've seen mentions of a keep-alive but I cant find it on the running config,
I'm on 17.9.5, any help is greatly appreciated.
Wifi-Controller01-9800#sho chassis ha-status active
My state = ACTIVE
Peer state = STANDBY HOT
Last switchover reason = none
Last switchover time = none
Image Version = 17.9.5
Chassis-HA Local-IP Remote-IP MASK HA-Interface
-----------------------------------------------------------------------------
This Boot: 169.254.232.101 169.254.232.103 255.255.255.0 GigabitEthernet2
Next Boot: 169.254.232.101 169.254.232.103 255.255.255.0 GigabitEthernet2
Chassis-HA Chassis# Priority IFMac Address Peer-timeout(ms)*Max-retry
-----------------------------------------------------------------------------------------
This Boot: 1 2 00:50:56:B1:0D:8C 100*5
Next Boot: 1 2 00:50:56:B1:0D:8C 100*5
Wifi-Controller01-9800#sho chassis rmi
Chassis/Stack Mac Address : 0050.56b1.0d8c - Local Mac Address
Mac persistency wait time: Indefinite
H/W Current
Chassis# Role Mac Address Priority Version State IP RMI-IP
--------------------------------------------------------------------------------------------------------
*1 Active 0050.56b1.0d8c 2 V02 Ready 169.254.232.101 10.206.232.101
2 Standby 0050.5690.5a18 1 V02 Ready 169.254.232.103 10.206.232.103
Wifi-Controller01-9800#sh redundancy
Redundant System Information :
------------------------------
Available system uptime = 15 hours, 43 minutes
Switchovers system experienced = 0
Standby failures = 0
Last switchover reason = none
Hardware Mode = Duplex
Configured Redundancy Mode = sso
Operating Redundancy Mode = sso
Maintenance Mode = Disabled
Communications = Up
Current Processor Information :
-------------------------------
Active Location = slot 1
Current Software state = ACTIVE
Uptime in current state = 15 hours, 43 minutes
Image Version = Cisco IOS Software [Cupertino], C9800-CL Software (C9800-CL-K9_IOSXE), Version 17.9.5, RELEASE SOFTWARE (fc1)
Technical Support: http://www.cisco.com/techsupport
Copyright (c) 1986-2024 by Cisco Systems, Inc.
Compiled Tue 30-Jan-24 15:33 by mcpre
BOOT = bootflash:packages.conf,12;
CONFIG_FILE =
Configuration register = 0x102
Recovery mode = Not Applicable
Fast Switchover = Enabled
Initial Garp = Enabled
Peer Processor Information :
----------------------------
Standby Location = slot 2
Current Software state = STANDBY HOT
Uptime in current state = 15 hours, 41 minutes
Image Version = Cisco IOS Software [Cupertino], C9800-CL Software (C9800-CL-K9_IOSXE), Version 17.9.5, RELEASE SOFTWARE (fc1)
Technical Support: http://www.cisco.com/techsupport
Copyright (c) 1986-2024 by Cisco Systems, Inc.
Compiled Tue 30-Jan-24 15:33 by mcpre
BOOT = bootflash:packages.conf,12;
CONFIG_FILE =
Configuration register = 0x102
Wifi-Controller01-9800#show ver
Wifi-Controller01-9800#show version
Cisco IOS XE Software, Version 17.09.05
Cisco IOS Software [Cupertino], C9800-CL Software (C9800-CL-K9_IOSXE), Version 17.9.5, RELEASE SOFTWARE (fc1)
Technical Support: http://www.cisco.com/techsupport
Copyright (c) 1986-2024 by Cisco Systems, Inc.
Compiled Tue 30-Jan-24 15:33 by mcpre
Cisco IOS-XE software, Copyright (c) 2005-2024 by cisco Systems, Inc.
All rights reserved. Certain components of Cisco IOS-XE software are
licensed under the GNU General Public License ("GPL") Version 2.0. The
software code licensed under GPL Version 2.0 is free software that comes
with ABSOLUTELY NO WARRANTY. You can redistribute and/or modify such
GPL code under the terms of GPL Version 2.0. For more details, see the
documentation or "License Notice" file accompanying the IOS-XE software,
or the applicable URL provided on the flyer accompanying the IOS-XE
software.
ROM: IOS-XE ROMMON
Wifi-Controller01-9800 uptime is 15 hours, 43 minutes
Uptime for this control processor is 15 hours, 45 minutes
System returned to ROM by reload at 08:36:44 UTC Tue Nov 19 2024
System restarted at 19:09:56 UTC Fri Nov 22 2024
System image file is "bootflash:packages.conf"
Last reload reason: Reload reason not captured, system report at bootflash:core/Wifi-Controller01-9800-system-report_20241122-190750-UTC.tar.gz
This product contains cryptographic features and is subject to United
States and local country laws governing import, export, transfer and
use. Delivery of Cisco cryptographic products does not imply
third-party authority to import, export, distribute or use encryption.
Importers, exporters, distributors and users are responsible for
compliance with U.S. and local country laws. By using this product you
agree to comply with applicable laws and regulations. If you are unable
to comply with U.S. and local laws, return this product immediately.
A summary of U.S. laws governing Cisco cryptographic products may be found at:
http://www.cisco.com/wwl/export/crypto/tool/stqrg.html
If you require further assistance please contact us by sending email to
export@cisco.com.
AIR License Level: AIR DNA Advantage
Next reload AIR license Level: AIR DNA Advantage
Smart Licensing Status: Smart Licensing Using Policy
cisco C9800-CL (VXE) processor (revision VXE) with 12268676K/3075K bytes of memory.
Processor board ID 9Q1LCBDHQRM
Router operating mode: Autonomous
8 Virtual Ethernet interfaces
1 Gigabit Ethernet interface
32768K bytes of non-volatile configuration memory.
16332008K bytes of physical memory.
11526144K bytes of virtual hard disk at bootflash:.
11526144K bytes of virtual hard disk at bootflash-2:.
Installation mode is INSTALL
Configuration register is 0x102
Wifi-Controller01-9800# show redundancy switchover history
Wifi-Controller01-9800#
Wifi-Controller01-9800#
Wifi-Controller01-9800#
Wifi-Controller01-9800#show run | inc heart
Wifi-Controller01-9800#show chassis detail
Chassis/Stack Mac Address : 0050.56b1.0d8c - Local Mac Address
Mac persistency wait time: Indefinite
H/W Current
Chassis# Role Mac Address Priority Version State IP
-------------------------------------------------------------------------------------
*1 Active 0050.56b1.0d8c 2 V02 Ready 169.254.232.101
2 Standby 0050.5690.5a18 1 V02 Ready 169.254.232.103
Stack Port Status Neighbors
Chassis# Port 1 Port 2 Port 1 Port 2
--------------------------------------------------------
1 OK OK 2 2
2 OK OK 1 1
Wifi-Controller01-9800#
Solved! Go to Solution.
11-23-2024 06:01 AM
@paul-d wrote : >.... [I powered off the secondary, and the primary stayed up, no reboots]
Then it must be considered mandatory to execute the 'primary' WirelessAnalyzer procedure to have a consistency check of the configuration with the CLI command show tech wireless (not a simple show tech) and feed the output from that into
Wireless Config Analyzer
Check if there are any remarks or errors concerning HA
M.
11-23-2024 04:42 AM
Did you check this:
system report at bootflash:core/Wifi-Controller01-9800-system-report_20241122-190750-UTC.tar.gz
11-23-2024 05:32 AM
Hi,
I have the file on a tftp server now, whats the best method of analysing it as its quite large for a text file.
11-23-2024 05:46 AM
@paul-d wrote : I have the file on a tftp server now, whats the best method of analysing it as its quite large for a text file.
Unfortunately the controller crash dumps are not available for analysis by normal humans only TAC.
In that context I posted my earlier reply , to get an idea if the problem is related to the current VMware environment or a native controller(s) crash. Also is there a special vmware topology involved ? Are the controllers on the same hypervisor (box) ?
Ref : https://community.cisco.com/t5/wireless/9800-wlc-ha-vmware-keeps-rebooting/m-p/5227986/highlight/true#M278046
M.
11-23-2024 04:43 AM
Just to confirm the stability, if you can turn off standby WLC and check is the Active stable ? and no reboot taking place.
what Esxi version running, how is the VLAN Setup done on esxi, is this vswitch or dswitch.
Both the Virtual WLC in same Esxi or across different Esxi ?
I have tested old version vWLC 9800 17.X code with HA - as i was reading document :
It is important to put C9800 interface that we intend to use as a Redundancy Port L2 HA inter-vWLC 9800 link into a separate, unused VLAN
check below guide :
11-23-2024 05:40 AM
if you can turn off standby WLC and check is the Active stable ? and no reboot taking place. [I powered off the secondary, and the primary stayed up, no reboots]
what Esxi version running, how is the VLAN Setup done on esxi, is this vswitch or switch. [VMware ESXi, 7.0.3, 23794027, the VM has 2 network cards, one is connected to a vlan for redundancy and the other is a trunk for all of the vlans, the only thing in the redun vlan is the two WLC's for redundancy talk, I believe they use Vswitches]
Both the Virtual WLC in same Esxi or across different Esxi ? [both are on VMware ESXi, 7.0.3, 23794027]
I have tested old version vWLC 9800 17.X code with HA - as i was reading document :
It is important to put C9800 interface that we intend to use as a Redundancy Port L2 HA inter-vWLC 9800 link into a separate, unused VLAN [I cab confirm we have done this, separate interface on the WLC, separate vswitch and separate vlan]
11-23-2024 06:01 AM
@paul-d wrote : >.... [I powered off the secondary, and the primary stayed up, no reboots]
Then it must be considered mandatory to execute the 'primary' WirelessAnalyzer procedure to have a consistency check of the configuration with the CLI command show tech wireless (not a simple show tech) and feed the output from that into
Wireless Config Analyzer
Check if there are any remarks or errors concerning HA
M.
11-23-2024 04:43 AM
- Have a look at the outputs from : show reload-history
show version | inc reload
- I would advise that you configure a syslog server on the (primary) controller and check logging at the syslog server ,
especially after reboots or just before to get last gasp info's
- Do these machines have sufficient resources at all times , this can be checked by using the hypervisor and check memory and cpu being allocated
- Have a checkup of the primary controller configuration using the CLI command show tech wireless (not a simple show tech )
and feed the output into Wireless Config Analyzer
Additional commands useful for further research into the issue :
dir bootflash:/core/ | i core|system-report (look for crash reports , if any)
show version | inc reload
show platform
show inventory
show environment
Related to HA-SSO troubleshooting
show redundancy | i ptime|Location|Current Software state|Switchovers
show chassis
show chassis detail
show chassis ha-status local
show chassis ha-status active
show chassis ha-status standby
show chassis rmi
show redundancy
show redundancy history
show redundancy switchover history
show tech wireless redundancy
show redundancy states
show platform hardware slot R0 ha_port interface stats
test wireless redundancy rping
M.
11-23-2024 05:58 AM
11-23-2024 06:08 AM - edited 11-23-2024 06:09 AM
@paul-d wrote : Thank you, i have provided the output for all commands and from the analyzer tool
I will not review the complete file ; but it seems there are errors , to start with if you have the HTML page with the output or the excell file then all errors red flagged , (concerning the first wlc checks) must be corrected (this is mandatory)
So from the outputs I think that 230041 will be red flagged and must be corrected , but the other errors red flagged must be corrected too
M.
11-23-2024 06:48 AM
Hi,
Just so i'm clear the wireless management interface (currently on vlan 657) needs to match the vlan used on the redun-management interface (currently 1657) ?
I though the HA interfaces needed to be on their own vlan, I didn't realise management must also be in the same vlan too
Wifi-Controller01-9800#show running-config | include wireless management interface
wireless management interface Vlan657
Wifi-Controller01-9800#show running-config | include redun-management
redun-management interface Vlan1657 chassis 1 address 10.206.232.101 chassis 2 address 10.206.232.10 3
11-23-2024 07:13 AM
Correct - this is clearly stated in the docs:
https://www.cisco.com/c/en/us/support/docs/wireless/catalyst-9800-series-wireless-controllers/220277-configure-high-availability-sso-on-catal.html#toc-hId--1035674251
"which have been configured with separated WMIs and with GUI accessible at
In addition to these IP addresses, 2 additional ones into the same subnet (and VLAN) have been used, namely 10.48.39.131 and 10.48.39.132."
11-23-2024 06:56 AM
Also refer to the Best Practices guide: https://www.cisco.com/c/en/us/td/docs/wireless/controller/9800/technical-reference/c9800-best-practices.html#C9800CLconsiderations
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide