cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
1627
Views
9
Helpful
12
Replies

9800 WLC HA Vmware -keeps rebooting

paul-d
Level 1
Level 1

Hi I have a HA part of 9800 WLC's on vmware and for some reason, they keep rebooting and failing over, from what I can see through our monitoring there are no network events to cause the failover, any ideas? I've seen mentions of a keep-alive but I cant find it on the running config,

I'm on 17.9.5, any help is greatly appreciated.

Wifi-Controller01-9800#sho chassis ha-status active

My state = ACTIVE
Peer state = STANDBY HOT
Last switchover reason = none
Last switchover time = none
Image Version = 17.9.5

Chassis-HA Local-IP Remote-IP MASK HA-Interface
-----------------------------------------------------------------------------
This Boot: 169.254.232.101 169.254.232.103 255.255.255.0 GigabitEthernet2

Next Boot: 169.254.232.101 169.254.232.103 255.255.255.0 GigabitEthernet2


Chassis-HA Chassis# Priority IFMac Address Peer-timeout(ms)*Max-retry
-----------------------------------------------------------------------------------------
This Boot: 1 2 00:50:56:B1:0D:8C 100*5

Next Boot: 1 2 00:50:56:B1:0D:8C 100*5

Wifi-Controller01-9800#sho chassis rmi
Chassis/Stack Mac Address : 0050.56b1.0d8c - Local Mac Address
Mac persistency wait time: Indefinite
H/W Current
Chassis# Role Mac Address Priority Version State IP RMI-IP
--------------------------------------------------------------------------------------------------------
*1 Active 0050.56b1.0d8c 2 V02 Ready 169.254.232.101 10.206.232.101
2 Standby 0050.5690.5a18 1 V02 Ready 169.254.232.103 10.206.232.103

 

Wifi-Controller01-9800#sh redundancy
Redundant System Information :
------------------------------
Available system uptime = 15 hours, 43 minutes
Switchovers system experienced = 0
Standby failures = 0
Last switchover reason = none

Hardware Mode = Duplex
Configured Redundancy Mode = sso
Operating Redundancy Mode = sso
Maintenance Mode = Disabled
Communications = Up

Current Processor Information :
-------------------------------
Active Location = slot 1
Current Software state = ACTIVE
Uptime in current state = 15 hours, 43 minutes
Image Version = Cisco IOS Software [Cupertino], C9800-CL Software (C9800-CL-K9_IOSXE), Version 17.9.5, RELEASE SOFTWARE (fc1)
Technical Support: http://www.cisco.com/techsupport
Copyright (c) 1986-2024 by Cisco Systems, Inc.
Compiled Tue 30-Jan-24 15:33 by mcpre
BOOT = bootflash:packages.conf,12;
CONFIG_FILE =
Configuration register = 0x102
Recovery mode = Not Applicable
Fast Switchover = Enabled
Initial Garp = Enabled

Peer Processor Information :
----------------------------
Standby Location = slot 2
Current Software state = STANDBY HOT
Uptime in current state = 15 hours, 41 minutes
Image Version = Cisco IOS Software [Cupertino], C9800-CL Software (C9800-CL-K9_IOSXE), Version 17.9.5, RELEASE SOFTWARE (fc1)
Technical Support: http://www.cisco.com/techsupport
Copyright (c) 1986-2024 by Cisco Systems, Inc.
Compiled Tue 30-Jan-24 15:33 by mcpre
BOOT = bootflash:packages.conf,12;
CONFIG_FILE =
Configuration register = 0x102


Wifi-Controller01-9800#show ver
Wifi-Controller01-9800#show version
Cisco IOS XE Software, Version 17.09.05
Cisco IOS Software [Cupertino], C9800-CL Software (C9800-CL-K9_IOSXE), Version 17.9.5, RELEASE SOFTWARE (fc1)
Technical Support: http://www.cisco.com/techsupport
Copyright (c) 1986-2024 by Cisco Systems, Inc.
Compiled Tue 30-Jan-24 15:33 by mcpre


Cisco IOS-XE software, Copyright (c) 2005-2024 by cisco Systems, Inc.
All rights reserved. Certain components of Cisco IOS-XE software are
licensed under the GNU General Public License ("GPL") Version 2.0. The
software code licensed under GPL Version 2.0 is free software that comes
with ABSOLUTELY NO WARRANTY. You can redistribute and/or modify such
GPL code under the terms of GPL Version 2.0. For more details, see the
documentation or "License Notice" file accompanying the IOS-XE software,
or the applicable URL provided on the flyer accompanying the IOS-XE
software.


ROM: IOS-XE ROMMON
Wifi-Controller01-9800 uptime is 15 hours, 43 minutes
Uptime for this control processor is 15 hours, 45 minutes
System returned to ROM by reload at 08:36:44 UTC Tue Nov 19 2024
System restarted at 19:09:56 UTC Fri Nov 22 2024
System image file is "bootflash:packages.conf"
Last reload reason: Reload reason not captured, system report at bootflash:core/Wifi-Controller01-9800-system-report_20241122-190750-UTC.tar.gz

 

This product contains cryptographic features and is subject to United
States and local country laws governing import, export, transfer and
use. Delivery of Cisco cryptographic products does not imply
third-party authority to import, export, distribute or use encryption.
Importers, exporters, distributors and users are responsible for
compliance with U.S. and local country laws. By using this product you
agree to comply with applicable laws and regulations. If you are unable
to comply with U.S. and local laws, return this product immediately.

A summary of U.S. laws governing Cisco cryptographic products may be found at:
http://www.cisco.com/wwl/export/crypto/tool/stqrg.html

If you require further assistance please contact us by sending email to
export@cisco.com.

AIR License Level: AIR DNA Advantage
Next reload AIR license Level: AIR DNA Advantage

Smart Licensing Status: Smart Licensing Using Policy

cisco C9800-CL (VXE) processor (revision VXE) with 12268676K/3075K bytes of memory.
Processor board ID 9Q1LCBDHQRM
Router operating mode: Autonomous
8 Virtual Ethernet interfaces
1 Gigabit Ethernet interface
32768K bytes of non-volatile configuration memory.
16332008K bytes of physical memory.
11526144K bytes of virtual hard disk at bootflash:.
11526144K bytes of virtual hard disk at bootflash-2:.
Installation mode is INSTALL


Configuration register is 0x102

Wifi-Controller01-9800# show redundancy switchover history
Wifi-Controller01-9800#
Wifi-Controller01-9800#
Wifi-Controller01-9800#
Wifi-Controller01-9800#show run | inc heart
Wifi-Controller01-9800#show chassis detail
Chassis/Stack Mac Address : 0050.56b1.0d8c - Local Mac Address
Mac persistency wait time: Indefinite
H/W Current
Chassis# Role Mac Address Priority Version State IP
-------------------------------------------------------------------------------------
*1 Active 0050.56b1.0d8c 2 V02 Ready 169.254.232.101
2 Standby 0050.5690.5a18 1 V02 Ready 169.254.232.103

 

Stack Port Status Neighbors
Chassis# Port 1 Port 2 Port 1 Port 2
--------------------------------------------------------
1 OK OK 2 2
2 OK OK 1 1

Wifi-Controller01-9800#

 

 

 

 

1 Accepted Solution

Accepted Solutions

 

  @paul-d wrote : >.... [I powered off the secondary, and the primary stayed up, no reboots]
                            Then it must be considered mandatory to execute the 'primary' WirelessAnalyzer procedure to have a consistency check of the configuration with the CLI command show tech wireless (not a simple show tech) and feed the output from that into 
                                 Wireless Config Analyzer

       Check if there are any remarks or errors concerning HA

 M.



-- Each morning when I wake up and look into the mirror I always say ' Why am I so brilliant ? '
    When the mirror will then always repond to me with ' The only thing that exceeds your brilliance is your beauty! '

View solution in original post

12 Replies 12

@paul-d 

Did you check this:

system report at bootflash:core/Wifi-Controller01-9800-system-report_20241122-190750-UTC.tar.gz

Hi,

I have the file on a tftp server now, whats the best method of analysing it as its quite large for a text file. 

 

   @paul-d  wrote : I have the file on a tftp server now, whats the best method of analysing it as its quite large for a text file. 
                             Unfortunately the controller crash dumps are not available for analysis by normal humans only TAC.

 In that context I posted my earlier reply , to get an idea if the problem is related to the current VMware environment or a native controller(s) crash. Also is there a special vmware topology involved ? Are the controllers on the same hypervisor (box) ?


 Ref : https://community.cisco.com/t5/wireless/9800-wlc-ha-vmware-keeps-rebooting/m-p/5227986/highlight/true#M278046

 M.



-- Each morning when I wake up and look into the mirror I always say ' Why am I so brilliant ? '
    When the mirror will then always repond to me with ' The only thing that exceeds your brilliance is your beauty! '

balaji.bandi
Hall of Fame
Hall of Fame

Just to confirm the stability, if you can turn off standby WLC and check is the Active stable ? and no reboot taking place.

what Esxi version running, how is the VLAN Setup done on esxi, is this vswitch or dswitch.

Both the  Virtual WLC in same Esxi or across different Esxi ?

I have tested old version vWLC 9800 17.X code with HA - as i was reading document :

It is important to put C9800 interface that we intend to use as a Redundancy Port L2 HA inter-vWLC 9800  link into a separate, unused VLAN

check below guide :

https://www.cisco.com/c/dam/en/us/td/docs/wireless/controller/9800/17-1/deployment-guide/c9800-ha-sso-deployment-guide-rel-17-1.pdf

BB

***** Rate All Helpful Responses *****

How to Ask The Cisco Community for Help

if you can turn off standby WLC and check is the Active stable ? and no reboot taking place. [I powered off the secondary, and the primary stayed up, no reboots]

 

what Esxi version running, how is the VLAN Setup done on esxi, is this vswitch or switch. [VMware ESXi, 7.0.3, 23794027, the VM has 2 network cards, one is connected to a vlan for redundancy and the other is a trunk for all of the vlans, the only thing in the redun vlan is the two WLC's for redundancy talk, I believe they use Vswitches]

Both the  Virtual WLC in same Esxi or across different Esxi ? [both are on VMware ESXi, 7.0.3, 23794027]

I have tested old version vWLC 9800 17.X code with HA - as i was reading document :

It is important to put C9800 interface that we intend to use as a Redundancy Port L2 HA inter-vWLC 9800  link into a separate, unused VLAN [I cab confirm we have done this, separate interface on the  WLC, separate vswitch and separate vlan]

 

  @paul-d wrote : >.... [I powered off the secondary, and the primary stayed up, no reboots]
                            Then it must be considered mandatory to execute the 'primary' WirelessAnalyzer procedure to have a consistency check of the configuration with the CLI command show tech wireless (not a simple show tech) and feed the output from that into 
                                 Wireless Config Analyzer

       Check if there are any remarks or errors concerning HA

 M.



-- Each morning when I wake up and look into the mirror I always say ' Why am I so brilliant ? '
    When the mirror will then always repond to me with ' The only thing that exceeds your brilliance is your beauty! '

marce1000
Hall of Fame
Hall of Fame

 

  - Have a look at the outputs from : show reload-history
                                                               show version | inc reload
                                                       
   - I would advise that you configure a syslog server on the (primary) controller and check logging at the syslog server , 
       especially after reboots or just before to get last gasp info's

    - Do these machines have sufficient resources at all times , this can be checked by using the hypervisor and check memory and cpu being allocated

    - Have a checkup of the primary controller configuration using the CLI command show  tech wireless (not a simple show tech )
       and feed the output into Wireless Config Analyzer

 Additional commands useful for further research into the issue :

dir bootflash:/core/ | i core|system-report            (look for crash reports , if any)
show version | inc reload
show platform
show inventory
show environment

                     Related to HA-SSO troubleshooting

show redundancy | i ptime|Location|Current Software state|Switchovers
show chassis
show chassis detail
show chassis ha-status local
show chassis ha-status active
show chassis ha-status standby
show chassis rmi
show redundancy
show redundancy history
show redundancy switchover history
show tech wireless redundancy
show redundancy states
show platform hardware slot R0 ha_port interface stats
   
                                            test wireless redundancy rping

 M.



-- Each morning when I wake up and look into the mirror I always say ' Why am I so brilliant ? '
    When the mirror will then always repond to me with ' The only thing that exceeds your brilliance is your beauty! '

Hi,

Thank you, i have provided the output for all commands and from the analyzer tool

 

      @paul-d  wrote : Thank you, i have provided the output for all commands and from the analyzer tool
                                 I will not review the complete file ; but it seems there are errors  , to start with if you have the HTML page with the output or the excell file then all errors red flagged  , (concerning the first wlc checks) must be corrected (this is mandatory)
 So from the outputs I think that 230041 will be red flagged and must be corrected , but the other errors red flagged must be corrected too

 M.



-- Each morning when I wake up and look into the mirror I always say ' Why am I so brilliant ? '
    When the mirror will then always repond to me with ' The only thing that exceeds your brilliance is your beauty! '

Hi,

Just so i'm clear the wireless management interface (currently on vlan 657) needs to match the vlan used on the redun-management interface (currently 1657) ?

I though the HA interfaces needed to be on their own vlan, I didn't realise management must also be in the same vlan too

Wifi-Controller01-9800#show running-config | include wireless management interface
wireless management interface Vlan657
Wifi-Controller01-9800#show running-config | include redun-management
redun-management interface Vlan1657 chassis 1 address 10.206.232.101 chassis 2 address 10.206.232.10 3

 

Correct - this is clearly stated in the docs:
https://www.cisco.com/c/en/us/support/docs/wireless/catalyst-9800-series-wireless-controllers/220277-configure-high-availability-sso-on-catal.html#toc-hId--1035674251

"which have been configured with separated WMIs and with GUI accessible at

  • IP address 10.48.39.130 for the first one, referred to as WLC1;
  • IP address 10.48.39.133 for the second one, referred to as WLC2.

In addition to these IP addresses, 2 additional ones into the same subnet (and VLAN) have been used, namely 10.48.39.131 and 10.48.39.132."

 

Rich R
VIP
VIP

Also refer to the Best Practices guide: https://www.cisco.com/c/en/us/td/docs/wireless/controller/9800/technical-reference/c9800-best-practices.html#C9800CLconsiderations

Review Cisco Networking for a $25 gift card