05-11-2021 03:59 AM - edited 07-05-2021 01:17 PM
Hi board,
assuming in a WLC C9800 SSO cluster with RMI (SW version 17.3.3), you want to replace a failed cluster member... what is the preferred way to do it?
I tried the following and it was a disaster:
Assuming chassis2 has failed completely and needs to be integrated into the cluster:
1.) Make sure the SW version on the new chassis is the same (install mode)
2.) Cable the new factory default chassis to the network (channel uplink and RP)
3.) Assign the correct chassis number to the new chassis
# Exec mode chassis 2 priority 1 reload
==> Chassis boots up with chassis number 2
4.) Base SSO configuration:
interface Vlan<MANAGEMENT-VLAN-ID> ip address 192.168.0.1 255.255.255.0 no shutdown ! redun-management interface Vlan<MANAGEMENT-VLAN-ID> chassis 1 address <RMI-IPv4-CHASSIS1> chassis 2 address <RMI-IPv4-CHASSIS2> ! end ! write memory
After this, the following log came up on the newly (factory default) chassis:
WLC#wr Building configuration... [OK] WARNING: Reload HA Chassis for RMI configuration to take effect WLC# *May 11 10:46:16.688: %SYS-6-PRIVCFG_ENCRYPT_SUCCESS: Successfully encrypted private config file Chassis 2 reloading, reason - stack merge May 11 10:46:20.850: %PMAN-5-EXITACTION: C0/0: pvp: Process manager is exiting: May 11 10:46:21.194: %PMAN-5-EXITACTION: F0/0: pvp: Process manager is exiting: May 11 10:46:42.457: %PMAN-5-EXITACTIONvp: Process manager is exiting: process exit with reload fru code May 11 10:46:54.985: %PMAN-3-PROCESS_NOTIFICATION: R0/0: pvp: System report core/WLC_2_RP_0-system-report_20210511-104649-UTC.tar.gz (size: 12529 KB) generated and System report info at core/WLC_2_RP_0-system-report_20210511-104649-UTC-info.txt Initializing Hardware ...
==> Chassis 2 reboot
However (and that's the problem), chassis 1 reboots as well!
=> Wireless service disruption, because both chassis are booting at the same time.
I would assume, only chassis 2 is rebooting and integrates itself into the cluster...
Am I doing something wrong here, or may I hit a bug here?
05-11-2021 05:50 AM
Yes, its expected for existing Primary/Active WLC to reboot as well for the first time HA pairing, so the setup should be performed at change window. Replacing with new/different WLC in the setup is going to be similar to initial pairing, otherwise it will stuck in Maintenance mode that require manual intervention ie., reboot both WLC at same time, anyway.
05-11-2021 07:41 AM
Wow ... needing a maintenance window to replace a failed chassis .... .... .. I lost my faith in current products ....
Thank you for the answer!!!
05-11-2021 07:56 AM
Just out of curiosity, did you verify that chassis 1 had priority set to 2? If I recall correctly, the chassis 2 should come up and become the standby and then the chassis 1 should reboot and chassis 2 becomes active with no interruptions. Now there is that chance that chassis 2 could restart like what happened in your case, but not suppose to.
You can verify and or test by brining up a couple 9800-CL's and see what happens also.
05-12-2021 12:32 AM
Hey Scott,
so from my point of view, the priorities are only relevant in the election process, when both WLCs are booting.
But in my case, I set the priorities like recommended in the HA paper:
myWLC#show chassis Chassis/Stack Mac Address : f4bd.abcd.f660 - Local Mac Address Mac persistency wait time: Indefinite Local Redundancy Port Type: Twisted Pair H/W Current Chassis# Role Mac Address Priority Version State IP ------------------------------------------------------------------------------------- *1 Active f4bd.abcd.f660 2 V02 Ready 169.254.54.130 2 Standby f4bd.abcd.f5a0 1 V02 Ready 169.254.54.131
In either way: If my chassis#1 would have failed and I replace chassis#1 (with prio:2), I would not expect that chassis#1 take over. I would expect, that it integrates as chassis#1 secondary.
The SSO paper has a nice list, how the active WLC is chosen:
1. The wireless controllerthat is currently the active wireless controller
2. The wireless controller with the highest priority value.
3. The wireless controllerwith the shortest start-up time.
4. The wireless controller with the lowest MAC Address.
So based on the list - the currently active WLC should keep its role in any case (except it fails)..
05-12-2021 12:47 AM
05-12-2021 12:55 AM
05-11-2021 11:22 PM
https://www.cisco.com/c/dam/en/us/td/docs/wireless/controller/9800/17-1/deployment-guide/c9800-ha-sso-deployment-guide-rel-17-1.pdf
On C9800-40 and C9800-80 wireless controller, enable High Availability SSO using the following command on
each of the two wireless controller units
chassis redundancy ha-interface local-ip <local IP> <local IP subnet> remoteip <remote IP>
Reload both wireless controllers by executing the command reload from the CLI
Note: It is recommended to configure HA using the Redundancy Management Interface (RMI) starting Release 17.1. To see
configuration using RMI please see the Redundancy Management Interface section.
05-12-2021 01:49 AM
Have always rebooted both WLCs as part of initial bringup or replacement addition to avoid frustration particularly the RPs are connected across L2.
It appear, the new WLC trying to add itself to HA-stack as standby-hot initially(election process) for the first time require existing ACTIVE WLC to reboot to do initial sync at the bootup and all the other config database synced once fully booted. this initial scenario is different than failure scenario where both WLCs were already synced in the past. there's many .doc ref for this scenario. unable to find Cisco .doc ref mention that new/replaced WLC will sync with existing Active WLC without ACTIVE reboot, please point that out if found.
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide