01-11-2023 08:32 PM
Trying to setup redundancy on 9800-Cl in Esxi environment. Have three NICs - Gi1 ( OOB ) , Gi2 ( trunk ) , Gi3 ( HA ) . After issuing following commands and reloading controllers one of the controller goes into recovery mode and HA is not formed. Commands used :
chassis redundancy ha-interface gig3
&
redun-management interface Vlan40 chassis 1 address 192.168.31.4 chassis 2 address 192.168.31.5
IOS version in use 17.3.6. Getting following message on controller which goes into recovery mode %RIF_MGR_FSM-6-RMI_LINK_domn. RMI link is down.
Looking for help and recommendation to resolve issues.
Thanks
Solved! Go to Solution.
01-11-2023 11:53 PM
Since this is virtual first you need to check the GIG3 how this is extended in vswitch - check VLAN 40
01-11-2023 10:53 PM
Hi,
Please follow this post and try to setup the HA between two 9800-CL WLCs.
https://rowelldionicio.com/cisco-catalyst-9800-cl-high-availability/
https://wifininjas.net/2019/08/21/wn-blog-011-cisco-c9800-cl-wlc-redundancy-ha-sso/
Regards
Dont forget to rate helpful posts
01-11-2023 11:23 PM
- Have a checkup of the current controller configuration (before the HA attempt) with the CLI command : show tech wireless , have the output analyzed by https://cway.cisco.com/
M.
01-11-2023 11:53 PM
Since this is virtual first you need to check the GIG3 how this is extended in vswitch - check VLAN 40
01-12-2023 03:06 AM - edited 01-12-2023 05:47 AM
- Also note that you can get additional (state) info's on the RP (Gi3) with the command show platform hardware chassis active qfp datapath pmd ifdev (you will also get info on the state of other interfaces). You may compare this with before or after connecting attempts to the redundancy controller. Also correlate the observed outputs with the intended networking settings on the virtual environment (hypervisor)
M.
01-17-2023 11:20 AM
Thanks All. Both 9800-CL controllers needed to be on same ESxi host for HA ( Gi3 ) to work. I had 9800CL controllers on two separate ESXi hosts. Once I moved them to single ESXi host issue was Fixed.
11-16-2023 01:23 PM
Does anyone know if this can work when the controllers are on different host servers? Having both controllers on the same host greatly reduces the benefit of HA. None of the configuration guides address this and so far our time working with TAC hasn't produced a resolution.
11-16-2023 02:52 PM
May be explain your scenario or if you looking different DC, then try n+1 deployment.
11-16-2023 04:11 PM
Sure thing. Our situation is very similar to the original post... We are trying to setup redundancy on 9800-Cl in a VMWare Esxi environment. We have followed the deployment guide and have a functioning controller with APs joined and active clients.
The issue is that we are unable to establish HA between the WLCs. We have multiple ESXi Host servers in a cluster, and we would like to be able to have WLC1 and WLC2 on different host servers. All of our attempts to get this working fail. The HA link will not establish.
All of the documentation we've found - such as this: Configure Catalyst 9800 Wireless Controllers in High Availability (HA) Client Stateful Switch Over (SSO) in IOS-XE 16.12 - Cisco Only covers instances when WLC1 and WLC2 are on the same host.
What we need are the configuration steps for setup where the WLCs are on different hosts. We worked with TAC multiple times and have not found a solution. We don't even know if it is a supported scenario.
If anyone has this running or can provide info on it, I'd greatly appreciate it.
11-16-2023 11:41 PM
>..Does anyone know if this can work when the controllers are on different host servers? Having both controllers on the same host greatly reduces the benefit of HA. None of the configuration guides address this and so far our time working with TAC hasn't produced a resolution.
- In that case , you need to make sure that the external vlans 'bridging' between the two controllers , use the same vlan tagging as defined for inner HA on the 9800CL pair ,
M.
09-05-2024 12:01 AM - edited 09-05-2024 12:25 AM
Ok so for anyone like OP that runs into this problem here is how I solved it:
We too have a VMware vSphere cluster 8 Update3 (ESXi 8u3 Dell Custom A00). I found the same behaviour that when the 9800-CL VMs are on the same host the HA interface (Gi3) works fine but as soon as they are moved to seperate hosts it keeps resetting and rebooting the 9800-CL and wont form the HA.
We are using one Distributed Virtual Switch (DVS) for data - so each host has a dvs switch so changes are easy to make
The layout for the 9800 vm is like so
vnic1 = oob mgmt port group - connect it up and shut it down in IOS-XE which maps to Gi1 (the 9800-CL needs it connected but you can shut down in IOS-XE as OOB on vm makes no sense, and you would need custom static routing as well).
vnic 2 = this is your wireless management interface - either access mode portgroup or a trunk - that depends if you need to trunk vlans for central switching - we are using pure flexconnect so this is just set to our mgmt vlan. On the 9800-CL that is Gi2 and it is the WMI and RMI
vnic 3 = the HA interface - set the dvs portgroup under vlan to NONE, we had it set to a vlan and it wouldnt work when 9800 CL was on different hosts. In IOS-XE this is Gi3 (the HA interface).
I set both the vnic 2 and vnic 3 portgroups on the DVS to have promiscious mode true and forged transmits true.
The reason i think the problem occurs is the 9800-CL detects a switching loop with both Gi2 and Gi3 are connected to same vswitch or DVS if access mode is used. When you set the gi3 port group to NONE it is set for untagged only from the vmware side. This makes it so the 9800-CL does not see a switching loop.
As side note, when you use vmotion - only move one 9800-CL vm at a time, never move them both at the same time or it will bork and reset. And OP is right no guide on cisco.com or anywhere else will tell you this. I noticed the screen shots on some of the guides use vlan 0 / NONE for the HA interface but always describring a single ESXi host scenario.
vwlc-a#show chassis detail
Chassis/Stack Mac Address : 0050.5697.3840 - Local Mac Address
Mac persistency wait time: Indefinite
H/W Current
Chassis# Role Mac Address Priority Version State IP
-------------------------------------------------------------------------------------
*1 Active 0050.5697.3840 2 V02 Ready 169.254.135.112
2 Standby 0050.5697.535f 1 V02 Ready 169.254.135.113
Stack Port Status Neighbors
Chassis# Port 1 Port 2 Port 1 Port 2
--------------------------------------------------------
1 OK OK 2 2
2 OK OK 1 1
#
#
#
vwlc-a#show chassis ha-status active
My state = ACTIVE
Peer state = STANDBY HOT
Last switchover reason = active unit removed
Last switchover time = 00:16:34 AWST Thu Sep 5 2024
Image Version = 17.12.2
Chassis-HA Local-IP Remote-IP MASK HA-Interface
-----------------------------------------------------------------------------
This Boot: 169.254.135.112 169.254.135.113 255.255.255.0 GigabitEthernet3
Next Boot: 169.254.135.112 169.254.135.113 255.255.255.0 GigabitEthernet3
Chassis-HA Chassis# Priority IFMac Address Peer-timeout(ms)*Max-retry
-----------------------------------------------------------------------------------------
This Boot: 1 2 00:50:56:97:38:40 800*8
Next Boot: 1 2 00:50:56:97:38:40 800*8
vwlc-a#
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide