cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
3213
Views
18
Helpful
51
Replies

Configuring SSO on a pair of 9800-L issue

tdennehy
Level 2
Level 2

I am trying to configure what should be a very simple setup.  Two 9800-Ls on a bench with a switch in between.  They can ping each other when I configure SSO on both boxes, and I can ping the secondary.  But neither will ever become the standby.

I'm wondering if there is "something else", that everyone always forgets to do when configuring SSO.  Its so simple, just using vlan1 on both, with 192.168.1.x addresses.

Waiting for remote chassis to join
#######################################################################################

wc01:

interface Port-channel1
description ** uplink **
switchport mode trunk
!
interface Port-channel2
description ** uplink **
switchport mode trunk

 
interface TenGigabitEthernet0/1/0
switchport mode trunk
no negotiation auto
no mop enabled
channel-group 1 mode on
!
interface TenGigabitEthernet0/1/1
switchport mode trunk
no negotiation auto
channel-group 1 mode on
!
interface GigabitEthernet0
vrf forwarding Mgmt-intf
ip address 192.168.1.100 255.255.255.0
negotiation auto
no mop enabled
!
interface Vlan1
ip address 192.168.1.249 255.255.255.0 secondary
ip address 192.168.1.251 255.255.255.0
ip helper-address 192.168.1.254
no mop enabled

ip tftp source-interface GigabitEthernet0
ip route 0.0.0.0 0.0.0.0 192.168.1.254
ip route vrf Mgmt-intf 0.0.0.0 255.255.255.0 192.168.1.254
redun-management interface Vlan1 chassis 2 address 192.168.1.249 chassis 1 address 192.168.1.250

 

wc02:

!
interface Port-channel1
description ** uplink **
switchport mode trunk
!
interface Port-channel2
description ** uplink **
switchport mode trunk
!

interface TenGigabitEthernet0/1/0
switchport mode trunk
speed 1000 (its a 1gig SFP)
no negotiation auto
no snmp trap link-status
no mop enabled
channel-group 2 mode on
!
interface TenGigabitEthernet0/1/1
switchport mode trunk
speed 10000
no negotiation auto
no snmp trap link-status
channel-group 2 mode on
!
interface GigabitEthernet0
vrf forwarding Mgmt-intf
ip address 192.168.1.101 255.255.255.0
negotiation auto
no mop enabled
!
interface Vlan1
ip address 192.168.1.250 255.255.255.0 secondary
ip address 192.168.1.252 255.255.255.0
ip helper-address 192.168.1.254
no mop enabled

!
ip tftp source-interface GigabitEthernet0
ip route 0.0.0.0 0.0.0.0 192.168.1.254
ip route vrf Mgmt-intf 0.0.0.0 255.255.255.0 192.168.1.254

redun-management interface Vlan1 chassis 1 address 192.168.1.250 chassis 2 address 192.168.1.249

Could I be missing something?  This should not be that difficult!!!

51 Replies 51

This SSO, there must be no PO from SW toward both WLC!!

This wrong, 

You need single link from SW to each of wlc. 

That make issue. 

MHM

 @MHM Cisco World - all of our 9800 HA SSO pairs run on port channels - it's part of the standard HA design
It's shown in all the guides like page 45 & 49 in 
https://www.ciscolive.com/c/dam/r/ciscolive/global-event/docs/2023/pdf/BRKEWN-2846.pdf

Friend 

Your link is correct but check it' 

The PO is from WLC to SW 

In his case PO is from SW to both WLC !!!!

That not work.

MHM

I think I found the issue.  I erased the config and started over.  Same exact issue.  As I was rebooting, I saw this message on the console.  I just happened to see if as it was scrolling past.

"Details: Chassis 2 is detected INCOMPATIBLE with software version of Active: FAILED: Version '17.03.04c' mismatch with Active's running version '17.09.03' for package: 'rp_base'  "

Do you know what that indicates?  I don't know where that version is coming from, since I put 17.09.04a on them both yesterday, thinking it might be a code issue.  Apparently something else is going on here.

Waiting for remote chassis to join

Chassis number is 1
All chassis in the stack have been discovered. Accelerating discovery
Jan 29 19:58:29.012: %BOOT-3-BOOTTIME_INCOMPATIBLE_SW_DETECTED: R0/0: issu_stack: Incompatible software detected. Details: Chassis 2 is detected INCOMPATIBLE with software version of Active: FAILED: Version '17.03.04c' mismatch with Active's running version '17.09.03' for package: 'rp_base'
Jan 29 19:58:29.094: %AUTO_UPGRADE-5-AUTO_UPGRADE_START_CHECK: R0/0: auto_upgrade_client: Auto upgrade start checking for incompatible switches.
Jan 29 19:58:29.227: %AUTO_UPGRADE-5-AUTO_UPGRADE_START_CHECK: R0/0: auto_upgrade_client: Auto upgrade start checking for incompatible switches.

@tdennehy 

 There a few condition where the WLC get into recovery-mode.  One of them is gateway reachability from standby WLC. 

You can see other the possible reasons on this link 

Check for  "Table 2. System and Network Fault Handling"

Cisco Catalyst 9800 Series Wireless Controller Software Configuration Guide, Cisco IOS XE Amsterdam 17.3.x - High Availability [Cisco Catalyst 9800 Series Wireless Controllers] - Cisco

 

I don't understand this new name of the WLC  
edh001-001-wc01(recovery-mode)>

Everything I have read indicates this means that the WLC can no longer ping the gateway, which is this case is simply vlan 1 on the switch with ip of 192.168.1.254

However, when the WLC has this name, it CAN ping the gateway.  In fact, both WLCs can ping the SVI.  Its almost as if the pair THINKS it cannot ping the gateway, but they both can.

Is there something else the pair is looking for, other than the gateway?  Are they trying to reach the Internet or something?

@tdennehy  This name (recovery-mode)> is somehow a Claver way the WLC have to tell you that something is not right on the setup. 

On the link I shared above, we can see that there are a few situations where this can happen, not only the ping fail to gateway. 

Examples

Trigger

RP Link Status

Peer Reachability through RMI

Switchover

Result

Critical process crash

Down

Reachable

No

No action. One controller in recovery mode.

Forced switchover

Down

Reachable

N/A

No action. One controller in recovery mode.

 

RP Link

Peer Reachability Through RMI

Gateway From Active

Gateway From Standby

Switchover

Result

 

Down

Reachable

Reachable

Reachable

No SSO

Standby becomes active with (old) active going in to active-recovery mode. Configuration mode is disabled in active-recovery mode. All interfaces will be ADMIN DOWN with the wireless management interface having RMI IP. The controller in the active-recovery mode will reload to become standby when the RP link comes UP.

Down

Reachable

Unreachable

Reachable

RP link down, then active loses GW, then there won't be any SSO. GW down, within 8 seconds, RP link goes down, then there would be a SSO.

Gateway reachability message is exchanged over RP+RMI links. Old-Active goes to active-recovery mode. The configuration mode is disabled in active-recovery mode. All interfaces will be ADMIN DOWN with the wireless management interface having RMI IP. The controller in active-recovery will reload to become standby (or standby-recovery if gateway reachability is still not available) when the RP link comes up.

Down

Reachable

Unreachable

Unreachable

No SSO

Standby goes to standby-recovery

 

Leo Laohoo
Hall of Fame
Hall of Fame

Both units are Chassis 1?

Ugghhh.  I hope not.   I will go look!

edh001-001-wc02#sho chassis
Chassis/Stack Mac Address : 8c1e.806e.9080 - Local Mac Address
Mac persistency wait time: Indefinite
Local Redundancy Port Type: Twisted Pair
H/W Current
Chassis# Role Mac Address Priority Version State IP
-------------------------------------------------------------------------------------
*1 Active 8c1e.806e.9080 1 V02 Ready 169.254.1.250


edh001-001-wc01#sho chassis
Chassis/Stack Mac Address : 687d.b4fd.4640 - Local Mac Address
Mac persistency wait time: Indefinite
Local Redundancy Port Type: Twisted Pair
H/W Current
Chassis# Role Mac Address Priority Version State IP
-------------------------------------------------------------------------------------
*2 Active 687d.b4fd.4640 2 V02 Ready 169.254.1.249

 

 

I think I found the issue.  I erased the config and started over.  Same exact issue.  As I was rebooting, I saw this message on the console.  I just happened to see if as it was scrolling past.

"Details: Chassis 2 is detected INCOMPATIBLE with software version of Active: FAILED: Version '17.03.04c' mismatch with Active's running version '17.09.03' for package: 'rp_base'  "

Do you know what that indicates?  I don't know where that version is coming from, since I put 17.09.04a on them both yesterday, thinking it might be a code issue.  Apparently something else is going on here.

Waiting for remote chassis to join

Chassis number is 1
All chassis in the stack have been discovered. Accelerating discovery
Jan 29 19:58:29.012: %BOOT-3-BOOTTIME_INCOMPATIBLE_SW_DETECTED: R0/0: issu_stack: Incompatible software detected. Details: Chassis 2 is detected INCOMPATIBLE with software version of Active: FAILED: Version '17.03.04c' mismatch with Active's running version '17.09.03' for package: 'rp_base'

Scott Fella
Hall of Fame
Hall of Fame

I don't know which guide you followed, but when you ener this command, use a vlan that is not on your network, you just want the redundancy management on its own vlan.  You don't need to route this. Keep in mind, there are various way's this is done.  The docs show different way's. You didn't show all the commands that are required also.

Also, take a look at this guide, there are others that are good and some videos out there.

redun-management interface VlanXXX chassis 1 address 192.168.1.250 chassis 2 address 192.168.1.249

https://howiwifi.com/2021/01/17/cisco-9800-rmirp-high-availability-best-practice-configuration/

https://www.wiresandwi.fi/blog/cisco-wlc-9800-high-availability-sso-rmi-rp-cli-configuration

https://www.cisco.com/c/en/us/support/docs/wireless/catalyst-9800-series-wireless-controllers/220277-configure-high-availability-sso-on-catal.html#toc-hId-1451838582

-Scott
*** Please rate helpful posts ***

I think I found the issue.  I erased the config and started over.  Same exact issue.  As I was rebooting, I saw this message on the console.  I just happened to see if as it was scrolling past.

"Details: Chassis 2 is detected INCOMPATIBLE with software version of Active: FAILED: Version '17.03.04c' mismatch with Active's running version '17.09.03' for package: 'rp_base'  "

Do you know what that indicates?  I don't know where that version is coming from, since I put 17.09.04a on them both yesterday, thinking it might be a code issue.  Apparently something else is going on here.

Waiting for remote chassis to join

Chassis number is 1
All chassis in the stack have been discovered. Accelerating discovery
Jan 29 19:58:29.012: %BOOT-3-BOOTTIME_INCOMPATIBLE_SW_DETECTED: R0/0: issu_stack: Incompatible software detected. Details: Chassis 2 is detected INCOMPATIBLE with software version of Active: FAILED: Version '17.03.04c' mismatch with Active's running version '17.09.03' for package: 'rp_base'

Upload the image again is what I would do and validate it doesn't error out or you see any errors while updating.

-Scott
*** Please rate helpful posts ***
Review Cisco Networking for a $25 gift card